Ollama Provider

Run open-source models locally with Ollama for privacy and cost savings.

Prerequisites

  1. Install Ollama: ollama.ai
  2. Pull a model: ollama pull llama3.2
  3. Verify: ollama list

Configuration

[auth]
provider = "ollama"

[model]
provider_id = "ollama"
model = "llama3.2"
base_url = "http://localhost:11434/v1"

Available Models

ModelSizeDescription
llama3.23B/1BMeta's latest Llama
llama3.18B/70BPrevious generation
qwen2.57B/14B/72BAlibaba Qwen model
mistral7BMistral AI model
codellama7B/13B/34BCode-focused Llama
deepseek-coder6.7BCode generation
phi33.8BMicrosoft's small model

Pull models:

ollama pull llama3.2
ollama pull mistral
ollama pull codellama

CLI Usage

# Use Llama 3.2
savfox -m ollama:llama3.2 exec "Task"

# Use with OSS flag
savfox --oss exec "Task"

Configuration Options

[model.ollama]
base_url = "http://localhost:11434/v1"
temperature = 0.7
num_ctx = 4096
num_gpu = 1

GPU Acceleration

Ollama automatically uses GPU when available:

  • macOS: Metal (Apple Silicon)
  • Linux: CUDA (NVIDIA)
  • Windows: CUDA (NVIDIA)

Memory Requirements

Model SizeRAM Required
3B8 GB
7B16 GB
13B32 GB
70B128 GB+

Troubleshooting

Connection refused

  1. Ensure Ollama is running: ollama serve
  2. Check the base URL
  3. Verify the port (default: 11434)

Out of memory

  1. Use a smaller model
  2. Reduce num_ctx
  3. Close other applications

Slow responses

  1. Enable GPU acceleration
  2. Use a smaller model
  3. Reduce context length