English

Ollama Provider

Run open-source models locally with Ollama for privacy and cost savings.

Prerequisites

Install Ollama: ollama.ai
Pull a model: ollama pull llama3.2
Verify: ollama list

Configuration

[auth]
provider = "ollama"

[model]
provider_id = "ollama"
model = "llama3.2"
base_url = "http://localhost:11434/v1"

Available Models

Model	Size	Description
llama3.2	3B/1B	Meta's latest Llama
llama3.1	8B/70B	Previous generation
qwen2.5	7B/14B/72B	Alibaba Qwen model
mistral	7B	Mistral AI model
codellama	7B/13B/34B	Code-focused Llama
deepseek-coder	6.7B	Code generation
phi3	3.8B	Microsoft's small model

Pull models:

ollama pull llama3.2
ollama pull mistral
ollama pull codellama

CLI Usage

# Use Llama 3.2
savfox -m ollama:llama3.2 exec "Task"

# Use with OSS flag
savfox --oss exec "Task"

Configuration Options

[model.ollama]
base_url = "http://localhost:11434/v1"
temperature = 0.7
num_ctx = 4096
num_gpu = 1

GPU Acceleration

Ollama automatically uses GPU when available:

macOS: Metal (Apple Silicon)
Linux: CUDA (NVIDIA)
Windows: CUDA (NVIDIA)

Memory Requirements

Model Size	RAM Required
3B	8 GB
7B	16 GB
13B	32 GB
70B	128 GB+

Troubleshooting

Connection refused

Ensure Ollama is running: ollama serve
Check the base URL
Verify the port (default: 11434)

Out of memory

Use a smaller model
Reduce num_ctx
Close other applications

Slow responses

Enable GPU acceleration
Use a smaller model
Reduce context length