Provider support

Your model. Your rules.

Santra is provider-agnostic. Any server that speaks the OpenAI chat-completions protocol works. Set one environment variable and you're connected.

First-class support

Anthropicrecommended
Claude models have excellent code comprehension and instruction following. claude-sonnet-4-6 is the default recommended model.
claude-opus-4-7claude-sonnet-4-6claude-haiku-4-5
OpenAI
GPT-4o and o1 series with strong general coding capabilities.
gpt-4ogpt-4o-minio1-mini
Nvidia NIMenterprise
OpenAI-compatible endpoint for Nvidia-hosted or self-hosted LLaMA, Mistral, and Nemotron models.
meta/llama-3.1-405b-instructmistralai/mistral-largenvidia/nemotron-4-340b

OpenAI-compatible

Any server implementing POST /v1/chat/completions works with Santra. Set OPENAI_BASE_URL to your server.

GroqTogether AIFireworks AIPerplexityAnyscaleDeepInfraMistral AICohereOllamaLM StudiovLLMtext-generation-webui

Nvidia NIM — on-prem inference

Nvidia NIM lets you deploy frontier models on your own hardware with an OpenAI-compatible API surface. Ideal for data-sovereignty and regulated environments.

# Nvidia-hosted (cloud)
$ export OPENAI_BASE_URL=https://integrate.api.nvidia.com/v1
$ export OPENAI_API_KEY=nvapi-...
$ export SANTRA_MODEL=meta/llama-3.1-405b-instruct

# Self-hosted NIM
$ export OPENAI_BASE_URL=http://nim.corp.internal:8000/v1
$ export OPENAI_API_KEY=your-nim-key

Fully offline — Ollama / LM Studio

No internet required for inference. Run locally and nothing leaves your machine.

Ollama
$ ollama pull llama3.1:70b
$ export OPENAI_BASE_URL=http://localhost:11434/v1
$ export OPENAI_API_KEY=ollama
LM Studio
$ export OPENAI_BASE_URL=http://localhost:1234/v1
$ export OPENAI_API_KEY=lm-studio
Install guide →