Provider support
Your model. Your rules.
Santra is provider-agnostic. Any server that speaks the OpenAI chat-completions protocol works. Set one environment variable and you're connected.
First-class support
OpenAI-compatible
Any server implementing POST /v1/chat/completions works with Santra. Set OPENAI_BASE_URL to your server.
GroqTogether AIFireworks AIPerplexityAnyscaleDeepInfraMistral AICohereOllamaLM StudiovLLMtext-generation-webui
Nvidia NIM — on-prem inference
Nvidia NIM lets you deploy frontier models on your own hardware with an OpenAI-compatible API surface. Ideal for data-sovereignty and regulated environments.
# Nvidia-hosted (cloud)
$ export OPENAI_BASE_URL=https://integrate.api.nvidia.com/v1
$ export OPENAI_API_KEY=nvapi-...
$ export SANTRA_MODEL=meta/llama-3.1-405b-instruct
# Self-hosted NIM
$ export OPENAI_BASE_URL=http://nim.corp.internal:8000/v1
$ export OPENAI_API_KEY=your-nim-key
Fully offline — Ollama / LM Studio
No internet required for inference. Run locally and nothing leaves your machine.
Ollama
$ ollama pull llama3.1:70b
$ export OPENAI_BASE_URL=http://localhost:11434/v1
$ export OPENAI_API_KEY=ollama
LM Studio
$ export OPENAI_BASE_URL=http://localhost:1234/v1
$ export OPENAI_API_KEY=lm-studio