Skip to main content
One endpoint for every AI tool. Bifrost handles the routing.
Bifrost is the OpenAI-compatible HTTP gateway that sits between every AI tool on the workstation and whichever provider eventually answers the call. It exposes http://localhost:30080/v1/chat/completions and fans out to OpenAI, Gemini, OpenRouter, and the local MLX server based on the task class.

Model routing conventions

Never hardcode model identifiers in committed config. Models change frequently; identifiers rot. Tools resolve task classes (Research, Coding, Review, Pre-commit) to a current model at call time via listmodels.
ContextFormat
Local MLX models through Bifrostmlx-local/<model> — Bifrost expects provider/model
Direct vllm-mlx on port 11434bare HuggingFace model ID — no prefix
Cloud models through Bifrostunprefixed — Bifrost routes by task class

Local-only mode

When localOnlyMode is enabled or the --local flag is passed, every request routes to the MLX inference server on port 11434. No cloud API calls occur. Verify the LaunchAgent is running before enabling local-only mode:
launchctl list | grep vllm-mlx

Priority in the AI gateway stack

Bifrost is the second layer in the gateway priority order:
  1. Anthropic official — Claude Code plugins, skills, patterns
  2. Bifrost AI gateway — multi-provider routing at localhost:30080
  3. Personal or custom — only when no alternative exists

Capabilities

Bifrost supports 23+ providers including OpenAI, Anthropic, AWS Bedrock, Google Vertex, and local inference servers. Key features:
  • Intelligent failover — transparent routing to a configured fallback when a provider is unavailable
  • Semantic caching — caches responses by semantic similarity, reducing cost and latency
  • MCP support — Model Context Protocol integration for multi-tool coordination
  • Prometheus metrics — built-in observability for latency, throughput, and cost tracking
Performance at scale: <100 µs gateway overhead at 5,000 RPS.

Deployment

Bifrost runs locally as a lightweight gateway process. Options:
npx bifrost@latest       # 30-second startup via NPX
Docker containers and a Go SDK are also available for embedded or orchestrated deployments.

See also

  • AI development pipeline — how Bifrost fits into the full model-routing pipeline
  • nix-ai — Nix package and config layer that manages the Bifrost process