Set a dollar limit per key, session, or user. Requests get rejected before they reach the provider.
# wrap any command
$ npx @f3d1/llmkit-cli -- python agent.py
... agent runs normally ...
claude-sonnet-4 $0.0847 1,204 in / 380 out cache saved $0.31
gpt-4.1-mini $0.0182 890 in / 241 out
Session total: $4.12 / $50.00 budget 14 reqs in 38s
11 tools for cost tracking inside your IDE. 5 work locally by reading Claude Code, Cursor, and Cline session data. No account needed.
Learn more ->
Budget enforcement that actually blocks requests. Reservation pattern: estimate before, reject if over, settle after. Per-key and per-session limits.
Get started ->
Spend by model, provider, and session. Request log with full cost breakdown. API key management, budget configuration, anomaly detection.
Try it free ->
11 providers. 730+ models priced. Cache-aware pricing that tracks read and write tokens separately.
Anthropic
29 models
OpenAI
145 models
Google Gemini
50 models
xAI Grok
39 models
DeepSeek
6 models
Groq
37 models
Mistral
63 models
Together
105 models
Fireworks
257 models
Ollama
local
OpenRouter
meta-gateway
Cost is reserved before the request. Exceeded means rejected, not logged after the fact.
Prompt caching makes tokens up to 90% cheaper. We track cached and uncached separately.
MIT licensed. Self-host on Cloudflare Workers free tier. Your keys stay in your infra.
Free while in beta. No credit card.
MIT licensed. Built with Claude Code. Source on GitHub