A local proxy that scores every request's difficulty and routes it to the cheapest Claude model that can handle it — Haiku, Sonnet, or Opus. Every response reports exactly how much you saved.
Get started npm ↗[claude-router] → haiku (heuristic, 0ms) | cost: $0.0010 | saved: $0.0030 vs claude-sonnet-5 [claude-router] → sonnet (heuristic, 0ms) | cost: $0.0180 | saved: $0.0000 vs claude-sonnet-5 [claude-router] → opus (hybrid, 48ms) | cost: $0.0890 | extra: $0.0740 vs claude-sonnet-5 [claude-router] → haiku (ai, 0ms, cached) | cost: $0.0008 | saved: $0.0034 vs claude-sonnet-5
Your client pins one model, so you pay one price for everything — including the "rename this variable" turns. claude-router prices each request for what it actually needs.
Point any Anthropic-compatible app — Claude Code, Cursor, Cline, your own SDK code — at the proxy via ANTHROPIC_BASE_URL. Or import it as a TypeScript library.
Cost math includes prompt-cache pricing. Exact cents saved per call, lifetime totals with claude-router stats, and a live dashboard.
A 0-ms heuristic scores each prompt. Ambiguous cases get confirmed by Haiku itself for ~$0.00004 — cached, time-boxed, and never able to block a request.
Truncated or refused responses auto-retry one tier up. Rate-limited tiers fall forward to the next. Failures degrade gracefully, never fatally.
One command on Windows, macOS, or Linux: background daemon, login autostart, env var, Claude Code statusline — every ✓ verified against a real health check.
Binds to 127.0.0.1 only. Your keys never leave your machine — the proxy forwards whatever credentials your app already sends.
Every request gets a difficulty score from 0 to 100. The score decides the tier; thresholds are tunable in the config.
Scores in the ambiguous middle band (40–60) — or prompts the heuristic can't read, like non-English text — get a second opinion from Haiku itself: a 1–3 complexity rating costing ~$0.00004, cached so repeats are free, with a 1.5s timeout that falls back to the heuristic. After the response, two safety nets: truncated or refused answers retry one tier up; rate-limited tiers fall forward.
1Install and set up (starts the proxy, wires up autostart + ANTHROPIC_BASE_URL):
npm install -g @sheruq/claude-router
claude-router install --force-route2Open a new terminal and use Claude Code normally:
claude3Watch the savings:
claude-router stats # lifetime savings + per-day breakdown
claude-router status # health and routing state
claude-router doctor # diagnose setup problems--force-route routes every request by difficulty even when the client pins a model (Claude Code always does). The proxy adapts model-specific parameters to the routed tier, so nothing breaks.
| Model | Input $/1M | Output $/1M |
|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 |
| Claude Sonnet 5 | $3.00 | $15.00 |
| Claude Opus 4.8 | $5.00 | $25.00 |
Most coding-session turns are simple — file edits, short questions, formatting. Haiku output costs 3× less than Sonnet and 5× less than Opus. Routing the easy majority down while keeping Opus for the genuinely hard questions is where the savings come from — and every response header tells you exactly what you saved.
Prefer routing inside your own app? Same engine, no proxy.
npm install @sheruq/claude-routerimport { createRouter } from '@sheruq/claude-router';
const router = createRouter({ apiKey: process.env.ANTHROPIC_API_KEY });
const res = await router.send({
messages: [{ role: 'user', content: 'Translate to French: Hello' }],
max_tokens: 100,
});
res.meta.tier; // 'haiku'
res.meta.savedCents; // 1.2