claude-router — auto-route Claude API calls to the cheapest capable model

Why it exists

Your client pins one model, so you pay one price for everything — including the "rename this variable" turns. claude-router prices each request for what it actually needs.

◆Zero code changes

Point any Anthropic-compatible app — Claude Code, Cursor, Cline, your own SDK code — at the proxy via ANTHROPIC_BASE_URL. Or import it as a TypeScript library.

◆Measurable savings

Cost math includes prompt-cache pricing. Exact cents saved per call, lifetime totals with claude-router stats, and a live dashboard.

◆Smart classification

A 0-ms heuristic scores each prompt. Ambiguous cases get confirmed by Haiku itself for ~$0.00004 — cached, time-boxed, and never able to block a request.

◆Self-healing

Truncated or refused responses auto-retry one tier up. Rate-limited tiers fall forward to the next. Failures degrade gracefully, never fatally.

◆Cross-platform install

One command on Windows, macOS, or Linux: background daemon, login autostart, env var, Claude Code statusline — every ✓ verified against a real health check.

◆Local & private

Binds to 127.0.0.1 only. Your keys never leave your machine — the proxy forwards whatever credentials your app already sends.

How routing works

Every request gets a difficulty score from 0 to 100. The score decides the tier; thresholds are tunable in the config.

Haiku · score < 30

Sonnet · 30 – 70

Opus · > 70

03070100

− simple verbs (translate, list, format) + complex verbs (architect, prove, design) + math & science signals + code blocks + tool use & images + long context & deep conversations + expert system prompts

Scores in the ambiguous middle band (40–60) — or prompts the heuristic can't read, like non-English text — get a second opinion from Haiku itself: a 1–3 complexity rating costing ~$0.00004, cached so repeats are free, with a 1.5s timeout that falls back to the heuristic. After the response, two safety nets: truncated or refused answers retry one tier up; rate-limited tiers fall forward.

Get started in 60 seconds

1Install and set up (starts the proxy, wires up autostart + ANTHROPIC_BASE_URL):

npm install -g @sheruq/claude-router
claude-router install --force-route

2Open a new terminal and use Claude Code normally:

claude

3Watch the savings:

claude-router stats        # lifetime savings + per-day breakdown
claude-router status       # health and routing state
claude-router doctor       # diagnose setup problems

--force-route routes every request by difficulty even when the client pins a model (Claude Code always does). The proxy adapts model-specific parameters to the routed tier, so nothing breaks.

The economics

Model	Input $/1M	Output $/1M
Claude Haiku 4.5	$1.00	$5.00
Claude Sonnet 5	$3.00	$15.00
Claude Opus 4.8	$5.00	$25.00

Most coding-session turns are simple — file edits, short questions, formatting. Haiku output costs 3× less than Sonnet and 5× less than Opus. Routing the easy majority down while keeping Opus for the genuinely hard questions is where the savings come from — and every response header tells you exactly what you saved.

Or use it as a library

Prefer routing inside your own app? Same engine, no proxy.

npm install @sheruq/claude-router

import { createRouter } from '@sheruq/claude-router';

const router = createRouter({ apiKey: process.env.ANTHROPIC_API_KEY });

const res = await router.send({
  messages: [{ role: 'user', content: 'Translate to French: Hello' }],
  max_tokens: 100,
});

res.meta.tier;        // 'haiku'
res.meta.savedCents;  // 1.2