Smart Spawn — The right AI model for every task

Four ways to route

Different problems need different strategies.

1

Single

One task. Best model. Done.

Task ——→ [Score] ——→ Best Model ✓

Describe what you need. Smart Spawn scores every model in your budget tier for that category and picks the winner. Coding tasks get coders. Research gets readers.

N

Collective

Cheap models, expensive results.

┌──→ Model A ──┐ Task ──├──→ Model B ──┤──→ Merge ✓ └──→ Model C ──┘

Same prompt, three diverse models, parallel execution. Merge the best parts. Budget models brainstorming together regularly match premium on creative work. For about 1/50th the price.

↓

Cascade

Start cheap. Escalate only if you have to.

Task ──→ Cheap ──→ Good? ├── Yes ─→ Done (saved 90%) └── No ─→ Premium ─→ Done ✓

The $0.10 model goes first. If it handles it (and it usually does), you saved 90%+. If not, premium takes over. Most routine tasks never need to escalate.

*

Swarm

Big problems, decomposed.

┌── Research ─→ Gemini Flash Task ──├── Code ─→ DeepSeek R1 ├── Design ─→ Claude Sonnet └── Review ─→ GPT-5

Break a complex project into a dependency graph of subtasks. Each piece gets the model that scores highest for that specific job. Research to a context specialist, code to a code machine, review to a reasoning engine.

Cheap models need better prompts

A $0.10 model with a well-structured expert prompt regularly outperforms a $15 model with a lazy one. Smart Spawn builds that prompt for you.

You provide

task: "Build a billing dashboard"
persona: fullstack-engineer
stack: [nextjs, typescript, stripe]
domain: saas

Smart Spawn builds

Expert system prompt with role context, stack conventions, domain knowledge, output format constraints, and guardrails.

15+ personas, 30+ stack blocks, 8 domains, configurable guardrails. The API composes them into a single prompt before routing.

This is how collective mode works so well on budget models. Three cheap agents with expert-crafted prompts, brainstorming in parallel, routinely match a single premium model on creative and architectural tasks.

Real numbers

6,000 requests · 168M input · 15M output

Without routing (Opus 4.6) $2,550/mo

With Smart Spawn $600/mo

You save $1,950/mo (~77%)

Based on power-user workload. Routing splits 15-20% to premium models, remainder to cost-optimized alternatives. All pricing from published API rates.

~80%

Tasks resolve at budget tier

Cascade catches the 20% that actually need premium. The rest? A $0.10 model handles it.

50×

Cheaper for brainstorming

3 budget models collectively ($0.30/M) vs 1 premium ($15/M). The collective wins on creative work.

6hr

Benchmark refresh

New models ship weekly. Scores go stale. Smart Spawn re-pulls from all 5 sources every 6 hours.

How scoring works

Five benchmark sources. Z-score normalization. Category-specific weighting.

Data Sources

OpenRouter

Model catalog, pricing, capabilities, context lengths

Artificial Analysis

Intelligence, coding, and math quality indices

HuggingFace Leaderboard

MMLU, BBH, and academic benchmarks

LMArena (Chatbot Arena)

ELO ratings from human preference battles

LiveBench

Contamination-free coding and reasoning scores

The Pipeline

01 Normalize

Different benchmarks use different scales. An "intelligence index" of 65 means something completely different than an Arena ELO of 1350. Everything gets z-score normalized, so 2σ above average on any benchmark maps to the same score.

02 Categorize

Models get scored per category: coding, reasoning, creative, vision, research, general. Coding benchmarks weigh more for coding tasks, creativity benchmarks for creative ones. Each task type gets its own ranking.

03 Blend

Final score = benchmarks + your personal feedback + community ratings + context signals. Your ratings feed back into future picks.

Live from the API

Real numbers. Right now.

—

Models Tracked

—

With Benchmarks

—

Sources Online

—

Last Refresh

Try it yourself

Describe a task, pick a budget, see what comes back. No signup, no API key.

Task

Budget

Three ways in

OpenClaw plugin, lightweight skill, or MCP server. All talk to the same scoring engine.

Install

openclaw plugins install @deeflectcom/smart-spawn

Config (optional)

# openclaw.yaml
plugins:
  smart-spawn:
    budget: medium
    mode: single

● Full tool integration for OpenClaw. Your agent gets smart_spawn as a native command with all spawn modes, feedback loops, and local scoring.

Option A — Send the link to your agent

https://github.com/deeflect/smart-spawn/blob/main/skills/SKILL.md

Paste this URL in chat. Your agent reads it and learns the API.

Option B — Download to skills folder

mkdir -p ~/.openclaw/skills/smart-spawn-api
curl -sL \
  https://raw.githubusercontent.com/deeflect/smart-spawn/main/skills/SKILL.md \
  -o ~/.openclaw/skills/smart-spawn-api/SKILL.md

● No plugin needed. Your agent calls the API via HTTP. Works with any OpenClaw instance, Claude Code, or anything that can make web requests.

Setup

git clone https://github.com/deeflect/smart-spawn.git
cd smart-spawn/mcp-server && npm install

Add to your MCP config (Claude Desktop, Codex, etc.)

{
  "mcpServers": {
    "smart-spawn": {
      "command": "bun",
      "args": ["run", "--cwd", "/path/to/mcp-server", "start"],
      "env": {
        "OPENROUTER_API_KEY": "your_key"
      }
    }
  }
}

Available tools

run_create Create async multi-agent runs (single, collective, cascade, swarm)

run_status Poll progress, node completion, cost tracking

run_result Get merged output from all sub-agents

artifact_get Fetch stored artifacts by run and node

● Local MCP server that runs async multi-agent workflows via OpenRouter. Works with Claude Desktop, Codex, or any MCP-compatible client. Requires an OpenRouter API key.

Ready to stop picking models by hand?

One command. No config. Your agent starts picking the right model immediately.

Get Started Read the source

You're paying for the wrong model.