Smart Spawn scores every model against 5 benchmark sources and routes to the one that fits your task and budget.
Different problems need different strategies.
One task. Best model. Done.
Describe what you need. Smart Spawn scores every model in your budget tier for that category and picks the winner. Coding tasks get coders. Research gets readers.
Cheap models, expensive results.
Same prompt, three diverse models, parallel execution. Merge the best parts. Budget models brainstorming together regularly match premium on creative work. For about 1/50th the price.
Start cheap. Escalate only if you have to.
The $0.10 model goes first. If it handles it (and it usually does), you saved 90%+. If not, premium takes over. Most routine tasks never need to escalate.
Big problems, decomposed.
Break a complex project into a dependency graph of subtasks. Each piece gets the model that scores highest for that specific job. Research to a context specialist, code to a code machine, review to a reasoning engine.
A $0.10 model with a well-structured expert prompt regularly outperforms a $15 model with a lazy one. Smart Spawn builds that prompt for you.
task: "Build a billing dashboard"
persona: fullstack-engineer
stack: [nextjs, typescript, stripe]
domain: saas
Expert system prompt with role context, stack conventions, domain knowledge, output format constraints, and guardrails.
15+ personas, 30+ stack blocks, 8 domains, configurable guardrails. The API composes them into a single prompt before routing.
This is how collective mode works so well on budget models. Three cheap agents with expert-crafted prompts, brainstorming in parallel, routinely match a single premium model on creative and architectural tasks.
Based on power-user workload. Routing splits 15-20% to premium models, remainder to cost-optimized alternatives. All pricing from published API rates.
Five benchmark sources. Z-score normalization. Category-specific weighting.
Different benchmarks use different scales. An "intelligence index" of 65 means something completely different than an Arena ELO of 1350. Everything gets z-score normalized, so 2σ above average on any benchmark maps to the same score.
Models get scored per category: coding, reasoning, creative, vision, research, general. Coding benchmarks weigh more for coding tasks, creativity benchmarks for creative ones. Each task type gets its own ranking.
Final score = benchmarks + your personal feedback + community ratings + context signals. Your ratings feed back into future picks.
Real numbers. Right now.
Describe a task, pick a budget, see what comes back. No signup, no API key.
OpenClaw plugin, lightweight skill, or MCP server. All talk to the same scoring engine.
openclaw plugins install @deeflectcom/smart-spawn
# openclaw.yaml
plugins:
smart-spawn:
budget: medium
mode: single
●
Full tool integration for OpenClaw. Your agent gets smart_spawn as a native command with all spawn modes, feedback loops, and local scoring.
One command. No config. Your agent starts picking the right model immediately.