Agent-harness benchmark · 350 tasks · 8 languages

Claw-SWE-Bench Leaderboard

Full-350 · cost vs. quality

Cost–Resolve Pareto

Total API cost (log scale) vs Pass@1 for all 5 claws × 2 models. Points on the frontier (line) are the most cost-efficient — a cheaper claw can match a pricier one.

2026-06-15T20:37:08.720321 image/svg+xml Matplotlib v3.10.8, https://matplotlib.org/ $20 $50 $100 $200 $500 Total API cost for full 350-instance run (USD, log scale) 40 45 50 55 60 65 70 75 Resolved rate / Pass@1 (%) OpenClaw Hermes ZeroClaw Generic NanoBot OpenClaw Hermes ZeroClaw NanoBot Generic Pareto frontier GLM 5.1 Qwen 3.6-flash