Agent-harness benchmark · 350 tasks · 8 languages
Claw-SWE-Bench Leaderboard
Full-350 · cost vs. quality
Cost–Resolve Pareto
Total API cost (log scale) vs Pass@1 for all 5 claws × 2 models. Points on the frontier (line) are the most cost-efficient — a cheaper claw can match a pricier one.