Code
SWE-bench Verified
Полное сравнение 20 моделей: процент реальных GitHub-issues, которые модель исправила автономно (SWE-bench Verified, 500 задач).
Data updated: 06/20/2026
Sources
Benchmarks are published by test authors. Methodologies differ; scores do not replace reliability ratings in the catalog. Check the primary source before choosing.