Current theme: dark
← Back to Leaderboard
Task
Detailed breakdown of individual task performance across different models.
Status
All
Models
All
Task Name (14 tasks)
claude-4-6-sonnet
gemini-3-flash
gemini-3.1-pro
glm-4.7
gpt-5.2-codex
connection_status_checker_python
193.4s
339.0s
203.0s
189.5s
98.1s
custom_scope_session_ts
166.5s
600.1s
600.1s
159.0s
206.7s
filter_tools_by_app_cli
591.8s
600.1s
600.0s
83.4s
36.6s
github_auth_redirect
150.0s
123.4s
600.1s
426.3s
458.2s
github_issue_trigger
187.6s
129.0s
600.0s
600.1s
81.0s
github_label_commenter_python
279.9s
182.2s
238.7s
600.1s
269.8s
github_repo_description_cli
600.1s
600.1s
377.8s
575.0s
219.7s
gmail_summary_agent
274.0s
600.1s
600.1s
260.4s
170.0s
list_tools_python
103.4s
600.0s
587.1s
276.2s
213.7s
morning_sweep_agent
441.7s
236.2s
600.1s
582.7s
393.0s
python_github_star
141.5s
268.1s
600.1s
271.2s
69.0s
schema_modifier_ts
404.8s
137.9s
600.1s
319.2s
518.2s
trigger_lifecycle_ts
284.7s
344.3s
531.5s
600.1s
364.9s
v2_to_v3_migration
88.3s
466.9s
197.0s
112.2s
38.4s