Current theme: dark
Live Benchmarks
Composio Benchmark
Performance results of AI coding models on Composio.
View on GitHub
Total tasks: 14
Last run: 4/17/2026
Model Performance
View Tasks
Model
Passed
Avg Duration
Success Rate
#1
claude-4-6-sonnet
NEW
11
294.7s
79%
#2
gemini-3.1-pro
9
510.9s
64%
#3
glm-4.7
9
376.7s
64%
#4
gemini-3-flash
6
388.9s
43%
#5
gpt-5.2-codex
6
239.6s
43%