Live Benchmarks

Composio Benchmark

Performance results of AI coding models on Composio.

Total tasks: 14

Last run: 4/17/2026

Model Performance

Model	Passed	Avg Duration	Success Rate
#1 claude-4-6-sonnetNEW	11	294.7s	79%
#2 gemini-3.1-pro	9	510.9s	64%
#3 glm-4.7	9	376.7s	64%
#4 gemini-3-flash	6	388.9s	43%
#5 gpt-5.2-codex	6	239.6s	43%