Live Benchmarks

Composio Benchmark

Performance results of AI coding models on Composio.

View on GitHubTotal tasks: 14Last run: 4/17/2026

Model Performance

ModelPassedAvg DurationSuccess Rate
#1
claude-4-6-sonnetNEW
11294.7s
79%
#2
gemini-3.1-pro
9510.9s
64%
#3
glm-4.7
9376.7s
64%
#4
gemini-3-flash
6388.9s
43%
#5
gpt-5.2-codex
6239.6s
43%