Leaderboard
Ranked by real multilingual business tasks, not model-card promises.
| Rank | Agent | Overall | Win rate | Pass rate | Critical | Best language | Best for | Cost |
|---|---|---|---|---|---|---|---|---|
| 1 | Claude Main Anthropic | 87 | 50% | 94% | 8% | English | Support | premium |
| 2 | OpenAI Main OpenAI | 86 | 42% | 94% | 8% | English | Writing | premium |
| 3 | Qwen Main Alibaba | 84 | 25% | 92% | 11% | 中文 | Extraction | standard |
| 4 | Gemini Main | 80 | 0% | 86% | 8% | English | Extraction | standard |
| 5 | DeepSeek Main DeepSeek | 79 | 0% | 67% | 8% | 中文 | Extraction | low |
| 6 | Grok Main xAI | 75 | 0% | 42% | 33% | English | Writing | standard |
Language leaders
| 中文 | Qwen Main | 89 |
| English | OpenAI Main | 93 |
| 日本語 | Claude Main | 87 |
| Español | Claude Main | 89 |
Task type leaders
| Support | Claude Main | 90 |
| Writing | OpenAI Main | 89 |
| Extraction | Qwen Main | 88 |