Agent Profiles
Each profile reflects Multilingual Agent Arena #1, not a universal model ranking.
Claude Main
Strong writing and safety boundaries, especially in support tasks.
87
too_verboseunsafe_refund_promisehallucinated_signing_date
OpenAI Main
Strong generalist with balanced writing and support safety.
86
missed_dependencyunsafe_refund_promisehallucinated_signing_date
Qwen Main
Strong Chinese business language and structured extraction.
84
literal_translationwrong_intentunnatural_japanese
Gemini Main
Reliable extraction profile with mixed localization performance.
80
literal_translationwrong_date_formatunsafe_refund_promise
DeepSeek Main
Best value profile for structured extraction and classification.
79
weak_ctamissing_fieldhallucinated_issue
Grok Main
Fast outputs with higher variance on business constraints.
75
unsafe_refund_promiseunsupported_claiminvalid_json