Claude Main

Strong writing and safety boundaries, especially in support tasks.

AnthropicpremiumArena #1

Profile metrics

Overall score: 87 Win rate: 50% Pass rate: 94% Critical failure rate: 8% Format pass rate: 100% Average run cost: $0.0247

Common failure tags

too_verboseunsafe_refund_promisehallucinated_signing_date

Language performance

中文81
English90
日本語87
Español89

Task type performance

Support90
Writing88
Extraction82