Model Detail
Gemini 3.1 Pro
Monster external benchmark profile and perfect messaging score, but still missing full operator-suite verification in our canon.
Benchmark score
100/100
Source
Messaging benchmark + external canon
Role
Top external contender
Strengths
- Excellent SWE-bench/GPQA profile
- Perfect messaging benchmark score
- Likely top-tier general capability
Weaknesses
- Missing full Suite v2 operator run
- Less grounded in our internal operator data than Opus/GLM
Operator read
Monster external benchmark profile and perfect messaging score, but still missing full operator-suite verification in our canon.
Source artifacts
Raw machine-readable files for anyone who wants to dig deeper or run their own analysis.