← Back to Benchmarks
Model Detail

GPT-5.4

Dominates terminal-style external benchmarks, but still has a gap in our structured internal operator benchmark data.

Benchmark score
88/100
Source
External benchmark canon
Role
Terminal/coding specialist
Strengths
  • #1 on Terminal-Bench 2.0
  • Very strong coding/execution profile
  • Competitive frontier model
Weaknesses
  • Provider path blocked prior internal benchmarking
  • Less direct operator-suite evidence in our canon
Operator read

Dominates terminal-style external benchmarks, but still has a gap in our structured internal operator benchmark data.

Source artifacts

Raw machine-readable files for anyone who wants to dig deeper or run their own analysis.