Ranked entrants20

20 with Hard Intelligence data

Current leaderGPT‑5.5

Overall 88.22

Score spread33.09

#1 to #20

FormulaLane mean

Full + SWE + published Hard Intelligence when measured

Data refreshJul 09, 2026

Static HTML plus public JSON

Chart

Overall ladder

Every ranked entrant ordered by public overall score.

GPT‑5.5 88.22 rank #1

GPT‑5.6 Terra 87.70 rank #2

DeepSeek V4 Flash 86.08 rank #3

GPT‑5.6 Sol 84.57 rank #4

Claude Opus 4.8 84.45 rank #5

Claude Fable 5 83.74 rank #6

GLM‑5.2 83.68 rank #7

Gemini 3.5 Flash 83.38 rank #8

DeepSeek V4 Pro 83.36 rank #9

GPT‑5.6 Luna 82.13 rank #10

Claude Sonnet 5 82.08 rank #11

Qwen3.7 Max 80.71 rank #12

MiniMax M3 79.66 rank #13

MiniMax M3 Direct Plus 74.63 rank #14

Kimi K2.7 Code 71.01 rank #15

Step 3.7 Flash 68.06 rank #16

NVIDIA Nemotron 3 Ultra 67.58 rank #17

Gemma4‑12B‑Coder Fable5/Composer2.5 Q4_K_MLocal model 61.19 rank #18

Ornith‑1.0‑35B Q4_K_MLocal model 55.57 rank #19

Qwythos‑9B Claude Mythos Q8_0Local model 55.13 rank #20

Overall is a lane mean, not a hidden replacement for source measurements.

Chart

Tradeoff scatter maps

Each point is one tested model at the intersection of two public telemetry axes.

Cost × overall

Runtime × overall

Recorded tokens/item × cost

Use these maps to read quality versus cost, speed, and recorded token use. Runtime and token axes are normalized per item and show coverage in hover cards; they are telemetry, not current overall score inputs.

Chart

Lane contrast

Top eight entrants with Full, SWE, and Hard Intelligence shown side by side.

#1 GPT‑5.5

Full 85.76 SWE 85.51 Hard 93.40

#2 GPT‑5.6 Terra

Full 88.85 SWE 86.08 Hard 88.18

#3 DeepSeek V4 Flash

Full 95.80 SWE 86.95 Hard 75.48

#4 GPT‑5.6 Sol

Full 86.27 SWE 76.12 Hard 91.31

#5 Claude Opus 4.8

Full 83.32 SWE 88.67 Hard 81.37

#6 Claude Fable 5

Full 77.02 SWE 81.42 Hard 92.78

#7 GLM‑5.2

Full 86.87 SWE 89.58 Hard 74.58

#8 Gemini 3.5 Flash

Full 88.79 SWE 73.59 Hard 87.75

Hard Intelligence is shown as its own lane so cross-lane strengths and weaknesses stay visible.

Chart

Measured cost context

Cost is shown because deployment economics matter, but it does not secretly rewrite capability scores.

GPT‑5.5 $4.071 rank #1

GPT‑5.6 Terra $1.867 rank #2

DeepSeek V4 Flash $0.290 rank #3

GPT‑5.6 Sol $3.858 rank #4

Claude Opus 4.8 $6.115 rank #5

Claude Fable 5 $12.285 rank #6

GLM‑5.2 $2.768 rank #7

Gemini 3.5 Flash $1.646 rank #8

DeepSeek V4 Pro $0.335 rank #9

GPT‑5.6 Luna $1.000 rank #10

Claude Sonnet 5 $2.812 rank #11

Qwen3.7 Max $0.906 rank #12

MiniMax M3 $0.182 rank #13

MiniMax M3 Direct Plus $0.0041 rank #14

Kimi K2.7 Code $0.732 rank #15

Step 3.7 Flash $0.494 rank #16

NVIDIA Nemotron 3 Ultra $0.489 rank #17

Gemma4‑12B‑Coder Fable5/Composer2.5 Q4_K_M $0 rank #18

Ornith‑1.0‑35B Q4_K_M $0 rank #19

Qwythos‑9B Claude Mythos Q8_0 $0 rank #20

Very expensive rows are not punished twice; cost is visible telemetry and part of the public interpretation.

Chart

Lane balance pressure

Largest gap between each entrant’s strongest and weakest measured major lane.

Step 3.7 Flash 55.80 Hard Intelligence 33.99 vs Full / Agentic 89.79 · rank #16

Ornith‑1.0‑35B Q4_K_M 38.80 SWE MVP 39.16 vs Full / Agentic 77.96 · rank #19

Gemma4‑12B‑Coder Fable5/Composer2.5 Q4_K_M 31.64 Hard Intelligence 48.96 vs Full / Agentic 80.60 · rank #18

Qwythos‑9B Claude Mythos Q8_0 31.07 Hard Intelligence 43.91 vs Full / Agentic 74.98 · rank #20

Kimi K2.7 Code 29.34 SWE MVP 58.61 vs Full / Agentic 87.95 · rank #15

NVIDIA Nemotron 3 Ultra 20.91 SWE MVP 58.63 vs Full / Agentic 79.54 · rank #17

DeepSeek V4 Flash 20.32 Hard Intelligence 75.48 vs Full / Agentic 95.80 · rank #3

MiniMax M3 20.11 Hard Intelligence 66.77 vs SWE MVP 86.88 · rank #13

Qwen3.7 Max 19.09 Hard Intelligence 69.90 vs SWE MVP 88.99 · rank #12

MiniMax M3 Direct Plus 16.55 SWE MVP 67.77 vs Full / Agentic 84.32 · rank #14

Claude Fable 5 15.76 Full / Agentic 77.02 vs Hard Intelligence 92.78 · rank #6

Gemini 3.5 Flash 15.20 SWE MVP 73.59 vs Full / Agentic 88.79 · rank #8

GPT‑5.6 Sol 15.19 SWE MVP 76.12 vs Hard Intelligence 91.31 · rank #4

GLM‑5.2 15.00 Hard Intelligence 74.58 vs SWE MVP 89.58 · rank #7

GPT‑5.6 Luna 14.06 SWE MVP 75.84 vs Full / Agentic 89.90 · rank #10

DeepSeek V4 Pro 13.67 Hard Intelligence 78.38 vs Full / Agentic 92.04 · rank #9

Claude Sonnet 5 8.45 SWE MVP 78.08 vs Hard Intelligence 86.53 · rank #11

GPT‑5.5 7.89 SWE MVP 85.51 vs Hard Intelligence 93.40 · rank #1

Claude Opus 4.8 7.30 Hard Intelligence 81.37 vs SWE MVP 88.67 · rank #5

GPT‑5.6 Terra 2.77 SWE MVP 86.08 vs Full / Agentic 88.85 · rank #2

Lower pressure means a more even profile; higher pressure explains why one strong lane may not lift the overall rank by itself.

Breadth wins the top spot

GPT‑5.5 leads because its measured lanes stay high together: overall 88.22, Full 85.76, SWE 85.51, and Hard Intelligence 93.40.

Full / Agentic alone does not decide

DeepSeek V4 Flash owns Full rank #1 at 95.80, but the overall formula still checks SWE and Hard Intelligence before ordering the table.

SWE is a separate capability signal

GLM‑5.2 owns SWE rank #1 at 89.58. That lane rewards practical implementation and review behavior rather than only general prompt competence.

Hard Intelligence reshapes the table

GPT‑5.5 owns Hard Intelligence rank #1 at 93.40. That lane tests active inquiry, adaptation, repair, and authority integrity separately from Full and SWE.

The clearest drag is visible

Step 3.7 Flash has a Full/SWE average near 85.09, but Hard Intelligence is 33.99, so the blended overall lands at 68.06.

Full ranking table

Table with reasons, not just numbers.

Each row states the score formula, lane ranks, cost context, and the main reason the entrant lands at its current position.

Ranking data

Rank	Model	Overall	Full	SWE	Hard IQ	Formula	Cost + telemetry	Why here
#1	GPT‑5.5	88.22	85.76 #10	85.51 #7	93.40 #1	mean(Full, SWE, Hard Intelligence)	$4.071 19.57s/item · tokens complete	Overall 88.22 uses mean(Full, SWE, Hard Intelligence). Strength signal: Hard Intelligence rank #1. Main limiter: SWE MVP at 85.51. Hard Intelligence contributes to the ranking as a separate measured lane.
#2	GPT‑5.6 Terra	87.70	88.85 #5	86.08 #6	88.18 #4	mean(Full, SWE, Hard Intelligence)	$1.867 5.75s/item · tokens complete	Overall 87.70 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 88.85. Main limiter: SWE MVP at 86.08. Hard Intelligence contributes to the ranking as a separate measured lane.
#3	DeepSeek V4 Flash	86.08	95.80 #1	86.95 #4	75.48 #10	mean(Full, SWE, Hard Intelligence)	$0.290 22.93s/item · tokens complete	Overall 86.08 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full rank #1. Main limiter: Hard Intelligence at 75.48. Hard Intelligence contributes to the ranking as a separate measured lane.
#4	GPT‑5.6 Sol	84.57	86.27 #9	76.12 #12	91.31 #3	mean(Full, SWE, Hard Intelligence)	$3.858 13.30s/item · tokens complete	Overall 84.57 uses mean(Full, SWE, Hard Intelligence). Strength signal: Hard Intelligence rank #3. Main limiter: SWE MVP at 76.12. Hard Intelligence contributes to the ranking as a separate measured lane.
#5	Claude Opus 4.8	84.45	83.32 #13	88.67 #3	81.37 #7	mean(Full, SWE, Hard Intelligence)	$6.115 10.43s/item · tokens complete	Overall 84.45 uses mean(Full, SWE, Hard Intelligence). Strength signal: SWE rank #3. Main limiter: Hard Intelligence at 81.37. Hard Intelligence contributes to the ranking as a separate measured lane.
#6	Claude Fable 5	83.74	77.02 #19	81.42 #8	92.78 #2	mean(Full, SWE, Hard Intelligence)	$12.285 12.37s/item · tokens complete	Overall 83.74 uses mean(Full, SWE, Hard Intelligence). Strength signal: Hard Intelligence rank #2. Main limiter: Full / Agentic at 77.02. Hard Intelligence contributes to the ranking as a separate measured lane.
#7	GLM‑5.2	83.68	86.87 #8	89.58 #1	74.58 #11	mean(Full, SWE, Hard Intelligence)	$2.768 91.19s/item · tokens complete	Overall 83.68 uses mean(Full, SWE, Hard Intelligence). Strength signal: SWE rank #1. Main limiter: Hard Intelligence at 74.58. Hard Intelligence contributes to the ranking as a separate measured lane.
#8	Gemini 3.5 Flash	83.38	88.79 #6	73.59 #14	87.75 #5	mean(Full, SWE, Hard Intelligence)	$1.646 9.13s/item · tokens complete	Overall 83.38 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 88.79. Main limiter: SWE MVP at 73.59. Hard Intelligence contributes to the ranking as a separate measured lane.
#9	DeepSeek V4 Pro	83.36	92.04 #2	79.68 #10	78.38 #9	mean(Full, SWE, Hard Intelligence)	$0.335 23.25s/item · tokens complete	Overall 83.36 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full rank #2. Main limiter: Hard Intelligence at 78.38. Hard Intelligence contributes to the ranking as a separate measured lane.
#10	GPT‑5.6 Luna	82.13	89.90 #3	75.84 #13	80.66 #8	mean(Full, SWE, Hard Intelligence)	$1.000 8.02s/item · tokens complete	Overall 82.13 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full rank #3. Main limiter: SWE MVP at 75.84. Hard Intelligence contributes to the ranking as a separate measured lane.
#11	Claude Sonnet 5	82.08	81.64 #15	78.08 #11	86.53 #6	mean(Full, SWE, Hard Intelligence)	$2.812 12.51s/item · tokens complete	Overall 82.08 uses mean(Full, SWE, Hard Intelligence). Strength signal: Hard Intelligence at 86.53. Main limiter: SWE MVP at 78.08. Hard Intelligence contributes to the ranking as a separate measured lane.
#12	Qwen3.7 Max	80.71	83.24 #14	88.99 #2	69.90 #13	mean(Full, SWE, Hard Intelligence)	$0.906 22.71s/item · tokens complete	Overall 80.71 uses mean(Full, SWE, Hard Intelligence). Strength signal: SWE rank #2. Main limiter: Hard Intelligence at 69.90. Hard Intelligence contributes to the ranking as a separate measured lane.
#13	MiniMax M3	79.66	85.33 #11	86.88 #5	66.77 #14	mean(Full, SWE, Hard Intelligence)	$0.182 27.52s/item · tokens complete	Overall 79.66 uses mean(Full, SWE, Hard Intelligence). Strength signal: SWE MVP at 86.88. Main limiter: Hard Intelligence at 66.77. Hard Intelligence contributes to the ranking as a separate measured lane.
#14	MiniMax M3 Direct Plus	74.63	84.32 #12	67.77 #15	71.80 #12	mean(Full, SWE, Hard Intelligence)	$0.0041 21.94s/item · tokens complete	Overall 74.63 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 84.32. Main limiter: SWE MVP at 67.77. Hard Intelligence contributes to the ranking as a separate measured lane.
#15	Kimi K2.7 Code	71.01	87.95 #7	58.61 #17	66.46 #15	mean(Full, SWE, Hard Intelligence)	$0.732 28.60s/item · tokens complete	Overall 71.01 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 87.95. Main limiter: SWE MVP at 58.61. Hard Intelligence contributes to the ranking as a separate measured lane.
#16	Step 3.7 Flash	68.06	89.79 #4	80.39 #9	33.99 #20	mean(Full, SWE, Hard Intelligence)	$0.494 27.77s/item · tokens complete	Overall 68.06 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 89.79. Main limiter: Hard Intelligence at 33.99. Hard Intelligence contributes to the ranking as a separate measured lane.
#17	NVIDIA Nemotron 3 Ultra	67.58	79.54 #17	58.63 #16	64.56 #16	mean(Full, SWE, Hard Intelligence)	$0.489 70.71s/item · tokens complete	Overall 67.58 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 79.54. Main limiter: SWE MVP at 58.63. Hard Intelligence contributes to the ranking as a separate measured lane.
#18	Gemma4‑12B‑Coder Fable5/Composer2.5 Q4_K_MLocal model	61.19	80.60 #16	54.01 #18	48.96 #18	mean(Full, SWE, Hard Intelligence)	$0 21.52s/item · tokens complete	Overall 61.19 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 80.60. Main limiter: Hard Intelligence at 48.96. Hard Intelligence contributes to the ranking as a separate measured lane.
#19	Ornith‑1.0‑35B Q4_K_MLocal model	55.57	77.96 #18	39.16 #20	49.58 #17	mean(Full, SWE, Hard Intelligence)	$0 117.66s/item · tokens complete	Overall 55.57 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 77.96. Main limiter: SWE MVP at 39.16. Hard Intelligence contributes to the ranking as a separate measured lane.
#20	Qwythos‑9B Claude Mythos Q8_0Local model	55.13	74.98 #20	46.51 #19	43.91 #19	mean(Full, SWE, Hard Intelligence)	$0 21.13s/item · tokens complete	Overall 55.13 uses mean(Full, SWE, Hard Intelligence). Strength signal: Full / Agentic at 74.98. Main limiter: Hard Intelligence at 43.91. Hard Intelligence contributes to the ranking as a separate measured lane.

Interpretation

Why the leader leads

LeaderGPT‑5.5

Overall88.22

Full85.76

SWE85.51

Hard IQ93.40

The top rank belongs to the entrant with the strongest cross-lane balance under the current formula, not simply the best isolated lane score.

Lane policy

How Hard Intelligence is handled

Scopeactive inquiry + adaptation + repair

Formula roleincluded when measured

Blank cellsnot yet measured

Interpretationseparate from Full and SWE

When a Hard Intelligence score is published, it becomes the third major lane in the overall mean. Otherwise the row remains ranked by the measured lanes it has.

Tie-break reading

How to compare close rows

Overallfirst glance

Lane ranksdiagnosis

Costruntime context

Reliabilityoperational risk

Close overall scores should be read through the lane breakdown. A model can be strong for building software while weaker at active inquiry, or the reverse.

Why the ranking looks like this.

Overall ladder

Tradeoff scatter maps

Lane contrast

Measured cost context

Lane balance pressure

Breadth wins the top spot

Full / Agentic alone does not decide

SWE is a separate capability signal

Hard Intelligence reshapes the table

The clearest drag is visible

Table with reasons, not just numbers.

Why the leader leads

How Hard Intelligence is handled

How to compare close rows