← Back to ranking

Model result · rank #11

Step 3.7 Flash

OpenRouter · stepfun/step-3.7-flash · extra-high reasoning · Full + SWE + Hard measured. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.

Overall score68.06

Rank #11

Full / Agentic89.79

Full rank #3

SWE MVP80.39

SWE rank #7

Hard Intelligence33.99

Hard rank #11

Measured cost$0.494

100.0% reliability

Overall

All-around publication view

Score68.06
Formulamean(Full, SWE, Hard Intelligence)
BasisOpenRouter · stepfun/step-3.7-flash · extra-high reasoning · Full + SWE + Hard measured

The overall score averages the measured major lanes while keeping each source measurement visible.

Lane 01

Full / Agentic benchmark

Final89.79
Capability89.97
Agentic84.11
Pass rate88.4%
Prompts43

This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.

Lane 02

Software engineering MVP

SWE score80.39
Focused final71.22
Capability80.39
Daily driver67.40
Prompts24

This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.

Lane 03

Hard Intelligence diagnostic

Hard score33.99
Active inquiry17.50
Online adaptation80.00
Self-repair29.46
Authority integrity9.00

Hard Intelligence measures active inquiry, online adaptation, evidence-driven self-repair, and authority/salience integrity.

Telemetry

Runtime economics

Full cost$0.119
SWE cost$0.260
Hard cost$0.116
Full avg seconds7.20
Hard records8
Decode104.69

Cost, time, and runtime basis are telemetry. They explain tradeoffs; they do not secretly overwrite the capability scores.

Interpretation

Why this result lands here.

The model is stronger in the Full/Agentic lane than in the SWE lane; the overall score is therefore shown with both component lanes visible. Hard Intelligence score is 33.99 and contributes to the overall score alongside Full/Agentic and SWE. Hard Intelligence is published alongside Full/Agentic and SWE for the current ranking.