← Back to ranking

Model result · rank #2

Qwen3.7 Max

OpenRouter · extra-high reasoning. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.

Overall score86.11

Rank #2

Full / Agentic83.24

Full rank #9

SWE MVP88.99

SWE rank #1

Measured cost$0.851

100.0% reliability

Overall

All-around publication view

Score86.11
Formula50% Full + 50% SWE
BasisOpenRouter · extra-high reasoning

The overall score is calculated from the Full/Agentic and SWE lanes, keeping the aggregate comparable while preserving the measurements behind it.

Lane 01

Full / Agentic benchmark

Final83.24
Capability92.37
Agentic94.00
Pass rate93.0%
Prompts43

This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.

Lane 02

Software engineering MVP

SWE score88.99
Focused final78.27
Capability88.63
Daily driver74.14
Prompts24

This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.

Telemetry

Runtime economics

Full cost$0.669
SWE cost$0.182
Full avg seconds16.21
SWE time740.56s
Decode45.00

Cost, time, and runtime basis are telemetry. They explain tradeoffs; they do not secretly overwrite the capability scores.

Interpretation

Why this result lands here.

This is a top-tier all-around entrant: the aggregate score remains close to the leader, with lane-level tradeoffs shown separately.