Rank #4
Model result · rank #4
GLM‑5.2
OpenRouter · z-ai/glm-5.2 · maximum reasoning · Full + SWE + Hard measured. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.
Full rank #6
SWE rank #1
Hard rank #6
100.0% reliability
All-around publication view
The overall score averages the measured major lanes while keeping each source measurement visible.
Full / Agentic benchmark
This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.
Software engineering MVP
This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.
Hard Intelligence diagnostic
Hard Intelligence measures active inquiry, online adaptation, evidence-driven self-repair, and authority/salience integrity.
Runtime economics
Cost, time, and token basis are normalized telemetry. They explain tradeoffs; they do not overwrite the capability score yet.
Why this result lands here.
The result is best read as a balanced benchmark entry: one overall score plus the lane measurements that produced it. Hard Intelligence score is 74.58 and contributes to the overall score alongside Full/Agentic and SWE. GLM‑5.2 used OpenRouter maximum reasoning across the public benchmark suites. SWE score reflects all prompt outcomes from the public software-engineering suite.