Model result · rank #3

DeepSeek V4 Flash

DeepSeek direct API · refreshed Hard Intelligence telemetry. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.

Download public JSON Compare all models

Overall score86.08

Rank #3

Full / Agentic95.80

Full rank #1

SWE MVP86.95

SWE rank #4

Hard Intelligence75.48

Hard rank #10

Measured cost$0.290

100.0% reliability

Overall

All-around publication view

Score86.08

Formulamean(Full, SWE, Hard Intelligence)

BasisDeepSeek direct API · refreshed Hard Intelligence telemetry

The overall score averages the measured major lanes while keeping each source measurement visible.

Lane 01

Full / Agentic benchmark

Final95.80

Capability96.80

Agentic96.32

Pass rate97.7%

Prompts43

This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.

Lane 02

Software engineering MVP

SWE score86.95

Focused final86.59

Capability84.53

Daily driver87.04

Prompts24

This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.

Lane 03

Hard Intelligence diagnostic

Hard score75.48

Active inquiry80.74

Online adaptation81.70

Self-repair73.28

Authority integrity66.21

Hard Intelligence measures active inquiry, online adaptation, evidence-driven self-repair, and authority/salience integrity.

Telemetry

Runtime economics

Total cost$0.290

Cost / scored item$0.0009

Seconds / timed item22.93s

Runtime coverage100.0%

Recorded tokens / item4.5k

Token coverage100.0%

Cost, time, and token basis are normalized telemetry. They explain tradeoffs; they do not overwrite the capability score yet.

Interpretation

Why this result lands here.

This is a top-tier all-around entrant: the aggregate score remains close to the leader, with lane-level tradeoffs shown separately. Hard Intelligence score is 75.48 and contributes to the overall score alongside Full/Agentic and SWE. Hard Intelligence is published alongside Full/Agentic and SWE for the current ranking.