Model result · rank #14

MiniMax M3 Direct Plus

MiniMax Plus · direct API · SWE extra-high reasoning. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.

Download public JSON Compare all models

Overall score74.63

Rank #14

Full / Agentic84.32

Full rank #12

SWE MVP67.77

SWE rank #15

Hard Intelligence71.80

Hard rank #12

Measured cost$0.0041

100.0% reliability

Overall

All-around publication view

Score74.63

Formulamean(Full, SWE, Hard Intelligence)

BasisMiniMax Plus · direct API · SWE extra-high reasoning

The overall score averages the measured major lanes while keeping each source measurement visible.

Lane 01

Full / Agentic benchmark

Final84.32

Capability87.74

Agentic79.86

Pass rate88.4%

Prompts43

This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.

Lane 02

Software engineering MVP

SWE score67.77

Focused final61.86

Capability62.62

Daily driver60.93

Prompts24

This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.

Lane 03

Hard Intelligence diagnostic

Hard score71.80

Active inquiry17.50

Online adaptation98.50

Self-repair71.19

Authority integrity100.00

Hard Intelligence measures active inquiry, online adaptation, evidence-driven self-repair, and authority/salience integrity.

Telemetry

Runtime economics

Total cost$0.0041

Cost / scored item$0.0001

Seconds / timed item21.94s

Runtime coverage100.0%

Recorded tokens / item4.9k

Token coverage100.0%

Cost, time, and token basis are normalized telemetry. They explain tradeoffs; they do not overwrite the capability score yet.

Interpretation

Why this result lands here.

The model is stronger in the Full/Agentic lane than in the SWE lane; the overall score is therefore shown with both component lanes visible. Hard Intelligence score is 71.80 and contributes to the overall score alongside Full/Agentic and SWE. Hard Intelligence is published alongside Full/Agentic and SWE for the current ranking.