Rank #15
Model result · rank #15
Qwythos‑9B Claude Mythos Q8_0
Local modelLocal GGUF · llama.cpp Vulkan · Q8_0 · 256K allocation verified. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.
Full rank #15
SWE rank #15
Hard rank #13
100.0% reliability
All-around publication view
The overall score averages the measured major lanes while keeping each source measurement visible.
Full / Agentic benchmark
This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.
Software engineering MVP
This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.
Hard Intelligence diagnostic
Hard Intelligence measures active inquiry, online adaptation, evidence-driven self-repair, and authority/salience integrity.
Runtime economics
Cost, time, and token basis are normalized telemetry. They explain tradeoffs; they do not overwrite the capability score yet.
Why this result lands here.
The model is stronger in the Full/Agentic lane than in the SWE lane; the overall score is therefore shown with both component lanes visible. Hard Intelligence score is 43.91 and contributes to the overall score alongside Full/Agentic and SWE. Local model row: benchmarked on local hardware with no API metering. 256K allocation and runtime fit were verified for this Q8_0 local Vulkan entrant; filled-context retrieval at 256K remains separate/unproven. Q8 shows stronger full-suite quality than Q6, but SWE and Hard Intelligence diagnostic lanes remain weak.