← Back to ranking

Model result · rank #11

NVIDIA Nemotron 3 Ultra

OpenRouter · nvidia/nemotron-3-ultra-550b-a55b · xhigh reasoning. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.

Overall score69.08

Rank #11

Full / Agentic79.54

Full rank #10

SWE MVP58.63

SWE rank #10

Measured cost$0.369

100.0% reliability

Overall

All-around publication view

Score69.08
Formula50% Full + 50% SWE
BasisOpenRouter · nvidia/nemotron-3-ultra-550b-a55b · xhigh reasoning

The overall score is calculated from the Full/Agentic and SWE lanes, keeping the aggregate comparable while preserving the measurements behind it.

Lane 01

Full / Agentic benchmark

Final79.54
Capability95.29
Agentic92.57
Pass rate93.0%
Prompts43

This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.

Lane 02

Software engineering MVP

SWE score58.63
Focused final60.73
Capability58.63
Daily driver60.63
Prompts24

This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.

Telemetry

Runtime economics

Full cost$0.312
SWE cost$0.056
Full avg seconds68.94
SWE time2234.05s
Decode5.74

Cost, time, and runtime basis are telemetry. They explain tradeoffs; they do not secretly overwrite the capability scores.

Interpretation

Why this result lands here.

The model is stronger in the Full/Agentic lane than in the SWE lane; the overall score is therefore shown with both component lanes visible. Exact OpenRouter model: nvidia/nemotron-3-ultra-550b-a55b. Full lane uses the current full_suite_v3 run; suite metadata remains visible per row. SWE lane is software_engineering_mvp_v1 n=3, not the external SWE-bench Verified benchmark. Low SWE rank is driven by patch formatting/application failures: no_patch_found=1, patch_rejected=7, tests_failed=3 across 24 SWE records.