Rank #11
Model result · rank #11
NVIDIA Nemotron 3 Ultra
OpenRouter · nvidia/nemotron-3-ultra-550b-a55b · xhigh reasoning. Public result card with the model’s overall score, lane measurements, runtime/cost telemetry, and ranking formula.
Full rank #10
SWE rank #10
100.0% reliability
All-around publication view
The overall score is calculated from the Full/Agentic and SWE lanes, keeping the aggregate comparable while preserving the measurements behind it.
Full / Agentic benchmark
This lane captures instruction following, structured behavior, tool discipline, and general agentic reliability.
Software engineering MVP
This lane is closer to implementation usefulness: source handling, architecture cleanliness, and deliverable quality.
Runtime economics
Cost, time, and runtime basis are telemetry. They explain tradeoffs; they do not secretly overwrite the capability scores.
Why this result lands here.
The model is stronger in the Full/Agentic lane than in the SWE lane; the overall score is therefore shown with both component lanes visible. Exact OpenRouter model: nvidia/nemotron-3-ultra-550b-a55b. Full lane uses the current full_suite_v3 run; suite metadata remains visible per row. SWE lane is software_engineering_mvp_v1 n=3, not the external SWE-bench Verified benchmark. Low SWE rank is driven by patch formatting/application failures: no_patch_found=1, patch_rejected=7, tests_failed=3 across 24 SWE records.