OpenAI · apache_2_0

GPT OSS 20B

GPT OSS 20B by OpenAI appears in 2 sources with Reasoning at 19.77. Best read for Math, Open weights, RAG.

CreatorOpenAI

Release date2025-08-05

Knowledge cutoffNot published

ContextUnknown

Input priceUnknown

Output priceUnknown

Modalitytext

CountryUS

Metrics

All source-backed metrics

Open in leaderboard

LLM Stats Rank96ranking · source_rank

LLM Stats Code Index (estimated from arena)13.87 scoremetric

Reasoning19.77reasoning · index_reasoning

Math22.27math · index_math

Writing-5.14writing · index_communication

Vision3.48multimodal · index_vision

Tool calling-7.37tool_calling · index_tool_calling

Finance16.48domain · index_finance

Legal16.55domain · index_legal

Healthcare21.39domain · index_healthcare

GPQA71.5 %reasoning · gpqa_score

Code Arena533.66 %coding · coding_arena_score

Humanity Last Exam10.9 %reasoning · hle_score

TAU2 Retail54.8 %agent · tau_bench_retail_score

Parameters20,900,000,000 paramsmodel · params

AIME 202598.7 %metric

Evidence

Citations and source overlap

VellumTop models per benchmark from page-level structured data (JSON-LD).Retrieved 2026-05-20 LLM StatsPublic full LLM leaderboard advanced explorer snapshot, including ranking, benchmark indexes, pricing, speed, context, license, and model metadata.Retrieved 2026-06-01

FAQ

How should I read this profile?

Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.

Is GPT OSS 20B verified by EvalKit?

No. EvalKit currently shows 0 verified rows until real run evidence exists.

Why can metrics disagree?

Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.