← All models
OpenAI · apache_2_0
GPT OSS 20B
GPT OSS 20B by OpenAI appears in 2 sources with Reasoning at 19.77. Best read for Math, Open weights, RAG.
CreatorOpenAI
Release date2025-08-05
Knowledge cutoffNot published
ContextUnknown
Input priceUnknown
Output priceUnknown
Modalitytext
CountryUS
Metrics
All source-backed metrics
LLM Stats Rank96ranking · source_rank
LLM Stats Code Index (estimated from arena)13.87 scoremetric
Reasoning19.77reasoning · index_reasoning
Math22.27math · index_math
Writing-5.14writing · index_communication
Vision3.48multimodal · index_vision
Tool calling-7.37tool_calling · index_tool_calling
Finance16.48domain · index_finance
Legal16.55domain · index_legal
Healthcare21.39domain · index_healthcare
GPQA71.5 %reasoning · gpqa_score
Code Arena533.66 %coding · coding_arena_score
Humanity Last Exam10.9 %reasoning · hle_score
TAU2 Retail54.8 %agent · tau_bench_retail_score
Parameters20,900,000,000 paramsmodel · params
AIME 202598.7 %metric
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is GPT OSS 20B verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.