← All models
DeepSeek · mit
DeepSeek-V4-Pro-Max
DeepSeek-V4-Pro-Max by DeepSeek appears in 1 source with Reasoning at 56.96. Best read for Code quality, Long context, Low cost.
CreatorDeepSeek
Release date2026-04-23
Knowledge cutoffNot published
Context1M tokens
Input price$1.74/M tokens
Output price$3.48/M tokens
Modalitytext
CountryCN
Metrics
All source-backed metrics
LLM Stats Rank20ranking · source_rank
Reasoning56.96reasoning · index_reasoning
Math52.92math · index_math
Coding43.53coding · index_code
Research32.92research · index_search
Vision33.51multimodal · index_vision
Tool calling33.15tool_calling · index_tool_calling
Long context19.81long_context · index_long_context
Finance42.02domain · index_finance
Legal42.52domain · index_legal
Healthcare47.79domain · index_healthcare
GPQA90.1 %reasoning · gpqa_score
SWE-bench Verified80.6 %coding · swe_bench_verified_score
Code Arena1,317.2 %coding · coding_arena_score
Humanity Last Exam48.2 %reasoning · hle_score
SimpleQA57.9 %knowledge · simpleqa_score
BrowseComp83.4 %research · browsecomp_score
Toolathlon51.8 %tool_calling · toolathlon_score
MCP Atlas73.6 %tool_calling · mcp_atlas_score
SWE-bench Pro55.4 %coding · swe_bench_pro_score
Context1,048,576 tokenscontext · context
Speed43.9 c/sperformance · throughput
Latency126,951.07 msperformance · latency
Input price1.74 $/Mpricing · input_price
Output price3.48 $/Mpricing · output_price
Parameters1,600,000,000,000 paramsmodel · params
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is DeepSeek-V4-Pro-Max verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.