← All models
Anthropic · proprietary
Claude Haiku 4.5
Claude Haiku 4.5 by Anthropic appears in 1 source with Reasoning at 35.4. Best read for Code quality, LLM, Low cost.
CreatorAnthropic
Release date2025-10-15
Knowledge cutoffNot published
Context200K tokens
Input price$1/M tokens
Output price$5/M tokens
Modalitytext + vision
CountryUS
Metrics
All source-backed metrics
LLM Stats Rank50ranking · source_rank
Reasoning35.4reasoning · index_reasoning
Math20.89math · index_math
Coding25.09coding · index_code
Writing25.23writing · index_communication
Vision16.63multimodal · index_vision
Tool calling21.32tool_calling · index_tool_calling
Long context15.04long_context · index_long_context
Healthcare-0.1domain · index_healthcare
GPQA73 %reasoning · gpqa_score
AIME 202580.7 %math · aime_2025_score
SWE-bench Verified73.3 %coding · swe_bench_verified_score
Code Arena963.43 %coding · coding_arena_score
MMMLU83 %reasoning · mmmlu_score
OSWorld50.7 %agent · osworld_score
Terminal Bench41 %coding · terminal_bench_score
Context200,000 tokenscontext · context
Speed222.37 c/sperformance · throughput
Input price1 $/Mpricing · input_price
Output price5 $/Mpricing · output_price
Evidence
Citations and source overlap
FAQ
How should I read this profile?
Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.
Is Claude Haiku 4.5 verified by EvalKit?
No. EvalKit currently shows 0 verified rows until real run evidence exists.
Why can metrics disagree?
Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.