Model catalog

Search the frontier by model, provider, price, context, and evidence.

EvalKit turns public leaderboard rows into model profiles so you can start with the model, then inspect the citations behind every score.

AnthropicClaude Opus 4.6ReasoningGoogleGemini 3.1 ProReasoningOpenAIGPT-5.5ReasoningAnthropicClaude Opus 4.7Reasoning

410 models from 410 profiles

Source-scoped metrics. No fake universal score.

#1 · Anthropic

Claude Opus 4.6

Created by Anthropic
3 sources

Claude Opus 4.6 by Anthropic appears in 3 sources with Reasoning at 59.45. Best read for Code quality, Coding, High intelligence.

Top metric59.45Reasoning
Context1Mtokens
Input$5/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingHigh intelligenceLLM

#2 · Google

Gemini 3.1 Pro

Created by Google
1 source

Gemini 3.1 Pro by Google appears in 1 source with Reasoning at 59.1. Best read for LLM, Long context, Low cost.

Top metric59.1Reasoning
Context1Mtokens
Input$2.5/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#3 · OpenAI

GPT-5.5

Created by OpenAI
3 sources

GPT-5.5 by OpenAI appears in 3 sources with Reasoning at 62.25. Best read for Coding, High intelligence, LLM.

Top metric62.25Reasoning
Context1.1Mtokens
Input$5/Mpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMLong context

#4 · Anthropic

Claude Opus 4.7

Created by Anthropic
3 sources

Claude Opus 4.7 by Anthropic appears in 3 sources with Reasoning at 62.47. Best read for Code quality, Coding, High intelligence.

Top metric62.47Reasoning
Context1Mtokens
Input$5/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingHigh intelligenceLLM

#5 · Zhipu AI

GLM-5.1

Created by Zhipu AI
2 sources

GLM-5.1 by Zhipu AI appears in 2 sources with Reasoning at 54.18. Best read for Coding, High intelligence, Low cost.

Top metric54.18Reasoning
Context200Ktokens
Input$1.4/Mpublic snapshot
Licensemittext
CodingHigh intelligenceLow costMath

#6 · OpenAI

GPT-5.4

Created by OpenAI
2 sources

GPT-5.4 by OpenAI appears in 2 sources with Reasoning at 57.57. Best read for Coding, High intelligence, LLM.

Top metric57.57Reasoning
Context1Mtokens
Input$2.5/Mpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMLong context

#7 · Google

Gemini 3 Flash

Created by Google
2 sources

Gemini 3 Flash by Google appears in 2 sources with Reasoning at 49.35. Best read for Coding, High intelligence, LLM.

Top metric49.35Reasoning
Context1Mtokens
Input$0.5/Mpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMLong context

#8 · Anthropic

Claude Sonnet 4.6

Created by Anthropic
2 sources

Claude Sonnet 4.6 by Anthropic appears in 2 sources with Reasoning at 52.16. Best read for Code quality, Coding, High intelligence.

Top metric52.16Reasoning
Context200Ktokens
Input$3/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingHigh intelligenceLLM

#9 · Anthropic

Claude Opus 4.5

Created by Anthropic
2 sources

Claude Opus 4.5 by Anthropic appears in 2 sources with Reasoning at 54.19. Best read for Code quality, Coding, LLM.

Top metric54.19Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingLLMMultimodal

#10 · Zhipu AI

GLM-5

Created by Zhipu AI
2 sources

GLM-5 by Zhipu AI appears in 2 sources with Reasoning at 51.47. Best read for Code quality, Coding, High intelligence.

Top metric51.47Reasoning
Context200Ktokens
Input$1/Mpublic snapshot
Licensemittext
Code qualityCodingHigh intelligenceLow cost

#11 · Google

Gemini 3 Pro

Created by Google
3 sources

Gemini 3 Pro by Google appears in 3 sources with Reasoning at 49.76. Best read for Code quality, Coding, High intelligence.

Top metric49.76Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingHigh intelligenceLLM

#12 · Moonshot AI

Kimi K2.6

Created by Moonshot AI
2 sources

Kimi K2.6 by Moonshot AI appears in 2 sources with Reasoning at 58.13. Best read for Code quality, Coding, High intelligence.

Top metric58.13Reasoning
Context262Ktokens
Input$0.95/Mpublic snapshot
Licensemodified_mit_licensetext + vision
Code qualityCodingHigh intelligenceLow cost

#13 · OpenAI

GPT-5.2

Created by OpenAI
2 sources

GPT-5.2 by OpenAI appears in 2 sources with Reasoning at 53.54. Best read for Code quality, Coding, LLM.

Top metric53.54Reasoning
Context400Ktokens
Input$1.75/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingLLMLow cost

#14 · Google

Gemini 3.5 Flash

Created by Google
1 source

Gemini 3.5 Flash by Google appears in 1 source with Reasoning at 59.19. Best read for LLM, Long context, Low cost.

Top metric59.19Reasoning
Context1Mtokens
Input$1.5/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#15 · Moonshot AI

Kimi K2.5

Created by Moonshot AI
1 source

Kimi K2.5 by Moonshot AI appears in 1 source with Reasoning at 49.83. Best read for Code quality, Multimodal, Open weights.

Top metric49.83Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext + vision
Code qualityMultimodalOpen weightsTool use

#16 · Anthropic

Claude Sonnet 4.5

Created by Anthropic
2 sources

Claude Sonnet 4.5 by Anthropic appears in 2 sources with Reasoning at 40.2. Best read for Code quality, Coding, LLM.

Top metric40.2Reasoning
Context200Ktokens
Input$3/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingLLMLow cost

#17 · OpenAI

GPT-5.4 mini

Created by OpenAI
1 source

GPT-5.4 mini by OpenAI appears in 1 source with Reasoning at 45.79. Best read for LLM, Low cost, Multimodal.

Top metric45.79Reasoning
Context400Ktokens
Input$0.75/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMultimodalTool use

#18 · OpenAI

GPT-5.3 Codex

Created by OpenAI
1 source

GPT-5.3 Codex by OpenAI appears in 1 source with Reasoning at 56.17. Best read for LLM, Low cost, Multimodal.

Top metric56.17Reasoning
Context400Ktokens
Input$1.75/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMultimodalTool use

#19 · OpenAI

GPT-5.5 Instant

Created by OpenAI
2 sources

GPT-5.5 Instant by OpenAI appears in 2 sources with Reasoning at 42.54. Best read for Coding, High intelligence, LLM.

Top metric42.54Reasoning
Context400Ktokens
Input$5/Mpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMLow cost

#20 · DeepSeek

DeepSeek-V4-Pro-Max

Created by DeepSeek
1 source

DeepSeek-V4-Pro-Max by DeepSeek appears in 1 source with Reasoning at 56.96. Best read for Code quality, Long context, Low cost.

Top metric56.96Reasoning
Context1Mtokens
Input$1.74/Mpublic snapshot
Licensemittext
Code qualityLong contextLow costOpen weights

#21 · OpenAI

GPT-5 High

Created by OpenAI
1 source

GPT-5 High by OpenAI appears in 1 source with Reasoning at 47.84. Best read for LLM, Math, Multimodal.

Top metric47.84Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
LLMMathMultimodalReasoning

#22 · Alibaba Cloud / Qwen Team

Qwen3.5-397B-A17B

Created by Alibaba Cloud / Qwen Team
2 sources

Qwen3.5-397B-A17B by Alibaba Cloud / Qwen Team appears in 2 sources with Reasoning at 48.93. Best read for Code quality, Coding, High intelligence.

Top metric48.93Reasoning
Context262Ktokens
Input$0.6/Mpublic snapshot
Licenseapache_2_0text + vision
Code qualityCodingHigh intelligenceLow cost

#23 · Google

Gemma 4 26B-A4B

Created by Google
2 sources

Gemma 4 26B-A4B by Google appears in 2 sources with Reasoning at 35.5. Best read for Coding, High intelligence, Low cost.

Top metric35.5Reasoning
Context262Ktokens
Input$0.13/Mpublic snapshot
Licenseapache_2_0text + vision
CodingHigh intelligenceLow costMultimodal

#24 · OpenAI

GPT-5.2 Codex

Created by OpenAI
1 source

GPT-5.2 Codex by OpenAI appears in 1 source with Reasoning at 52.1. Best read for LLM, Low cost, Multimodal.

Top metric52.1Reasoning
Context400Ktokens
Input$1.75/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMultimodalTool use

#25 · Alibaba Cloud / Qwen Team

Qwen3.7 Max

Created by Alibaba Cloud / Qwen Team
1 source

Qwen3.7 Max by Alibaba Cloud / Qwen Team appears in 1 source with Reasoning at 60.25. Best read for Code quality, LLM, Long context.

Top metric60.25Reasoning
Context1Mtokens
Input$1.25/Mpublic snapshot
Licenseproprietarytext
Code qualityLLMLong contextLow cost

#26 · Alibaba Cloud / Qwen Team

Qwen3.6 Plus

Created by Alibaba Cloud / Qwen Team
2 sources

Qwen3.6 Plus by Alibaba Cloud / Qwen Team appears in 2 sources with Reasoning at 52.14. Best read for Coding, High intelligence, LLM.

Top metric52.14Reasoning
Context1Mtokens
Input$0.5/Mpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMLong context

#27 · OpenAI

GPT-5.1

Created by OpenAI
2 sources

GPT-5.1 by OpenAI appears in 2 sources with Reasoning at 47.52. Best read for Code quality, Coding, High intelligence.

Top metric47.52Reasoning
Context400Ktokens
Input$1.25/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityCodingHigh intelligenceLLM

#28 · Anthropic

Claude Opus 4.1

Created by Anthropic
1 source

Claude Opus 4.1 by Anthropic appears in 1 source with Reasoning at 38.24. Best read for Code quality, LLM, Multimodal.

Top metric38.24Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMMultimodalTool use

#29 · OpenAI

GPT-5.1 Medium

Created by OpenAI
1 source

GPT-5.1 Medium by OpenAI appears in 1 source with Reasoning at 45.89. Best read for LLM, Low cost, Math.

Top metric45.89Reasoning
Context400Ktokens
Input$1.25/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMathMultimodal

#30 · xAI

Grok-4.20 Beta Non-Reasoning

Created by xAI
1 source

Grok-4.20 Beta Non-Reasoning by xAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 34.59 score. Best read for LLM, Multimodal.

Top metric34.59 scoreLLM Stats Code Index (estimated from arena)
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
LLMMultimodal

#31 · MiniMax

MiniMax M2.7

Created by MiniMax
1 source

MiniMax M2.7 by MiniMax appears in 1 source with Reasoning at 53.06. Best read for Low cost, Open weights, Reasoning.

Top metric53.06Reasoning
Context205Ktokens
Input$0.3/Mpublic snapshot
Licensemittext
Low costOpen weightsReasoningTool use

#32 · Zhipu AI

GLM-4.6

Created by Zhipu AI
2 sources

GLM-4.6 by Zhipu AI appears in 2 sources with Reasoning at 37.73. Best read for Code quality, Coding, High intelligence.

Top metric37.73Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext + vision
Code qualityCodingHigh intelligenceMath

#33 · OpenAI

GPT-5.1 High

Created by OpenAI
2 sources

GPT-5.1 High by OpenAI appears in 2 sources with Reasoning at 53.33. Best read for Coding, High intelligence, LLM.

Top metric53.33Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMMath

#34 · OpenAI

GPT-5 mini

Created by OpenAI
1 source

GPT-5 mini by OpenAI appears in 1 source with Reasoning at 36.61. Best read for LLM, Low cost, Math.

Top metric36.61Reasoning
Context400Ktokens
Input$0.25/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMathMultimodal

#35 · Google

Gemma 4 31B

Created by Google
2 sources

Gemma 4 31B by Google appears in 2 sources with Reasoning at 44.8. Best read for Coding, High intelligence, Low cost.

Top metric44.8Reasoning
Context262Ktokens
Input$0.14/Mpublic snapshot
Licenseapache_2_0text + vision
CodingHigh intelligenceLow costMultimodal

#36 · DeepSeek

DeepSeek-V4-Flash-Max

Created by DeepSeek
1 source

DeepSeek-V4-Flash-Max by DeepSeek appears in 1 source with Reasoning at 51.86. Best read for Code quality, Long context, Low cost.

Top metric51.86Reasoning
Context1Mtokens
Input$0.14/Mpublic snapshot
Licensemittext
Code qualityLong contextLow costOpen weights

#37 · xAI

Grok-3

Created by xAI
1 source

Grok-3 by xAI appears in 1 source with Reasoning at 39.69. Best read for LLM, Low cost, Math.

Top metric39.69Reasoning
Context128Ktokens
Input$3/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMathMultimodal

#38 · OpenAI

GPT-5 Medium

Created by OpenAI
1 source

GPT-5 Medium by OpenAI appears in 1 source with Reasoning at 43.51. Best read for LLM, Math, Multimodal.

Top metric43.51Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
LLMMathMultimodalReasoning

#39 · OpenAI

GPT-5.1 Thinking

Created by OpenAI
1 source

GPT-5.1 Thinking by OpenAI appears in 1 source with Reasoning at 47.47. Best read for Code quality, LLM, Multimodal.

Top metric47.47Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMMultimodalTool use

#40 · Zhipu AI

GLM-4.7

Created by Zhipu AI
2 sources

GLM-4.7 by Zhipu AI appears in 2 sources with Reasoning at 43.81. Best read for Code quality, Coding, High intelligence.

Top metric43.81Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext + vision
Code qualityCodingHigh intelligenceMultimodal

#41 · Alibaba Cloud / Qwen Team

Qwen3.6-27B

Created by Alibaba Cloud / Qwen Team
1 source

Qwen3.6-27B by Alibaba Cloud / Qwen Team appears in 1 source with Reasoning at 45.74. Best read for Code quality, Low cost, Multimodal.

Top metric45.74Reasoning
Context262Ktokens
Input$0.6/Mpublic snapshot
Licenseapache_2_0text + vision
Code qualityLow costMultimodalOpen weights

#42 · MiniMax

MiniMax M2.5

Created by MiniMax
1 source

MiniMax M2.5 by MiniMax appears in 1 source with Reasoning at 52.51. Best read for Code quality, Long context, Low cost.

Top metric52.51Reasoning
Context1Mtokens
Input$0.3/Mpublic snapshot
Licensemittext
Code qualityLong contextLow costOpen weights

#43 · OpenAI

GPT-4.1 mini

Created by OpenAI
1 source

GPT-4.1 mini by OpenAI appears in 1 source with Reasoning at 15.71. Best read for LLM, Long context, Low cost.

Top metric15.71Reasoning
Context1Mtokens
Input$0.4/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#44 · Meituan

LongCat-Flash-Chat

Created by Meituan
2 sources

LongCat-Flash-Chat by Meituan appears in 2 sources with Reasoning at 23.41. Best read for Code quality, Coding, High intelligence.

Top metric23.41Reasoning
Context128Ktokens
Input$0.3/Mpublic snapshot
Licensemittext
Code qualityCodingHigh intelligenceLow cost

#45 · Moonshot AI

Kimi K2 0905

Created by Moonshot AI
1 source

Kimi K2 0905 by Moonshot AI appears in 1 source with Reasoning at 25.5. Best read for LLM, Math, Reasoning.

Top metric25.5Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext
LLMMathReasoning

#46 · Google

Gemini 2.5 Pro

Created by Google
2 sources

Gemini 2.5 Pro by Google appears in 2 sources with Reasoning at 35.05. Best read for Coding, High intelligence, LLM.

Top metric35.05Reasoning
Context1Mtokens
Input$1.25/Mpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMLong context

#47 · Google

Gemini 3.1 Flash-Lite

Created by Google
1 source

Gemini 3.1 Flash-Lite by Google appears in 1 source with Reasoning at 41.75. Best read for LLM, Long context, Low cost.

Top metric41.75Reasoning
Context1Mtokens
Input$0.25/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#48 · DeepSeek

DeepSeek-V3.2 (Non-thinking)

Created by DeepSeek
1 source

DeepSeek-V3.2 (Non-thinking) by DeepSeek appears in 1 source with LLM Stats Code Index (estimated from arena) at 22.77 score. Best read for Low cost, Open weights.

Top metric22.77 scoreLLM Stats Code Index (estimated from arena)
Context131Ktokens
Input$0.28/Mpublic snapshot
Licensemittext
Low costOpen weights

#49 · MiniMax

MiniMax M2

Created by MiniMax
1 source

MiniMax M2 by MiniMax appears in 1 source with Reasoning at 33.6. Best read for Code quality, Long context, Low cost.

Top metric33.6Reasoning
Context1Mtokens
Input$0.3/Mpublic snapshot
Licensemittext
Code qualityLong contextLow costOpen weights

#50 · Anthropic

Claude Haiku 4.5

Created by Anthropic
1 source

Claude Haiku 4.5 by Anthropic appears in 1 source with Reasoning at 35.4. Best read for Code quality, LLM, Low cost.

Top metric35.4Reasoning
Context200Ktokens
Input$1/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMLow costMultimodal

#51 · Moonshot AI

Kimi K2-Thinking-0905

Created by Moonshot AI
1 source

Kimi K2-Thinking-0905 by Moonshot AI appears in 1 source with Reasoning at 45.23. Best read for Code quality, Math, Open weights.

Top metric45.23Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityMathOpen weightsReasoning

#52 · OpenAI

GPT-5.1 Instant

Created by OpenAI
1 source

GPT-5.1 Instant by OpenAI appears in 1 source with Reasoning at 48.57. Best read for Code quality, LLM, Low cost.

Top metric48.57Reasoning
Context400Ktokens
Input$1.25/Mpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMLow costMultimodal

#53 · Anthropic

Claude Opus 4

Created by Anthropic
1 source

Claude Opus 4 by Anthropic appears in 1 source with Reasoning at 35.16. Best read for Code quality, LLM, Multimodal.

Top metric35.16Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMMultimodalTool use

#54 · xAI

Grok 4 Fast

Created by xAI
1 source

Grok 4 Fast by xAI appears in 1 source with Reasoning at 40.93. Best read for LLM, Long context, Low cost.

Top metric40.93Reasoning
Context2Mtokens
Input$0.2/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#55 · Google

Gemini 2.5 Flash

Created by Google
2 sources

Gemini 2.5 Flash by Google appears in 2 sources with Reasoning at 28.49. Best read for Coding, High intelligence, LLM.

Top metric28.49Reasoning
Context1Mtokens
Input$0.3/Mpublic snapshot
Licenseproprietarytext + vision
CodingHigh intelligenceLLMLong context

#56 · OpenAI

GPT-5

Created by OpenAI
1 source

GPT-5 by OpenAI appears in 1 source with Reasoning at 44.66. Best read for Code quality, LLM, Multimodal.

Top metric44.66Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMMultimodalTool use

#57 · Anthropic

Claude Sonnet 4

Created by Anthropic
1 source

Claude Sonnet 4 by Anthropic appears in 1 source with Reasoning at 30.27. Best read for Code quality, LLM, Multimodal.

Top metric30.27Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMMultimodalTool use

#58 · Mistral AI

Mistral Large 3 (675B Instruct 2512)

Created by Mistral AI
1 source

Mistral Large 3 (675B Instruct 2512) by Mistral AI appears in 1 source with Reasoning at 10.23. Best read for Low cost, Math, Multimodal.

Top metric10.23Reasoning
Context262Ktokens
Input$0.5/Mpublic snapshot
Licenseapache_2_0text + vision
Low costMathMultimodalOpen weights

#59 · MiniMax

MiniMax M2.1

Created by MiniMax
1 source

MiniMax M2.1 by MiniMax appears in 1 source with Reasoning at 41.25. Best read for Code quality, Long context, Low cost.

Top metric41.25Reasoning
Context1Mtokens
Input$0.3/Mpublic snapshot
Licensemittext
Code qualityLong contextLow costOpen weights

#60 · xAI

Grok 4.3

Created by xAI
2 sources

Grok 4.3 by xAI appears in 2 sources with LLM Stats Code Index (estimated from arena) at 25.31 score. Best read for Coding, High intelligence, LLM.

Top metric25.31 scoreLLM Stats Code Index (estimated from arena)
Context1Mtokens
Input$1.25/Mpublic snapshot
Licenseproprietarytext
CodingHigh intelligenceLLMLong context

#61 · OpenAI

GPT-4.1

Created by OpenAI
1 source

GPT-4.1 by OpenAI appears in 1 source with Reasoning at 21.09. Best read for LLM, Long context, Low cost.

Top metric21.09Reasoning
Context1Mtokens
Input$2/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#62 · DeepSeek

DeepSeek-V3.2-Speciale

Created by DeepSeek
1 source

DeepSeek-V3.2-Speciale by DeepSeek appears in 1 source with Reasoning at 43.22. Best read for Code quality, Math, Open weights.

Top metric43.22Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityMathOpen weightsTool use

#63 · Anthropic

Claude Opus 4.8

Created by Anthropic
1 source

Claude Opus 4.8 by Anthropic appears in 1 source with Reasoning at 65.69. Best read for LLM, Long context, Low cost.

Top metric65.69Reasoning
Context1Mtokens
Input$5/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#64 · Xiaomi

MiMo-V2-Flash

Created by Xiaomi
1 source

MiMo-V2-Flash by Xiaomi appears in 1 source with Reasoning at 38.43. Best read for Code quality, Math, Open weights.

Top metric38.43Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityMathOpen weightsTool use

#65 · xAI

Grok-4.20 Beta Reasoning

Created by xAI
1 source

Grok-4.20 Beta Reasoning by xAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 20.12 score. Best read for LLM, Multimodal.

Top metric20.12 scoreLLM Stats Code Index (estimated from arena)
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
LLMMultimodal

#66 · Meituan

LongCat-Flash-Thinking

Created by Meituan
1 source

LongCat-Flash-Thinking by Meituan appears in 1 source with Reasoning at 35.89. Best read for Code quality, Math, Open weights.

Top metric35.89Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityMathOpen weightsTool use

#67 · OpenAI

GPT-5.4 nano

Created by OpenAI
1 source

GPT-5.4 nano by OpenAI appears in 1 source with Reasoning at 39.89. Best read for LLM, Low cost, Multimodal.

Top metric39.89Reasoning
Context400Ktokens
Input$0.2/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMultimodalTool use

#68 · Alibaba Cloud / Qwen Team

Qwen3.5-27B

Created by Alibaba Cloud / Qwen Team
1 source

Qwen3.5-27B by Alibaba Cloud / Qwen Team appears in 1 source with Reasoning at 41.95. Best read for Code quality, Low cost, Multimodal.

Top metric41.95Reasoning
Context262Ktokens
Input$0.3/Mpublic snapshot
Licenseapache_2_0text + vision
Code qualityLow costMultimodalOpen weights

#69 · Alibaba Cloud / Qwen Team

Qwen3 Max

Created by Alibaba Cloud / Qwen Team
1 source

Qwen3 Max by Alibaba Cloud / Qwen Team appears in 1 source with Reasoning at 28.55. Best read for Code quality, LLM, Math.

Top metric28.55Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext
Code qualityLLMMathTool use

#70 · Zhipu AI

GLM-4.7-Flash

Created by Zhipu AI
1 source

GLM-4.7-Flash by Zhipu AI appears in 1 source with Reasoning at 31.38. Best read for Code quality, Math, Open weights.

Top metric31.38Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityMathOpen weightsTool use

#71 · Meituan

LongCat-Flash-Lite

Created by Meituan
1 source

LongCat-Flash-Lite by Meituan appears in 1 source with Reasoning at 23. Best read for Code quality, Low cost, Open weights.

Top metric23Reasoning
Context256Ktokens
Input$0.1/Mpublic snapshot
Licensemittext
Code qualityLow costOpen weightsTool use

#72 · DeepSeek

DeepSeek-V3.2-Exp

Created by DeepSeek
2 sources

DeepSeek-V3.2-Exp by DeepSeek appears in 2 sources with Reasoning at 35.69. Best read for Code quality, Coding, High intelligence.

Top metric35.69Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityCodingHigh intelligenceMath

#73 · Alibaba Cloud / Qwen Team

Qwen3.5-122B-A10B

Created by Alibaba Cloud / Qwen Team
2 sources

Qwen3.5-122B-A10B by Alibaba Cloud / Qwen Team appears in 2 sources with Reasoning at 43.05. Best read for Code quality, Coding, High intelligence.

Top metric43.05Reasoning
Context262Ktokens
Input$0.4/Mpublic snapshot
Licenseapache_2_0text + vision
Code qualityCodingHigh intelligenceLow cost

#74 · Zhipu AI

GLM-4.5

Created by Zhipu AI
2 sources

GLM-4.5 by Zhipu AI appears in 2 sources with Reasoning at 33.95. Best read for Code quality, Coding, High intelligence.

Top metric33.95Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityCodingHigh intelligenceMath

#75 · OpenAI

GPT-5.1 Codex

Created by OpenAI
1 source

GPT-5.1 Codex by OpenAI appears in 1 source with Reasoning at 42.49. Best read for Code quality, LLM, Multimodal.

Top metric42.49Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMMultimodalTool use

#76 · StepFun

Step-3.5-Flash

Created by StepFun
1 source

Step-3.5-Flash by StepFun appears in 1 source with Reasoning at 49.2. Best read for Code quality, Low cost, Open weights.

Top metric49.2Reasoning
Context66Ktokens
Input$0.1/Mpublic snapshot
Licenseapache_2_0text
Code qualityLow costOpen weightsTool use

#77 · OpenAI

GPT-5.3 Chat

Created by OpenAI
1 source

GPT-5.3 Chat by OpenAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 27.85 score. Best read for LLM, Low cost, Multimodal.

Top metric27.85 scoreLLM Stats Code Index (estimated from arena)
Context128Ktokens
Input$1.75/Mpublic snapshot
Licenseproprietarytext + vision
LLMLow costMultimodal

#78 · OpenAI

GPT-5.1 Codex High

Created by OpenAI
1 source

GPT-5.1 Codex High by OpenAI appears in 1 source with Reasoning at 44.32. Best read for LLM, Math, Multimodal.

Top metric44.32Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
LLMMathMultimodalReasoning

#79 · OpenAI

GPT OSS 120B High

Created by OpenAI
1 source

GPT OSS 120B High by OpenAI appears in 1 source with Reasoning at 31.76. Best read for Low cost, Math, Open weights.

Top metric31.76Reasoning
Context131Ktokens
Input$0.1/Mpublic snapshot
Licenseapache_2_0text
Low costMathOpen weightsTool use

#80 · Alibaba Cloud / Qwen Team

Qwen3 VL 235B A22B Instruct

Created by Alibaba Cloud / Qwen Team
2 sources

Qwen3 VL 235B A22B Instruct by Alibaba Cloud / Qwen Team appears in 2 sources with Reasoning at 26.03. Best read for Coding, High intelligence, Low cost.

Top metric26.03Reasoning
Context262Ktokens
Input$0.3/Mpublic snapshot
Licenseapache_2_0text + vision
CodingHigh intelligenceLow costMultimodal

#81 · xAI

Grok-4 Fast Reasoning

Created by xAI
1 source

Grok-4 Fast Reasoning by xAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 26.07 score. Best read for LLM, Long context, Low cost.

Top metric26.07 scoreLLM Stats Code Index (estimated from arena)
Context2Mtokens
Input$0.2/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#82 · xAI

Grok-4 Fast Non-Reasoning

Created by xAI
1 source

Grok-4 Fast Non-Reasoning by xAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 23.23 score. Best read for LLM, Long context, Low cost.

Top metric23.23 scoreLLM Stats Code Index (estimated from arena)
Context2Mtokens
Input$0.2/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#83 · Alibaba Cloud / Qwen Team

Qwen3 VL 4B Thinking

Created by Alibaba Cloud / Qwen Team
1 source

Qwen3 VL 4B Thinking by Alibaba Cloud / Qwen Team appears in 1 source with Reasoning at 15.02. Best read for Low cost, Multimodal, Open weights.

Top metric15.02Reasoning
Context262Ktokens
Input$0.1/Mpublic snapshot
Licenseapache_2_0text + vision
Low costMultimodalOpen weightsTool use

#84 · xAI

Grok-4.1 Fast Non-Reasoning

Created by xAI
1 source

Grok-4.1 Fast Non-Reasoning by xAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 23.08 score. Best read for LLM, Long context, Low cost.

Top metric23.08 scoreLLM Stats Code Index (estimated from arena)
Context2Mtokens
Input$0.2/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#85 · OpenAI

GPT-5 nano

Created by OpenAI
1 source

GPT-5 nano by OpenAI appears in 1 source with Reasoning at 24.61. Best read for LLM, Math, Multimodal.

Top metric24.61Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
LLMMathMultimodalReasoning

#86 · xAI

Grok-4.20 Multi-Agent Beta

Created by xAI
1 source

Grok-4.20 Multi-Agent Beta by xAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 21.9 score. Best read for LLM, Multimodal.

Top metric21.9 scoreLLM Stats Code Index (estimated from arena)
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
LLMMultimodal

#87 · Anthropic

Claude 3.7 Sonnet

Created by Anthropic
1 source

Claude 3.7 Sonnet by Anthropic appears in 1 source with Reasoning at 28.92. Best read for Code quality, LLM, Multimodal.

Top metric28.92Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseproprietarytext + vision
Code qualityLLMMultimodalTool use

#88 · xAI

Grok-4.1 Fast Reasoning

Created by xAI
1 source

Grok-4.1 Fast Reasoning by xAI appears in 1 source with LLM Stats Code Index (estimated from arena) at 19.21 score. Best read for LLM, Long context, Low cost.

Top metric19.21 scoreLLM Stats Code Index (estimated from arena)
Context2Mtokens
Input$0.2/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#89 · xAI

Grok Code Fast 1

Created by xAI
1 source

Grok Code Fast 1 by xAI appears in 1 source with Reasoning at 31.93. Best read for Code quality, LLM, Low cost.

Top metric31.93Reasoning
Context256Ktokens
Input$0.2/Mpublic snapshot
Licenseproprietarytext
Code qualityLLMLow costReasoning

#90 · Alibaba Cloud / Qwen Team

Qwen3-Coder

Created by Alibaba Cloud / Qwen Team
1 source

Qwen3-Coder by Alibaba Cloud / Qwen Team appears in 1 source with LLM Stats Code Index (estimated from arena) at 17.3 score. Best read for Open weights.

Top metric17.3 scoreLLM Stats Code Index (estimated from arena)
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseapache_2_0text
Open weights

#91 · Mistral AI

Mistral Small 4

Created by Mistral AI
1 source

Mistral Small 4 by Mistral AI appears in 1 source with Reasoning at 23.7. Best read for Low cost, Math, Multimodal.

Top metric23.7Reasoning
Context256Ktokens
Input$0.15/Mpublic snapshot
Licenseapache_2_0text + vision
Low costMathMultimodalOpen weights

#92 · DeepSeek

DeepSeek-V3.2

Created by DeepSeek
2 sources

DeepSeek-V3.2 by DeepSeek appears in 2 sources with Reasoning at 41.8. Best read for Code quality, Coding, High intelligence.

Top metric41.8Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licensemittext
Code qualityCodingHigh intelligenceMath

#93 · OpenAI

GPT-4.1 nano

Created by OpenAI
1 source

GPT-4.1 nano by OpenAI appears in 1 source with Reasoning at 0.73. Best read for LLM, Long context, Low cost.

Top metric0.73Reasoning
Context1Mtokens
Input$0.1/Mpublic snapshot
Licenseproprietarytext + vision
LLMLong contextLow costMultimodal

#94 · Alibaba Cloud / Qwen Team

Qwen3 VL 235B A22B Thinking

Created by Alibaba Cloud / Qwen Team
1 source

Qwen3 VL 235B A22B Thinking by Alibaba Cloud / Qwen Team appears in 1 source with Reasoning at 31.71. Best read for Math, Multimodal, Open weights.

Top metric31.71Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseapache_2_0text + vision
MathMultimodalOpen weightsTool use

#95 · DeepSeek

DeepSeek-V3 0324

Created by DeepSeek
1 source

DeepSeek-V3 0324 by DeepSeek appears in 1 source with Reasoning at 16.18. Best read for Low cost, Math, Open weights.

Top metric16.18Reasoning
Context164Ktokens
Input$0.28/Mpublic snapshot
Licensemit_+_model_license_(commercial_use_allowed)text
Low costMathOpen weightsReasoning

#96 · OpenAI

GPT OSS 20B

Created by OpenAI
2 sources

GPT OSS 20B by OpenAI appears in 2 sources with Reasoning at 19.77. Best read for Math, Open weights, RAG.

Top metric19.77Reasoning
ContextUnknowntokens
InputUnknownpublic snapshot
Licenseapache_2_0text
MathOpen weightsRAGReasoning
Models | EvalKit