EvalKit

LLM Leaderboard

Compare LLMs on reasoning, coding, speed, context, and price. Every row is citation-backed — sourced from public benchmarks and traceable to its origin.

How EvalKit scores work

The EvalKit Score is a composite of reasoning (40%), coding (35%), and agent capability (15%), blended across public benchmark data. Rows without a direct public metric show an estimate — marked with ~ — derived from provider or model-family medians. Source citations accompany every row.

Weekly refresh 8 public sourcesCitation policy

			Reasoning	Coding	Agent	Code arena	Context	Speed	Pricing $/M
#1	Claude Opus 4.8AnthropicCreated by Anthropic	66.9	65.7	52.3	82.2	805	1.0M	117/s	$5
#2	Claude Mythos PreviewAnthropicCreated by Anthropic	65.3	72.5	57.8	40.3	94	~600K	~117/s	~$4
#3	GPT-5.5OpenAICreated by OpenAI	64.1	62.3	51	75.3	1,948	1.1M	175/s	$5
#4	Claude Opus 4.7AnthropicCreated by Anthropic	63.7	62.5	48.8	77.3	1,927	1.0M	35/s	$5
#5	Gemini 3.5 FlashGoogleCreated by Google	62.4	59.2	46.4	83.6	1,507	1.0M	189/s	$1.5
#6	Qwen3.7 MaxAlibaba Cloud / Qwen TeamCreated by Alibaba Cloud / Qwen Team	62.3	60.3	47.9	76.4	1,213	1.0M	174/s	$1.25
#7	Gemini 3.1 ProGoogleCreated by Google	59.1	59.1	43.2	69.2	2,086	1.0M	129/s	$2.5
#8	DeepSeek-V4-Pro-MaxDeepSeekCreated by DeepSeek	59.1	57	43.5	73.6	1,317	1.0M	44/s	$1.74
#9	Claude Opus 4.6AnthropicCreated by Anthropic	58.4	59.5	43.6	62.7	2,125	1.0M	51/s	$5
#10	GPT-5.4OpenAICreated by OpenAI	58.2	57.6	43	67.2	1,726	1.0M	203/s	$2.5
#11	GLM-5.1Zhipu AICreated by Zhipu AI	57.5	54.2	43	71.8	1,802	200K	177/s	$1.4
#12	Qwen3.6 PlusAlibaba Cloud / Qwen TeamCreated by Alibaba Cloud / Qwen Team	56.6	52.1	41.7	74.1	1,211	1.0M	117/s	$0.5
#13	Kimi K2.6Moonshot AICreated by Moonshot AI	56.1	58.1	43.7	50	1,547	262K	50/s	$0.95
#14	Claude Opus 4.5AnthropicCreated by Anthropic	54.9	54.2	39.8	62.3	1,614	~600K	23/s	~$4
#15	DeepSeek-V4-Flash-MaxDeepSeekCreated by DeepSeek	54.3	51.9	37.6	69	1,127	1.0M	176/s	$0.14
#16	GLM-5Zhipu AICreated by Zhipu AI	53.4	51.5	36.1	67.8	1,596	200K	262/s	$1
#17	GPT-5.3 CodexOpenAICreated by OpenAI	53.4	56.2	43.4	38.1	1,367	400K	233/s	$1.75
#18	Claude Sonnet 4.6AnthropicCreated by Anthropic	52.8	52.2	36.5	61.3	1,701	200K	121/s	$3
#19	GPT-5.2OpenAICreated by OpenAI	52.6	53.5	34.6	60.6	1,519	400K	181/s	$1.75
#20	MiniMax M2.7MiniMaxCreated by MiniMax	51.8	53.1	38.9	46.3	1,156	205K	174/s	$0.3

Reasoning

Coding

Agent

Code arena

Context

Speed

Pricing $/M

License

Claude Opus 4.8AnthropicCreated by Anthropic

66.9

65.7

52.3

82.2

805

1.0M

117/s

Claude Mythos PreviewAnthropicCreated by Anthropic

65.3

72.5

57.8

40.3

~600K

~117/s

~$4

GPT-5.5OpenAICreated by OpenAI

64.1

62.3

75.3

1,948

1.1M

175/s

Claude Opus 4.7AnthropicCreated by Anthropic

63.7

62.5

48.8

77.3

1,927

1.0M

35/s

Gemini 3.5 FlashGoogleCreated by Google

62.4

59.2

46.4

83.6

1,507

1.0M

189/s

$1.5

Qwen3.7 MaxAlibaba Cloud / Qwen TeamCreated by Alibaba Cloud / Qwen Team

62.3

60.3

47.9

76.4

1,213

1.0M

174/s

$1.25

Gemini 3.1 ProGoogleCreated by Google

59.1

43.2

69.2

2,086

1.0M

129/s

$2.5

DeepSeek-V4-Pro-MaxDeepSeekCreated by DeepSeek

59.1

43.5

73.6

1,317

1.0M

44/s

$1.74

Claude Opus 4.6AnthropicCreated by Anthropic

58.4

59.5

43.6

62.7

2,125

1.0M

51/s

#10

GPT-5.4OpenAICreated by OpenAI

58.2

57.6

67.2

1,726

1.0M

203/s

$2.5

#11

GLM-5.1Zhipu AICreated by Zhipu AI

57.5

54.2

71.8

1,802

200K

177/s

$1.4

#12

Qwen3.6 PlusAlibaba Cloud / Qwen TeamCreated by Alibaba Cloud / Qwen Team

56.6

52.1

41.7

74.1

1,211

1.0M

117/s

$0.5

#13

Kimi K2.6Moonshot AICreated by Moonshot AI

56.1

58.1

43.7

1,547

262K

50/s

$0.95

#14

Claude Opus 4.5AnthropicCreated by Anthropic

54.9

54.2

39.8

62.3

1,614

~600K

23/s

~$4

#15

DeepSeek-V4-Flash-MaxDeepSeekCreated by DeepSeek

54.3

51.9

37.6

1,127

1.0M

176/s

$0.14

#16

GLM-5Zhipu AICreated by Zhipu AI

53.4

51.5

36.1

67.8

1,596

200K

262/s

#17

GPT-5.3 CodexOpenAICreated by OpenAI

53.4

56.2

43.4

38.1

1,367

400K

233/s

$1.75

#18

Claude Sonnet 4.6AnthropicCreated by Anthropic

52.8

52.2

36.5

61.3

1,701

200K

121/s

#19

GPT-5.2OpenAICreated by OpenAI

52.6

53.5

34.6

60.6

1,519

400K

181/s

$1.75

#20

MiniMax M2.7MiniMaxCreated by MiniMax

51.8

53.1

38.9

46.3

1,156

205K

174/s

$0.3

Claude Opus 4.8

Anthropic · Created by Anthropic · LLM

Score66.9

Reasoning65.7

Coding52.3

Agent82.2

Replicated from public sourceLLM Stats

Claude Mythos Preview

Anthropic · Created by Anthropic · LLM

Score65.3

Reasoning72.5

Coding57.8

Agent40.3

Replicated from public sourceLLM Stats

GPT-5.5

OpenAI · Created by OpenAI · LLM

Score64.1

Reasoning62.3

Coding51

Agent75.3

Replicated from public sourceLLM Stats

Claude Opus 4.7

Anthropic · Created by Anthropic · LLM

Score63.7

Reasoning62.5

Coding48.8

Agent77.3

Replicated from public sourceLLM Stats

Gemini 3.5 Flash

Google · Created by Google · LLM

Score62.4

Reasoning59.2

Coding46.4

Agent83.6

Replicated from public sourceLLM Stats

Qwen3.7 Max

Alibaba Cloud / Qwen Team · Created by Alibaba Cloud / Qwen Team · LLM

Score62.3

Reasoning60.3

Coding47.9

Agent76.4

Replicated from public sourceLLM Stats

Gemini 3.1 Pro

Google · Created by Google · LLM

Score59.1

Reasoning59.1

Coding43.2

Agent69.2

Replicated from public sourceLLM Stats

DeepSeek-V4-Pro-Max

DeepSeek · Created by DeepSeek · Open LLM

Score59.1

Reasoning57

Coding43.5

Agent73.6

Replicated from public sourceLLM Stats

Claude Opus 4.6

Anthropic · Created by Anthropic · LLM

Score58.4

Reasoning59.5

Coding43.6

Agent62.7

Replicated from public sourceLLM Stats

#10

GPT-5.4

OpenAI · Created by OpenAI · LLM

Score58.2

Reasoning57.6

Coding43

Agent67.2

Replicated from public sourceLLM Stats

#11

GLM-5.1

Zhipu AI · Created by Zhipu AI · Open LLM

Score57.5

Reasoning54.2

Coding43

Agent71.8

Replicated from public sourceLLM Stats

#12

Qwen3.6 Plus

Alibaba Cloud / Qwen Team · Created by Alibaba Cloud / Qwen Team · LLM

Score56.6

Reasoning52.1

Coding41.7

Agent74.1

Replicated from public sourceLLM Stats

#13

Kimi K2.6

Moonshot AI · Created by Moonshot AI · Open LLM

Score56.1

Reasoning58.1

Coding43.7

Agent50

Replicated from public sourceLLM Stats

#14

Claude Opus 4.5

Anthropic · Created by Anthropic · LLM

Score54.9

Reasoning54.2

Coding39.8

Agent62.3

Replicated from public sourceLLM Stats

#15

DeepSeek-V4-Flash-Max

DeepSeek · Created by DeepSeek · Open LLM

Score54.3

Reasoning51.9

Coding37.6

Agent69

Replicated from public sourceLLM Stats

#16

GLM-5

Zhipu AI · Created by Zhipu AI · Open LLM

Score53.4

Reasoning51.5

Coding36.1

Agent67.8

Replicated from public sourceLLM Stats

#17

GPT-5.3 Codex

OpenAI · Created by OpenAI · LLM

Score53.4

Reasoning56.2

Coding43.4

Agent38.1

Replicated from public sourceLLM Stats

#18

Claude Sonnet 4.6

Anthropic · Created by Anthropic · LLM

Score52.8

Reasoning52.2

Coding36.5

Agent61.3

Replicated from public sourceLLM Stats

#19

GPT-5.2

OpenAI · Created by OpenAI · LLM

Score52.6

Reasoning53.5

Coding34.6

Agent60.6

Replicated from public sourceLLM Stats

#20

MiniMax M2.7

MiniMax · Created by MiniMax · Open LLM

Score51.8

Reasoning53.1

Coding38.9

Agent46.3

Replicated from public sourceLLM Stats