Google · proprietary

Gemini 3 Flash

Gemini 3 Flash by Google appears in 2 sources with Reasoning at 49.35. Best read for Coding, High intelligence, LLM.

CreatorGoogle

Release date2025-12-17

Knowledge cutoffNot published

Context1M tokens

Input price$0.5/M tokens

Output price$3/M tokens

Modalitytext + vision

CountryUS

Metrics

All source-backed metrics

Open in leaderboard

LLM Stats Rank7ranking · source_rank

Reasoning49.35reasoning · index_reasoning

Math51.04math · index_math

Coding30.56coding · index_code

Vision33.31multimodal · index_vision

Tool calling23.03tool_calling · index_tool_calling

Long context4.44long_context · index_long_context

Healthcare44.26domain · index_healthcare

GPQA90.4 %reasoning · gpqa_score

AIME 202599.7 %math · aime_2025_score

SWE-bench Verified78 %coding · swe_bench_verified_score

Code Arena1,703.76 %coding · coding_arena_score

Humanity Last Exam43.5 %reasoning · hle_score

ARC-AGI v233.6 %reasoning · arc_agi_v2_score

MMMLU91.8 %reasoning · mmmlu_score

MMMU-Pro81.2 %multimodal · mmmu_pro_score

SimpleQA68.7 %knowledge · simpleqa_score

Toolathlon49.4 %tool_calling · toolathlon_score

MCP Atlas57.4 %tool_calling · mcp_atlas_score

MRCR v222.1 %long_context · mrcr_v2_score

CharXiv-R80.3 %multimodal · charxiv_r_score

ScreenSpot Pro69.1 %multimodal · screenspot_pro_score

Context1,000,000 tokenscontext · context

Speed490.68 c/sperformance · throughput

Latency3,213.13 msperformance · latency

Input price0.5 $/Mpricing · input_price

Output price3 $/Mpricing · output_price

Arena Rating1,466.6metric

Arena Rank13metric

Vote Count30,750metric

Evidence

Citations and source overlap

Arena AILM Arena text leaderboard (overall) mirrored in the public leaderboard dataset.Retrieved 2026-05-20 LLM StatsPublic full LLM leaderboard advanced explorer snapshot, including ranking, benchmark indexes, pricing, speed, context, license, and model metadata.Retrieved 2026-06-01

FAQ

How should I read this profile?

Treat this as a source-backed model dossier, not an EvalKit-run verification. The public values are replicated from linked sources and kept source-scoped.

Is Gemini 3 Flash verified by EvalKit?

No. EvalKit currently shows 0 verified rows until real run evidence exists.

Why can metrics disagree?

Different sources test different tasks, dates, prompts, and aggregation methods. EvalKit keeps those differences visible instead of merging them into a fake universal score.