Supported Models
🤖 LLM Model Selection Guide (2025 Q2)
🗺️ Quick-pick matrix
Need this most | Pick this model | Why |
---|---|---|
⚡ Sub-300 ms, < $0.001 | Gemini 2.0 Flash Lite · Claude 3 Haiku | First-token latency ≈ 0.25 s; rock-bottom cost |
🧠 Elite reasoning (budget ≠ issue) | Claude 3 Opus / 3.7 Sonnet IR | Top scores on MMLU, GPQA, HumanEval |
📚 ≥ 1 M-token context | Gemini 1.5 Flash / Pro | Up to 2 M tokens (1 M input + 1 M output); multimodal |
🖥️ Self-host / on-prem | Llama 3 70B Instruct | Open weights, GPT-3.5-grade accuracy |
🔍 Turn-key RAG | Cohere Command R / R-Plus | Built-in retrieval & function-calling; 128 K context |
💵 AWS-native, 32 K context | Amazon Nova Pro | Runs inside Bedrock; IAM, VPC, KMS integration |
🐍 Bilingual EN-ZH coding | DeepSeek R1-V1 | 90 % MMLU, 97 % MATH500; open weights |
Google · Gemini family
Gemini 2.0 Flash Lite • 1 M ctx
Gemini 2.0 Flash • 1 M ctx
Gemini 1.5 Flash 8B / Flash • 1.5 M ctx
Gemini 1.5 Pro • 2 M ctx
Anthropic · Claude 3 / 3.5 / 3.7 line
Model | Context | Benchmarks* | Multimodal | Tool use | Notes |
---|---|---|---|---|---|
Haiku | 200 K | 55 % MMLU | Text | ⚙︎ (prompt-level) | Sub-sec, budget tier |
Sonnet 3.5 | 200 K | 78 % MMLU, 64 % HumanEval | Vision SOTA (June 24) | ⚙︎ | Mid-tier; beats 3 Opus |
Sonnet 3.7 IR | 200 K (+128 K CoT) | ≈ GPT-4 on hard maths | Vision | ⚙︎ + Integrated reasoning switch | Fast or reflective modes |
Opus 3.0 | 100 K | 80 % MMLU | Vision | ⚙︎ | Deep reasoning, higher latency |
Alignment & safety
DeepSeek · R1-V1 (open)
Meta · Llama 3 & 4
Llama 3 Instruct
Size | Context | MMLU | HumanEval | Best for |
---|---|---|---|---|
1 B / 8 B | 128 K | 60 % | 20-30 % | Edge, CPU, tagging |
70 B | 128 K | 85 % | 50 % | Self-hosted RAG, privacy |
Llama 4 Scout 17 B • 10 M ctx
Llama 4 Maverick 17 B • 1 M ctx
Cohere · Command family
Model | Params | Context | Benchmarks | Tool use | Niche |
---|---|---|---|---|---|
Light | ~ ? B | 4 K | 55 % MMLU | Prompt-level | Cheap chat, routing |
R | 35 B | 128 K | 75 % MMLU, top Arena (’24) | Native func-call, cites sources | Long-doc RAG |
R Plus | 104 B | 128 K | ≈ GPT-4 on RAG, 2× faster | Multi-step, self-correct | Frontier RAG & agents |
Amazon · Nova series
Model | Context | Multimodal | Benchmarks (≈) | Fine-tune | Ideal |
---|---|---|---|---|---|
Micro | 128 K | ❌ | Beats GPT-4o-mini by ≈2 % | ✔︎ text | Low-latency bulk chat |
Lite | 300 K | ✔︎ (image + video) | Strong vis-text, < Nova Pro | ✔︎ text+vision | Doc & media analysis |
Pro | 300 K | ✔︎✔︎ | Near GPT-4o on RAG, 2× faster, 65 % cheaper | ✔︎ | Enterprise, multilingual, agents |
🛠️ Cost-saving tips
1.
2.
3.
4.
Modified at 2025-05-08 07:17:54