Context Windows for LLMs Supported in MaestroQA
Model Name | Context Window | Max Output Tokens |
OpenAI Models |
|
|
GPT-4o Mini | 128,000 | 16,384 |
GPT-4 | 128,000 | 16,384 |
Anthropic Models |
|
|
Claude Haiku | 200,000 | 4,096 |
Claude Haiku 3.5 | 200,000 | 8,192 |
Claude Sonnet | 200,000 | 8,192 |
Meta Models |
|
|
Llama 3 (11B) | 128,000 | 2,048 |
Llama 3 (70B) | 128,000 | 2,048 |
Cohere Models |
|
|
Cohere Command R | 128,000 | 4,000 |
Cohere Command R Plus | 128,000 | 4,000 |
Amazon Models |
|
|
Amazon Nova Lite | 300,000 | 5,000 |
Amazon Nova Pro | 300,000 | 5,000 |
Google Models |
|
|
Gemini Pro 2.0 (Experimental) | 2,096,000 | 8,192 |
Gemini Flash 2.0 | 1,048,576 | 8,192 |
Gemini Flash-Lite 2.0 | 1,048,576 | 8,192 |
What does Context Window mean?
The context window is your AI assistant's active memory space:
Measured in tokens (approximately 4 characters or ¾ of a word in English)
When you exceed this limit, the oldest information scrolls away and becomes inaccessible to the AI
Real-world impact: A larger context window lets you discuss complex topics, analyze longer documents, or maintain longer conversation threads without the AI losing track
What does Max Output Token mean?
Max output tokens define how much text an AI can generate in a single response:
Sets the upper limit on how long each individual AI reply can be
Ranges from 2,048 tokens (about 1-2 pages) to 16,384 tokens (about 8-12 pages) depending on the model
Real-world impact: Higher limits allow for more comprehensive answers, detailed code explanations, or thorough document analysis without interruption