Key Highlights
- Google unveiled two additional Gemini API service tiers: Flex and Priority
- Flex provides 50% cost savings for tasks that can tolerate delayed responses
- Priority charges 75–100% premium for mission-critical, time-sensitive operations
- Batch API continues offering 50% discount with latency up to 24 hours
- Caching tier uses token-based pricing tied to storage time
On April 2, Google rolled out significant changes to its Gemini API pricing structure, introducing five separate service tiers: Standard, Flex, Priority, Batch, and Caching. This expansion provides developers with greater flexibility to optimize costs, performance, and reliability based on their specific application requirements.
The newly introduced Flex tier targets background operations where immediate response isn’t critical. By leveraging unused compute resources during off-peak periods, it delivers 50% cost reduction compared to standard pricing. Response delays can vary from one minute to fifteen minutes without guaranteed delivery times. Ideal applications include customer relationship management updates, computational research tasks, and autonomous agent workflows.
What differentiates Flex from the current Batch API is its synchronous endpoint architecture. Developers avoid the complexity of file management and job status monitoring, gaining a more straightforward implementation while maintaining identical cost benefits.
On the opposite side, the Priority tier addresses high-stakes applications. With pricing 75% to 100% above standard rates, it guarantees superior reliability for time-critical business operations. Latency ranges from milliseconds to just a few seconds.
Google identifies Priority as optimal for real-time customer service chatbots, financial fraud monitoring, and automated content filtering systems. When Priority tier capacity limits are reached, excess requests gracefully downgrade to Standard tier instead of generating errors.
Complete Pricing Tier Overview
The previously available Batch API maintains its 50% discount off standard rates, accepting processing windows extending to 24 hours. It remains the best choice for large-scale offline workloads where timing isn’t a concern.
The Caching tier operates on a pricing model determined by token volume and retention period. Google recommends this option for conversational agents with extensive system prompts, recurring analysis of large video content, or searches across substantial document collections.
Both Flex and Priority tiers operate through a single service_tier parameter within API calls. Developers can switch between tiers by modifying one configuration setting, with the API response indicating which tier processed each request.
Flex tier access extends to all paid tier users for GenerateContent and Interactions API requests. Priority tier availability restricts to Tier 2 and Tier 3 paid accounts using identical endpoints.
Developer Benefits
The consolidated interface represents the most significant advancement. Previously, managing both background and interactive workloads demanded separate architectural approaches using different synchronous and asynchronous systems. The updated system handles both scenarios through uniform synchronous endpoints.
Google positioned this enhancement as part of its continued investment in AI agent development, recognizing that modern applications frequently require simultaneous handling of both low-priority background tasks and urgent interactive functions.
Gemini API product manager Lucia Loher and engineering lead Hussein Hassan Harrirou announced these changes on April 2, 2026.


