Key Highlights
- Alphabet introduces specialized 8th-generation tensor processing units: TPU 8t for model training and TPU 8i for inference workloads
- TPU 8i achieves 80% improvement in cost efficiency compared to its predecessor Ironwood
- Chips developed in partnership with Broadcom, with input from Google DeepMind teams
- TPU 8t training processor supports configurations up to 9,600 chips with doubled interconnect bandwidth versus Ironwood
- Cloud customers will gain access to both processors through Google Cloud platform later in 2025
Alphabet announced a significant architectural shift in its AI chip strategy Wednesday, introducing two distinct processors that divide training and inference responsibilities for the first time in its TPU history.
The newly revealed eighth-generation lineup features the TPU 8t for model training operations and the TPU 8i optimized for inference tasks that execute trained models in real-world applications. Broadcom served as co-development partner, extending a collaboration that has spanned more than ten years.
This architectural division represents a departure from prior generations. Earlier TPU versions combined both training and inference capabilities within single chip designs. Google attributes the new approach to the expanding requirements of agentic AI systems that function in persistent cycles with minimal human oversight.
“With the rise of AI agents, we determined the community would benefit from chips individually specialized to the needs of training and serving,” said Amin Vahdat, Google’s SVP and chief technologist for AI and infrastructure.
The inference-focused TPU 8i incorporates 384 megabytes of SRAM within each chip — representing a threefold increase over Ironwood’s capacity. According to Google, this memory expansion eliminates latency bottlenecks that occur when multiple users simultaneously query models, a phenomenon the company refers to as the “waiting room” effect.
Enhanced Economics for Inference Operations
Performance economics show substantial advancement with the TPU 8i, which provides 80% superior performance-per-dollar relative to Ironwood. In operational terms, organizations can accommodate approximately double the workload while maintaining identical expenditure levels.
Energy efficiency also improves significantly, with the chip delivering up to twice the performance-per-watt through integrated power management systems that dynamically adjust consumption based on computational demands.
Both processors now utilize Google’s Axion CPU as their host platform for the first time, enabling optimization at the system architecture level rather than limiting efficiency gains to individual chip components.
For training applications, the TPU 8t superpod architecture supports deployments spanning 9,600 chips with 2 petabytes of high-bandwidth memory. Interchip bandwidth doubles Ironwood’s capabilities, potentially reducing frontier model development timelines from months to weeks according to Google’s estimates.
The training processor also achieves 2.8 times the computational performance of the seventh-generation Ironwood platform at equivalent price points.
Enterprise and Research Adoption
Multiple organizations have already integrated Google’s TPU technology into their workflows. Citadel Securities developed quantitative research platforms using TPU infrastructure. The entire network of 17 U.S. Department of Energy national laboratories operates AI co-scientist applications on the chips. Anthropic has secured commitments for multiple gigawatts of Google TPU computing capacity.
DA Davidson analysts projected in September that the combined value of Google’s TPU operations and DeepMind could reach approximately $900 billion.
Google maintains an exclusive distribution model for TPUs — the processors are available solely through Google Cloud services rather than direct hardware sales. Nvidia continues to supply GPU hardware to Google, and the company confirmed it will be among the initial cloud platforms to deploy Nvidia’s forthcoming Vera Rubin system later this year.
Google DeepMind teams participated directly in the chip design process, subsequently utilizing the processors to train Gemini models and optimize algorithms powering Search and YouTube platforms.
Google confirmed that both the TPU 8t training chip and TPU 8i inference chip will reach general availability for cloud platform customers before the end of 2025.


