Google Just Split Its AI Chip Into Two. One for Training. One for Inference. That's a Bigger Deal Than It Sounds.

GoogleAI ChipsInfrastructure

Karan Gosrani

Team Converzoy|April 28, 2026

Google Just Split Its AI Chip Into Two. One for Training. One for Inference. That's a Bigger Deal Than It Sounds.

For years, the default assumption in AI infrastructure was that you needed one type of chip that could do everything. Train models, run inference, handle agent workloads, serve production traffic. Nvidia's H100 and its successors were built around that assumption. The market rewarded it.

Google just made a different bet.

At Cloud Next 2026, Google unveiled its eighth-generation TPU — and for the first time, it's not one chip. It's two. TPU 8t is built exclusively for training. TPU 8i is built exclusively for inference. They have different architectures, different scale profiles, and different performance characteristics. The message is clear: the era of the general-purpose AI chip doing everything is ending, and Google is building for what comes next.

What Each Chip Actually Does

TPU 8t is the training chip. Google describes it as a "training powerhouse" built to compress frontier model development cycles from months to weeks. A single TPU 8t superpod scales to 9,600 chips, delivers 121 exaflops of FP4 compute, carries two petabytes of high-bandwidth memory, and doubles the interchip interconnect bandwidth of the previous generation. For the companies that need to train large models repeatedly and fast, those numbers matter.

TPU 8i is the inference chip, and it's where the more interesting architectural story sits. It's built specifically for agentic workloads, which have different performance requirements than batch training runs. Agents need low latency, high memory bandwidth for serving multiple concurrent sessions, and the ability to handle the back-and-forth of multi-step tasks without degrading. TPU 8i scales to 1,152 chips per pod, delivers 11.6 exaflops of FP8 compute, and carries 331.8TB of HBM across a full pod.

Google claims 80% improvement in price-performance for the 8i over its predecessor, and up to 2.8x for the 8t.

Why Splitting the Chip Matters

The decision to build two chips instead of one is a strategic statement about where AI workloads are going.

Training and inference have always had different requirements. Training is batch-oriented, memory-heavy, tolerant of latency, and benefits from raw compute density. Inference is latency-sensitive, session-concurrent, and needs memory bandwidth more than peak compute. A chip optimised for both tends to be mediocre at each.

Nvidia has managed this tension by building increasingly powerful general-purpose GPUs and letting software handle the workload differentiation. That works when you're selling into a market that doesn't have better options. It works less well when you're competing against purpose-built silicon from a company with Google's infrastructure scale.

The agentic AI angle is the forward-looking piece. [Google declared the age of the agentic cloud at Cloud Next](https://converzoy.com/insights/google-cloud-next-2026-agentic-cloud), and TPU 8i is the hardware layer of that strategy. Agents running continuously, handling multiple tasks in parallel, serving real-time user requests — that workload profile is fundamentally different from serving a single-turn chat response. A chip designed specifically for it should outperform a general-purpose GPU on the metrics that matter for agents: latency, concurrency, cost per session.

The Nvidia Context

Every major tech company building its own silicon is a vote against Nvidia's long-term dominance. [SpaceX's decision to manufacture its own GPUs](https://converzoy.com/insights/spacex-in-house-gpu-terafab) through the Terafab project is the most recent example. Google's TPU program is the longest-running and most mature.

The difference with TPU 8 is the specificity. Previous TPU generations were positioned as training alternatives for Google's own workloads, with some availability to Cloud customers. TPU 8t and 8i are positioned as products — competitive offerings for enterprise customers choosing infrastructure for the agentic era.

The 121 exaflops figure for TPU 8t is the headline number, but the more interesting competition is at the inference layer. Inference is where the volume is. Every user query, every agent action, every API call runs inference. The company that owns inference at scale owns the recurring revenue. Google is making a direct play for that with TPU 8i.

Both chips are coming to general availability later in 2026. By the time they're broadly accessible, the agentic workload profile they're designed for will be mainstream. The timing is deliberate.

OpenAI Just Shipped GPT-5.5. It's Also Quietly Missing Its Own Revenue Targets.

Apr 28, 2026

Meta Paid $2 Billion for Manus. China Is Ordering It Back.

Apr 28, 2026

ChatGPT Images 2.0 Thinks Before It Draws. DALL-E 3 Has Three Weeks Left.

Apr 27, 2026

OpenAI Shipped GPT-5.5 Six Weeks After GPT-5.4. The Release Cadence Is the Story.

Apr 26, 2026

Google Just Split Its AI Chip Into Two. One for Training. One for Inference. That's a Bigger Deal Than It Sounds.

What Each Chip Actually Does

Why Splitting the Chip Matters

The Nvidia Context

You might also like

Ready to convert more visitors?