1. Executive Summary

Google’s new TurboQuant announcement triggered a clean market instinct: if large-language-model inference can run with materially less memory, then memory suppliers should be worth less. That reaction is understandable. It is also likely too aggressive in its current form. Google Research said TurboQuant can reduce KV-cache memory by at least 6x and improve attention-logit computation by up to 8x on Nvidia H100 in one benchmark configuration. Those are real numbers. But they describe a specific inference bottleneck, not proof that total AI memory demand is about to roll over.

The market appears to be collapsing two separate ideas into one. The first is memory intensity per workload: how much memory one task needs. TurboQuant clearly improves that. The second is aggregate ecosystem memory demand: how much total advanced memory hyperscalers and AI systems consume overall. That depends not only on efficiency, but on usage growth, longer context windows, model scale, broader deployment, and infrastructure build-out. A fall in memory per task does not automatically imply a fall in total memory demand.

That distinction matters because Google’s own infrastructure roadmap still points the other way. Google Cloud’s Ironwood disclosures describe 192 GiB HBM per chip, 7.4 TB/s peak HBM bandwidth, 1.77 PB of directly accessible HBM, and pod scale of up to 9,216 chips. Alphabet also said on its latest earnings call that Gemini serving unit costs were reduced by 78% over 2025, yet the company still expects 2026 CapEx of $175 billion to $185 billion, versus $91.4 billion in 2025, while describing Google Cloud demand as strong despite a tight supply environment. That is not the behavior of a company acting as though memory and AI infrastructure are suddenly becoming less important.

Micron’s own positioning is also stronger than the price action suggests. In its fiscal Q1 2026 prepared remarks, Micron said it had completed agreements on price and volume for its entire calendar 2026 HBM supply, and that it expects HBM TAM to grow from approximately $35 billion in 2025 to around $100 billion in 2028, implying roughly 2.86x growth over three years, or about 40% CAGR. Micron also said tight supply conditions should persist through and beyond calendar 2026. That is not the language of a company seeing demand destruction. It is the language of a company operating inside a constrained, high-value product cycle.

The neutral conclusion of this report is straightforward: Google improved a bottleneck, but it did not yet disprove the broader HBM supercycle. The stronger conclusion is that the market may be confusing a workload-level efficiency gain with a system-level demand collapse. If anything, the deeper signal may be the opposite: memory has become valuable enough that saving it is now frontier research.

2. What Actually Happened

Google Research introduced TurboQuant as a training-free compression algorithm for large language model KV caches and vector search. According to coverage of the release, the method quantizes KV caches down to roughly 3 bits, cuts KV-cache memory by at least 6x, and improves attention-logit computation by up to 8x on H100 in one benchmark setup. Google also positioned the algorithm as having negligible runtime overhead and as being suitable for production inference and large-scale vector search.

The most important point is what Google did not announce. It did not say that aggregate AI memory demand is falling. It did not say that HBM is no longer strategic. It did not say that hyperscalers will need less infrastructure in total. It announced a breakthrough in compressing one costly part of the inference stack — the KV cache — while the company’s own hardware roadmap still points to very large memory pools at system scale.

That is why the first market read may be too simple. The market seems to have translated:

❝

“one part of inference got materially more efficient”

into:

❝

“memory suppliers now face structurally lower demand.”

That leap remains unproven.

3. The Core Analytical Distinction

The entire report turns on one analytical separation:

Concept	Meaning	Why It Matters
Memory intensity per workload	How much memory one inference task consumes	TurboQuant can reduce this directly
Aggregate memory demand	How much total advanced memory the AI ecosystem consumes overall	This depends on usage, model scale, deployment breadth, and infrastructure expansion

TurboQuant clearly improves the first variable. It does not automatically settle the second. In AI infrastructure, lower cost per task can increase adoption, increase context usage, broaden enterprise deployment, and expand inference volume fast enough that total memory demand still rises. This is the core reason a 6x KV-cache reduction should not be lazily translated into a broken Micron thesis.

4. Why Google’s Own Hardware Roadmap Still Supports the Memory Thesis

Google’s Ironwood stack is the strongest counterargument to the simplistic bearish interpretation.

The disclosed Ironwood memory architecture is summarized below. Google Cloud said Ironwood has 1.77 PB of directly accessible HBM, and that each chip has 192 GiB of HBM with peak HBM bandwidth of 7.4 TB/s. The system can scale to 9,216 chips in a superpod.

Ironwood Metric	Value
HBM per chip	192 GiB
HBM bandwidth per chip	7.4 TB/s
Directly accessible HBM	1.77 PB
Pod scale	9,216 chips

Google Cloud also explicitly described the decode phase of large generative models as memory-bandwidth-intensive and said Ironwood was engineered for both the large-batch prefill phase and that decode phase. That is a crucial signal: Google is not acting as though memory has become secondary. It is still optimizing around memory as a major system bottleneck.

This leads to a very important conclusion:

❝

Google is optimizing memory because memory still matters.

If HBM had become structurally less important, Google’s own flagship AI infrastructure would not still be built around such large memory pools and such extreme bandwidth targets.

5. The “Fear Math” Behind the Selloff

The bearish instinct can be reconstructed with a simple piece of arithmetic.

If Ironwood offers 1.77 PB of directly accessible HBM, and if TurboQuant cuts KV-cache memory by 6x, investors can loosely think of the same workload as now fitting into:

1.77 × 6 = 10.62 PB of KV-cache-equivalent capacity

That number explains the emotional logic of the selloff. It makes the breakthrough sound like a major demand shock for memory suppliers. The problem is that this is workload-specific equivalence, not proof of a system-wide 6x reduction in memory demand across the AI stack. The market’s shortcut is understandable. It is not yet analytically complete.

6. The CapEx Contradiction Is the Strongest Counterargument

Alphabet’s own capital-spending behavior is the strongest evidence against a simplistic “less memory needed” reading.

On its February 2026 earnings call, Alphabet said it lowered Gemini serving unit costs by 78% over 2025 through model optimization, efficiency, and utilization improvements. Yet the same call said 2026 CapEx is expected to be $175 billion to $185 billion, versus $91.4 billion in 2025. Alphabet also said Google Cloud demand remains strong despite operating in a tight supply environment.

CapEx Bridge	Value
2025 CapEx	$91.4B
2026 CapEx guide (low)	$175B
2026 CapEx guide (high)	$185B
Low-end multiple vs 2025	1.91x
High-end multiple vs 2025	2.02x

The arithmetic is straightforward:

175 / 91.4 = 1.91x
185 / 91.4 = 2.02x

That is the contradiction the market needs to explain. If lower memory intensity were genuinely translating into lower total infrastructure need, Alphabet’s capital plan should be flattening, not nearly doubling. The more coherent interpretation is:

❝

unit economics improved, but demand scale still forced the system to expand.

That supports a “usage expansion” interpretation, not a “demand collapse” interpretation.

7. Micron’s Position Is Stronger Than the Price Action Suggests

Micron’s fiscal Q1 2026 prepared remarks are extremely important in this context.

Management said Micron had completed agreements on price and volume for its entire calendar 2026 HBM supply, including HBM4. It also forecast HBM TAM CAGR of approximately 40% through calendar 2028, from approximately $35 billion in 2025 to around $100 billion in 2028, and said that this milestone was now projected to arrive two years earlier than in the prior outlook. Micron also said tight demand-and-supply conditions in DRAM and NAND should persist through and beyond calendar 2026.

Micron Strategic Data Point	Value / Commentary
Calendar 2026 HBM supply	Committed on price and volume
HBM TAM 2025	~$35B
HBM TAM 2028	~$100B
Implied TAM multiple	2.86x
Management framing	~40% CAGR through 2028
Supply conditions	Tight through and beyond 2026

The raw math is powerful:

100 / 35 = 2.86x
CAGR implied by the raw figures is about 41.9%, which is consistent with Micron’s “approximately 40%” framing.

A company that has already committed its next year of HBM supply and still expects this level of market expansion is not describing a business on the edge of structural demand failure. It is describing a business with unusually strong visibility into a constrained, high-value category.

8. Micron’s Earnings Curve Still Signals Strength

Micron’s official fiscal Q1 2026 results were already strong:

FQ1 2026	Value
Revenue	$13.64B
Non-GAAP EPS	$4.78
Non-GAAP Gross Margin	56.8%
CapEx	$4.5B
Free Cash Flow	$3.9B

Micron also guided fiscal Q2 2026 to $18.70B ± $0.4B of revenue, $8.42 ± $0.20 of non-GAAP EPS, and 68.0% ± 1.0% non-GAAP gross margin.

Then, after Micron reported its March-quarter results, Reuters said the company delivered $23.86B of second-quarter revenue and guided third-quarter revenue to $33.5B ± $0.75B. MarketWatch also reported that Micron’s adjusted gross margin reached 74.9% in the February quarter and that the company was targeting 81% gross margin for the May quarter.

That creates an important earnings profile:

Metric	FQ1 2026	FQ2 2026	FQ3 2026 Guide
Revenue	13.64B	23.86B	33.5B
Non-GAAP EPS	4.78	—	—
Adjusted Gross Margin	56.8%	74.9%	81% target

Even without forcing a price multiple into this report, the trend is obvious:

Revenue is accelerating sharply
Margins are expanding sharply
HBM demand is already contract-backed
Supply remains constrained

The market is therefore not really debating whether Micron is earning power today. It is debating how long that earning power can persist. That is a much more nuanced question than “Google broke Micron.”

9. The Broader Industry Context Still Points to Constraint

This is not just a Google-versus-Micron story.

Reuters reported in March that Micron’s sharp jump in second-quarter revenue was being driven by booming demand for AI memory, that customers were committing to long-term data-center investments, and that the resulting growth in capacity needs was fueling a sharp rise in demand for advanced memory and storage, creating a supply crunch and pushing up prices. Reuters also described Micron as one of the three major suppliers of HBM essential to AI systems.

The Wall Street Journal likewise framed the market move as a selloff in Micron and other chip names after Google unveiled the new memory technology. That is exactly why this report matters: the market’s reaction was real, but the leap from “memory-compression breakthrough” to “structural demand impairment” still looks too fast.

10. Scenario Analysis

Bear Case

TurboQuant-like methods spread rapidly across production inference, memory intensity falls faster than overall AI usage rises, and investors conclude that current HBM margins and pricing are close to peak. In that case, Micron would likely re-rate lower as the market moves from “structural AI enabler” back to “cyclical memory producer.” For this case to hold, compression would need to reduce aggregate advanced-memory demand, not just improve one bottleneck. That evidence is not yet visible in Alphabet’s CapEx plans or Micron’s contracted HBM supply.

Base Case

Memory per workload becomes more efficient, but lower inference cost expands usage enough that total HBM demand remains firm. This is the most balanced scenario. It fits Google’s efficiency gains, Alphabet’s still-rising infrastructure budget, and Micron’s still-strong HBM visibility at the same time.

Bull Case

TurboQuant becomes an adoption accelerator. Cheaper inference, longer context, more multimodal use, more agents, and wider enterprise deployment push total memory demand higher despite lower memory intensity per task. In that case, Google’s breakthrough becomes memory-positive at the ecosystem level, because it helps AI usage scale faster than memory intensity falls. This is still an inference, but it is consistent with Alphabet’s spending behavior and the broader supply-constrained market described by Reuters and Micron.

11. What Would Actually Break the Micron Thesis

A real break in the Micron thesis would require more than a technical research release.

Thesis Breaker	Why It Matters
HBM pricing weakens sharply	Scarcity premium and margins would compress
Hyperscalers cut advanced-memory orders	Demand destruction would become real
Alphabet materially reduces AI CapEx	Infrastructure expansion thesis would weaken
Micron’s forward HBM visibility stops extending	Contracted supply would no longer support premium assumptions

Right now, the public evidence still points the other way: Alphabet is still investing heavily, Google is still describing tight supply, and Micron has already committed its calendar 2026 HBM book while guiding strong market growth through 2028.

12. Neutral Conclusion

The cleanest neutral conclusion is this:

❝

Google did not publish proof that AI needs less memory in aggregate.
Google published proof that AI is still memory-constrained enough that breakthrough compression matters.

That is not a semantic distinction. It is the entire investment distinction. Google improved a real bottleneck. But Google’s own hardware roadmap, Alphabet’s own CapEx guidance, and Micron’s own HBM visibility all argue that advanced memory remains strategic rather than obsolete.

13. Opinion

My view is that the market likely over-compressed the conclusion.

A fair reading is:

TurboQuant is real
Some memory-intensive AI workflows will become more efficient
The market should revisit peak-unit-economics assumptions

An overreaction is:

Micron’s structural case is broken
HBM demand is collapsing
One compression breakthrough invalidates the AI memory build-out

That is not what the evidence says. The deeper interpretation is:

❝

When saving memory becomes frontier research, memory is not irrelevant — it is scarce and valuable enough to optimize aggressively.

That is the version of the story I would sell.

Disclosure

This report is for informational and analytical purposes only. It is not personalized investment advice.

Google’s TurboQuant vs Micron