In Today's Issue:
  • 🔥 Meta announced 4 generations of custom AI chips — what it means for the rest of us

  • 🧠 GTC wrap: inference is officially the main event

  • 💰 An AI accounting startup hit $1.15B valuation in 2 years

  • 🤖 81% of doctors now use AI. That number was 38% two years ago.

  • NVIDIA's Feynman 2028 roadmap — and what it means for GPU buyers

Hey —

GTC 2026 just wrapped, and the picture is clear: the AI hardware landscape is entering a new phase.

NVIDIA laid out a roadmap that puts inference front and center. Meta made its biggest move yet toward chip independence. And a two-year-old startup proved that agentic AI is already generating real revenue.

Here's what you need to know.

🔥 Meta's Custom Silicon Bet

Meta just unveiled four generations of custom AI chips: MTIA 300, 400, 450, and 500. All shipping between now and 2027. The goal is to optimize inference costs across their 3 billion users.

What's interesting: Meta isn't trying to replace NVIDIA. They're still buying GPUs for training and just signed an AMD deal for additional capacity. The MTIA line is purely inference — starting with ranking and recommendation models, then expanding to generative AI with the 450 and 500.

The MTIA chips use a modular chiplet architecture, letting Meta iterate on a roughly six-month cycle. That's fast for custom silicon. But the real question is timing — the 450 and 500 won't hit mass production until 2027, by which point NVIDIA's Vera Rubin and Groq 3 LPUs will already be in the field.

What it means for your stack: Custom silicon makes sense at Meta's scale — billions of daily predictions across a handful of well-understood model architectures. For most companies, the economics still favor renting GPU capacity. The break-even point for building your own chips is somewhere north of 100,000 units, and very few organizations will ever get there.

🧠 GTC Wrap: The Inference Inflection Point

The biggest theme from San Jose wasn't a chip launch. It was a number: 80-85% of AI compute is now inference, not training.

That shift changes everything about how infrastructure decisions get made. Training is a one-time investment. Inference runs 24/7/365. And the optimization targets are completely different — training is about throughput, inference is about latency and cost-per-query.

This is the strategic logic behind NVIDIA's $20B Groq acquisition. Rather than forcing GPUs to do everything, NVIDIA now has purpose-built silicon for the decode phase of inference — the part that's fundamentally memory-bandwidth bound. Vera Rubin handles the heavy lifting, Groq 3 LPUs handle fast token generation. It's a smart architectural split.

The takeaway for infrastructure buyers: if you're still planning your AI budget around training costs alone, you're optimizing for a shrinking share of total spend. The real cost center is inference — and it's growing fast.

💰 2 Years to Unicorn: Basis Makes Accounting Interesting

Basis AI hit a $1.15B valuation after a $100M Series B led by Accel and GV. They build agentic AI for accounting — financial statements, tax returns, expense tracking. Not a copilot that suggests things. The AI handles the work end-to-end.

30% of the top 25 US accounting firms already use it. Founded in 2023.

This is the pattern worth watching: find an industry drowning in repetitive, high-value manual work, deploy AI agents that actually complete tasks (not just assist), and scale across enterprise customers who measure ROI in headcount savings.

The infrastructure angle: Running agentic AI at this scale — across hundreds of firms processing sensitive financial data — is a serious compute challenge. The inference costs alone for a system that needs to be accurate, auditable, and always-on are significant. We'll dig deeper into the economics of agentic AI infrastructure in an upcoming deep dive.

🤖 81% of Doctors Now Use AI

An AMA survey shows 81% of physicians now use AI in their practice, up from 38% in 2023. The killer app isn't diagnosis — it's documentation. AI handles clinical notes so doctors can focus on patients instead of screens.

Amazon is leaning in too, launching a Health AI agent for Prime members: 24/7 virtual care, lab result interpretation, prescription renewals.

Healthcare went from cautious experimentation to mass adoption in two years. No other industry has moved this fast. The lesson: AI adoption accelerates fastest when it removes the most hated part of someone's job.

👀 Looking Ahead: Feynman 2028

NVIDIA previewed its next-generation Feynman architecture at GTC. Shipping 2028. Key details:

  • 3D die stacking

  • Custom HBM4E or HBM5 memory

  • TSMC A16 1.6nm process

  • New Rosa CPU

The big signal: NVIDIA is now on an annual refresh cycle for data center hardware. Each generation brings meaningful performance-per-watt improvements, which means the useful life of any GPU purchase is getting shorter.

For infrastructure planners, this accelerates the rent-vs-buy calculation. Cloud and managed GPU services look increasingly attractive when hardware generations turn over this quickly. The companies with the most flexibility in their infrastructure commitments will have a real advantage.

That's it for today. If this was useful, forward it to someone making AI infrastructure decisions. We're covering this daily.

— Steven, ClusterMind

Keep Reading