In Today's Issue:
🔥 Meta announced 4 generations of custom AI chips — what it means for the rest of us
🧠 GTC wrap: inference is officially the main event
💰 An AI accounting startup hit $1.15B valuation in 2 years
🤖 81% of doctors now use AI. That number was 38% two years ago.
✨ NVIDIA's Feynman 2028 roadmap — and what it means for GPU buyers
Hey —
GTC 2026 just wrapped, and the picture is clear: the AI hardware landscape is entering a new phase.
NVIDIA laid out a roadmap that puts inference front and center. Meta made its biggest move yet toward chip independence. And a two-year-old startup proved that agentic AI is already generating real revenue.
Here's what you need to know.
🔥 Meta's Custom Silicon Bet
Meta just unveiled four generations of custom AI chips: MTIA 300, 400, 450, and 500. All shipping between now and 2027. The goal is to optimize inference costs across their 3 billion users.
What's interesting: Meta isn't trying to replace NVIDIA. They're still buying GPUs for training and just signed an AMD deal for additional capacity. The MTIA line is purely inference — starting with ranking and recommendation models, then expanding to generative AI with the 450 and 500.
The MTIA chips use a modular chiplet architecture, letting Meta iterate on a roughly six-month cycle. That's fast for custom silicon. But the real question is timing — the 450 and 500 won't hit mass production until 2027, by which point NVIDIA's Vera Rubin and Groq 3 LPUs will already be in the field.
What it means for your stack: Custom silicon makes sense at Meta's scale — billions of daily predictions across a handful of well-understood model architectures. For most companies, the economics still favor renting GPU capacity. The break-even point for building your own chips is somewhere north of 100,000 units, and very few organizations will ever get there.
🧠 GTC Wrap: The Inference Inflection Point
The biggest theme from San Jose wasn't a chip launch. It was a number: 80-85% of AI compute is now inference, not training.
That shift changes everything about how infrastructure decisions get made. Training is a one-time investment. Inference runs 24/7/365. And the optimization targets are completely different — training is about throughput, inference is about latency and cost-per-query.
This is the strategic logic behind NVIDIA's $20B Groq acquisition. Rather than forcing GPUs to do everything, NVIDIA now has purpose-built silicon for the decode phase of inference — the part that's fundamentally memory-bandwidth bound. Vera Rubin handles the heavy lifting, Groq 3 LPUs handle fast token generation. It's a smart architectural split.
The takeaway for infrastructure buyers: if you're still planning your AI budget around training costs alone, you're optimizing for a shrinking share of total spend. The real cost center is inference — and it's growing fast.
💰 2 Years to Unicorn: Basis Makes Accounting Interesting
Basis AI hit a $1.15B valuation after a $100M Series B led by Accel and GV. They build agentic AI for accounting — financial statements, tax returns, expense tracking. Not a copilot that suggests things. The AI handles the work end-to-end.
30% of the top 25 US accounting firms already use it. Founded in 2023.
This is the pattern worth watching: find an industry drowning in repetitive, high-value manual work, deploy AI agents that actually complete tasks (not just assist), and scale across enterprise customers who measure ROI in headcount savings.
The infrastructure angle: Running agentic AI at this scale — across hundreds of firms processing sensitive financial data — is a serious compute challenge. The inference costs alone for a system that needs to be accurate, auditable, and always-on are significant. We'll dig deeper into the economics of agentic AI infrastructure in an upcoming deep dive.
🤖 81% of Doctors Now Use AI
An AMA survey shows 81% of physicians now use AI in their practice, up from 38% in 2023. The killer app isn't diagnosis — it's documentation. AI handles clinical notes so doctors can focus on patients instead of screens.
Amazon is leaning in too, launching a Health AI agent for Prime members: 24/7 virtual care, lab result interpretation, prescription renewals.
Healthcare went from cautious experimentation to mass adoption in two years. No other industry has moved this fast. The lesson: AI adoption accelerates fastest when it removes the most hated part of someone's job.
👀 Looking Ahead: Feynman 2028
NVIDIA previewed its next-generation Feynman architecture at GTC. Shipping 2028. Key details:
3D die stacking
Custom HBM4E or HBM5 memory
TSMC A16 1.6nm process
New Rosa CPU
The big signal: NVIDIA is now on an annual refresh cycle for data center hardware. Each generation brings meaningful performance-per-watt improvements, which means the useful life of any GPU purchase is getting shorter.
For infrastructure planners, this accelerates the rent-vs-buy calculation. Cloud and managed GPU services look increasingly attractive when hardware generations turn over this quickly. The companies with the most flexibility in their infrastructure commitments will have a real advantage.
That's it for today. If this was useful, forward it to someone making AI infrastructure decisions. We're covering this daily.
— Steven, ClusterMind

