By Nick Farrell
Publication Date: 2025-12-29 10:08:00
Hybrid bonding, SRAM dies, and a possible CUDA headache
Nvidia wants to own inference, and word on the street is that it is lining up its Feynman GPUs to do it.
The dark satanic rumour mill has spun a hell-on-earth yarn claiming that Nvidia could integrate LPU units into next-gen Feynman GPUs, using an IP licensing deal for Groq’s LPU tech as the entry point.
GPU expert AGF reckons the LPUs could be stacked on Feynman using TSMC’s hybrid bonding, a move aimed at stuffing more low-latency memory close to compute.
The comparison is AMD’s X3D play, where extra cache gets bonded on top, except the “extra” looks like LPU dies packed with SRAM banks.
AGF argues that building SRAM as a monolithic block on leading-edge nodes makes little sense because SRAM scaling is limited, and it would burn up pricey wafer area for minimal gain.
Instead, the idea is a main Feynman compute die on something like A16 (1.6nm) handling tensor blocks and control logic, with separate LPU dies carrying the SRAM.
Wider hybrid-bonded links would do the joining, promising a fat interface and lower energy per bit than off-package memory, which sounds lovely on a slide.
If A16 really comes with backside power delivery, that frees up the front side for vertical SRAM connections, pushing latency down where inference actually hurts.
The neat-looking diagram doing the rounds shows TSVs, vertical SRAM connections, LPU dies as SRAM banks and a hybrid bonding interface, all stacked as if…