
AI giveth and AI taketh CPU
Recorded on the floor of HumanX, Ryan is joined by AMD CTO Mark Papermaster to discuss AMD’s silicon strategy for AI borne of their long history of heterogeneous CPU/GPU computing, how chipmakers are dealing the wide range of AI workloads from training to inference, and the paradox of agents both...
AI giveth and AI taketh CPU - Stack Overflow
Stack Overflow Business Stack Internal: the knowledge intelligence layer that powers enterprise AI.Stack Data Licensing: decades of verified, technical knowledge to boost AI performance and trust.Stack Ads: engage developers where it matters — in their daily workflow.Want to learn more about the topics Mark and Ryan discussed in this episode? Check out the AMD Advanced Insights podcast, a monthly show hosted by Mark.Connect with Mark on LinkedIn.TRANSCRIPT[Intro Music]Ryan Donovan: I'm Ryan Donovan, host of the Stack Overflow Podcast, and I'm here at HumanX to talk about the silicon aspect of AI. I'm here with Mark Papermaster, who's CTO of AMD. We're gonna talk all about it. So, welcome to the show.Mark Papermaster: Ryan, thanks for having me.Ryan Donovan: I wanted to start off, I'm sure you've seen this, I saw an April Fool's article about AMD buying Intel and a lot of people were a little taken aback, but also feeling like this was a interesting turn of events. You all have seemed to have made better choices than Intel, at least in terms of your market cap. What do you think your strategy has been in terms of AI? Do you think your approach to AI has been rewarded?Mark Papermaster: Well, Ryan, what we've done at AMD is had a very, very focused development effort to drive our roadmap to be leadership across everything from supercomputing, the cloud, the edge and endpoint devices like PCs. So, when you think about that, it drives us to be laser-focused on, 'what do customers need? What really gives customer value?' Because that's what we're focused on. So, every product line, the way we run our processes. And we really, over the last decade plus, reinvented how we develop products, and it starts with a keen ear in listening to our customers. [Lisa] Su became CEO, and she highlighted three mantras for the company: build great products, really listening to customers and having delighted customers, and simplify everything we do. So, that focus is really powerful, and there's a culture of collaboration innovation. My role as CTO is to drive innovation, to drive our roadmap, to make sure that we're not ever getting complacent, and so that led us to AI. I mean, what attracted me to AMD 14 years ago was the fact that AMD was really the only company that had a deep footprint in CPUs, a deep footprint in GPUs. And when you look at AI, it needs both heterogeneous computing across both. And so, that's what we've been able to do, is to develop an engineering culture that really derives that value out of heterogeneous computing. And again, now we've even further expanded our portfolio all the way into embedded devices, and adaptive computing. So, AI is incredibly dependent on high-performance computing, and that's what our company is all about.Ryan Donovan: Yeah, and it's interesting, you talk about the CPU and the GPU. I think my long-term understanding of AMD was mostly on the CPU, right?Mark Papermaster: Yes.Ryan Donovan: As X86 alternative.Mark Papermaster: You bet. That's our heritage.Ryan Donovan: And obviously with AI, everybody is sort of focused on the GPU, TPU side of the house. How do you fit those together in a– I don't know if you fit them together in a single chip or in a single combined piece of silicon. How do you get those to work together?Mark Papermaster: Well, what people don't realize is that we've been combining CPU and GPU since 2011. It's been 15 years. When we started with PCs. And it wasn't about AI because it was 15 years ago. So, it was about, how do you get the best experience where you're running a combination of computing and graphics, whether it be gaming, whether it be just visualization on the PC that you're running, or workstation applications. So, we were really the first to create that type of tight integration of x86 with GPUs with a completely shared memory. So, it's very, very power efficient because it's not like you have to send something that you're processing in the CPU and send it over to a different GPU, and all the power that gets burned in doing that–Ryan Donovan: So, they sort of share the, like, L1, L2 cache?Mark Papermaster: Yeah, it's fully coherent. It's the same memory. So, it's not at a level one cache, but it's a shared memory, and so that is incredibly efficient. So, a lot of experience in heterogeneous computing, and we started over a dozen years ago. We started the heterogeneous system architecture with us and other companies because we've not only been committed to heterogeneous computing, we've been committed to doing that in an open ecosystem. And that really differentiates us versus our big GPU competitor, Nvidia. We've always been about being open. What you find is, as we then brought GPUs and CPUs into the data center, by then we had innovated on chiplets. So, you asked, is it always on the same piece of silicon? Well, for the data center where you need massive CPU and GPU compute, we use chiplets to be able to give different configurations. The number one supercomputers, and number two in the world, very tightly combine CPU and GPU chiplets together on a one carrier connected over silicon, but it's actually different chiplets that we put together. Incredibly energy efficient.Ryan Donovan: Yeah.Mark Papermaster: So, we're all about how to deliver value and innovation in how you put the computing elements together.Ryan Donovan: So, talk about chiplets. What's the difference between a chip and a chiplet?Mark Papermaster: Yeah. So, when you think about that, when you say it's one chip, it's a homogeneous chip, that means you're creating one design, it's gonna go to one semiconductor technology node. It's all built on the same node. And building a chip- think about photography, where you create an image. Well, that's what you do. You're creating a chip on a given technology and creating images of multiple layers. That's all the transistors, and all the wiring, and how you put it together. And so, it turns out [that] as you build bigger and bigger chips, it's harder to manufacture. So, we innovated with chiplets, meaning partition it. [We] created a hierarchical and a partitioned organization. So, we broke out the CPU compute elements—this is our first heterogeneous implementation from the chiplet that connects to all the memory and the IO off to storage, and other devices. And that way, we could put the CPU on the latest bleeding-edge semiconductor technology node, and the chip that docks to all the IO and memory, it doesn't need to be on there. It can be on a much more cost-efficient, older node. It also gave us agility. So, with just a CPU chiplet and an IO interface, we could create different combinations of CPU chiplets. You can take that same chiplet that you use for the data center, and you can use it in desktop PC applications. That's how we build our workstations, and we ended up doing the same thing in our GPUs. When we launched our big GPU to take on Nvidia in December of 2023. It's chiplet-based.Ryan Donovan: I've heard there's a manufacturing bottleneck around the high-end chips, that [there are] basically two places that can make those. Does this get around the manufacturing bottleneck?Mark Papermaster: Makes it easier because these chiplets are easier to manufacture. They yield better, so we're more efficient, and we work so closely with our supply chain. We worked with TSMC for decades. We plan our supply dependencies on them [for] years in advance. We've already locked in 2026 [and] 2027. We do the same thing with our memory partners. Long-term relationships we've had with them locking our supply, and then how you package them and put them together. Also, we're heavily invested in that industry. So, people do think it's all about the design. It's much more than just the design. It's how you architect putting it together. It's how you architect your supply chain and build those relationships to avoid bottlenecks in delivering to your customers.Ryan Donovan: So, in the architecting [of] the chiplets and the chiplet combinations, how are you adjusting to the changes in workload needs, like the increased value of inference it's been in the last year or so? And I'm sure there's gonna be other shifts.Mark Papermaster: Chiplets are a piece of our agility there, Ryan. So, before inferencing took off and it was all about training 'cause we had to train all these models, we had a diversion of needs between the traditional high-performance compute that all the national labs run on, weather forecasting, designing new molecules and enzymes. They need high precision. Some of the work, some of the analysis they do, can't use the approximations of AI. So, we created– using chiplets, we've had almost identical versions of our GPUs, one oriented for high performance computing that integrated CPU and GPU together on the same carrier, and another one for largely inferencing tasks that are all GPU. So, it gave us a lot of flexibility, just like our CPU chiplets gave us flexibility on how many cores do people need, CPU cores. We were able to tailor what type of AI are you running? And now, we're seeing even more and more diversity because as inferencing starts to take off, inferencing has many different flavors of the workloads. Are you vibe-coding, where you need a low latency, and you're using our caches for low latency? Are you needing to have [a] large context window? Your prompts are huge, you're running agentic flows, so you're dependent on leveraging all of that memory that hangs off of the CPU, and we have that flexibility with very high-speed buses between our CPUs and GPUs. So, our approach of modularity and of partitioning out how we implement gives us a lot of flexibility to tailor as workloads evolve.Ryan Donovan: Yeah. So in managing [those] workloads, where does that management happen? Does it happen at the silicon level, or is there a software level that sends things to the GPU levels, to the memory bus, or whatever it is.Mark Papermaster: Yeah, that c
📰Originally published at stackoverflow.blog
Staff Writer