Today, weâre bringing you something a little different. The Code Conference was this week, and we had a great time talking live onstage with all of our guests. Weâll be sharing a lot of these conversations here in the coming days, and the first one weâre sharing is my chat with Dr. Lisa Su, the CEO of AMD.
Technology
AMD CEO Lisa Su on the AI revolution and competing with Nvidia
AMD CEO Lisa Su discusses the chip giant’s rivalry with Nvidia, how AI is revolutionizing the GPU business, government regulation, and how AMD is using AI in the workforce.
Lisa and I spoke for half an hour, and we covered an incredible number of topics, especially about AI and the chip supply chain. These past few years have seen a global chip shortage, exacerbated by the pandemic, and now, coming out of it, thereâs suddenly another big spike in demand thanks to everyone wanting to run AI models. The balance of supply and demand is overall in a pretty good place right now, Lisa told us, with the notable exception of these high-end GPUs powering all of the large AI models that everyoneâs running.
The hottest GPU in the game is Nvidiaâs H100 chip. But AMD is working to compete with a new chip Lisa told us about called the MI300 that should be as fast as the H100. Thereâs also a lot of work being done in software to make it so that developers can move easily between Nvidia and AMD. So we got into that.
Youâll also hear Lisa talk about what companies are doing to increase manufacturing capacity. The CHIPS and Science Act that recently passed is a great step toward building chip manufacturing here in the United States, but Lisa told us it takes a long time to bring up that supply. So I wanted to know how AMD is looking to diversify this supply chain and make sure it has enough capacity to meet all of this new demand.
Finally, Lisa answered questions from the amazing Code audience and talked a lot about how much AMD is using AI inside the company right now. Itâs more than you think, although Lisa did say AI is not going to be designing chips all by itself anytime soon.
Okay, Dr. Lisa Su, CEO of AMD. Here we go.
This transcript has been lightly edited for length and clarity.
Hello, hello. Nice to see you.
Nice to see you.
Thank you for having me.
I have a ton to talk about â 500 cardsâ worth of questions. Weâre going to be here all night. But letâs start with something exciting. AMD made some news today in the AI market. Whatâs going on?
Well, I can say, first of all, the theme of this whole conference, AI, is the theme of everything in tech these days. And when we look at all of the opportunities for computing to really advance AI, thatâs really what weâre working on. So yes, today, we did have an announcement this morning from a company, a startup called Lamini, a great company that weâve been working with, some of the top researchers in large language models.
And the key for everyone is, when I talk to CEOs, people are all asking, âI know I need to pay attention to AI. I know I need to do something. But what do I do? Itâs so complicated. There are so many different factors.â And with these foundational models like Llama, which are great foundational models, many enterprises actually want to customize those models with their own data and ensure that you can do that in your private environment and for your application. And thatâs what Lamini does.
They actually customize models, fine-tune models for enterprises, and they operate on AMD GPUs. And so that was a cool thing. And we spent a bit of time with them, quite a bit of time with them, really optimizing the software and the applications to make it as easy as possible to develop these enterprise, fine-tuned models.
I want to talk about that software in depth. I think itâs very interesting where weâre abstracting the different levels of software development away from the hardware. But I want to come back to that.
I want to begin broadly with the chip market. Weâre exiting a period of pretty incredible constraint in chips across every process node. Where do you think we are now?
Itâs interesting. Iâve been in the semiconductor business for, I donât know, the last 30 years, and for the longest time, people didnât really even understand what semiconductors were or where they fit in the overall supply chain and where they were necessary in applications. I think the last few years, especially with the pandemic-driven demand and everything that weâre doing with AI, people now are really focused on semiconductors.
I think there has been a tremendous cycle. One, a cycle where we needed a lot more chips than we had, and then a cycle where we had too many of some. But at the end of the day, I think the fact is semiconductors are essential to so many applications. And particularly for us, what weâre focused on are the most complex, the highest performance, the bleeding edge of semiconductors. And I would say that thereâs tremendous growth in the market.
What do you think the bottleneck is now? Is it the cutting edge? Is it at the older process nodes, which is what we were hearing in the middle of the chip shortage?
I think the industry as a whole has really come together as an ecosystem to put a lot of capacity on for the purposes of ensuring that we do satisfy overall demand. So in general, I would say that the supply / demand balance is in a pretty good place, with perhaps the exception of GPUs. If you need GPUs for large language model training and inference, theyâre probably tight right now. A little bit tight.
Lisaâs got some in the back if you need some.
But look, the truth is we absolutely are putting a tremendous amount of effort getting the entire supply chain ramped up. These are some of the most complex devices in the world â hundreds of billions of transistors, lots of advanced technology. But absolutely ramping up supply overall.
The CHIPS and Science Act passed last year, a massive investment in this country in fabs. AMD is obviously the largest fabless semiconductor company in the world. Has that had a noticeable effect yet, or are we still waiting for that to come to fruition?
I do think that if you look at the CHIPS and Science Act and what itâs doing for the semiconductor industry in the United States, itâs really a fantastic thing. I have to say, hats off to Gina Raimondo and everything that the Commerce Department is doing with industry. These are long lead time things. The semiconductor ecosystem in the US needed to be built five years ago. It is expanding now, especially at the leading edge, but itâs going to take some time.
So I donât know that we feel the effects right now. But one of the things that we always believe is the more you invest over the longer term, youâll see those effects. So Iâm excited about onshore capacity. Iâm also really excited about some of the investments in our national research infrastructure because thatâs also extremely important for long-term semiconductor strength and leadership.
AMDâs results speak for themselves. Youâre selling a lot more chips than you were a few years ago. Where have you found that supply? Are you still relying on TSMC while you wait for these new fabs to come up?
Again, when you look at the business that weâre in, itâs pushing the bleeding edge of technology. So weâre always on the most advanced node and trying to get the next big innovation out there. And thereâs a combination of both process technology, manufacturing, design, design systems. We are very happy with our partnership with TSMC. They are the best in the world with advanced and leading-edge technologies.
Theyâre it, right? Can you diversify away from them?
I think the key is geographical diversity, Nilay. So when you think about geographical diversity, and by the way, this is true no matter what. Nobody wants to be in the same place because there are just natural risks that happen. And thatâs where the CHIPS and Science Act has actually been helpful because there are now significant numbers of manufacturing plants being built in the US. Theyâre actually going to start production over the next number of quarters, and we will be active in having some of our manufacturing here in the United States.
I talked to Intel CEO Pat Gelsinger when he broke ground in Ohio. Theyâre trying to become a foundry. He said very confidently to me, âI would love to have an AMD logo on the side of one of these fabs.â How close is he to making that a reality?
Well, I would say this. I would say that from onshore manufacturing, we are certainly looking at lots and lots of opportunities. I think Pat has a very ambitious plan, and I think thatâs there. I think we always look at who are the best manufacturing partners, and whatâs most important to us is someone whoâs really dedicated to the bleeding edge of technology.
Is there a competitor in the market to TSMC on that front?
Thereâs always competition in the market. TSMC is certainly very good. Samsung is certainly making a lot of investments. You mentioned Intel. I think there are some activities in Japan as well to bring up advanced manufacturing. So there are lots of different options.
Last question on this thread, and then I do want to talk to you about AI. There has been a lot of noise recently about Huawei. They put out a seven-nanometer chip. This is either an earth-shattering geopolitical event or itâs bullshit. What do you think it is?
Letâs see. I donât know that I would call it an earth-shattering geopolitical event. Look, I think thereâs no question that technology is considered a national security importance. And from a US standpoint, I think we want to ensure that we keep that lead. Again, I think the US government has spent a lot of time on this aspect.
The way I look at these things is we are a global company. Chinaâs an important market for us. We do sell to China more consumer-related goods versus other things, and thereâs an opportunity there for us to really have a balanced approach into how we deal with some of these geopolitical matters.
Do you think that there was more supply available at TSMC because Huawei got kicked out of the game?
I think TSMC has put a tremendous amount of supply on the table. I mean, if you think about the CapEx thatâs happened over the last three or four years, itâs there because we all need more chips. And when we need more chips, the investment is there. Now chips are more expensive as a result, and thatâs part of the ecosystem that weâve built out.
Letâs talk about that part of it. So you mentioned GPUs are constrained. The Nvidia H100, thereâs effectively a black market for access to these chips. You have some chips, youâre coming out with some new ones. You just announced Laminiâs training fully on your chips. Have you seen opportunity to disrupt this market because Nvidia supply is so constrained?
I would take a step back, Nilay, and just talk about whatâs happening in the AI market because itâs incredible whatâs happening. If you think about the technology trends that weâve seen over the last 10 or 20 years â whether youâre talking about the internet or the mobile phone revolution or how PCs have changed things â AI is 10 times, 100 times, more than that in terms of how itâs impacting everything that we do.
So if you talk about enterprise productivity, if you talk about personal productivity or society, what we can do from a productivity standpoint, itâs that big. So the fact that thereâs a shortage of GPUs, I think itâs not surprising because people recognize how important the technology is. Now, weâre in such the early innings of how AI and especially generative AI is coming to market that I view this as a 10-year cycle that weâre talking about, not how many GPUs can you get in the next two to four quarters.
We are excited about our road map. I think with high-performance computing, I would call generative AI the killer app for high-performance computing. You need more and more and more. And as good as todayâs large language model is, it can still get better if you continue to increase the training performance and the inference performance.
And so thatâs what we do. We build the most complex chips. We do have a new one coming out. Itâs called MI300 if you want the code name there, and itâs going to be fantastic. Itâs targeted at large language model training as well as large language model inference. Do we see opportunity? Yes. We see significant opportunity, and itâs not just in one place. The idea of the cloud guys are the only users, thatâs not true. Thereâs going to be a lot of enterprise AI. A lot of startups have tremendous VC backing around AI as well. And so we see opportunity across all those spaces.
So MI300?
MI300, you got it.
Performance-wise, is this going to be competitive with the H100 or exceed the H100?
It is definitely going to be competitive from training workloads, and in the AI market, thereâs no one-size-fits-all as it relates to chips. There are some that are going to be exceptional for training. There are some that are going to be exceptional for inference, and that depends on how you put it together.
What weâve done with MI300 is weâve built an exceptional product for inference, especially large language model inference. So when we look going forward, much of what work is done right now is companies training and deciding what their models are going to be. But going forward, we actually think inference is going to be a larger market, and that plays well into some of what weâve designed MI300 for.
If you look at what Wall Street thinks Nvidiaâs mode is, itâs CUDA, itâs the proprietary software stack, itâs the long-running relationships with developers. You have ROCm, which is a little different. Do you think that thatâs a moat that you can overcome with better products or with a more open approach? How are you going about attacking that?
Iâm not a believer in moats when the market is moving as fast as it is. When you think about moats, itâs more mature markets where people are not really wanting to change things a lot. When you look at generative AI, itâs moving at an incredible pace. The progress that weâre making in a few months in a regular development environment mightâve taken a few years. And software in particular, our approach is an open software approach.
Thereâs actually a dichotomy. If you look at people who have developed software over the last five, seven, or eight years, theyâve tended to use⦠letâs call it, more hardware-specific software. It was convenient. There werenât that many choices out there, and so thatâs what people did. When you look at going forward, actually what you find is everyoneâs looking for the ability to build hardware-agnostic software because people want choice. Frankly, people want choice. People want to use their older infrastructure. People want to ensure that theyâre able to move from one infrastructure to another infrastructure. And so theyâre building on these higher levels of software. Things like PyTorch, for example, which tends to be that hardware-agnostic capability.
So I do think the next 10 years are going to be different from the last 10 as it relates to how do you develop within AI. And I think weâre seeing that across the industry and the ecosystem. And the benefit of an open approach is that thereâs no one company that has all of the ideas. So the more weâre able to bring the ecosystem together, we get to take advantage of all of those really, really smart developers who want to accelerate AI learning.
PyTorch is a big deal, right? This is the language that all these models are actually coded in. I talk to a bunch of cloud CEOs. They donât love their dependency on Nvidia as much as anybody doesnât love being dependent on any one vendor. Is this a place where you can go work with those cloud providers and say, âWeâre going to optimize our chips for PyTorch and not CUDA,â and developers can just run on PyTorch and pick whichever is best optimized?
Thatâs exactly it. So if you think about what PyTorch is trying to do â and it really is trying to be that sort of hardware-agnostic layer â one of the major milestones that weâve come up with is on PyTorch 2.0, AMD was qualified on day one. And what that means is anybody who runs CUDA on PyTorch right now, it will run on AMD out of the box because weâve done the work there. And frankly, itâll run on other hardware as well.
But our goal is âmay the best chip win.â And the way you do that is to make the software much more seamless. And itâs PyTorch, but itâs also Jax. Itâs also some of the tools that OpenAI is bringing in with Triton. There are lots of different tools and frameworks that people are bringing forward that are hardware-agnostic. There are a bunch of people who are also doing âbuild your ownâ types of things. So I do think this is the wave of the future for AI software.
Are you building custom chips for any of these companies?
We have the capability of building custom chips. And the way I think about it is the time to build custom chips is actually when you get very high volume applications going forward. So I do believe there will be custom chips over the next number of years. The other piece thatâs also interesting is you need all different types of engines for AI. So we spend a lot of time talking about big GPUs because thatâs whatâs needed for trading large language models. But youâre also going to see ASICs for some⦠letâs call it, more narrow applications. Youâre also going to see AI in client chips. So Iâm pretty excited about that as well in terms of just how broad AI will be incorporated into chips across all of the market segments.
Iâve got Kevin Scott, CTO of Microsoft, here tomorrow. So Iâll ask you this question so I can chase him down with it. If, say, Microsoft wanted to diversify Azure and put more AMD in there and be invisible to customers, is that possible right now?
Well, first of all, I love Kevin Scott. Heâs a great guy, and we have a tremendous partnership with Microsoft across both the cloud as well as the Windows environment. I think you should ask him the question. But I think if you were to ask him or if you were to ask a bunch of other cloud manufacturers, they would say itâs absolutely possible. Yes, it takes work. It takes work that we each have to put in, but itâs much less work than you might have imagined because people are actually writing code at the higher-level frameworks. And we believe that this is the wave of the future for AI programming.
Let me connect this to an end-user application just for a second. Weâre talking about things that are very much raising the cost curve: a lot of smart people doing a lot of work to develop for really high-end GPUs on the cutting-edge process nodes. Everythingâs just getting more expensive, and you see how the consumer applications are expensive: $25 a month, $30 a seat for Microsoft Office with Copilot. When do you come down the cost curve that brings those consumer prices down?
Itâs a great, great question. I do believe that the value that you get with gen AI in terms of productivity will absolutely be proven out. So yes, the cost of these infrastructures is high right now, but the productivity that you get on the other side is also exciting. Weâre deploying AI internally within AMD, and itâs such a high priority because, if I can get chips out faster, thatâs huge productivity.
Do you trust it? Do you have your people checking the work that AI is doing, or do you trust it?
Sure. Look, weâre all experimenting, right? Weâre in the very, very early stages of building the tools and the infrastructure so that we can deploy. But the fact is it saves us time â whether weâre designing chips, where weâre testing chips, where weâre validating chips â it saves us time, and time is money in our world.
But back to your question about when do you get to the other side of the curve. I think thatâs why itâs so important to think about AI broadly and not just in the cloud. So if you think about how the ecosystem will look a few years from now, you would imagine a place where, yes, you have the cloud infrastructures training these largest foundational models, but youâre also going to have a bunch of AI at the edge. And whether itâs in your PC or itâs in your phone, youâre going to be able to do local AI. And there, it is cheaper, it is faster, and it is actually more private when you do that. And so, thatâs this idea of AI everywhere and how it can really enhance the way weâre deploying.
That brings me to open source and, honestly, to the idea of how we will regulate this. So thereâs a White House meeting, everyone participates, great. Everyoneâs very proud of each other. You think about how you will actually enforce AI regulation. And itâs okay, you can probably tell AWS or Azure not to run certain work streams. âDonât do these things.â And that seems fine. Can you tell AMD to not let certain things happen on the chips for somebody running an open-source model on Linux on their laptop?
I think it is something that we all take very seriously. The technology has so much upside in terms of what it can do from a productivity and a discovery standpoint, but thereâs also safety in AI. And I do think that, as large companies, we have a responsibility. If you think about the two things around data privacy as well as just overall ensuring as these models are developed that theyâre developed to the best of our ability without too much bias. Weâre going to make mistakes. The industry as a whole will not be perfect here. But I think there is clarity around its importance and that we need to do it together and that there needs to be a public / private partnership to make it happen.
I canât remember anyoneâs name, so Iâd be a horrible politician. But letâs pretend Iâm a regulator. Iâm going to do it. And I say, âBoy, I really donât want these kids using any model to develop chemical weapons. And I need to figure out where to land that enforcement.â I can definitely tell Azure, âDonât do that.â But a kid with an AMD chip in a Dell laptop running Linux, I have no mechanism of enforcement except to tell you to make the chip not do it. Would you accept that regulation?
I donât think thereâs a silver bullet. Itâs not, âI can make the chip not do it.â Itâs âI can make the combination of the chip and the model and have some safeguards in place.â And weâre absolutely willing to be at that table to help that happen.
You would accept that kind of regulation, that the chip will be constrained?
Yes, I would accept an opportunity for us to look at what are the safeguards that we would need to put in place.
I think this is going to be one of the most complicated... I donât think we expect our chips to be limited in what we can do, and it feels like this is a question we have to ask and answer.
Let me say again, itâs not the chip by itself. Because in general, chips have broad capability. Itâs the chips plus the software and the models. Particularly on the model side, what you do in terms of safeguards.
We could start lining up for questions. Iâve just got a couple more for you. Youâre in the PS5; youâre in the Xbox. Thereâs a view of the world that says cloud gaming is the future of all things. That might be great for you because youâll be in their data centers, too. But do you see that shift underway? Is that for real, or are we still doing console generations?
Itâs so interesting. Gaming is everywhere. Gaming is everywhere in every form factor. Thereâs been this long conversation about: is this the end of console gaming? And I donât see it. I see PC gaming strong, I see console gaming strong, and I see cloud gaming also having legs. And they all need similar types of technology, but they obviously use it in different ways.
Audience Q&A
Nilay Patel: Please introduce yourself.
Alan Lee: Hi, Lisa. Alan Lee, Analog Devices. One and a half years after the Xilinx acquisition, how do you see adaptive computing playing out in AI?
Lisa Su: First of all, itâs nice to see you, Alan. I think, first of all, the Xilinx acquisition was an acquisition we completed about 18 months ago â fantastic acquisition. Brought a lot of high-performance IP with adaptive computing IP. And I do see that particularly on these AI engines, engines that are optimized for data flow architectures, thatâs one of the things that we were able to bring in as part of Xilinx. Thatâs actually the IP that is now going into PCs.
And so we see significant IP usage there. And together, as we go forward, I have this belief that thereâs no one computer that is the right one. You actually need the right computing for the right applications. So whether itâs CPUs or GPUs or FPGAs or adaptive SoCs, you need all of those. And thatâs the ecosystem that weâre bringing together.
NP: This tall gentleman over here.
Casey Newton: Hi, Casey Newton from Platformer. I wanted to return to Nilayâs question about regulation. Someday, itâs sad to say, but somebody might try to acquire a bunch of your GPUs for the express purpose of doing harm â training a large language model for that purpose. And so I wonder what sort of regulations, if any, do you think government should place around who gets access to large numbers of GPUs and what size training runs theyâre allowed to do.
LS: Thatâs a good question. I donât think we know the answer to that, particularly in terms of how to regulate. Our goal is, again, within all of the export controls that are out there, because GPUs are export controlled, that we follow those regulations. There are the biggest and the next level of GPUs that are there. I think the key is, again, as I said, itâs a combination of both chip and model development that really comes about. And weâre active at those tables and talking about how to do those things. I think we want to ensure that we are very protective of the highest-performing GPUs. But also, itâs an important market where lots of people want access.
Daniel Vestergaard: Hi, Iâm Daniel from DR [Danmarks Radio]. To return to something you talked about earlier because everyone here is thinking about implementing AI in their internal workflows â and itâs just so interesting to hear about your thoughts because you have access to the chips and deep machine learning knowledge. Can you specify a bit, what are you using AI internally for in the chip-making process? Because this might point us in the right direction.
LS: Thanks for the question. I think every business is looking at how to implement AI. So for us, for example, there are the engineering functions and the non-engineering: sales, marketing, data analytics, lead generation. Those are all places where AI can be very useful. On the engineering side, we look at it in terms of how can we build chips faster. So they help us with design, they help us with test generation, they help us with manufacturing diagnostics.
Back to Nilayâs question, do I trust it to build a chip with no humans involved? No, of course not. We have lots of engineers. I think copilot functions in particular are actually fairly easy to adopt. Pure generative AI, we need to check and make sure that it works. But itâs a learning process. And the key, I would say, is thereâs lots of experimentation, and fast cycles of learning are important. So we actually have dedicated teams that are spending their time looking at how we bring AI into our company development processes as fast as possible.
Jay Peters: Hi, Jay Peters with The Verge. Apple seems to be making a much bigger push in how its devices, and particularly its M-series chips, are really good for AAA gaming. Are you worried about Apple on that front at all?
NP: They told me the iPhone 15 Pro is the worldâs best game console. And thatâs why itâs âPro.â Itâs a very confusing situation.
LS: I donât know about that. I would say, look, as I said earlier, gaming is such an important application when you think about entertainment and what weâre doing with it. I always think about all competition. But from my standpoint, itâs how do we get... Itâs not just the hardware; itâs really how do we get the gaming ecosystem. People want to be able to take their games wherever and play with their friends and on different platforms. Those are options that we have with the gaming ecosystem today. Weâre going to continue to push the envelope on the highest-performing PCs and console chips. And I think weâre going to be pretty good.
NP: I have one more for you. If you listen to Decoder, you know I love asking people about decisions. Chip CEOs have to make the longest-range decisions of basically anybody I can think of. Whatâs the longest-term bet youâre making right now?
LS: We are definitely designing for the five-plus-year cycle. I talked to you today about MI300. We made some of those architectural decisions four or five years ago. And the thought process there was, âHey, whereâs the world going? What kind of computing do you need?â Being very ambitious in our goals and what we were trying to do. So weâre pretty excited about what weâre building for the next five years.
NP: Whatâs a bet youâre making right now?
LS: Weâre betting on what the next big thing in AI is.
NP: Okay. Thank you, Lisa.
LS: Alright.
NP: I did my best.
Decoder with Nilay Patel /
A podcast about big ideas and other problems.