How Great Storage Enables AI Performance and Efficiency at Scale

Solidigm co-CEO Dave Dixon and SVP, Head of Marketing and Products Greg Matson discuss how the AI boom is creating massive global scalability challenges, the role of efficient, high-capacity NAND storage in solving them, and how the company’s unique capabilities are accelerating AI development and deployment.

Six Five hosts: Patrick Moorhead, Daniel Newman

Transcript

Patrick Moorhead:
Welcome back to the Six Five Summit 2024. It’s day one, and we are talking all things AI. Regardless of the track, AI is on everybody’s mind. And this year is about two things. It’s about enterprises capturing the AI value, but also the extended build out of AI in the data center, and the data center edge. Dan, we’re talking AI again. Isn’t that great?

Daniel Newman:
Yeah. Well, you’ll remember when we were coming up with a theme for this year, it was all about really being able to help companies bring value, right?

Patrick Moorhead:
Yes.

Daniel Newman:
There was so much talk about this CapEx, this big capture. All the dollars were going towards GPUs, GPUs, GPUs. Behind this, there needed to be so much more, Pat. And that’s what this event, AI Unleashed, here at the Six Five Summit is really all about.

Patrick Moorhead:
No, that’s right. And Dan, I know I use this a lot, but I talk about the quadrangle. Quadrangle of compute, which says you really have to have balanced system, between compute, memory storage, and networking. And if anything, any one of those gets out of whack, you can’t bring full value that the system brings, that the quadrangle brings.

And data center storage is tremendously valuable, with the amount of data that comes in for ingest. You’re training it, you’re performing rag on it, you’re sending results back, you’re storing it for long-term storage. It becomes very, very important. We happen to have with us, two folks from Solidigm, which I hear is the market share leader in data center SSDs. Great to see you, both of you.

Daniel Newman:
Hey, morning guys. Yeah, thanks for coming in.

Patrick Moorhead:
Yeah, so Dave and Greg, great to have you on the show. Maybe we can hit first off, tell us what Solidigm does? I talked about it a little bit, but let’s hear a little bit more.

David Dixon:
Yeah, well, a little bit of background on Solidigm, we were acquired by SK hynix, back in 2020. We were originally the NAND products group at Intel. And so we came over about that timeframe. We closed the deal, a little bit more than two years now. One thing that does make us unique, kind of like what you were saying, is that, as opposed to everybody else in the NAND market, we are laser focused on data center storage. We see that as the biggest market in all of NAND, the fastest growing market in all in NAND, and it’s going to keep going, so it’s really our focus for that reason.

Daniel Newman:
Well, David, let’s talk about something that’s probably near and dear to the hearts of just about everybody that’s involved in AI, and that’s scalability. And this can come through a couple of different lenses. Of course, one is you have companies, our data is showing massively involved in POCs right now. So it’s the idea to implementation scale. And then you’ve got scale-like things, like infrastructure and power. And that’s another thing that’s going on. Talk about the challenges, though, that you’re seeing here at Solidigm, as it relates to scaling AI.

David Dixon:
Well, yeah, so obviously, the scalability of continuing the AI boom, and AI development through the rest of its decade, it’s going to be a huge challenge. Now, at Solidigm, we’re really looking at that challenge as more of an opportunity, for us specifically, which is kind of cool. But whether we’re talking about the power limitations that you talked about, the performance limitations, the size of the data sets are going to be used inside training, all those are really seen as scale challenges for being able to continue this AI development whirlwind that we’ve all been on for the past two years or so.

Patrick Moorhead:
Let’s talk a little bit about the role that storage plays in this. It’s just storage, right?

David Dixon:
Yeah. Well, we started with these huge scalability challenges. We’re talking about, okay, you’re going to build 50 nuclear power plants in the next couple of years to fix the problem? It’s not going to be a single answer that’s really going to do this, right? This has to be a multi-faceted solution. And transitioning from HDDs to data center SSDs is going to be a key part of that solution.

So as an example, if we look at power, latest projections are that it’s, I think by the end of 2030, data centers are going to take up something like 20% of the overall global power grid.

Patrick Moorhead:
It’s like off the charts.

David Dixon:
Off the charts, right?

Patrick Moorhead:
Yes.

David Dixon:
And you could solve that by building a bunch of nuclear power plants, but we’re probably not going to be able to get that done. And that data’s pretty well known. But something that’s not as well known, you talked about, why is storage important?

Patrick Moorhead:
Right.

David Dixon:
There’s more and more data, papers being written. About 30% of that data center power is taken up by the storage. So it’s not just these big heavy-

Patrick Moorhead:
And these are hard drives, to be clear?

David Dixon:
Yeah.

Patrick Moorhead:
Okay.

David Dixon:
Yeah, they’re hard drives today. They’re about 90%, right? 90% HDDs in the data centers today. But it’s not just these big power-hungry GPUs. People don’t realize that they’re being fed by a massive amount of storage to keep them going. And so, if we can make a dent in that huge storage bucket, we’re really talking about a major impact to this power and scalability challenge.

Patrick Moorhead:
Yeah, I think we’ve seen the growth of the storage, and pick your adventure. We can talk about parameters, we can talk about data points. I think the latest out of Chat GPT is 1.6 trillion data points. I’m sure there’s a way to convert that into parameters. But it’s funny, in the green room, in the run-up to this interview, you had shown me a piece of data that… And Dan and I get hit with power stuff day in and day out, which was, there are actual grids out there in the United States, that have either no power left, or some absurdly small percentage left for data centers. Can you talk a little bit about that?

Greg Matson:
Yeah, yeah, absolutely. There’s an article almost every other day in the big publications about it. But take Northern Virginia for example, which is a huge hub for data centers.

Patrick Moorhead:
Huge. Yes.

Greg Matson:
They have like 0.2% headroom left in their grid. And so the data center build out is so strong and so fast that now it’s bumped up against that limit. And it’s also, now the grid is the limiter to the build out of those data centers. And right now they’re primarily being built out for these large AI deployments.

Patrick Moorhead:
And just so everybody understands, that’s just not the data centers competing with that. That’s also potentially power required for EVs in the future.

Greg Matson:
Power required for EVs.

Patrick Moorhead:
For homes, for new buildings that are built here, and…

Daniel Newman:
For charging your iPhone.

Patrick Moorhead:
Yeah, a little bit time.

Daniel Newman:
Got to be a little bit of [inaudible 00:07:24].

Patrick Moorhead:
A little bit that, yeah. But yeah, so a lot of choices there. And then you look at how long it takes to spin up a new power source, whether it’s solar, nuclear, coal, gas, something else. It’s typically, those are five-year projects.

David Dixon:
Exactly.

Patrick Moorhead:
So something has to give here.

David Dixon:
That’s right.

Patrick Moorhead:
And you get into some pretty interesting spreadsheet exercises on what they do, but it’s clear that moving from hard drives to low power, high performance SSDs, in those areas, it’s almost a no-brainer.

David Dixon:
Yeah, but it’s also, and it’s the high density aspect of it too. It’s the main driver. If you…

Patrick Moorhead:
How big?

David Dixon:
Well, so we’ll maybe talk about it in just a sec, but just to close on this power angle, I think it’s, the data, if you really go look at rack level replacements, et cetera, we’re talking about a 60 petabyte AI storage rack. You can get an 80% power reduction by converting from HDDs to high density, high performance QLC SSDs.
So now you’re talking about 20% of the overall grid being taken for data centers. 30% of that data center storage, now you can reduce that number by 80%. We’re talking about really big impactful numbers now.

Patrick Moorhead:
So there could potentially be big advantages from not just those new AI servers, but also potentially looking at your entire fleet of servers out there.

David Dixon:
Yeah, exactly. And that’s just power. I think the other key driver of the scalability is going to be the performance requirements. Because, what we’re finding, what we’re hearing from customers, right Greg, is that a lot of the GPUs that they’ve all been building out with, really starting to get underutilized because of the under appreciation of data being fed into the GPUs is effectively starving the GPUs right now. So there’s an under utilization problem that’s happening, and that’s really slowing down the benefits, really, that we’re getting out of AI.

Patrick Moorhead:
Great point.

Greg Matson:
The GPUs, as well as the data scientists who use the GPUs, who as everyone knows now are hottest commodity employees out there. And so it’s almost too expensive not to convert from hard drives to SSDs, even though at a CapEx level it might look a little bit… SSDs look a little bit higher than HDDs, but when you put the total solution together, it’s all the biggest new AI data centers are designing in these high capacity SSDs.

Daniel Newman:
You’ll hear me talking to no end about the, it’s the math, it’s the economics. And we actually have some real world problems we’re going to try to solve. And of course, we will figure those out first. Meaning we’d spent a lot of time here talking about power and availability, the first thing we’ll solve at any cost is going to be enough power to make sure that we can continue this build out.

You’ve seen what the market for GPUs grow to. We’ve heard numbers as big as 400 billion on the TAM. It’s huge. And of course, every company, I always talk about the deflationary aspects of tech, right now as companies need to figure out how to keep hitting their numbers, they need to keep doing it more and more efficiently.

AI is an enabler of that. But in the end, the economics come down to things like what you’re doing with SSD. It comes down to saying, “How do we get down from four racks to two racks? How do we optimize power in those racks? How do we maximize utilization of GPUs? Or more efficiency from that same GPU?”

So this question, this is the moneymaker for you and Dan, I’ll throw it your way, is oftentimes this particular category can be treated a bit as a commodity. What you’ve said today doesn’t sound anything like a commodity, but what’s your winning formula for being able to grab the AI market right now? What’s the Solidigm winning formula competitively, performance wise, and of course, I spend a little time on economics?

David Dixon:
Well, I’ve definitely seen a transition, even, I think Greg and I have both been in this industry, going back for flash for 30 years. And there’s been times where it comes and goes with a commodity look and feel to the market, of course. And we’re coming out of a pretty heavy downturn over the last couple of years.

But we’re seeing an excitement now that we’ve never seen before. And this is really just over the past couple of quarters, because people are now really appreciating the importance of the bottlenecks that are created by data center storage. And as that data is transitioning from being cold and just sitting there not being used, not being read, to now being more and more data getting generated, more and more of it wanting to be read and utilized all the time, we’re getting tremendous pull from our customers already.

So the products are really, we’ve got the in the right place at the right time right now, with our high-performance, high-density QLC. And we talked about the 80% power reduction, just to finish that thought on performance. One way we look at is this GPU utilization factor.

Patrick Moorhead:
Right.

David Dixon:
It takes one of our P53-36s to keep five GPUs fully utilized more than 90%. The inverse, if it takes, I think, eight HDDs to keep one H100 fully utilized. So we’re talking about 40 to one factors. These are not incremental improvements that we’re talking about, and power performance. At the data center level, these are monumental changes that really are going to be the key driver.

Daniel Newman:
Well, it’s very interesting. Greg will appreciate this, I’ll play marketer for a moment, but I’ll throw one more thing.

Greg Matson:
Do you have to?

Daniel Newman:
Yeah, no, I do. I can’t help myself. So focus. A lot of times, and one of the things that in a recent briefing I attended from Solidigm, I thought was very interesting, is pretty much everyone that you compete with has a broad focus across consumer data center. You’ve chosen to completely focus on this problem. And in the AI era, and the problem related to data, and enterprise data, and data centers, it’s unique to have a company that’s saying, this is the one thing and we’re all in. How big is that to your ability to execute on AI?

Greg Matson:
I think it’s huge. We have, like you said, all of our engineers working on, with our customers, deeply with our customers, understanding their workloads, understanding what they see in the future. And it allows us to tune our drives to meet their workloads specifically.

Daniel Newman:
Right.

Greg Matson:
But it’s also enabled us to focus on, what’s this next wave of what I would call data center evolution, which is not hard drives going away ever, but some of them giving away the flash. And so we’ve been investing in the highest capacity flash drives. So we have 30 and 60 terabyte SSDs. We launched them last year. We’re way ahead of our competition, from that perspective. And we’ve been investing in QLC technology, to make those drives very affordable, compared to existing flash technologies. We’re actually in our fourth generation of QLC as we speak.

David Dixon:
Yeah, probably just to close on that thought, when we talk about high performance, high density, the key driver is this QLC and the multi-level cell technology. And this is where we’re storing more logical bits for every single flash physical flash shell that’s sitting there. When we get to QLC, we’re 16 leveled, 16 different states that we’re storing in the physical flash cell. That’s the only way to really bring this value to the customers. And that has been our focus, that’s why we’re on our fourth gen right now in the data center. And comp is really just trying to still commercialize, really, their first generation.

Daniel Newman:
Well, Greg and David, I want to thank you so much for being part of this year’s Six Five Summit. I hope to have you back. And we’ll be watching closely. We’ll be watching Solidigm, and we’ll be watching this AI evolution and revolution. So stay tuned and join us again, if you wouldn’t mind.

Greg Matson:
Well, thanks for having us. We appreciate it.

Daniel Newman:
Absolutely.

David Dixon:
Yeah, thanks, guys.

Greg Matson:
It’s been fun.

Daniel Newman:
All right, everyone, we are here. It is Six Five Summit 2024, day one. We are in the cloud and infrastructure track. You just heard from Solidigm. But stay with us for all of our content here. This year’s Six Five Summit is bigger than ever. See you soon.

 

Other Categories