SSD Innovation: Store More Data in Less Space to Improve Storage Sustainability
Solidigm co-CEO Dave Dixon discusses the positive impacts of hyper-dense storage in designing more sustainable data centers, and the corresponding benefits to offsetting the extreme environmental challenges inherent to ever-increasing AI compute space and energy needs.
Transcript
Keith:
Welcome back to the Six Five Summit. We’re here at the beautiful campuses of Solidigm. I’m joined with Greg and Dave again from Solidigm. Guys, welcome to the show. All right, we’re talking sustainability in this track and we’ve had both Greg and Dave on earlier talking about AI and the role of storage and AI. How is Solidigm helping to ease my sustainability concerns and issues?
Dave:
All right. Well, hey, Keith, good to see you, number one. So similar to the scalability discussion we had earlier, data center storage is really going to be impacted by power performance and the infrastructure build out. And one of the key things that data center storage is going to enable for that is with QLC, high performance, high density SSDs. I’ve been thinking about scalability and sustainability as two sides of the same coin. We can get power improvements, performance improvements. They can either be applied to scalability, meaning adding more to the GPU compute, more utilization, et cetera, higher performance, better training, or you can apply that savings directly to environmental addressing some of these environmental challenges around sustainability. So I think we’ll get into the details, but I think of really kind of two sides of the same coin.
Keith:
So explain to me this concept and how you’re helping with the sustainability imperative and how it relates to our overall infrastructure challenges, whether it’s the enterprise or cloud service providers.
Dave:
Yeah, well, obviously there’s multiple vectors of the imperative. It could be coming from regulatory agencies and with the EU and or even the industry with OCP, Open Compute Project. But there’s going to be more carbon free, carbon neutral changes going forward right into the data center. And so obviously there’s going to be regulatory imperatives coming, but it’s also just going to be driven because of the just fundamental power grid. And we know, I think we talked about this earlier, but if you believe the projections that the data center is going to take up 20% of global power within five years, the end of the decade.
And it’s already starting down that trend now, it’s not sustainable. I don’t think the idea of sustainable is building a 50 new coal plants or 50 new nuclear power plants. It just isn’t going to happen in that time. So we have to really address the environmental challenges head on and we have to help with that. I don’t think it’s going to be a single item isn’t going to be the solution for all of this. It’s going to be multifaceted. And data center storage with the power savings, performance improvements, density and rack space that we’ll talk about, it’s going to be a big part of the solution.
Keith:
So I’m trying to wrap my head around still this argument that density helps with reducing overall carbon footprint, especially in the world AI, growing data. I’m looking at the requirements of needing 3X my average rack in wattage going into a rack. Help me understand again how storage is helping ease my carbon footprint.
Greg:
Well, there’s a couple of different ways. One of them is by actual footprint, where we can actually reduce, compared to the most dense hard drive deployments, we can actually reduce the rack space or the square footage taken up by storage by four to 5X. And right now, one of the biggest contributors to carbon is actually concrete. And so by just densifying the storage is just in itself more sustainable. There’s also a power advantage, where 90% of a data center’s greenhouse gas emissions are during its runtime, its active use. And so anything you can do that’s lower power, whether it’s moving from hard drives to SSDs or anything else, it helps reduce that 90% to a lower number.
Keith:
So let’s put some meat behind this. Nvidia’s latest GPU, 1,200 watts of power. As I think about the GPU and needing to feed data to that, I can understand how efficiency can play a role, but 1,200 watts is 1,200 watts. What are you doing to help lighten the burden of that 1,200 watts?
Dave:
If you go look, there’s a lot more data coming out now, papers coming out of universities that shows that, yeah, the GPUs are burning that much power, but the storage component within the data center is taking one third of that data center power. And so that part’s not as well known. You think of it like 1,200 watts versus 40 watts or something like that. But when you really look at the complete rack level and the amount of storage that you have to attach to a GPU, you’re talking about racks and racks and racks of storage. And that’s where Greg was going with the rack reduction. So now if we can reduce the footprint by three quarters, we can get down from four racks of storage down to one rack for a 60 petabyte AI storage solution, for example, but the power consumption is even more impressive.
You’re going to be able to transition from QLC SSDs from HDDs, get about 80% total power reduction. So now we’re talking about 20% of the global grid going into data center. One third of that is data center storage and now we can have an 80% impact on reducing that power consumption. Now we’re talking big numbers.
Keith:
So unlike every other data center manager in the world, I hear the drive for efficiency, efficiency, efficiency, but I’m not getting a break on my performance requirements. I’m asked to deliver better performance year after year after year and AI has accelerated that. How are you folks helping me to meet this insatiable demand for performance, but still meet my sustainability goals?
Greg:
Well, SSDs are inherently much more performant than hard disk drives. And so we can actually replace between five and 20 hard drives with one SSD. In fact, one SSD can keep eight NVIDIA GPUs 90% utilized. Where on the flip side is it takes five HDDs to keep one Nvidia GPU fully utilized. And so that’s a 40 to one ratio all driven by performance.
Keith:
If I’m not a engineer, data center engineer, help me understand the scale. When we say a 1,200 watt GPU, how can I relate that to a more common household utility, neighborhood, et cetera? How much data are we talking about? How much power are we talking about?
Dave:
Yeah, I mean, we definitely have done some calculations. Again, you look at the 20% of the global grid, 30% storage, 80% reduction, you can do the calculations of how many houses can you basically power for years under the situation. And it’s basically small sized cities with the savings that you can sustainably generate by making that conversion. But you also hit upon something also interesting to keep this idea that it’s not just saving. Actually, we want to keep the performance going with the AI. We want to keep AI going because AI itself is probably going to be needed to solve these environmental problems, the sustainability.
Keith:
Right? It gets circular.
Dave:
Right. Because just getting AI, that’s going to make power generation, power consumption, everything we do throughout the entire environmental network, the idea is to make everything much more efficient. So we can help with the concrete savings, with the power savings, but also being the enabler for AI. It’s just going to help drive the whole efficiency throughout the end-to-end system as well.
Keith:
So I understand the theory. If I get more data into higher performance systems, then I gain efficiency. That efficiency can be spent either on additional performance in the same footprint, thus driving sustainability, or I can get additional performance without sacrificing my sustainability goals. I get the theory. Help me understand where you’re seeing this put into practice. Where’s some examples? Whether they’re direct customers, would love to hear that, or just anecdotally the customers who have implemented these technologies and experienced these types of wins.
Greg:
Well, we have a lot of customers that I can’t mention by name, but all the biggest data AI deployments are actually adopting our high capacity QLC drives today. You can read about them in the paper and they talk about the GPU wins, but they don’t talk about the SSD wins and we’re in nearly all of them. We do have one customer that it’s on their website, so it’s easy to go see. They have actually been able to reduce the power of storage in their customer’s environments by over 90%. That customer is called Ocient and a big part of that power savings is by moving from hard drive or hybrid based storage to our high capacity QLC SSDs. So that’s a 90% reduction. We can’t claim all 90% to that, but a very significant portion of it is.
Keith:
All right, so we’ve thrown around this term QLC a couple of times. One, what’s the acronym and why should I care about QLT, TLC, MLC, all the Cs?
Greg:
That’s a good point. Yeah, I like that.
Dave:
Yeah, it is super important because we talk about high performance, high density SSDs. QLC is actually the enabler to make that happen. So QLC stands for quad level cell, number one, and it’s basically where we start to store more logical bits in the same physical flash cell. So we’re basically increasing the density without adding more silicon. In a way, it’s kind of a sustainability play in and of itself. And it goes back, it’s really our secret sauce. At Solidyne we developed our first multilevel cell product back in the mid ’90s, almost 25 years ago, where you can basically keep increase in the storage size of the flash by throwing CapEx and scaling the silicon. Or you can get more creative and say I want to store more data in every physical cell. And it’s very difficult to do because quad level cell means you’re storing four bits of data in a physical cell, but to store four bits is two to the power.
Now you’re talking about 16 different states that you have to read individually in a physical flash cell and each one of those have to be delineated by the number of what we call electrons on a floating gate. We are manipulating tens of electrons now at that level to be able to manage quad level cell. And it takes a full court team effort between the NAND development, silicon development, system development, working with the firmware team to actually make this happen because quality and reliability cannot be … You still have to have 100% reliability drives where we go in the data center because of the applications that we’re going into. So our job is to make QLC work. We’re on our fourth generation in the QLC into the data center. Our competition is still working on commercializing basically the first generation.
Keith:
There’s been this MLC, QLC, TLC debate and TLC has always been looked upon as the poor stepbrother. We don’t use QLC in the data center. What’s changed? The reliability has been a huge question mark around QLC in the past. I’m not sacrificing reliability for sustainability?
Dave:
Right. Well, again, the trick, the secret sauce is how can you get the advantages without the disadvantages? And the way you do that is basically by managing it at the product level, managing it at the SSD, managing it with firmware. And you do that by understanding the underlying reliability mechanisms, understanding how the NAND’s going to be behave at the flash cell level and ensuring that you’re going to have reliable QLC at the product level in the SSD. And we go through a ton of design and validation and partnership between these teams to make that happen and it is what makes us successful.
Keith:
I’ve been around this for a long time and we’ve always worried about some type of solar flare, some type of cosmic array hitting a solid state drive and corrupting data and us not knowing, it is one of the worst things that can happen in enterprise IT, to lose data and not understand how you lost the data. So kudos on the testing. Guys, I really appreciate you setting the time. I know I’m not the most friendliest interviewer and I asked some kind of tough questions, but you did a great job. I think I really do understand the connection between sustainability, efficiency, performance when it comes to storage and we end up going deeper into AI than I expected. Kudos.
If you want to learn more about Solidigm, you can again follow the information in this session below. I highly encourage you to watch some of the other Six Five Summit sessions with this team of folks. Really intriguing conversations around performance, AI. They go deeper into the technology. Stay tuned for more great coverage from the Six Five Summit 2024.