Sustainable Computing in the AI Era: The Path to a More Energy Efficient Data Center
Data centers are the backbone to our IT infrastructure. With the boom of generative AI, such as large language models, and with data growing at an exponential rate, around 2% of global electricity is used to power data centers and is reportedly increasing 12% annually.
It’s crucial for organizations to demand greater efficiency and sustainability from how their data centers are powered. Join this session to learn about Lenovo’s energy efficient solutions and how they support organizations along their journeys to becoming more sustainable.
Transcript
Cory Johnson:
We’re joined right now by Scott Tease. He’s with Lenovo as a general manager of high performance computing and AI infrastructure, and we’re glad to have you to talk about this issue of sustainability. So data centers, Scott, are the center of the world right now, and there’s so many issues around building them. Sustainability doesn’t seem to be always be the first thing we think of.
Scott Tease:
Yeah. This huge amount of AI that’s being installed out there in high performance computing is using a lot of power. The things that the researchers are doing with that IT is pretty incredible, it’s game changing research, it’s game changing capabilities, but it is consuming a lot of power and it’s making the data center a really hot commodity right now, and I mean hot in a couple of different ways. One, it’s hard to get into. It’s hot.
Cory Johnson:
Yeah, it is hot. I’ve seen some studies out there that say that for a piece of data that uses for every wad of power, a regular piece of data uses, once it’s interacting with AI and models and vectorization of large language models, suddenly you’re looking at 10X power consumption with the same piece of data.
Scott Tease:
Yeah. I mean, we’re gathering so much data together to get the insight out of this data. It’s truly incredible. We couldn’t even contemplate doing some of the things we’re doing today, even just five years ago. So while we sometimes give AI a hard time for the amount of power it’s consuming, the things we’re able to do now with it and the insight we’re able to gather is stuff that we just weren’t even physically capable of doing just a few years ago. It does happen at this point in time to be pretty power intensive. We’re brute forcing, getting answers out of the AI. It’ll get smarter, it’ll get more finessed to it, and it’ll get smarter over time, but right now it’s a pretty heavy power lift to do these tasks.
Cory Johnson:
Yeah. I’ve been thinking lately a lot about just the actual physical size of the semiconductors we’re talking about when Jensen Young comes out and shows us on stage a chip that looks like a tortilla. You’re just secure in a different world.
Scott Tease:
Yeah. It’s really amazing. What’s even more amazing is when I look at… I own two parts of the business for Lenovo. I own high performance computing and AI. The two are pretty similar, but what used to take us hundreds of server racks, I can now do it a single compute rack. Thanks to technologies like what Nvidia is giving us to work with. So you’re packaging a great deal of technology in a very, very small amount of space. It’s using a lot of power and it’s generating a lot of heat, and that to us, that’s the bigger problem for the data center is how do you package all this stuff together, keep it nice and tight and concise, and then deal with the heat that’s coming off of these systems. That’s the biggest challenge these days for the data center operator.
Cory Johnson:
Well, and I think that there’s a parallel between how we use AI. AI might get rid of some of our annoying tasks or make things go faster. It doesn’t mean we’re going to leave work at noon. It means we’re going to get more done and work just as many hours if not more. And the data center quite the same. It doesn’t mean if you’re using one 10th of the size of a rack to get a task done, it means you’re going to fill the rest of the rack and get 10 times more tasks done.
Scott Tease:
Exactly. Yeah. It’s a never ending consumption of IT capability. This is not a question of, “I’ve got to run a certain amount of workload, what IT do I need for it?” It’s almost like, “What does my budget allow us to get?” And then I’ll figure out the maximum amount of research I can do with that, the maximum over models I can create simultaneously, what have you. On your point on whether AI is going to allow us to take off at 12:00 every day. A lot of people talking about whether AI is going to replace people, replace jobs. Our firm opinion is AI is not going to replace jobs.
In fact, we’ll likely create a lot more high skill jobs what we have today. But one thing we’re also confident of is that a person that’s applying AI to their role might replace somebody that’s not applying AI to their role. So again, a doctor applying AI is more powerful than a doctor not applying AI. Same with a civil engineer, an architect, whatever. It’s, again, the application of that technology is going to make us better at everything that we do. That’s what’s so great.
Cory Johnson:
Yeah. I’d like to believe for obvious selfish reasons that people who know how to ask the right questions will be more valuable in a world where we can get those answers. But let’s get into the weeds here a little bit about this issue of heat and cooling. Maybe it might be illustrative to talk about what a data center looked like 10 years ago in terms of cooling and, maybe power consumption, what it’s going to look like 10 years from now.
Scott Tease:
Yeah. Oh, man. Massive changes. So 10 years ago you were mostly installing CPUs, not graphics processors, but CPUs. And if you were to really do some really good work, you could build a rack that might have consumed about 25 kilowatts, and that was doing something, that was really pushing the envelope. Most of our enterprise users were something in the 8 to 12 kilowatts per rack load on their systems back a decade ago.
Cory Johnson:
Before we get off that, what in the rack was using most of the power?
Scott Tease:
The most of the power was the CPU is probably 60% of the power was the CPU itself. But all the other components, the memory, the networking cards, all those kinds of things work together to consume all that power. So back in the day, you were looking at an 8 to 12 kilowatt rack on average. Today, quite easily, you’re 40 to 50 kilowatts in a normal environment. Some of these AI racks, later this year, they’re going to be approaching 100 kilowatts per rack. So we used to measure data. We still measure data centers in the concept of a megawatt, how many megawatts is your data center? When you got a hundred kilowatt rack, that megawatt does not go very far getting you a large number of racks that used to get you… Again, it could be as many as 100 racks in a megawatt. Now we’re talking less than 10 racks.
Yeah, really amazing.
Cory Johnson:
And so, with the cooling of those racks when it was 10 years ago was done how? And I’m sure we’re going to get to liquid cooling in a minute, but what we’re looking at 10 years ago for cooling of racks.
Scott Tease:
Yeah. 10 years ago was nearly all air cooling. We, Lenovo, we’re starting to do water cooling for our densest most high performance users, our HPC clients. We were doing water cooling for them to try to unlock the most performance possible. But the vast majority, 99% of all that IT was cooled by air. Fans inside of servers, moving air out of the server, and then air conditioning, and air handlers dealing with that heat once it entered into the data center room itself.
Cory Johnson:
All right. So let’s go in the future here. When we look 10 years forward, what do you think we’re looking at?
Scott Tease:
Yeah. So today, the drive towards liquid cooling has really been an amazing journey to watch happen, and it’s happening all over the world. One of the things that people have realized is that movement of air, when you’re talking about very high power devices that have to be kept at a very cool temperature, the amount of air that you’ve got to move is a huge volume of air and it’s actually pretty power intensive to move air. Fans take a lot of power, air handlers in the data center take a lot of power. So you could be seeing 35% to 40% of your power at a data center level quite easily. Not going to IT, but going to the air conditioning and the air handlers themselves. When you got 100 kilowatt rack, the thought of burning 40 to 50 kilowatts just to do air conditioning and air movement. Man, the economics are not going to work for that. Thus, the push towards liquid cooling, which allows us to do that much, much more efficiently.
Cory Johnson:
But there’s an environmental impact there too, if you’re using water or we will get beyond that. But just the use of the water, talk to me about how that is unfolding and how the technology’s evolving.
Scott Tease:
Yeah. So what we are doing at Lenovo is we’re actually bringing liquid directly to the components. So we’re putting a manifold on the rack and we’re putting basically pipes through the systems themselves and we’re bringing liquid right over the top of CPU, the memory, the networking, the SSD driver, the NVMe device, we’re pulling that heat away from the device directly and putting it into the water loop. So our designs, they really don’t need any fans. All the heat is being transferred into that water loop itself. And instead of having a loud system with all these fans blowing air around, what we’ve got is a very small flow of liquid being pumped through the server. It’s very, very quiet and all that heat is being taken away. Our goal is to achieve as close to 100% transfer of heat into that liquid loop so the data center has no need for any kind of specialized air conditioning. It’s going to save on power costs.
Cory Johnson:
Can we try to describe one?
Scott Tease:
Yeah.
Cory Johnson:
With the absence of animations and graphics, let’s try to describe the loop. Describe to me, what is the loop? What’s the loop? Where does it go?
Scott Tease:
Think of it like this. We use pure water. So inside of that loop is water. If I ever have a spill, I can mop it up. We treat water the same all over the world. You may not be able to drink water in every country, but if you spill it, you can mop it up and put it in the trash. So we recycle that loop over and over again, and what’s going to happen is is we’re going to have a small device called a CDU, which is a coolant distribution unit. That distribution unit is going to pump the liquid through our servers, taking the heat away as it goes. It’s a small number of liters per minute per device, but it’s enough to pull all the heat away from that server. Once we get the heat into that water loop, then we’ve got to do something with it.
A lot of times the data centers just send it up to the roof, they send it through a dry cooler, they take about five to 10 degrees of heat out of it and they send it back through again. So we’re recycling that same loop of water all for months and months and months without it having to be changed. Some of our more progressive users are looking at ways to take the heat that has been transferred into that water loop and make use of it. There’s actually a lot of stored energy in that water loop that we can unlock. And some of our really forward-thinking clients are trying to find ways to do that now for heating buildings, supplementing hot water, running physics reactions to create cold water out of hot water. So it’s pretty amazing stuff.
Cory Johnson:
It’s super interesting. Preparing for an interview, I’ll tell you about my process… I used AI and I went to, I think, ChatGPT or Claude or something and asked to find metaphors for the use of liquid cooling in a data center. And the metaphor that I found with the help of AI, interestingly, will probably burn more heat than this conversation good work. But the metaphor was you got a stove and you’ve got this really hot flame in the stove, so you put a pot of water on it and if it gets so hot, the water starts to boil, it hits the steam reactor on the top, which takes the heat out the water, which allows the water to cool, but you’re twice removed from that hot flame on the stove.
Scott Tease:
Yeah, that’s interesting. I’ve not really thought about it like that. But again, if you look at how well water or liquid transfers heat versus how air does it, water’s 5,000 times better at transferring heat. To move a small amount of heat with air, you need a pretty big volume of air to get that heat away from the device. With liquid, very, very small amounts of liquid and very small amounts of movement allow us to get that heat away from the parts. And that is the goal. All the vendors, Intel, AMD, Nvidia, what they want us to do is they want us to package in these devices into the server, build them really densely so they don’t take up a lot of room.
But one of the key things is we’ve got to get the heat away from the part before it overheats the part in every wat of energy that that part consumes is going to end up in that server in the form of a wat of heat. It’s the law of conservation of energy. That water of electricity gets converted to a water heat. I’ve got to move the heat away from the part before it overheats and causes a thermal damage. And the liquid is just is beautiful at its ability to pull that heat away quickly and efficiently. And we’re using that.
Cory Johnson:
And it’s a complex problem because you don’t know where the heat’s going to happen. It’s not the whole semiconductor gets hot. There are little hotspots within the semiconductor as it’s doing different types of processing and you don’t know where they’re going to be, but it’s that dissipation of the heat is so very important.
Scott Tease:
Yeah. So it’s interesting. So in a server we can predict what the high power parts are. It’s the CPU, the GPU, the memory, the networking adapter, things like that. Those are pretty easy for us to predict. What’s complicating matters is in addition to the power going up on the device, the devices are getting smaller, smaller than they ever were before. Going from 15 nanometer to 10 nanometer, seven down to four. So the parts are getting smaller. More heat means we’re going to have to dissipate even more heat in a smaller space ever before. And that means if you’re doing air cooling, that means a lot more air movement to get rid of that heat. Whereas with water, I just turn up the flow a tiny little bit and I’m able to take care of that heat. So the problem is getting worse as power goes up, so you just let-
Cory Johnson:
There’s just a little bit of time left. I wonder if we can also talk about liquid nitrogen, what we might see in the future beyond water.
Scott Tease:
Oh, man. I hope we don’t-
Cory Johnson:
… the next frontier.
Scott Tease:
… see liquid nitrogen. Actually, I hope I’m retired by the time we see liquid nitrogen. So I hope. I hope. There are a lot of different technologies that we’re looking at that could take us beyond what we can do today with liquid. But liquid just as it is, single phase, liquid like a water, or something like that. It’s got a lot of longevity and it’s going to take us a pretty far distance into the future. With today’s current technologies, as you go past that, we might be looking at things like multiple phase liquids that once they get in contact with the heat, they change from a liquid to a gas. And that transformation from liquid to a gas actually carries the heat away really, really efficiently. It’s a little bit harder to manage that transition from the liquid to the gas, but it’s incredibly efficient at heat removal. So we may be looking at that stuff in the future.
Cory Johnson:
So you think we’re still in the world of water for a very long time?
Scott Tease:
I think there’s a lot of customers that are going to try their best to stay in an aircooled environment and we do all we can to optimize air cooling the systems. More and more customers today are looking at moving to liquid for the very first time. We like to remind them that we’ve been doing that for over 10 years. We put our first liquid cooled supercomputer install. We installed it in 2012. It was 9,700 servers back in the day. We installed it at LRZ in Munich, Germany. It was the first warm water liquid cooled supercomputer, and we’ve been doing it ever since. So as customers move to water, we like to remind them, we’ve been doing it a very long time and have a very good handle on what it takes to do it right.
Cory Johnson:
Fascinating stuff. All right. Scott Tease is the Vice President and GM of High Performance Computing and AI Infrastructure Solutions at Lenovo. Thank you for your time.
Scott Tease:
Hey, thanks. Great being with you today. Great conversation.