Dell AI Data Management Services
Beth Williams, Global Portfolio Lead at Dell Technologies, shares her insights on overcoming AI data management hurdles with Dell's advanced solutions and services. This conversation explores the essential techniques and technologies vital for AI development and deployment success.
There is no AI without data. Host David Nicholson is joined by Dell Technologies' Global Portfolio Lead, AI, Apps & Data Services, Beth Williams on this episode of the Six Five On The Road at SC24. Beth and David discuss the importance of data in virtually any AI use case and how Dell is addressing the challenges of AI Data Management.
Tune in for more on ⤵️
- Data-related challenges organizations face when adopting AI, including: ✅ Managing massive data volumes ✅ Ensuring data quality ✅ Meeting AI-specific regulatory standards ✅ Adapting to the demands of AI development and deployment.
- Techniques and technologies organizations can adopt to address AI Data Management challenges
- Dell's Data Management Services, including optimization and implementation services for data cataloging and pipelines, and Dell's collaboration with leading technology providers to offer advanced data management for AI
Learn more at Dell Technologies. Watch the video below and be sure to subscribe to our YouTube channel so you never miss an episode.
David Nicholson:
 Welcome back to Six Five On The Road’s Continuing coverage of SuperComputing ’24. I’m Dave Nicholson, and I’ve got a very, very special guest from Dell Technologies today. Beth Williams, welcome to the program. How are you?
Beth Williams:
 I’m doing great, thank you. Thanks for having me.
David Nicholson:
 So data is at the center of all things AI. In fact, without data, I would argue that AI stands for absence of intelligence. Would you agree that data plays a critical role in successfully deploying AI? Are you seeing that with your customers?
Beth Williams:
 Absolutely. I think it’s fundamental actually, in terms of implementing any kind of AI use case, making sure that you’re getting your data right, getting the quality right, getting the right sort of data to the AI model, all of that is imperative to get a use case implemented correctly.
David Nicholson:
 And what are you seeing in terms of challenges that are out there? First of all, what is your role at Dell Technologies? What’s the perspective that you bring to this?
Beth Williams:
 Yeah, so I’m in the consulting team, so I’m the global portfolio lead for all of the consulting offers around AI and applications and data. And they’re obviously all very linked.
David Nicholson:
 Okay. So you’re engaging with folks outside of what we would think of as the ivory tower in tech actually, where the ROI rubber meets the road, I guess, but so what are some of the specific challenges that you are seeing?
Beth Williams:
 So there’s lots of different things. I’ve mentioned quality already. I mean, I think that’s probably one of the biggest challenges, especially when we’re talking about things like generative AI, whether or not you’re training a model or you’re trying to feed a vector database via RAG, making sure that the quality of the data that you’re using in the models through those mechanisms is right, is fundamental, and it’s a big challenge. I think part of the reason for that is now we’re opening up different data sources that we’ve not really used before. So unstructured data has suddenly become really important to generative AI models. In fact, it’s probably the richest source of data that we’re seeing for gen AI. So things like SharePoint sites are a big example of that. In fact, we’ve got probably about 500,000 if not more SharePoint sites in Dell, and we are using that as a mechanism to feed a lot of the models that we’re using.
But obviously not all SharePoint sites are correct. Some of them might have a bit of stale data. And so quality is really, really important. Making sure that you’re cleaning that data, making sure that whatever you are feeding the model is right, but also the amount. I mentioned, we’ve got over 500,000 SharePoint sites, and that’s just SharePoint. So there’s so much data to deal with, that whole volume of data is a big challenge. And the different types of data, I’ve mentioned unstructured data, but you can break that down even further. If you think of, say, PDFs, within a PDF, you’ve obviously got texts where you’ve also got things like images and graphs, and you’re trying to take all of that different type of data, that kind of multimodal data and feed it into your model. And that’s really, really tricky. In fact, we as Dell have found that quite challenging in the past, especially around things like images and graphs.
David Nicholson:
 One of the concerns that we hear about constantly is concerns about governance, privacy, things like that. Are those concerns completely warranted or do we have adequate solutions for those issues at this point? What do you say?
Beth Williams:
 I think the concerns are warranted for sure, and I think part of the reason why it’s certainly around security, it’s getting quite interesting, is because we now have a lot of different attack vectors that we didn’t have before. So people have been doing AI for quite a while, but they had data security in place. But now when we’re starting to talk about using models, there’s all these new opportunities to get to it in a nefarious way. So you could get to the data sources that you’re pulling in and start to poison those. You could even get to the point where you’re talking to the model and poison the prompts. So there’s loads of different new attack vectors that have suddenly opened up all of this opportunity for nefarious characters to start accessing and polluting your data and your models.
And then governance itself has always been a concern around data, PII data and so on. But now that’s even more important as well, because we’re starting to see a whole shift in the types of data being consumed as well. I mentioned things like SharePoint sites. If you think about that, if you think about the way that we currently use SharePoint, for example, you have this inbuilt role-based access. So I can say, you can see my site and they can’t see my site. But the minute you start to suck all that in to say, a vector database, that’s gone. So all of a sudden you’ve just got all of this lovely data and you have no idea who’s meant to see what. And so governance becomes really important at that point to make sure that that role-based access still prevails after you’ve flattened all the data.
David Nicholson:
 So that’s interesting because the traditional thought process behind data hygiene is the idea that you want to avoid as much of the garbage in, garbage out outcome as possible, and then you have to balance between how much time and money do we spend making it perfect before it’s good enough. What you just described is actually inducing additional error, if you will, because you’re stripping away the permissions that have been managed in a traditional way. So when you go in and you engage with a client, how do you sort through that? How do you figure out how much more time and money should be thrown at making the data perfect? And then how would you address something like that where you’ve stripped away the traditional permissions and now you’ve made it this, the data lake house next to the lake full of data sounds great, but maybe you’re not supposed to see my data. How do you manage all of that?
Beth Williams:
 So not surprisingly, there isn’t a silver bullet. There’s lots of different mechanisms that you need to put in place. We often talk about things as becoming a data swamp these days, because of all of that data going in without the right kind of lineage being attached to it. So one of the things that we always suggest, well, first of all, don’t try and boil the ocean. I mentioned we’ve got hundreds of thousands worth of SharePoint sites. We don’t need to necessarily clean up all of those sites. So first of all, focus on the data sources that are important to you. And quite often the best way of doing that is looking at the use cases that you’re going to implement first. So prioritizing what you’re going to use the data for first, that’ll give you a subset of the data sources to look at.
And then once you’ve decided on that subset, it’s a case of going through some of the processes that we use. For example, we use data catalogs. And so what we can do there is look at the data sources and tag the sources with metadata. So we can say, for example, this is the lineage of this data source. We know exactly where it’s come from and this is the role-based access we would like. These are the people that are allowed to see it, these are the people that aren’t, this has got PII in it. So having that kind of metadata tagging to the data sources in a catalog is a really good way of getting started. Because what you can do with that then is even though you might not be using that straight away, when you start to consume the data with say, data pipelines, you can actually access that metadata and using policies, you can say, well, actually, that data’s allowed, that data isn’t. And you can do that repeatedly, automatically on the fly if you like.
David Nicholson:
 I’m curious to hear what your experience has been working with folks who are thinking about fine-tuning a model with their own bespoke, proprietary, crown jewels of data. Do you think this is pushing folks more in the direction of hybrid cloud? The hyperscale cloud providers would’ve said at some point that, nah, hybrid cloud, it’s just a bridge until everything can be in our clouds. I’m hearing a lot of folks say, “Well, hold on a minute. No, no, no, no, no. We want a core of what we’re doing to be somewhere where we feel like we have more control.” Are you seeing the same thing? What are your thoughts? I know, look, Dell does hybrid cloud better than anyone, but what’s your perspective?
Beth Williams:
 It’s always going to be a hybrid cloud world, and it comes to the data, because as we say in Dell, there’s certain amounts of data that will never ever leave Dell. Under no circumstances will we let that data go. It’s really important to our business. And so it will always stay on-prem, and there’s other data that we already put out to say to public cloud, which is okay, it’s customer-facing. So there’s lots of different ways to achieve the same goal, but at the same time, you’ve got to stick to your core principles of this is my IP, this is my data, and this is really important for my customers that I keep this data safe. So we see that across many, many different customers. Obviously, from a Dell perspective, our customer base is predominantly those people that are already thinking in that space anyway. They’re already looking at hybrid cloud and have been for a very long time because there are certain workloads that will never ever leave their premise. But as we say, there’s always good use cases for public cloud as well.
David Nicholson:
 Yeah, in the classes that I teach in AI, the big question from CIOs and CTOs is always, how do I get to the nirvana of a positive ROI from AI? Specifically from generative AI, but from all things AI, Dell has been in this business of helping people “manage their data”. I use the big air quotes to all encompassing managing data for a long time. So this isn’t all net new for Dell. Can you walk us through from the perspective of what Dell has been doing in the past, what you’re doing in the present, and what you’ll be doing in the future, and how that’s changed? Because some of the stuff you’re talking about is stuff you’ve been doing. 10 years ago it would’ve been the same thing, but how have things really, really changed recently? And then what can we expect on the horizon?
Beth Williams:
 Yep, you’re absolutely right. So Dell and obviously Heritage EMC, we’ve been focused on storing customers data for many, many years. So our data storage and data management solutions have been front and center for a very long time. What we’ve started to move into now are things, as you mentioned before, the Lake House concept, the Dell Data Lakehouse, using the sort of Starburst partnership. So we started to move up on top of that kind of storage and management layer and start looking at, well, actually, what are you going to do with this data? How are you going to manage it effectively? How are you going to be able to extract metadata? So we have products that are now evolving so that metadata can be automatically extracted from the storage products that we’ve got as well. So that evolution is happening. So we’re slowly moving up the stack of, I would say, data products into that kind of data management space.
But we’re not trying to be all things to all people. So very clearly, we are focusing on data management that in this context, that will accelerate AI and gen AI. We’re not going to everything in data management that the world could need, it’s going to be very much focused on storing your data and then managing it in the focus of accelerating AI. And what you’re going to see, I think coming down the line is more of that. More solutions using our ecosystem of partners, using the best of breed technologies where we can go and actually help customers achieve those goals, look more at the kind of AI use case world, the AI solution world, and help the data feed those solutions. And again, using kind of the ecosystem of partners that we’ve got as well as our fundamental data products.
David Nicholson:
 I love the fact that you referred to Heritage EMC. Thanks for making me feel very, very old.
Beth Williams:
 I’m Heritage EMC, so there you go.
David Nicholson:
 I actually, yeah, full disclosure, I was at EMC for 16 years prior to Dell acquiring EMC. So Heritage, wow. I think I get to get those special license plates now for my cars, classic.
Beth Williams:
 Absolutely.
David Nicholson:
 So what are some of the things that maybe people would be surprised by when they’re initially coming at this? If the board, I’m a CIO, I’m pretending to be a CIO, and my CEO calls me and says, “Dave, the board just asked me what our AI strategy is, what do I tell them?” And so I’ve got to come up with an answer, and I reach out to you and I say, “Hey, I want to kick off an AI pilot program.” And that’s literally all I tell you. What am I going to be surprised by when you say, “Okay, okay, Dave, first, let’s start with step zero or step one.” Anything that people are shocked by that you can’t just flip a light switch on? What does that look like?
Beth Williams:
 Yeah, no, I think there’s obviously people always wanting the panacea. They’re always wanting the guaranteed solution, the killer use case that’s going to make their ROI. And like anything, it takes work and it takes some dissemination to work out exactly what it is that we need to be doing. And we had to do that as Dell. So we had hundreds of use cases, over 800 use cases that we were booting around in terms of thinking about what to do with AI, some of which were actually in flight. And we realized really quickly that actually, yeah, we can’t have 800 in-flight AI use cases. We need to come down to a small set, focus on those things that are important to us. And so we went through that process of looking at what people were suggesting and starting to cluster these use cases together to see where the biggest bang for the buck would be, what was most technically feasible, where the data was.
So as we said before, data readiness for use cases is really important. If this use case relies on data that we know is really bad at the moment, it’s going to take a long time to clean up. Maybe that’s a lower priority than say, other use cases that we know the data’s pretty good and it’s very fast in terms of getting it to the model. So what we try and do with our customers is help them understand that you do have to go through that process. It doesn’t have to be a long process though. I think the surprising thing for a lot of people is they think that’s like a six month strategy exercise and it’s not. We can do something very quickly in a couple of weeks to come in and help you disseminate what those high priority use cases are in the same way that we have, and then focus on the data sources that are relevant to those use cases and then start to incrementally implement them.
And again, we are not saying go in and build a massive data as a service data product data mesh out of the gate. We’re saying go and get those good use cases to start with, go and implement those. You can do them tactically in terms of data. There can be some manual steps in there as well, as long as you’re safe, start to see that getting some ROI. And then once you’ve got a few use cases under your belt and you’re starting to get into some scale, that’s when you start putting the automation in. That’s when you start putting the Dell Data Lakehouse in. That’s when you start to look for scale, and you are starting to look at things like data as a service because you’re going to need that with these more use cases coming down the line. So I guess the big surprise is, don’t buy everything that we sell out of the gate. Just do things to start with that are small, incrementally add to those, make sure you’re getting ROI, and then when you get to a point where you really need to scale, then we can help you with our own experience.
David Nicholson:
 No, it makes a lot of sense, and I think it’s always interesting talking to folks who are involved in the real services side of the tech business because you have a lot of effort that gets put into productizing things, but at a certain point, every single one of your engagements is bespoke. I guess the best you can hope is sort of 80 to 90%, yeah, we’ve done this before. Don’t worry, we’ve got you. But there’s always going to be that leading edge where you’re working in collaboration with a client. Is that a fair statement? And are you sometimes jealous of your product friends who have three SKUs and that’s it?
Beth Williams:
 So I’ve been in consulting for a very long time, so if I didn’t like variance, I’m in the wrong job, quite frankly. So I think that’s what makes it fun. But you’re right, I think the pretrial is a fairly good one. It is pretty much an 80-20. As things start to evolve, we can start to see patterns emerging. For example, we know what the kind of top solutions are that people are going after. We know that things like RAG are really important. So we can boilerplate a lot of this stuff to a point where we can take it to the environment, take it to the customer, get it deployed, and then after that, that extra configuration layer that’s just integration and configuration with the customer and what the customer wants to do. And that is normally about 20%. So yeah, that’s the bit.
Every customer’s different. Everybody wants something different. Everybody’s got a slightly different goal, but we try and take a bit of weight off. Not everything’s a snowflake. We’ve got good patterns that we use. We’ve got our validated designs that we follow. That gives a really good start, and we know those things work. So it’s not like the old days where we used to go in and say, well, what do you want? Now we go in and say, look, this is how we would do it. How much would you like of this? And where would you like the variance? So it’s a lot better than it used to be.
David Nicholson:
 Well, AI is certainly filled with excitement and drama, but to the extent that Beth Williams and her teams can go in and eliminate that drama, go get your adrenaline somewhere else if you’re an adrenaline junkie. What you don’t want is to be terrified about what the next day brings in your AI deployment. Beth Williams from Dell Technologies, thanks so much for joining us here on Six Five On The Road’s continuing coverage of SuperComputing 2024.
Beth Williams:
 Thanks very much.
MORE VIDEOS

How Regulated Industries Are Making AI Work - Six Five Connected
Chris Wolf, Venkat Balabhadrapatruni, George DeCandio, and industry experts join Diana Blass to reveal how private and hybrid AI strategies are redefining what's possible for regulated sectors balancing innovation, compliance, and security.

The Six Five Pod | EP 281: Inside the Week in Tech - Oracle, Salesforce, TSMC & the AI Cloud Race
On this episode of The Six Five Pod, hosts Patrick Moorhead and Daniel Newman discuss recent developments in the tech industry. They cover Oracle's financial analyst day, highlighting the company's impressive growth projections and AI infrastructure plans. The hosts also delve into OpenAI's partnerships with Broadcom and other tech giants, exploring the implications for the AI cloud market. They touch on Apple's latest chip announcements and debate Oracle's potential to challenge the top cloud providers. The episode concludes with insights on TSMC's strong earnings and its impact on the semiconductor industry, as well as a brief discussion on ASML's performance. Throughout, the hosts provide candid analysis and occasional humor, offering listeners a comprehensive overview of the week's most significant tech news.

Adobe Acrobat Studio: The Future of AI Document Productivity - Six Five Media
Michi Alexander, VP of Product Marketing at Adobe, joins the webcast to discuss how Acrobat Studio’s AI-powered tools are transforming document productivity for enterprises by automating insights, enhancing collaboration, and delivering polished outputs.
Other Categories
CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks
Alex Rose from Secureworks joins Shira Rubinoff on the Cybersphere to share his insights on the critical role of threat intelligence in modern cybersecurity efforts, underscoring the importance of proactive, intelligence-driven defense mechanisms.
QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella
Quantum is no longer a technology of the future; the quantum opportunity is here now. During this keynote conversation, Infleqtion CEO, Matt Kinsella will explore the latest quantum developments and how organizations can best leverage quantum to their advantage.

Accelerating Breakthrough Quantum Applications with Neutral Atoms
Our planet needs major breakthroughs for a more sustainable future and quantum computing promises to provide a path to new solutions in a variety of industry segments. This talk will explore what it takes for quantum computers to be able to solve these significant computational challenges, and will show that the timeline to addressing valuable applications may be sooner than previously thought.

