AI’s Role in Balancing Potential and Risk: Insights from Six Five On The Road at AWS re:Invent
An AI tool that lets you CHAT with your data? Host Patrick Moorhead is with Cohesity’s CTO, Dr. Craig Martell, for a conversation about the hype vs. reality of AI in data management and the surprising ways AI is ALREADY being used to protect your data (Think ransomware detection and sensitive data identification)!
Catch Six Five On The Road at AWS re:Invent for more on:
- The impact of AI on Data Management and Cohesity’s role in driving Data Insights
- The latest developments around Cohesity GAIA and what it means for enterprises
- Dr. Martell’s predictions for AI in 2025, focusing on the importance of a pragmatic approach to AI development and application, especially concerning cybersecurity
Learn more at Cohesity.
Watch the video at Six Five Media at AWS re:Invent, and be sure to subscribe to our YouTube channel, so you never miss an episode.
Transcript
Patrick Moorhead: The Six Five is On The Road here in Las Vegas at AWS re:Invent 2024. There’s no surprise the big discussions, the big announcements and all of the conversations are about generative AI and enterprises lighting up those capabilities. And as we’ve discussed on the show, and I’ve published in my company’s research, one of the biggest impediments to scaling AI is data management, and not only the data management, but also getting a handle around the governance related to AI. And that’s what we’re going to talk about today with the CTO at Cohesity, Dr. Craig Martell. Welcome to The Six Five.
Dr. Craig Martell: Thanks, Pat. Appreciate it.
Patrick Moorhead: You have a very illustrious background. We were talking about that in the virtual green room, all about AI, and it’s pretty impressive. You were doing AI before it was cool. I mean, I know the algorithms started in the 1960s, but 10 years ago it really started to light up there. So I’d like to drill down a little bit into what I said in the lead-in, which was asking you how is AI used in data management and how are you extending it to data insights?
Dr. Craig Martell: That’s a great question, Pat. Thanks. My career may or may not be illustrious, but I have it just by getting old. So if you’re around long enough, you’ve done enough things.
Patrick Moorhead: 35 years. Guilty.
Dr. Craig Martell: And I tend to say that I won the lottery because I started doing AI back in the late ’90s before anybody thought it would do anything. And so then suddenly I wake up and it’s pretty effective. So it’s an exciting time to be working in this field.
Patrick Moorhead: Oh, the ’90s. That’s impressive. It’s funny, I was thinking, okay, the algorithms came out in the ’60s, we didn’t really have the compute power storage until maybe seven or eight years ago when ML took off. That’s cool, in the ’90s. That’s wonderful.
Dr. Craig Martell: Yeah. It was slow and didn’t work at all because nobody was working in it, and so I did it because I thought it was cool. So back to the original question, how is AI used for data management and how are we extending that for data insights? Think about how backup actually works. Data protection. The technical term is data protection, but it’s backup. Cohesity and a company together figure out the important data, we back that data up and we protect it. And part of that protection, and this is part of the data management, is figuring out if it’s being attacked. So we can see all of your data and if you’re having a ransomware attack in the front, we actually notice that in the back. We can see changes in… we say changes in entropy, but we can see that more and more things are becoming encrypted that shouldn’t be encrypted or that historically weren’t encrypted. And we also have a product that allows you to have an immutable copy of that.
So if we notice that that encryption is taking place or that change in entropy is taking place, we can alert you to say, “Hey, we think there’s a particular danger here.” That’s AI, right? I mean, today we would call that machine learning, but to me, they’re all of a piece. I don’t think modern AI is that different than old-fashioned machine learning. We can talk about that if you want. So that’s machine learning. Another part of the machine learning to protect your data is to figure out the sensitivity of that data. Does that data have credit card information? Does that data have PII that might be really valuable? So we can actually alert you. It looks like you’re having an attack and it looks like they’re attacking this kind of sensitive data, and we can prevent that data from being backed up so it doesn’t overwrite the prior version, and we can instantaneously help you restore it if that’s the case. So AI is all throughout that. So that’s classic AI for data protection.
Patrick Moorhead: And by the way, just quick comment, what I find fascinating is this virtual cycle. My lead-in, one of the biggest impediments is data management to get to AI, but you’re actually using AI to help-
Dr. Craig Martell: To do data management.
Patrick Moorhead: To do data management.
Dr. Craig Martell: A hundred percent. So we call that pillar one internally. And pillar one is the AI that we do to protect your data, right? Pillar two is the AI that we do to help give you insights into your data, which I think traditionally has been overlooked. You have the CIOs over here protecting your data, and then you have the CDOs, and their job is to make that data accessible and available to provide business insight, but they don’t really talk that much, and we really need to get them talking because we already have that data. So if we can allow particular kinds of access to that data, then the CIO has done a lot of the CDO’s job and can provide real value in that direction.
So currently we have a product called GAIA, stands for Generative AI Agent or Generative AI Application, depending upon who you ask. The history was before me. But GAIA allows you to have a ChatGPT-like interaction with your data. So you can select a data set, you can select the permissions, it’s really important, select the permissions on who can actually have the conversation with that data, and you can ask it questions, like, “What was the thread I had with Pat two years ago about the following thing?” And it will summarize that for you. Now, part of the issue with generative AI, we can talk about this when we talk about the dangers or the potential problems, is that if it doesn’t have the right information, it can hallucinate. So it’s extremely important that we have the links back to the original emails or the original documents so that you can actually look yourself and say, “Okay, right. This is the right question.”
Patrick Moorhead: Yeah, the metadata. So what I find fascinating, I think GAIA was announced maybe a year ago.
Dr. Craig Martell: I think it was a year ago, yeah.
Patrick Moorhead: And what I really thought was just so pragmatic and awesome was the fact that the data’s there. And typically it’s just sitting there. It is sitting there.
Dr. Craig Martell: Why not use it?
Patrick Moorhead: And typically in the workflow, you’re ETL-ing the data off to maybe a data lake, pulling in data warehouse, and then you’re activating that data. Here, it’s there, right? It’s going to be there and everybody has to have data protection and backup. Can you catch us up though? What are the recent developments around GAIA?
Dr. Craig Martell: Yeah, that’s great. In my opinion, there’s two big recent developments, and there’s some smaller ones as well, and I might not remember all because it’s been a very productive year. The two big ones for me is that we’re not just doing cloud-based stuff. We’re moving to on-prem as well. So your Dell EMC Isilon, your NetApp NAS, our own smart files or any on-prem filer that you have, you can start selecting files from there to be able to do that. And that’s huge to be able to do GAIA on top of it.
Patrick Moorhead: Well, it is huge, I mean, because 80% of the enterprise data on average is sitting on-prem at the enterprise edge or even maybe sitting on devices.
Dr. Craig Martell: And it also depends upon the industry. If you’re a new startup in Silicon Valley, you’re probably all in the cloud. If you’re a bank or you’re a hospital or an insurance company, you likely have requirements to have it protected and on cloud. And so being able to ask these sorts of questions, not just to the cloud, but also to on-prem is going to be a big win.
Patrick Moorhead: For sure. So you’re CTO. I know you know that. Hopefully you do. I’m just joking. And you have this really interesting job that you have to have a future state. It’s kind of the art of the possible, right, the technology, and then you have to intersect that with the needs of what people think they need and the product teams can… You’re all working together to put that out there. But I have to ask, what kind of predictions do you have for 2025?
Dr. Craig Martell: Let me start by saying that I’m actually a professional AI skeptic, and I’ll tell you what I mean by that. At the same time, this is the most exciting time in AI ever. It’s also probably the most… The last two years have probably been the most overhyped ever.
Patrick Moorhead: Sure.
Dr. Craig Martell: And so when I say I’m professional AI skeptic, what I mean is I really want to be a realist about these things. And so one of the things we’re doing in the office of the CTO is absolutely forward-looking. And I’ll tell you that forward-looking is around this sort of GAIA notion that this data is sitting there, what insight… remember, we call that pillar two… What insights can we bring to your business from that data? Almost all of our energy is spent there, because the backup team are excellent. They’re experts. They know what they’re doing. What my team is focused on is how can we bring the data that’s just sitting there, how can that value be immediately useful for your business? How much of that is going to be magical AI is really the battle we have internally, right?
Two years ago, the notion of gen AI was you buy this box or you subscribe to some service and then every one of your problems is solved. Well, we’ve all seen that that actually doesn’t work anymore, right? We’ve heard stories about the lawyer that had 10 made-up cases or whatever the number was, right? So the hallucination problem’s real, the selection of the files that you’re going to summarize over is real. So there’s a lot of nuts and bolts work that we also have to do to make sure that we deliver on that promise. The way that I say it is, look, the waters of hype are going to recede. They’ve already started to, and a lot of sand is going to be washed away. We want to make sure there’s rocks left over, and that’s really what we’re focusing our energy.
Patrick Moorhead: No, I think that’s a great place to put energy. I mean, our research suggests that the big boom in enterprise AI is probably 18 months, two years away. And one of the bigger challenges is in the old world, it was really specific to a certain area. For instance, it’s ERP data, it’s productivity data, it’s PLM, it’s CRM data, and you can activate on that, HR. But now the thesis here is you should be able to have this data and be able to activate it across and get better insights. So for instance, you’re trying to connect the front end of the house, which is, let’s just call it CRM, the back end of the house, which is ERP, to basically energize people. And then you have to figure out who really should have access. Should the frontline worker have access to the pay compensation packages for the top a hundred people in the company?
Dr. Craig Martell: Probably not.
Patrick Moorhead: Probably not. But those things have happened with the early instantiations, and that’s key. So hopefully you’re keeping your eye on how only people who should get access, get access once that data is activated.
Dr. Craig Martell: So a lot of things to say there. One, for sure on the permissions, one of the benefits of backed up data is we can actually back up the permissions as well. So we actually know the permissioning structure of your company to a large degree, and we have other layers on top of it that allow say a compliance team to say, “No, no, no, no. Only legal gets to look at this,” right?
Patrick Moorhead: Yes, yes.
Dr. Craig Martell: So we have really tight constraints on that, and that’s extremely important, and we don’t want to release a product that’s going to allow the dock worker to know the CEO’s salary. I’m also really excited, and I’m going to say a buzzword, and I kind of hate myself for it, about agentic AI, because what is an AI agent? Let’s actually be clear about that. It’s really just a mapping from a query to a plan. It does something, right? Well, yeah, it’s a plan, right? And so that makes it sound a little bit less magic. It’s just this query means do these end steps, do these five steps.
Patrick Moorhead: It’s not a search result. It’s something that is actually activated on based on a search. I know it’s not a search.
Dr. Craig Martell: That’s right. But to normal people, it is. It’s a plan. And part of that plan might be to do something, or part of that plan might be to do some more searches across these disparate data sets, like you mentioned. But I do want to say, AI is not monolithic. We can’t think about buying a magical solution, throwing at it, and suddenly it works perfectly. You have to think about the use case you want. Even though it’s disparate data, there’s a particular use case and you have to gather the data for that, build the model, and train and evaluate for that use case. So that’s good old-fashioned machine learning engineering that hasn’t changed, and I really want that message out there.
Patrick Moorhead: Yeah, great conversation. Craig, I really appreciate you coming on the show here.
Dr. Craig Martell: Thanks so much, Pat.
Patrick Moorhead: I love your pragmatic realism as well. I think we all need a wake-up call. I think the initial investment cycle was required, otherwise we wouldn’t be here. And if you look at the last-
Dr. Craig Martell: It’s an exciting place to be.
Patrick Moorhead: Well, if you look at the last five waves of technology going back 40 years, they always had a hype cycle, and I think it is required and it figures it out. I’m feeling really good though, because when it works, it works and it works really well. So thanks again.
Dr. Craig Martell: Thanks so much, Pat. I appreciate it.
Patrick Moorhead: So thanks for tuning in to this discussion on AI and data. Tune in to all of the AWS re:Invent content and check out all of the videos that we’ve done for Cohesity. Thank you very much. Hit that subscribe button. Take care.