The Hidden Value of Numbers: Generative AI for Tabular and Time Series Data
Tabular and time series data is the heartbeat of modern business, containing invaluable insights into trends, patterns, and anomalies across sales transactions, product data, financial records, and more. Unfortunately, current approaches to generative AI are not well suited to use cases that operate on numerical data, such as prediction and forecasting, planning and optimization, resulting in high investments in AI projects that are doomed to fail.
In this session, we’ll talk about:
• New approaches to AI applied to time series data and structured business data
• Actionable strategies for organizations across all industries to adapt and thrive in this evolving AI landscape
• Guidelines for industry leaders and policymakers to responsibly navigate the transformative potential of Gen AI
Transcript
Daniel Newman:
Hey everyone. Welcome back to the The Six Five Summit. Daniel Newman here, CEO of The Futurum Group. Excited for this next session here in our data DevOps observability track. We got Devavrat Shah. He is the CEO of Ikigai. We’re going to be talking about AI of course, because that’s what the world wants to hear about. But Devavrat, want to thank you so much for taking some time to join. I know you’re busy building, starting up, making things happen. I love it as an entrepreneur, one to another. How’s it going today?
Devavrat Shah, PhD:
Excellent, Daniel, thank you for having me here.
Daniel Newman:
Of course. It’s great to have you here. So listen, this event has companies like Amazon and Microsoft and Google, and we also have exciting startups like Ikigai. Give me the quick one minute for everyone out there that isn’t familiar with the really interesting and exciting work that you’re doing.
Devavrat Shah, PhD:
Absolutely. So I’m CEO founder of Ikigai, also professor of AI at MIT where I’ve been teaching for 20 years. At Ikigai, we bring generative AI for tabular time-series data, structured data. Our mission is to enable everybody to thrive in the AI era, with the view rather than get drowned. And our focus is enterprises.
Daniel Newman:
Well listen, for first of all, nice resume. Look, one of the most esteemed schools on the planet, one that someday if I work really hard, maybe I’ll have the chance to come and stop by and say hi to your students. I love it. But listen, LLMs are all the rage. Everybody’s talking about LLMs, LLMs, LLMs. We’ve got small models, we’ve got bigger models, we’ve got gigantic models, you’ve got trillions, billions, millions of parameters, but it’s not the only AI that’s out there. LLMs are good for some AI. Talk a little bit about just your general viewpoint on what LLMs and AI are. And then what are the rest of the four decades of AI capabilities that we’ve been building and when are those approaches more appropriate?
Devavrat Shah, PhD:
Absolutely. LLMs, diffusion fog model for images, multimodal models, these are great models for unstructured data. See, the world in which unstructured data is for decades, we’ve been spending a lot of time trying to learn what are the life mathematical representations for those things and we could not figure out. And this modern take is, let’s just take all world’s data and finds how it’s represented in a succinct manner so that when a new data example comes in, let’s look up effectively what is the right matching answer to that and produce an answer. That’s what these things do. On the other hand, when you think about tabular times settings data, four plus decades of research, actually I would say a hundred years plus of research has provided us methods, approaches, models, presentations so that when those data points summing, we understand the structure within it so quickly and we can extrapolate from it.
And if you think about enterprises data, core enterprise data, whether it’s about people, whether it’s about product, whether it’s about finances, whether it’s about sales, customers and all that, it’s primarily tabular time-series data. And those are the things where we don’t need large language models or diffusion models. We need a very good mathematical representation that can allow you to extract patterns, insights from it and make decisions with it, do scenario analysis and all that nice things.
Daniel Newman:
Yeah, so there’s a big opportunity in businesses actually. You’re kind of hearing the tide turn a bit on conversations. We all know that the LLMs are excited. Also becoming a bit of table stakes. Of course, some definite nuance between company to company on the training and the quality and the outputs, but at the same time it’s become largely available to everyone for almost nothing. I call it search 3.0, it’s the next generation of search. It’s a new format, but essentially it’s Google’s next thing is like you search and now you get generative responses, but still in many ways search. The whole market talks about the value of it is when a company can actually use proprietary unique data that sits inside of one of their various systems of record or that sits on the devices of people who have built out endless PowerPoints and Excel spreadsheets and have written millions of emails and done presentations with graphics and representations. That’s the money data as I like to call it. That’s where companies differentiate.
You’ve built a novel approach, you’re trying to help companies handle this and some of what I mentioned. But those datas the things that you talked about, the large graphical model. Just give me the quick distinction from an LGM, we’re going to make new names today and an LLM.
Devavrat Shah, PhD:
Absolutely. Very nicely put by the way. In terms of LGMs, what we do? We think of data as a tabular timestamp data. Think of as a mental model. Think of a giant spreadsheet that’s changing over time. For example, the demand of your product in terms of what sales you have made in a given channel, given location for a given product that’s changing over time, its a time series kept in a nice structured manner for this type of data. What you want to do is you want to learn the relationship between every pair of cells so that if there is a cell that has a value that is missing, you can sell it up. There’s a cell that is value, but which is unusual, you can detect anomalies. There’s a part of data that was having some type of model and now post COVID model has changed, you want to understand change points.
You want to understand what are the relationship, what are the similarities between multiple behavioral things like that product in that region behaves like this product in that region. Or that product is 40% this and 60% that. Those kind of embeddings, those kind of what I would call calculus of these structured time-series data is what large graphical model enables in a domain agnostic manner. Large graphical models are primarily based on mathematical representations. And for that reason, as you present small amount of data, it tries to look for the right representation for it so that you can do all sorts of extrapolation very quickly in a very efficient manner. These models are primarily representation, they’re not learn from world’s data. For that reason, they’re very computationally efficient, they scale well. We are proud that we run on CPUs, not GPUs. And once this is at the core, you can do all sorts of interesting things with it. Whether harmonizing data together when it’s in multiple data environments, simulating, forecasting, predicting to doing scenario analysis on top. All of that you can do with this.
Daniel Newman:
Okay. Okay. So I had a question but now I have a different question because you just opened up…. The chips are cool again, by the way. Remember when semiconductors were not cool? They are cool. You just said something and by the way, I’ve said this on the record many times. I get some lashback from the Nvidia fanboys. And I’m a big fan of Jensen, I’ve gotten to know him, incredible, incredible leader. But a lot of AI is still done on CPUs. Talk to me just really quickly, because I don’t want to go too far down this rabbit hole, but why did you choose that? I think the world would love to hear why a company like yours has gone down the path of using a CPU as opposed to a GPU.
Devavrat Shah, PhD:
So at the end of the day, we are using CPUs because we can and that’s sufficient. And the reason that’s sufficient is because we are not taking what I would call an approach where go through entire world’s data to learn the patterns. We are just focusing on your data. And to identify patterns, we are using the universal mathematical representation that we have identified over years of research through my research at MIT and our collaborations.
Daniel Newman:
No, that makes a ton of sense. It’s okay sometimes to just be like, “Look, it does the job.” And it’s way more cost-efficient right now, way more power efficient right now. There’s way more access. You’re not going to wait nine months or 18 months to get access to them or you’re not going to pay a huge tax in the cloud to get access to. It does the work. A lot of inference is done on a CPU. A ton of inference and a ton of inference will be done. We’re not just going to shut all these data centers down that we’ve been building over the years, we’re going to put them to use. It’s kind of like older storage technologies became cold storage and then we’d warmer and then we’d HBM and we’re seeing this happen. So talk a little bit, just give me a couple of the really clear use cases for LGMs, if you could just put them in context for people.
Devavrat Shah, PhD:
Absolutely. So back to the use of AI in enterprises. Enterprises, there are two things that they want to do really well with data and AI. One, figuring out opportunity to grow. Two, run their operations efficiently. In either case it’s about form of balancing act. Think of supply chain, supply and demand, those are the two sides of equations. As you think about demand, you want to forecast demand really well because that’s uncertain. As you plan for supply, you want to figure out what are the right planning and controlling actions that you can take among choices you have available. If you go to finance, you’re trying to figure out how do you reconcile your data that sits into multiple environments so that we can unify it. You can do a bunch of things on top, whether it’s after that, forecasting your cash, forecasting your consumption, if you’re a consumption based company. Or planning and doing scenario analysis from an outcome driven perspective.
If you are thinking about insurances, in insurances you’re to do claims auditing. How do you use AI to do that? If you’re thinking about financial services, you want to do a fraud detection, fraud management risk score that is comprehensive across customers and transaction across all sorts of data altogether. So these are the type of use cases that we have been seeing across supply chain, banking and financial services, insurances and the traditional CFO’s office.
Daniel Newman:
Excellent. And by the way, it sounds like you’re hitting the highly regulated industries. And also it sounds like you can help with some of what I say persona based, role based, which is super important. Just ask our CFO, the questions that I’m asking him and how much AI could help if it was properly implemented. And we’re getting there, every company’s getting there. It’s a process. So I got about a couple minutes left with you here and I really appreciate you spending the time. I’m going to ask you to put your professor hat on for a minute here in the last… I’m going to ask you kind of a big question, hopefully you can answer it in a somewhat condensed manner. But key considerations for the development of a longtime AI strategy and I’m going to couple this. Considerations for businesses developing a strategy. And at the same time, how do you see policymakers and business leaders making sure that we manage it? So how do we deploy and then how do we manage it responsibly?
Devavrat Shah, PhD:
Great. So there’s a few things here. One is as we use AI, whether we use internally or externally. With AI, exception is a norm, so plan for it rather than thinking of that as a surprise. Going back to compliance, heavy compliance/recognition heavy industry, you want to keep expert in the loop, not out of the loop. The best way to think in my mind, bringing AI is to have experts within the loop, not out of the loop. Let machines do a lot of work for you, but make sure that there is a spoonful of intelligence that is brought in by experts constantly. So that’s one part. Second is data is where AI starts and data is where AI ends. So be very mindful of how do you utilize data? How do you mix it up? Or how do you avoid mixing data? It’s very, very important to have attribution in terms of where you are generating AI insights.
And then finally, the third thing is, unlike other things, AI is rapidly progressing with regulation. It’s the norms that matter and those norms are not yet defined. So what you need to do is you need to start defining those norms along with your other fellow community industries and try to start to get into the self-regulation things. For example, even though we are a small startup, we decided to follow this ourselves and we have set up our own AI council. This AI council is a collection of awesome academics both from social sciences, the law and policy and AI. And they help us meet a quarterly and they help us provide what are the things we should do or we should not do and then implement.
Daniel Newman:
Yeah, listen, you did a good job and you played the professor role perfectly there, Devavrat. I would love to sit in on one of your lectures at some point. This is a big problem. I remember just recently I was on CNBC and they were asking me about basically, “Can this be regulated?” And I think it’s going to be people like you building companies like you, working with policymakers, business leaders to figure out A, how do we do this technology to keep innovating and lead the world, but at the same time, do it safely, do it securely, do it responsibly. It will be a tug of war. Industry will always move faster than regulators. There’s really nothing we can do about that except work closely, stay informed, try to open the box, be transparent whenever possible. And it sounds like this is exactly what you’re doing.
Devavrat Shah, CEO at Ikigai. Thank you so much for joining me here at this year’s Six Five Summit. Let’s talk again soon.
Devavrat Shah, PhD:
Thank you Daniel, thank you for having me.
Daniel Newman:
All right everybody, we appreciate you tuning to this segment. It was a fascinating one. LGMs, we’ve got a new acronym now. Check out Ikigai and stay with us for all of our Six Five coverage. Studio, sending it back to you.