How to Control Cloud Costs for GenAI Tools and Reinvest for Growth
Can AI be used to control cloud expenses? Host Mitch Ashley is joined by DoiT’s Eduardo Mota and Weaviate’s Jobi George on this episode of DevOps Dialogues, for a conversation on how companies can manage their cloud expenditures in the context of GenAI tool utilization and strategize reinvestment for growth.
Their discussion covers:
- The intricate cost relationship between cloud architecture and GenAI, highlighting the importance of expert partnerships in developing effective AI solutions
- Utilizing data as a competitive advantage, from leveraging existing datasets to enhancing them through transformations or graph database conversions
- A comparative analysis of RAG (Retrievable Augmented Generation) versus Agents in reducing hallucinations and costs while optimizing efficiency
- Evaluating ROI for AI-driven projects, considering aspects like end-user payment, competitive differentiation, and operational efficiencies
- The role of managed services in the initial stages of cloud transition, and advanced strategies such as fine-tuning, distillation, or quantizing a model for growth
Learn more at DoiT and Weaviate.
Watch the video below, and be sure to subscribe to our YouTube channel, so you never miss an episode.
Transcript
Mitch Ashley: Hey everybody. Mitch Ashley here with DevOps Dialogues. I’m VP and Practice Lead of DevOps and application development with The Futurum Group. DevOps dialogue is all about conversations about creating software in the era of AI, cloud, security, all kinds of aspects that we have to deal with of creating. The best kinds of applications that deliver the results that our customers, our business, our partners, are going to be most fulfilled from and receive the greatest outcomes. We’re going to be talking about GenAI and doing AI projects and some tips and information that will help you all in taking on those or maybe you’ve already started a project and looking for some good advice or information, some insights. We’re going to do that by the two folks that are joining me on the podcast today. First, is Eduardo from DoiT, who’s Senior Cloud Architect; and Jobi George, who’s global head of partnerships at Weaviate. Welcome, gentlemen. Eduardo, would you introduce yourself, tell us a bit more about you and also tell us about DoiT?
Eduardo Mota: Absolutely. Thank you. Well, I’m in Canada, so the weather over here is very cold. So it gets cold. I’m a senior Cloud Data Architect, and I’ve been working with AI MLs for quite a few years now before GenAI. At DoiT, what we do is we’ll help our customers really unlock the true value of the cloud, and so we help them strategize, create architectures, and really become efficient when using the cloud.
Mitch Ashley: Fantastic. Jobi?
Jobi George: I’m Jobi George. I run the global partnership at Weaviate and manage all our cloud technology and system integrator partners. Weaviate is an open source vector database for building AI-native applications. Without developer-first approach, developers are able to go and use LLMs and embedding providers and frameworks to build their AI-native applications. They’re using it to build applications like hybrid search, semantic search, RAG applications, or some new things like recommender systems or generated feedback loops, and now the agentic RAGs. We have around a thousand plus customers like Cisco, Morningstar, Bunk, who are building all kinds of GenAI and search applications. I’m glad to be here. Thank you for inviting me.
Mitch Ashley: Excellent. Thank you both for being here. Thanks to your companies for being here, as well. Let’s start out about talking first about taking on GenAI projects. It’s an exciting time to be in the industry, technology is changing every day. There’s announcements, it seems like every week, every day, whether it’s Microsoft or AWS or you name it, a new model coming out. There’s so many technologies to learn. I know my own experience leading software teams. You don’t always want to venture into this alone for fear of it becoming too much of a science project rather than really getting value out of a project. I’d love to hear your thoughts. Maybe Eduardo if you want to start out first, since you work with so many customers building these applications, what are some of the best ideas about how to get started and how to decide what kinds of applications are best to take on with generative AI?
Eduardo Mota: There is so much going on. There are just so many things as you were mentioning. We take an example of AWS, there are so many services available to our customers. A lot of the time we hear like, “I want to use GenAI. I really want to leverage what is possible with the technology, but I don’t know where to start.” Really it’s starting from the place where, what is a good use case for personalizing the customer’s journey? How can you really hyper-personalize that experience to get the most value? Everybody’s collecting all this data. It’s just very powerful data that allows us to get into the nitty-gritty of that personalization. Now, with GenAI, we’re able to unlock that. Before, we were creating personas, we were creating buckets that will put a customer in a certain category, but with GenAI, you can really go dive into that custom personal experience.
If we take a, like I was mentioning AWS as an example, they have great service called Bedrock where you don’t have to create an infrastructure, you can start utilizing it on demand and send texts. You only pay for the amount of tokens or the amount of words that you send out and off you go. It’s a really great way to be able to start testing what is possible. Now, whenever I talk with my customers, I do have to mention that GenAI is not a silver bullet, nothing in technology. It is great at certain things and when you are doing this test with Bedrock or with any other cloud provider, make sure that you test that limit. Where does it start breaking? Because that’s the iteration where you’re going to be able to start making progress and start adjusting things so it fits your use case. But to start with is think about that hyper-personalized customer experience that you want to deliver and then start testing with the managed services available to you in the cloud. That’s where I will start.
Jobi George: That’s a great, great point. I’ll just add to that itself. For example, Weaviate is one of the companies which integrate with Bedrock. I think just to roll back a bit, generative AI, this is a totally new way. It almost feels almost like the internet-browser-coming-into-the-market type of a landscape. Everybody wants to build all of these applications. Everybody wants to build an application which looks like a chat GPT or equivalent of those applications. But they might not have the skill gap, they might not have the real understanding of all the use cases which they can go and tackle. So taking a more selective approach to how and what they can build and what is the platform fits in for them as they’re building These applications become super, super critical. But the possibilities are endless, as Eduardo said, there are all kinds of applications, people are building starting from simple search type of applications, RAG applications.
We always had a search or a database application architectures to build these applications. But machine learning and the generative AI has totally changed what the information retrieval and the quality of the information which is being retrieved from dramatically. Now, we are able to get to relevant information. Eduardo talked about personalization, being able to do that personalized apps, so we have an approach where we call it as a generative feedback loop, which allows you to build these personalized application at scale, and that’s where the motion is. Some of the things I would highlight is that be very careful on what the use cases you’re trying to approach, do you have the skills to do those and quantify and approach it in a way that you can deploy and get ROI on these applications quickly?
Mitch Ashley: Go, ahead. Sorry. No, go ahead.
Eduardo Mota: You mentioned a lot of great things and you start getting a sense of it when you are starting to test because you get a sense of where the gap is on the skills and where the gap is on the data. Because the data is a great motor and the oil that will get all these going. For every company out there, you have access to the same models, you have access to Bedrock, you have access to all these other models for other cloud providers, but the differentiator becomes the data. When you start testing, you’ll be able to start identifying, I can I start using this other data this way or this other way. I think those are great points.
Mitch Ashley: I was just going to bring up the data, because it’s really the center of the universe is oftentimes is deciding where do I put my generative AI application? Where in the cloud? Etcetera, where do we need that data? With so many things happening in the AI fields, particularly generative AI, we heard a lot about RAG, Retrievable, Augmented Generation. Let’s talk a little bit about what that is and why we use that with the importance of it. And then also of course we hear a lot about agents, being able to create your own agents and a lot of different capabilities that are being launched in the market. I suspect we’ll see a lot more over time. Anybody want to take on RAG? I think maybe, Jobi, you were mentioning that earlier.
Jobi George: RAG is a very logical evolution from a search-based application to building a retrieval-based approach to building application. Think of a native RAG pipeline consists of two parts. One is a retrieval component typically basically composed of an embedding model and a vector database, which Weaviate is one of the providers in that. And then there is a generative component, which is the LLM or the models which you can use. You can use foundation models, you can use your own custom models, whatever you want. At the inference time or basically what you’re trying to, when you do a query. What you try to do is do a similarity search, try to find an index of documents, which is a subset of the documents which are more closer to the kind of query you are trying to put in there, which allows you to retrieve the most similar document and provide it as a context to the LLMs to build their response, a generative response back to you.
This is fundamentally different than just doing a searching or a SQL-query type of an approach. Now, we are able to build some interesting applications on top of it. We are able to take the plain old chatbots, think of it as four or five years ago, and now they are really looking like they’re really answering your questions in a more relevant way. You’re also hitting information which is very relevant to the context because you have actually done that similarity search and that has provided that context which allows you to now provide a response back, which is very relevant to the information you’re trying to retrieve at. That’s a RAG approach and I see that the RAG is now transforming to what we call as an agentic RAG, where you are taking both the LLM and the database and now adding some memory and a workflow tools as an elements to it. So it’s a very logical evolution of the generative AI applications from RAG to agentic RAGs and to eventual more complex workflow based applications.
Eduardo Mota: To add to this is that there is a race to be able to identify the best content for the question for the task at hand. Organizations are looking for ways to optimize that search. Databases, Weaviate help allow these searches to be more efficient. The longer the text that you send to the model, the longer it will take, the higher the cost will be and the more noise you will introduce into the inference. There are a lot of repercussions about that. We are trying to identify what is the right amount of data, the minimum amount of data required to answer this task? There is a huge field that is going into that. Now, on the agent side, there is also great work being done around that. In a way, I don’t like the word agent too much, I like the word tools better than agent just because it is more like a tool that LLM can use to retrieve data or to do other actions.
There is this combination of vectors and embeddings that you have in a vector database, but also a lot of organizations have data in relational database like MySQL or Postgres, and so they want to access this data, as well. It’s another way of being able to use tools to access data, get the specific points that they want, and enhance the prompt that is being sent to the model and make it as short as possible. So that’s the sweet spot that everybody’s looking at, how can I get there? And these vector databases are making a big difference when doing that.
Mitch Ashley: Here, a couple of the points, too, RAG is a way of leveraging data as part of the front end to the model without exposing the model to your data or vice versa, exposing your data to the model and how you can augment or add to that additional content. To your point, SQL databases, it could be documents, there’s all kinds of different points of which we can pull data from. You mentioned vector databases, Jobi, one of the ways we can do high-performance retrieval of information out of generative AI models, correct?
Jobi George: That’s absolutely right. One of the point I want to come back to is around the data silos and the data we are talking about. I come from the data space, I lived through the big data wave and see this wave come in and I think we are in a very early inning of the GenAI landscape from that point of view. There also was a promise of going and taking the unstructured data and making it available relevant for you to go and build applications, insights and analytics on top of it. I think we are doing the same iterative loop in that way, but now the game has changed a lot from the point of view of earlier it was a text-based, it was structured data which you can actually go on. Now, by taking each individual data element, be it data sets, tables have the metadata and the schema and some context which you can capture in a structured data. Taking PDFs, taking images, being able to take all of those multimodal data sets and bringing it all into the same vector space and then doing a relevant data search makes it super intuitive and super useful for creating some very natural-looking applications, which are very different than a typical database type of applications.
I think that is the big jump which we are seeing. And again, it’s a journey. If you look at the progress around the new things which have come around Colbert, ColPali and some of the stuff implementations we have done around ACON. This is a journey and I think we are just in the early inning of it, that where a lot of innovations will come in, which will make it super important for a vector database to be a core component to building some of those applications. That’s how we tell our story, which is like we talk about AI native stack and when we talk about AI native stack, basically LLM is the core foundation and a vector database so that you can actually build the applications on top of it. And then the underlying layer, the architecture has to be right, it has to be containerized, it has to be scalable in the cloud like AWS being able to get that scaling is a critical part as your data sets grow and you bring in all the data into a vector space point of view.
Mitch Ashley: Very good. One of the things working with the new technology is keeping costs in mind. These are new services that we might be using from a cloud hyperscaler like AWS, but we don’t have experience. I mean we had that happen early in the cloud. What does our cost profile look like as we start to consume these resources? What are some of the things to consider in terms of costs that either anticipate learning from experts like yourself or that you’ll experience once you start to develop an application and see what its performance and consumption profile looks like? Eduardo.
Eduardo Mota: I also test a lot, iterate fast, and fail fast. I mean it seems like there are a lot of entrepreneurs out there that are doing this to be able to get business up and running. When it comes to GenAI and these workloads is the same thing. There is no one-solution-fits-all, and so you need to iterate fast and understand what are the limits of the technology. Right-sizing these workloads is very, very important to reduce costs, and that’s true across entire cloud, not only on GenAI, but every workload. It’s not simply by going to the cloud means that your bill is going to be cheaper. Rather, you need to understand what the architecture is, make important choices. For example, on the data side that we’ve been talking about, we can easily put a lot of data in S3, but if you don’t have the right format and the strategy there, you can be paying 90% more or a hundred percent more to access that data that if you have the right strategy, that you have the right format and know how to access this data. That has an implication not only on the cost but on the latency of the model, which ultimately is your paying for how long that model is running. All of this has a cascading effect if you do not understand the building blocks to optimize every layer of the workload.
That goes with tools, as well. We were talking about RAG and agents, you can create a lot of tools for the model to access many different things. Great, fantastic, but it doesn’t have to? Does it need it in order to do that? Just a quick tip for those that are listening and you’re working with agents, when you start adding four or more agents to the model, there is going to be a diminishing return on the quality of which tool is being utilized and there is a tendency to use the first tool presented to the model. If you have four more, the fifth tool is not being utilized and you are having this code there for nothing or paying for it for nothing. All these architectures we need to revise. That’s something I love to do. I love to drive it with my customers and say, “What is it that you’re trying to accomplish? How do we architect this to fit your current estate, reduce your bill, and be able to get you in a path where you’re able to build for the future?” I mean it’s a long answer because there is not really super bullet to all of this, but I mean Jobi, you experience that. I mean you’re in the database space.
Jobi George: Yeah. I can certainly add to that. I break it down into two parts. One is on the development and developers and others on the deployment side. If you look from the developer and the development side of the things, everybody wants to build ChatGPT, everybody wants to build the most complex generative AI applications. Choosing the tool, being able to look at the use cases and quantifying the use cases to the segment so that what is the ROI going to look like because nothing comes for free as Eduardo said. You can do as fancy applications as you want. If you’re doing an e-commerce application and you’re trying to put out a personalized shopping cart and you have certain SLA on QPS for you to do that serving, you cannot do all the aspects of the generated AI to bring that applications to life.
So being able to do that trade-offs, like very standard practices across any workloads, those are all the standard practices which apply. Costs do add up very fast in the sense that this technology is evolving, it’s in that stage of maturation where a lot of innovation happening, optimization is still coming in as part of the logical maturation curve. We have started focusing a lot around cost versus performance. A lot of the features we have introduced things like multi-tenancy where if you are not, you can break your workload into multiple tenants, then you can offload some of those tenants down, you can offload them all the way to the disk and save cost. Being aware of the cost consciousness about these deployment models is super critical.
Coming back to the developer side, we look at it like we built it on Kubernetes, we built it on AWS basically to provide that choice where customers can start on a multi-tenant SaaS type of an offering with $25-a-month type of an offering and build an experiment. But as they go bigger, they can actually go and get to more of a managed hosting. Basically, we are able to take the same environment, move them into a single tenant offering from our end or they could start with a bring your own cloud offering and experiment with that. Most funny thing is that as people are deploying these use cases, what you’re saying is that their cost and ability to manage these deployments is not there because again, evolving ecosystem, they’re coming back to us in some form to go and host it for them and run it because that it’s a lot more cost-efficient than them trying to figure out and going and running it. There are a lot of these pieces coming in because of the state of the market it is.
Mitch Ashley: That’s a really good point because oftentimes the architectural decisions you make, not just the services you choose to make, you can create very high-consumption types of applications, particularly in AI and not realize you’re racking up a lot of costs. That’s of course the fastest way to get your second projects canceled, is have the first one go way, way out of control. You want to think about those costs upfront. Even if you don’t know what they are yet is understanding what’s driving costs and maybe you can make some adjustments in your design, your architecture along the way to better utilize the money that you can be spending to operate these kinds of applications.
Eduardo Mota: I think that’s where the cloud really also gives another benefit. All these managed services where you don’t have to spend human hours setting up an infrastructure, but rather start getting a sense of how much things are going to cost at a smaller scale. And then you can start saying, “If I send this product, if I get this output, this is how much it’s going to cost me.” And starting to extrapolate on putting thresholds to be able to get there.
Mitch Ashley: Let’s talk a little bit about, I mentioned earlier some of my experiences when working with new technologies, it’s fun to learn it yourself, but that isn’t always the most effective way to get it done. I mean, you make a lot of the same mistakes that other people have already made, some of the learnings you can gain from working with others. Let’s talk about the value of working with organizations like yourselves, people that have been down this path multiple times, maybe worked in different scenarios, different kinds of applications, and how that can help accelerate maybe a new customer that you’re working with to get not only their project delivered on time or successfully, but accelerate their own learning.
Eduardo Mota: Absolutely. For me, I’ve been in the trenches. I was a DevOps engineer before trying to figure out AWS and trying to be the most optimized way to it. Spend hours going through documentation, trying to figure out, and at the end of the day I’m like, am I doing this right? That’s what I love about working with DoiT and myself here because we are a group of 300 engineers across the globe that have expertise in everything around AWS. We collaborate with one another to be able to get the right answer to the customer, understand the requirement and what the customer is trying to solve for trying to build for, and then being able to cut all the noise of what is not necessary and be able to hyper-focus on, this is the best architecture that you can build today that will grow with your business and will keep the cost in balance and then that will just help the customer reinvest all those savings back into the cloud and growing the business. For us, DoiT is an extension of every team, every engineer team, and every organization.
Jobi George: That’s true, and I can add to that, we are creating a distributed new database and we are at an early stage where a lot of experimentation is going on and now a lot of them are now going into production. A lot of moving parts at this point and being able to have the help across from AWS and others, being able to build the database in the right way and e-count wherever we can, the cost and optimize the performance and cost for the customer is super critical because as we were talking before, the cost starts adding up pretty fast and being able to go there and go under the hoods and find out how you can go and do this particular optimization at the Kubernetes level four S3 would help you to get some of the cost advantages is critical. We also do things like based on your QPS, what is your expected query speeds? And things like that. We can then go and find the cheapest way for you to go and run a certain machine configuration.
For example, we run a lot of our production workload for our customers on Graviton. A lot of when people talk about Generative AI, they talk about GPUs. But I think we are seeing a lot of our actions on the low cost side of things because cost is a big factor and as more and more deployment starts happening, it’ll become a bigger and bigger factor. What is the performance and cost curves are going to look like and how you deploy things in production becomes very, very important.
Mitch Ashley: Very good. I would just add to that my general advice in this particular area, things are changing so fast, so many things are being introduced seems like on a daily basis. It’s tough to say what’s the best technology you use? Because changing that rapidly. I think you’re more on the good track by picking the right partners to work with because the technology will change, our understanding will evolve, our learning will certainly increase as the technology changes, too. So you want to pick the right horses to be a part of, to ride if you will. Not just necessarily the best technology because there’re going to be a lot of really good things, not just available today but coming out real soon. Just open up your browser the next day, I think you’ll probably see a new announcement.
Well, let’s wrap up. I’d love to hear a little bit more about, so folks that are interested, wanting to find out more like how can you help me on the journey that I am, maybe I’m already down the path on an app or I’m looking to get started and wanting to accelerate my own learning. What’s a good way folks can engage you to find out more?
Eduardo Mota: I mean, for DoiT you can go to doit.com. We have a wide range of services available. And just to point out, GenAI Accelerators, my favorite program that we offer, hope you can check it out to get you started in the GenAI journey.
Jobi George: I’ll say, come to Weaviate.io where there are a lot of gateway, a lot of ways you can get a lot of information on documentation, Slack forums, via open source. You can go to GitHub, play with the code if you’re of that type. Or you want to go and basically just get started, go to our Weaviate cloud, you can get started and get 15 days for-free trial. You can run your sandboxes there and provision your clusters and get started in a few minutes. So test it out.
Mitch Ashley: Get started, jump in. AI is a contact sport. Get engaged. Well, thanks to you both, Eduardo and Jobi, it’s been a pleasure talking with you today. And please folks, check out other respective websites. Thanks to both your companies, to DoiT and Weaviate, appreciate having you both on DevOps Dialogues. And thanks to our listeners, we wish you the best on your journey and to generative AI and applications and beyond. Take care, everybody.