Name: Fireside Chat with LaunchDarkly, AWS & Gravity9: Scaling AI and Feature Experiments
Uploaded: 2025-06-17T16:06:03.928Z
Duration: 45 min 24 s
Description: Fireside Chat with LaunchDarkly, AWS & Gravity9: Scaling AI and Feature Experiments

Transcript for "Fireside Chat with LaunchDarkly, AWS & Gravity9: Scaling AI and Feature Experiments": Hey, everyone. Wanted to say, you know, good evening for for those in in the European time zone. Anybody calling in from, from The States and my side, you know, good morning, as well. I did see at least one person, I think, in the chat say hello from Illinois. So hello again. You know, I'm I'm local with you. My name is Scott Schindelbecker. I'm an experimentation solutions specialist here at LaunchDarkly, and I'm gonna be the host today, for our fireside chat. To make a really quick, note of just a kind of a a few housekeeping things to begin with. Let me share the agenda, just the overall time of what we're looking to to cover, and then we'll we'll get going. So, gonna start off. We're gonna introduce ourselves. I've got some some amazing, colleagues here, that that's gonna share some of our our experience with you. I'm gonna kick us off by by telling a little bit of, an AI horror story, to kind of get our our brains, going a little bit on the space that we're we're gonna be talking about today, which is, you know, the the burgeoning world of new generative AI and how experimentation, kind of, ties in with that. So gonna do that. We're gonna talk to Ayan, you know, about scaling generative AI, in AWS Bedrock. We've got Josh, who's gonna talk to us about scaling experimentation. And then at the end, we're gonna save some room for q and a. So, that's, the general, plan here. So, let me just I'll give you a quick, little bit more of introduction of myself here. My name is Gushen Blecker, like I said. I'm in the Chicago, Illinois, area right now. I've spent about twenty five years in professional IT in the last six or so, leading experimentation programs both within, kind of the the private sector and then here at LaunchDarkly. I've been here for a little over a year. So that's a little bit about myself. Ayane, I'll have you introduce yourself. Sure. My name is Ayane Ray. I'm a senior generative AI partner solutions architect with AWS and more than a decade of experience on AI and machine learning. Awesome. And then Josh. Hey. Nice to meet you. Yeah. I'm Josh from Gravity9. So I'm a product owner that works largely with health care, utilities, and insurance. I've been in the product game for about five years with a background previously in project management. And, yeah, AI is at the heart of quite a lot of what we do, so I'm super excited to talk about that today. Amazing. Alright. So everybody kinda get your, get your questions ready. I mean, we'll we'll have some dedicated time at the end, for q and a, but feel free to use the questions, you know, throughout. We'll we'll try to monitor for for questions in the q and a, section as well. But, we'll, just go ahead and get started then. Alright. So I'm gonna talk to you a little bit I'm gonna tell you a little bit of a a a terrifying tale of AI gone wrong, and I think we can probably see these things, you know, often these days. In fact, I my my feed tends to have these things, popping up often because I I love to read about them. So I'm gonna gonna talk a little bit about and and I'm using a a a stateside story here, about this this article that I ran across. It's a little over a year ago, but, you know, these same kinds of problems are ones that are are facing us these days. And, the article that I pulled this from, was a press or, you know, the the article that I pulled this from was quoting from a press release. The press release, was talking about a first of its kind plan for responsible artificial intelligence in unit New York City government, and the intention was to connect business owners and aspiring entrepreneurs to content to help them start, operate, and grow businesses in New York City. Right? So they've they've rolled out this new, system that's supposed to kind of crawl through all of the New York City laws and regulations and answer questions for people on, you know, the legalities of setting up businesses and what is and isn't, feasible or possible here. And, you know, since this is a story about responsible AI, you guys can all probably guess what happened next. Right? It went wrong. It it started to give answers that weren't correct, and then oftentimes weren't legal. Things like telling, you know, owners that they could take a cut of their workers' tips or telling people that they can discriminate against low income or or other kinds of, individuals in the housing market. That is really kind of the opposite of what we want as we're we're rolling out all of this new cool generative AI, technology. So, you know, one of the things that I think we like to talk about, at LaunchDarkly, and I know this is something big in the industry as well. Right? Artificial intelligence has real world impact, and we're we're trying to leverage all of this artificial intelligence to gain the positive impacts for that that we're gonna have. All the operational efficiencies and and gains that we're gonna be able to do in terms of velocity. But the negative consequences of these things can be just as bad or even worse. Right? Reputational damage is something that can easily be be associated with these kinds of problems where your generative AI goes wrong. Right? If if I read that story, I'm probably not gonna trust anything that New York City government puts out, in in the generative AI space for a while after that. And I'm imagining the public as well. You know, broken customer trust. Right? Like, it's it's gonna be hard for us as a company to say, I'm sorry that our chatbot told you, you know, that this was the process, but the process is over here, and it's gonna cost you five times as much. That's a bad, bad customer experience that's going to lead to to turn turnover. Obviously, financial loss is something that is huge. And one of the pieces of financial loss that I like to talk about isn't just, you know, something that the LLM did on accident. It can be computational costs because a small change to a temperature or a prop setting can cause, unintended consequences, and that can be a spiraling cost problem as well. And that's gonna get you in just as much trouble as if you, you know, do something that that loses the company money in another way. And, you know, I like to to think a lot about, you know, obviously, the legal ramifications here. You know, if I'm building out a chatbot for for my insurance company and, it tells me that it's okay to go ahead and commit some light insurance fraud, that's probably a bad thing for for me and my company as well. And then at at the end of the day, operational disruptions are going to become a more prevalent part of this, this, space. Because as we start to build these more complicated processes with agents and rags, you know, feeding into each other, and as some of these processes start to actually take the place of jobs or processes that are currently, you know, working, any downtime, any problems in any of those agents or any of those, systems is immediately going to be a problem for your operational, you know, efficiency effectiveness. All of these things, I think, are really truly giving a reason for us to be really careful about our development and and execution of generative AI. But I don't think that that means we have to, you know, be, less innovative. We we have plenty of opportunities and plenty of tools, out there in the marketplace that are coming out every day, that are gonna help us kind of solve for this new world of generative AI. And experimentation is gonna play a major role in that as well because experimentation is how we're going to start to see the actual customer impact of these these new, LLMs and agents and things like that. So I'm gonna, take a moment here. I'm just gonna stop sharing screen and, open this up. I've got some questions here. We're gonna start in, with the chat here, with Ion and, go from there. So here, hang on for half a second. Let me stop sharing my screen. Alright. And let's get going. How are you doing today, Ian? Yeah. Doing good. How are you, Scott? I am doing well. I'm doing well. So, you know, one of the first things I wanna ask you about here, is something that I feel is, like, every time I open up my new browser or go into Bedrock or something like that, there feels like there are five, ten new models out there that I could possibly choose from. How do you see teams, like, navigating this pace of change while, you know, still making smart and scalable choices? Yeah. That's that's actually very interesting because just yesterday, I was helping one of our customers, product manager at a health care health care, yeah, startup, and she was trying to build AI assistant, which can help summarize, the reports of the lab reports and get insights out of it. She was very excited to see the power of generative AI, but, also, she asked me the same question. Like, I am there are so many foundation models to choose from. How do I pick one? Now, I mean, like, it's it's, you know, it's like walking into a gelato store with 12 amazing flavors, but you don't know what you're in mood for. I mean, that by the way, that happens with my wife well all the time. I am the worst at that. The bigger the menu, the longer it takes me to figure out what I wanna to eat. It does. So I asked her what's this assistant supposed to do really well. She took a pause, reflected on it, and got back to me saying that, first of all, this assistant should be very good in summarizing documents. It should be able to translate the documents in simple language so that nontechnical people can understand it. Next, it should avoid hallucinations, like the example that you shared, Scott. Right? It's so important to avoid hallucinations. And it should be able to respond in near real time. The responses should be fast. Now that's a well defined goal. You know? So that's the mantra. You choose a model based on not based on hype, but based on fit for purpose. Like, you don't buy a sports car to drive off road. Like so so and that's also one of the reason that customers choose Amazon Bedrock. You know? Because first of all, you need access to so many foundation models. And with Amazon Bedrock, you can access, to the foundation models from Anthropic, Meta, Mistral. We also have our own Amazon Nova models and many more through a single API endpoint. And that gives users a wide choice to experiment on. So that's what I'm seeing a lot of customers are using Amazon Bedrock to play around, experiment, and choose the foundation models to build their generative AI applications. Yeah. I mean, I I know for myself, when I first got my hands on LaunchDarkly's new AI configs product and and said, hey. I've never implemented anything in GenAI before. The first thing I did was connected into AWS Bedrock because I had a whole list of any model that I wanted to choose. And I I started with just some of the the, boiler plate like the Nova models, and they did great things. So, that makes a lot of sense, in terms of, you know, going to trusted places and then being very fit for purpose about, you know, the choices that you make on the models that are available. Awesome. So, you know, a lot of customers are probably, you know, playing around with, with generative AI these days. Right? But do you have any examples of of, you know, how customers are using Bedrock, you know, to experiment with new models and prompts and trying things to to figure out what actually delivers value rather than just, you know, playing around with some fun chatbots, like, which is what I've mostly done. Yeah. Yeah. So like like I said, in AWS, you get a choice of high performing foundation models from leading AI companies, but you also get a broad set of other capabilities, that helps you to build and also to scale your generative AI applications. So one of that feature that I would like to highlight is, evaluation feature, Amazon bedrock evaluations, where you can test across these models side by side. You can by changing like, you can change the models and keep the prompt data and inference configuration parameters same, or you can do the other way around. You can keep the model same and try to enhance the quality based on the goal that you have set for this AI assistant by changing the prompt, changing the data, changing the inference configuration parameters. And that all happens for a very intuitive interface, feature called Amazon bedrock evaluation. Also, one interesting thing is that, mostly customers, use these foundation models in conjunction with their data, a technique known as retrieval augmented generation or RAG. So you also have capability to evaluate your RAG applications. Like, you can you can evaluate the retrieval alone if it is if it is able to retrieve all the right context from your knowledge base. But you you can also evaluate the overall response, which is a combination of retrieval and also generation. So lot of customers these days are using Amazon bedrock evaluation, to evaluate, their generated via applications. Cool. I mean, it it it sounds like, you know, one of the the cool things that, again, LaunchDarkly's AI Config product is is also, you know, playing in that space where we're we want to help people, work fast, increase their velocity, but also, you know, protect against the risk as well. So the ability to set up different models and test against each other, run actual experiments, and and things as well is something that I think is truly gonna be, you know, necessary as we go along. So it's really cool to hear, you know, Bedrock already building that in, you know, to their product structure as well. When you say, you know, evaluations, what are you evaluating against? Do you pick a set of of metrics that you're trying to to measure between a set of models? Or can you tell me a little bit more about how that works? Yeah. Sure. So we have a set of metrics, in bedrock evaluation feature. Mhmm. Like, the metrics span across four key areas. Number one, it is the quality assessment, and we have metrics like correctness, completeness, faithfulness. The next area that we touch upon is user experience. Like, we have metrics like helpfulness, coherence, relevance. The third area is instruction compliance where we evaluate if the LLM, if the generative application is able to carry on, is able to execute the instructions or follow the instructions, and it follows a professional style or not. And the last area, but the most important one, in my opinion, is the safety monitoring. We evaluate, against harmfulness, stereotyping, and refusal handling. So does this is what we, test this. Most of these metrics can be evaluated with or without ground truth because that's also one of the challenges because mostly Yeah. Organizations don't have a ground truth. So apart from one or two metrics, like, most of these metrics also works without without ground truth. So that's that's what, the basic flow of evaluation is. That makes makes total sense. Awesome. So, I mean, I don't know. Is this, I was gonna ask if there's a a Bedrock feature that you wish more people knew knew about and used, something, you know, that's kinda flying under the radar. Is this that thing, or is there something else out there that, you know, is something that you wish more people knew about and used? Yeah. We we I wish more people knew about, evaluation, but I'll also highlight one feature within this bedrock evaluation feature, which is, like I I talked a lot about evaluation. Right? But evaluating general DBI applications aren't that easy because the output of the models are stochastic or random. Like, it it might be similar in meaning, but textually, it might be very different. Now Right. If if you if you try to evaluate the response of a general DBI application against your ground rule, then try to do an apple to apple comparison, that strategy might not work, because it might be semantically similar but not, textually similar. So that's the reason we have a feature called LLM as a judge, where you can use one of the curated high performing formulation models available in Amazon Bedrock to evaluate the output of your general VBI or RAG applications. So that's that's one of the, key features that we launched last re Invent. And I would encourage people to use use it more. I was also one of the architects behind that feature, so shameless plug. That's awesome. So that's that's something, I I'd I'd definitely like more people to use use that feature. I was I was just talking with one of with our one of our AI config specialists, last week, and he had implemented basically what you're just talking about, an evaluative, model that would check the accuracy that he had built in a rag. And, you know, it was taking a small percentage of of responses and running it through in an experiment to determine, you know, is the model hallucinating or not? Is it still providing accurate, answers? And I thought that was like a a just a mind blowing thing, you know, but seems so simple. So that's really, really cool. Awesome. Thank you. Alright. So I think, you know, I've I've taken up the the time that I've got with you. Of course, you know, as I talk to Josh, feel free to jump in and and add, you know, flavor as you want, especially, you know, at the end when we're gonna answer questions. But, Josh, let me let me kinda pivot over to you and and Gravity9. You guys are kind of at the forefront of helping companies who are attempting to implement these things and get them in front of customers. So, you know, you you're working with teams that are under pressure to ship fast all the time. And there's this I don't know if it's a misconception or or at least, you know, there's this piece that that where people think you have to slow down in order to start experimenting on something. So how do you kind of work with teams who are are working with that pressure to deliver fast and convince them that the rigor of an experiment is worth, you know, any additional time. And I'm gonna I'll tell why I'm throwing air quotes at that in a minute. Yeah. Absolutely. And it comes up all the time, a 100% of the time. And I think what I always say is it's definitely a case of slowing down to speed up, and that's probably the hardest thing about experimentation and validating experiments. Because at the start of it, it can seem like a lot of naval gazing around a Miro board to identify those kind of assumptions early on, and nobody wants an extra meeting, especially if it results in, like, a Miro board that no one's gonna come back to because it's just an artifact then. Right? But it's also really difficult because assumptions themselves are kind of like a bad word. And I feel like in our field, particularly, there's a lot of imposter syndrome in tech and particularly in product. And admitting that you've assumed something is almost to admit that you don't know everything, which is kinda bad. But, actually, then you sort of feel like you've also exposed yourself as a massive fraud, which is awful. But everybody makes assumptions even if you've done a 100 user interviews and have reams and reams of data. Eventually, you're going to have to make a bet that something will work. And that's why it's super important to slow down, have these conversations, and then draw a straight line between your assumptions, your hypothesis, your experiments. Because if you can spend time to get your team's assumptions out about the product and the users and the technical capabilities and get them out in the open and then identify, like, the riskiest assumptions that are the most in need of testing, Going back to slowing down, it saves you an unbelievable amount of time later on in retroactive pivoting, development time chasing, you know, the wrong features for the wrong users at the wrong time. And, ultimately, you speed up as a compounding effect as well because you make your team more openly admit when they're making that leap of faith and give them the confidence to voice their rather than trying to bluff their way through incomplete data at fear of being exposed. I'll use the air quotes this time. If anything, then, you know, you make them more confident to say, I don't know anything about this or I don't know enough, but let's go out there and try and experiment or let's put that on the board. I'd be more concerned about a team that didn't have any assumptions about the success of their product. Does that make sense? It makes it makes a lot of sense to me. I'll say the reason I throw up air quotes around slowing down is because I I do I'm not gonna just say that there isn't a slowdown. There is. But oftentimes, I find that that's at the beginning. As your program gets up and running, as people as people kind of start to digest the change in thinking and the change in process, that first piece can be a little bit slower. You know, you might have a little bit more time. But then once you've got that process, it actually can speed things up because you're testing and and you're rolling out the things that actually are causing impact and you're prioritizing based off of things that are driving business value, and you're deprioritizing things that aren't. So, does that sound like what the a little bit of what you've seen as well? Yeah. I agree. This effect compounds in on itself. It feels clunky at first. It takes a long time to a longer to set up initially, but then when you embed this within Teams, then they speed up and they they increase their time to value. There's exercises you can do to speed that process up like David Bland has a great exercise that I love where it's like you get a four quadrant grid and on the top right hand side of that grid are your riskiest assumptions that you know the least about. You sit with your team for at least, like, ten minutes or half an hour a week. You get those on the board. You visualize them. And then the ones in the top right corner are the ones that you'd say, okay. Cool. How can we experiment around that? Right. Right. That's really that's really good. Right? Focus on the high priority, high unknown, high risk areas first and and shore up those, you know, those assumptions with learnings. Yes. So one of the things there, I think, that seems important, right, and that I'm sure Gravity9 runs into all the time is that you work at the kind of intersection where engineering and data teams and AI and product and everything all come together. So what does it kinda look like when you when you come into a company and help that company company, you know, scale their experimentation programs? Yeah. That's a great question. And I think a key thing here is, like, democratizing experimentation and democratizing assumptions mapping, because keeping that conversation open is super critical. And in a lot of cases, what I've seen is that the optimization team can be totally separate from the product team. And then often, products don't have any visibility about the experiments they're running or any AB testing until the code is somewhere in production and controlled by somebody else. But experimentation needs to be front and center of product development. And the more product teams feel empowered to run experiments without having to go through layers and layers of governance and steering routes, the the quicker they can make decisions and deliver that value. So what I would encourage teams to do, whether you're planning for the next sprint, the next two sprints, or working in safe or scaled agile ways plan for a quarter, is to drop in some experiment placeholders into your sprint plan and then hold yourself and your team accountable because you'll know they're coming up, you know that you have to experiment, you do the assumptions mapping exercise, and you make it part of your delivery cadence. And I think one of the key things here that we you know, with resistance to experiments is people think that they need to be these huge sprawling feature releases that impact, say, 50% of all users. They don't. They can be stubs. They can be low fidelity prototypes, fake door tests, anything to be able to give you the most amount of learning with the smallest amount of audience. Because often, you know, AB test kind of implies fifty fifty, whereas in fact, it can be like 99 to 1%. But so long as that 1% is going to deliver the most amount of learning for you, you've made an impact and you're able to make a decision that you weren't able to before. And, again, as we mentioned with AI, then AI is so important at this stage, and I really feel that it's gonna be important in the experiment culture because all of us in tech have been talking about it for years, but there are still millions of users who are getting their heads around the concept. So we've talked about rag and agentic workflows and everything, but experiments are so critical to understand, does that resonate with them? Does that actually deliver value? Do they care? And we really need to be validating our assumptions and continuously experimenting here, and that's what we'll find with a lot of teams now is, you know, do we want to experiment in a way that if you deploy AI in this space, does it deliver any value? Do users care that it's AI? Do they want to see that it's AI in the first place? Because you might find that users are actually more inclined to use a model if they don't know what's going on behind the scenes, if that makes sense. Right. Right. I mean, the idea that the data the information coming back is the same whether or not it's produced by a model or a human being should be all that the customer cares about at the end of the day, but we're still in a place where population at large is building a lot of trust in AI, rightfully so, because as we talked about, hallucinations and things can happen. So that does make some sense to me that maybe, you know, there are circumstances where you kind of don't really tell if this is an agent or a person. I imagine in cases that's a good experiment to run as well, you know, is, telling somebody versus not telling somebody that this is model generated. And does that actually lead to, you know, decreases or increases in purchase order. Absolutely. Awesome. I mean, the example I drop in there would be Apple with Apple Intelligence. There's tons and tons of great machine learning that Apple do. They don't that they don't publicize as AI, but, actually, the features they've tagged up as being powered by Apple Intelligence seem to have resonated less for their users. It's the stuff under the hood with the photos app and with notes and things like that. All of the optimization features that seem to be more powerful and make the iPhone a valuable proposition. That's actually I hadn't even considered that, and that is actually really true. All the geotagging and stuff that happens to my photos under the covers is something I use all the time, and I don't think I've even tried Apple Intelligence on my phone yet. So Right. Point. Absolutely. I wanna ask I think maybe one more question, and then I'm gonna kinda go on from there, to make sure we got time for our audience. And let's see here. I think I'm gonna go with this one. You and I, I think often probably get brought into meetings where a company will say, I wanna build a culture of experimentation. Right? I I know I get into that meeting all the time. And, I find almost everybody has a different definition. They all include most of the same things, but, you know, they have a bit of a different definition of what a true culture of experimentation is. In your opinion, what does that actually mean or look like in a modern company or product team? What is a culture of experimentation? So allowing time for those experiments is absolutely key. Building them into your delivery cadence, making it almost a corporate culture that we will test often and experiment frequently. And it's part of the continuous discovery framework from Teresa Torres. She's written a great book about it that I really like, and it's kind of like keep yourself honest, talk to your customers at least once a week, run experiments as often as you can, and match your assumptions as frequently as possible. So, again, empowering your product team, at least to start with, or or or the wider team to stand up and admit when they don't know something is probably the most powerful step you can take first. So instead of people saying, I don't know. I've made an assumption and seeing that as a, you know, an accusation or something like that, then answer that and answer someone saying, I've made an assumption on this with cool. Okay. How do we prove that? You know, if you can't get the data at this stage, how do we take action to make data to validate this decision? So a lot of product teams wait for that data before making a decision, but you won't be able to do it without action. So just pull the pull the plaster off, get an experiment out there, make it lightweight, make it a stub. It's easier these days with Vibe Coding. You can get something in front of a very small amount of users that can at least, you know, at least validate your hypothesis and get it done. And like I said earlier, then, you know, we think that AB testing should involve half of your user population whereas it shouldn't. So instructing them, look. You know, instilling that. What is the smallest amount of users I can get to get the maximum amount of learning? So powerful to derisk experimentation and actually see it as a natural step in product development and almost as as important as discovery, you know, retrospectives, working with your teams, just building it into that delivery cadence. And like I said, this is gonna become way more important when AI is here because, truthfully, as I mentioned, none of us know, you know, none of us know everything about AI. It's more important now more than ever to validate our assumptions because all of us are making assumptions about where AI is gonna be effective. And I guarantee that all of us will be slightly wrong in a couple of years' time and will have made some bets that we we weren't able to follow-up, will have made some bets that have landed. It's important to experiment those and get them in out in front of the general public to prove, does this work? Yeah. No. That's really good. I think I my my answer tends to be, you know, along those lines, it's whatever you need to do to create that safe space where people, as you said, can admit I'm making an assumption here or or I want to validate this assumption with data and giving them, you know, the framework, which is the experiment, to actually do data driven decisioning, right, to gather that information and make those those decisions. If you don't have the safe space, then it's all gonna just go back to the HIPAA highest, paid person's opinion being the driver of of everything. So Awesome. Excellent. Excellent. Okay. I wanna, take a few minutes here. I know that there might have been there are some questions I think I've seen in the, in the chat over here. So let me see if I can, jump in on maybe some of these, questions. I have a question in here. How can AWS or LaunchDarkly support the activation or deactivation of tools or resources for AI agents without changing or redeployment of code? Ayan, that's probably, for you talking a little bit about, some of the Bedrock capabilities, and I can I can talk a little bit about what, what LaunchDarkly is doing in this space? Yeah. Sure. So AWS, right, we, mostly support offline evaluations, and I think LaunchDarkly is more around, online, like, expect tracking the experiments. So I think we suggest, to experiment and evaluate your generated via applications across the metrics that we have before moving the code to production. So you test it out. Right? Once you are sure, then only you, move the code to production. And, also, when the code is in production, I'd encourage you to use LaunchDarkly, to continuously experiment and see how how your generative application is is responding to it. I like that aspect of it, the fact that, you know, there's there's actually continuous evaluation that happens throughout the development and deployment and release cycle. It's not a one time thing that you either do before you release or after you release. It's something that actually has to happen throughout. Yeah. And that's really super important. You know, from a from a LaunchDarkly perspective, I can just kind of very briefly mention you know, I I've mentioned our AI from the same product that's out there. Basically, if you know LaunchDarkly and you know the the idea of feature flagging and what feature flags can do in terms of allowing you to make kind of decouple the deployment of code from the release of code. Our AI config product has built in a similar kind of flag template for working with AI, generative AI models. So you can, create a flag with multiple variations that that change different models, different, settings, different prompt technique or different prompts, and set up and run AB tests or experiments on those across, and and measure, you know, for both technical and, business value, add. So, that answer hopefully, that answers. If you guys haven't seen the AI config product, I I, you know, highly recommend going out and take a look at it, folks. Let me see. Other questions. I have another one here, from Brad Johnson. Enterprise production AI adoption of AI can have steep barriers. What concrete advice do you have to get AI into use in critical business applications? Actually, this is really good one. Now I think both of you guys can answer. Yeah. I I can go first, Josh, if it if that works. Gotcha. So, yeah, couple of things. Right? As as as we mentioned, evaluate, evaluate, evaluate. And across your life cycle across your development life cycle and also after deploying it in production. The other thing that, is is very important is to have guardrails, for your AI application, because you just don't want to open the Pandora's box, and you want to just have the right controls, on the general API application. So we also have Amazon Metro guardrails, which you can use, to just hey. I don't want my AI application to, spit out some sensitive PII information. So or I don't want the the general API application to give some response, which can harm or impact any particular section of the community. So for that, we we have, Amazon Metro guardrails, and I definitely encourage people to make the best use of it, because that's that's extremely important before moving moving into production. The other thing is is, you might also want to provision sufficient throughput because you don't want to, experience throttles for your users, when in production. So that's another feature that we have called provision throughput in Amazon WebRTC, where you can buy a certain provision throughput, right, to reserve a certain capacity, so that your general DPA applications and the foundation models perform more reliably. Yeah. Awesome. Hit those two. Yeah. Amazing. Yeah. I think I think that I think, largely, we're gonna take the same approach. It's you you cannot take a scattershot approach with an enterprise application and AI. You need to essentially do some retrospection and look at the largest points of friction within that application, but then also kind of zoom in and say, is there enough data in this space, and is there enough that the AI can grab on to to generate some value and some impact for users? Because if you essentially throw AI into a, you know, a gap where it hasn't got data that you could trust the outcomes, then effectively you may end up making your problem worse. So you really need to talk to your users. You really need to get get in front of the application and see what they're trying to achieve and say, is a generated outcome here going to generate anything meaningful for my users? And, also, can I trust the output of it as well? And as I am rightly said, is that output is that output then gonna be safe? Is it gonna, you know, give away PII? Is it gonna cause any issues there? So take a targeted approach, talk to your users, and use it almost as a scalpel in the right place at the right time. Okay. That's really good. I was gonna key in at first on the idea of trust because I feel like that's probably where at least most kind of large companies leadership exists right now. And then when I talk to people about AI, I I talk to people that are saying, we're trying all of this stuff all over the place in the lower environments, but we haven't taken anything to production yet. And the the constant refrain is around trust in the data. So I think both of you guys have have shared some really good tips around how do you build that trust as you're developing and as you're executing and rolling out your generative AI capabilities so that it's not, turn on, throw every single customer at it and see what happens, which is what businesses probably will never do. More of a a kind of small, data driven approach to kind of iterative development to develop the trust and to kind of also make sure that you're solving problems that customers want solved. That sound good? Absolutely. Awesome. Yep. Amazing. I'm gonna try to get, maybe one more question in here. I see, this one probably falls on you, Josh. You and me maybe a little bit. What are the few key best practices for conducting chaos experiments alongside CAICD of AI services? So Oh, wow. Chaos experimenting in AI? Like, is is chaos and AI synonymous? Yeah. It depends on what we mean by, chaos experimentation, doesn't it? Because, I guess, you know, AI is essentially pattern recognition and bringing order to chaos. If we're looking at chaos experimentation in the way of simulating disaster recovery, then, of course, you can look at, you know, can we knock down the application? Is that going to affect our our AI responses? Can we simulate an outage of some of our data sources that AI pulls in in from? So let's say we have an agentic workflow that uses some fabric data or some data from another API. If I take that out, does it affect the output of my my my LLM in a in a meaningful way that will damage the outcome of my users? And if it does, then, you know, what is the, you know, what is the business continuity for that? What do we do in those scenarios? What does the does the user even still get to interact with the AI? Can the business still function without it? Those are the sort of things you need to look into. So if we're looking at, as I said, chaos experimentation in a business resilience sort of capacity, then it is super important to have the two coexist to make sure that you've got failover. And, actually, I mean, I know we're almost out of time, but, you know, I mean, the chat GPT outage from the other day, like, that's super important to now looking at, can we switch models at the drop of a hat because we may lose, you know, connection to the API that we need to generate the outcomes for the outputs for the users. And if that falls over and it's contingent on your entire workflow, you need a failover. That's right. That's right. That's kinda getting into that operational piece that I was talking about earlier. As these systems get more complex and more LLMs depend on the output of previous ones and data that you're getting. When you have these outages and downtimes, it brings everything down. Right, so, makes sense. Awesome. Alright. I think we're close to time. So I I wanna just kind of take the last, minute or two here, that we have to just, in general, thank everybody, for for coming. Thanks everybody for listening. If we didn't get to your your questions, we can follow-up, after the fact. I think we've we've got note of all the questions that are here. I want to thank, Ian and Josh so much for for your time and your wisdom here as we get into this. I know, you know, just from my perspective, as an experimenter, I'm thinking about how the impacts of the custom you know, of these AI agents and systems on the customer experience, and a lot of it seems scary, to me right now. It seems like a big black box full of scary things that I don't know what's in there. So I think you guys have have really helped, you know, give me a little bit more confidence that, we're not necessarily, on the doorstep of the Terminator revolution about to lose our lives to AI or our jobs maybe, but we'll see. Anyway, thank you so much for everybody, who jumped on, and, have a great evening, or, rest of your day. Have a great one. Have a great one, guys. Thank you so much. Thank you.