From the course: Agentic AI Design Patterns for GenAI and Predictive AI
Escalating evaluation
From the course: Agentic AI Design Patterns for GenAI and Predictive AI
Escalating evaluation
- Sometimes, a request or prompt sent to an AI system is relatively simple and straightforward. But if the AI system's model is complex and capable of performing a wide range of processing tasks that go well beyond the nature of this one particular prompt, then the use of that AI system can be wasteful and can also be unnecessarily lengthy. Here we have a human user issuing a simple prompt to a predictive AI system with a relatively complex model. The processing power and perhaps also the time it takes to process the simple request are unnecessarily high. The user gets the result they wanted, but at a greater cost than necessary. This may not seem like a big deal when we look at this one simple scenario. So what if the system uses a bit more energy and time to process that request? But imagine now the system gets hundreds or thousands of simpler requests within a day or even a few hours. All of this extra wasteful processing can really add up, and a consideration, especially important if we are paying for resource usage in a cloud environment, is the cost that will also add up. This pattern introduces an architecture whereby we have several AI systems available, each capable of performing similar tasks, but some designed with simpler models and others with more complex ones. A special router agent becomes the point of contact for these AI systems. It analyzes a prompt it receives from a user or an application, and then determines which of the available AI systems has a model most suitable to process that request. This results in the most optimal utilization of the available infrastructure and resources. To accomplish this, the router agent can have different types of logic, including rule-based logic, where the router uses a set of predefined rules to make decisions, heuristic logic for when an exact, perfect rule isn't feasible. For example, a heuristic might say if the input contains medical terms, it's likely a complex case, so skip the simple model and go straight to the specialized one. And then there's context-aware logic. This logic incorporates the context from the current interaction and past interactions. For example, a router might note that a user has already gone through the simple model twice without a solution, and it therefore then automatically routes the next request to a more complex model. And then there's predictive logic. This one's kind of interesting because this is where the router itself can actually encompass a small, lightweight model that is trained to predict the most appropriate model to use based on the input it receives. Note that there are two ways this pattern can be applied. The first, which is the one we've been focusing on so far, is the smart router approach whereby the router is smart enough to determine which model to send the request to. Alternatively, we can have a sequential chain approach whereby the router has the equivalent of sequential evaluation logic that submits a given request to a simple model first, and then based on the results, it perhaps then needs to submit it to one or more complex models in the chain until it receives the results it needs. Let's have a quick look at a simple example of this. The router agent might first analyze the prompt it receives to determine its overall complexity in relation to the available AI system models. If it's a simple request, it gives it to the AI system with the simple model, but if the result it gets back isn't good enough, it will then involve the model with the next higher level of complexity and so on. So what could go wrong when applying this pattern? One possible risk with the sequential chain approach we just described is that if the first simple model misinterprets the prompt, it might return results that lead the more complex models to analyze or process the wrong things, ultimately resulting in completely flawed output. With the smart router approach, there is, of course, always the risk that we don't design the logic correctly, and this then, of course, would lead to the agent routing requests to the wrong models.