Join now Sign in

From the course: Agentic AI Design Patterns for GenAI and Predictive AI

Shadow model deployment

From the course: Agentic AI Design Patterns for GenAI and Predictive AI

Start my 1-month free trial

Shadow model deployment

“

- So the proactive retraining pattern we just covered in the previous video provides a very effective way of detecting model drift issues early on. It allows us to automate this detection process so that we don't have to rely only on humans to notice drift issues. It also automates some or part of the challenger model retraining process, which speeds up traditional, more manual retraining process steps. But for some organizations or requirements, this is just not good enough. When the results being produced by a champion model with drift issues are being used in production environments for important business activity or important decision making, then it can be dangerous to continue using that model until a replacement challenger model is available. On the other hand, it can also be damaging to shut down the AI system and then wait until the new challenger model has been retrained and is ready for deployment. That's one issue, but the greater concern is that there may simply not have been enough time to verify the retrained challenger model to ensure that it does not contain flaws and to ensure that it is doing what it's supposed to do. In other words, to ensure that it is actually ready for production usage. The shadow model deployment pattern addresses these concerns by building upon the architecture established by the proactive retraining pattern. What we basically do is build a second AI system for the challenger model, which we deploy in parallel with our main one for the champion model. So in this figure, model version one is the champion, and version two is the challenger model, which we'll refer to as the shadow model in this scenario. We feed the shadow model the exact same production data that we provide the champion, which means predictive AI system B with the shadow model is treated like a real world production system on par with predictive AI system A. So how does this all relate to the proactive retraining architecture? Let's have a look. For the AI system with the shadow model, we introduce a new shadow deployment agent that oversees its performance with the production data. An AI engineer, of course, is also involved to check the model's performance and behavior. The output produced by system B is not actually used for production purposes, but is instead stored and analyzed so that it can be compared with what system A is doing, and so that it can be verified to ensure that the shadow model is ready to become the new champion model when required. For our AI system with the current champion model, we use the same type of drift monitor agent that we did in the proactive retraining pattern. This agent is important because of its ability to detect when drift begins happening. As soon as the drift monitor agent detects drift, it notifies the shadow deployment agent, which then signals a separate pipeline. This pipeline automates the process of promoting the already validated shadow or challenger model to the actual production AI system, where it then replaces the champion model that had the drift issues. So this is clearly a very sophisticated architecture, but one that when designed correctly can help solve drift issues quickly and effectively, which can be a huge benefit for many businesses relying on AI systems. To get some insight into how all of this is accomplished, let's take a look at the logic commonly found in the shadow deployment agent. A shadow deployment agent will primarily need live testing and validation logic. Despite its name, this agent does not actually contain the logic to put the shadow model into the actual production environment. This is usually handled by a separate continuous integration and development pipeline. The shadow deployment agent is focused on the following types of logic. Request duplication logic, whereby the agent intercepts incoming live requests and then duplicates them. Parallel routing logic, where the agent sends the duplicated requests to both the champion and the challenger or shadow models. Prediction logging, where it captures and logs the predictions from both models along with the original request data. And then there's performance comparison logic, where the agent analyzes the log data to compare the challenger model's performance against the champion model's performance. As explained, if we set up this type of architecture in our solution, it can be extremely effective to address problems with model drift. The greatest impact, however, will be the cost and responsibility of implementing and running the additional computational resources and infrastructure we need to run two models in parallel with production data. We need to ensure that this benefit of risk mitigation really does outweigh the added expense and maintenance of the duplicate system. Otherwise, this type of environment could become unnecessarily burdensome for the IT enterprise.

Contents

- Next steps
  
  44s