From the course: Building RAG Solutions with Azure AI Foundry (Formerly Azure AI Studio)

The basics of RAG: Adding custom data to your LLM

Large language models are trained on a large set of data, mainly from the internet. However, they do have limitations. First, if you ask it questions about current events, it will not be able to respond accurately. Each model will have a specific date on how recent the data it was trained on. The free version of ChatGPT, for example, was trained on January 2022 data. So it will reply that Queen Elizabeth II is still alive when we already know she passed away. Second, if you ask questions about your domain data, it may also not respond back accurately. And worse, it may even make up a fabricated answer. In the given example, the model is providing an answer, but the source links given when clicked do not match the actual product. RAG LLM context is a popular acronym for retrieval-augmented generation. It is the technique of adding data to an LLM from an external data source. This data can be your legal contracts, product manuals, customer information sheets, software designs, and even your code. A good analogy for drag is to come to an open book exam when you are a student. In an open book exam, we can refer to any books you have brought to the classroom to answer questions. Imagine your brain as the LLM, but you needed to open the books you came with to get the information needed to answer questions. To further understand drag, let us discuss the workflow. First, every time a user makes a query, the system needs to retrieve from an external data source the relevant information that will answer that query. Second, the users query and retrieve content is augmented or added together. This becomes the new prompt. Third, the new prompt is now fed into the LLM to generate a response. To simplify, the main difference between RAG and a typical LLM system is that a typical LLM system would answer user queries based on its training data set, while RAG provides answers to queries from an external source, you have provided. How the relevant data is retrieved based on the user's initial prompt is best explained by discussing other concepts, called tokens and embeddings in the next chapters.

Contents