From the course: Oracle Cloud Infrastructure Generative AI Professional

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Process documents

Process documents

(light instrumental music) - [Instructor] In the previous lesson, we discussed that RAG by playing consists of ingestion, retrieval, and generation. Now let us discuss each of these in detail. We'll begin with ingestion. The first step in ingestion is to load documents. The documents can come from a variety of sources and have multiple formats. The documents can be PDFs, comma-separated values, HTML, JSON, and many other types. Most of the LLM frameworks, including LangChain, offer classes to load different types of documents. The loader classes also support loading just a single document or all the documents in a given directory. Once the documents are loaded, the next step is to split the documents into smaller pieces, also referred to as chunks. There are a few things to consider while splitting the documents. Let us understand each of these. First consideration is the size of the chunk. That is how big or small the chunk should be. Most of the LLMs have a maximum input size…

Contents