From the course: Hands-On AI: Build a RAG Model from Scratch with Open Source
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Uploading vectors, text, and filenames to the database
From the course: Hands-On AI: Build a RAG Model from Scratch with Open Source
Uploading vectors, text, and filenames to the database
- [Instructor] We're now ready to start creating embeddings of our text and store them in the vector database we created. As part of the process, we'll have to break our text out into concept or idea sized chunks because each vector can only hold one concept or idea. We'll be using sentences as our default chunk size. So the first thing we're gonna do is install a package called NLTK or Natural Language Toolkit, which is helpful for a lot of natural language processing tasks. And we're gonna go ahead and do that directly in the terminal by running Python directly in here because we need to download a few things. But before we do that, let's run a pip install nltk. And once we have that, now we can go ahead and enter python. And now that we're in Python, we need to run two commands to download some tokenizers that we will be using to break our articles up into sentences. So the first is punkt, P-U-N-K-T. And…