Overview
In this guide, you can learn how to use Django MongoDB Backend to perform MongoDB Vector Search queries. This feature allows you to perform a semantic search on your documents. A semantic search is a type of search that locates information that is similar in meaning, but not necessarily identical, to your provided search term or phrase.
Sample Data
The examples in this guide use the MovieWithEmbeddings model, which represents
the sample_mflix.embedded_movies collection from the Atlas sample datasets. The MovieWithEmbeddings model class has the following definition:
from django.db import models from django_mongodb_backend.fields import ArrayField class MovieWithEmbeddings(models.Model): title = models.CharField(max_length=200) runtime = models.IntegerField(default=0) plot_embedding = ArrayField(models.FloatField(), size=1536, null=True, blank=True) class Meta: db_table = "embedded_movies" managed = False def __str__(self): return self.title
The MovieWithEmbeddings model includes an inner Meta class, which specifies
model metadata, and a __str__() method, which defines the
model's string representation. To learn about these
model features, see Define a Model in the
Create Models guide.
Run Code Examples
You can use the Python interactive shell to run the code examples. To enter the shell, run the following command from your project's root directory:
python manage.py shell
After entering the Python shell, ensure that you import the following models and modules:
from <your application name>.models import MovieWithEmbeddings
To learn how to create a Django application that uses the Movie
model and the Python interactive shell to interact with MongoDB documents,
visit the Get Started tutorial.
Perform a Vector Search
Important
Query Requirements
Before you can perform MongoDB Vector Search queries, you must create a Vector Search index on your collection. To learn how to use Django MongoDB Backend to create a Vector Search index, see Vector Search Indexes in the Create Indexes guide.
You can use Django MongoDB Backend to query your data based on its semantic meaning. A MongoDB Vector Search query returns results based on a query vector, or an array of numbers that represents the meaning of your search term or phrase. MongoDB compares this query vector to the vectors stored in your documents' vector fields.
To specify your Vector Search criteria, create an instance of the SearchVector
expression class provided by the django_mongodb_backend.expressions module.
This expression corresponds to the $vectorSearch MongoDB pipeline
stage. Pass the following arguments to the SearchVector() constructor:
path: The field to query.query_vector: An array of numbers that represents your search criteria. To learn more about query vectors, see vectors in the MongoDB Atlas documentation.limit: The maximum number of results to return.num_candidates: (Optional) The number of documents to consider for the query.exact: (Optional) A boolean value that indicates whether to perform an Exact Nearest Neighbor (ENN) search. The default value isFalse. To learn more about ENN searches, see ENN (Exact Nearest Neighbor) Search in the MongoDB Atlas documentation.filter: (Optional) A filter to apply to the query results.
Then, run your Vector Search query by passing your SearchVector instance to
the annotate method from Django's
QuerySet API. The following code shows the syntax for performing a Vector
Search query:
from django_mongodb_backend.expressions import SearchVector Model.objects.annotate( score=SearchVector( path="<field name>", query_vector=[<vector values>], limit=<number>, num_candidates=<number>, exact=<boolean>, filter=<filter expression> ) )
Basic Vector Search Example
This example runs a MongoDB Vector Search query on the sample_mflix.embedded_movies
collection. The query performs the following actions:
Queries the
plot_embeddingvector field.Limits the results to
5documents.Specifies an Approximate Nearest Neighbor (ANN) vector search that considers
150candidates. To learn more about ANN searches, see ANN (Approximate Nearest Neighbor) Search in the MongoDB Atlas documentation.
vector_values = [float(i % 10) * 0.1 for i in range(1536)] MovieWithEmbeddings.objects.annotate( score=SearchVector( path="plot_embedding", query_vector=vector_values, limit=5, num_candidates=150, exact=False, ) )
<QuerySet [<MovieWithEmbeddings: Berserk: The Golden Age Arc I - The Egg of the King>, <MovieWithEmbeddings: Rollerball>, <MovieWithEmbeddings: After Life>, <MovieWithEmbeddings: What Women Want>, <MovieWithEmbeddings: Truth About Demons>]>
Tip
The preceding code example passes an arbitrary vector to the
query_vector argument. To learn how to generate a vector that
represents the meaning of a search term or phrase, see
How to Create Vector Embeddings in the MongoDB Atlas documentation.
Vector Search Score Example
MongoDB assigns a relevance score to every document returned in a Vector Search query. The documents included in a result set are ordered from highest to lowest relevance score.
To include this score in your query results, you can use the values() method
from Django's QuerySet API. Pass the score field as an argument
to the values() method.
The following example shows how to run the same vector search query as the preceding example and print the documents' vector search relevance scores:
vector_values = [float(i % 10) * 0.1 for i in range(1536)] MovieWithEmbeddings.objects.annotate( score=SearchVector( path="plot_embedding", query_vector=vector_values, limit=5, num_candidates=150, exact=False, ) ).values("title", "score")
<QuerySet [{'title': 'Berserk: The Golden Age Arc I - The Egg of the King', 'score': 0.47894009947776794}, {'title': 'Rollerball', 'score': 0.45006513595581055}, {'title': 'After Life', 'score': 0.42825883626937866}, {'title': 'What Women Want', 'score': 0.4211753308773041}, {'title': 'Truth About Demons', 'score': 0.4194544851779938}]>
Additional Information
To learn more about MongoDB Vector Search, see the following resources from the MongoDB Atlas documentation: