You can use the vectorSearch type to index fields for running
$vectorSearch queries. You can define the index for the
vector embeddings that you want to query and any additional fields
that you want to use to pre-filter your data.
Filtering your data is useful to narrow the scope of your semantic search
and ensure that certain vector embeddings are not considered for comparison,
such as in a multi-tenant environment.
You can use the Atlas UI, Atlas Administration API,
Atlas CLI, mongosh, or a supported MongoDB Driver
to create your MongoDB Vector Search index.
Note
You can't use the deprecated knnBeta operator to query
fields indexed using the vectorSearch type index definition.
Considerations
In a vectorSearch type index definition, you can index arrays with
only a single element. You can't index embedding fields inside arrays of documents
or embedding fields inside arrays of objects. You can index embedding fields inside
documents using the dot notation. The same embedding field can't be indexed
multiple times in the same index defintion.
Before indexing your embeddings, we recommend converting your embeddings
to BSON BinData vectors with
subtype float32, int1, or int8 for efficient storage
in your cluster. To learn more, see how to convert
your embeddings to BSON vectors.
When you use MongoDB Vector Search indexes, you might experience elevated resource consumption on an idle node for your Atlas cluster. This is due to the underlying mongot process, which performs various essential operations for MongoDB Vector Search. The CPU utilization on an idle node can vary depending on the number, complexity, and size of the indexes.
To learn more about sizing considerations for your indexes, see Memory Requirements for Indexing Vectors.
If you make changes to the collection for which you defined MongoDB Vector Search
index, the latest data might not be available immediately for queries.
However, mongot monitors the change streams and updates stored
copies of data, making MongoDB Vector Search indexes eventually consistent. You can
view the number of indexed Documents in the Atlas UI
to verify that changes to the collection are reflected in the index.
Alternatively, you can create a new index after adding new documents to your collection and wait for the index to become queryable. You can also implement a polling logic similar to the following to ensure that the index is ready for querying before attempting to use it.
Example
console.log("Polling to check if the index is ready. This may take up to a minute.") let isQueryable = false; while (!isQueryable) { const cursor = collection.listSearchIndexes(); for await (const index of cursor) { if (index.name === result) { if (index.queryable) { console.log(`${result} is ready for querying.`); isQueryable = true; } else { await new Promise(resolve => setTimeout(resolve, 5000)); } } } }
Supported Clients
You can create and manage MongoDB Vector Search indexes through the Atlas UI,
mongosh, Atlas CLI, Atlas Administration API, and the following
MongoDB Drivers:
Syntax
The following syntax defines the vectorSearch index type:
1 { 2 "fields":[ 3 { 4 "type": "vector", 5 "path": "<field-to-index>", 6 "numDimensions": <number-of-dimensions>, 7 "similarity": "euclidean | cosine | dotProduct", 8 "quantization": "none | scalar | binary", 9 "hnswOptions": { 10 "maxEdges": <number-of-connected-neighbors>, 11 "numEdgeCandidates": <number-of-nearest-neighbors> 12 } 13 }, 14 { 15 "type": "filter", 16 "path": "<field-to-index>" 17 }, 18 ... 19 ] 20 }
MongoDB Vector Search Index Fields
The MongoDB Vector Search index definition takes the following fields:
Option | Type | Necessity | Purpose |
|---|---|---|---|
| Array of field definition documents | Required | Definitions for the vector and filter fields to index, one definition per document.
Each field definition document specifies the The |
fields.type | String | Required | Field type to use to index fields for
To learn more, see About the |
fields.path | String | Required | Name of the field to index. For nested fields, use dot notation to specify path to embedded fields. |
fields.numDimensions | Int | Required | Number of vector dimensions that MongoDB Vector Search enforces at index-time and
query-time. You can set this field only for For indexing quantized vectors or BinData, you can specify one of the following values:
The embedding model you choose determines the number of dimensions in your vector embeddings, with some models having multiple options for how many dimensions are output. To learn more, see Choosing a Method to Create Embeddings. |
fields.similarity | String | Required | Vector similarity function to use to search for top K-nearest
neighbors. You can set this field only for You can specify one of the following values:
To learn more, see About the Similarity Functions. |
fields.quantization | String | Optional | Type of automatic vector quantization for your vectors. Use
this setting only if your embeddings are You can specify one of the following values:
To learn more, see Vector Quantization. |
fields.hnswOptions | Object | Optional | Parameters to use for Hierarchical Navigable Small Worlds graph construction. If omitted, uses
the default values for the IMPORTANT: This is available as a Preview feature. Modifying the default values might negatively impact your MongoDB Vector Search index and queries. |
fields.hnswOptions.maxEdges | Int | Optional | Maximum number of edges (or connections) that a node can have in
the Hierarchical Navigable Small Worlds graph. Value can be between A higher number improves recall (accuracy of search results) because the graph is better connected. However, this slows down query speed because of the number of neighbors to evaluate per graph node, increases the memory for the Hierarchical Navigable Small Worlds graph because each node stores more connections, and slows down indexing because MongoDB Vector Search evaluates more neighbors and adjusts for every new node added to the graph. |
fields.hnswOptions.numEdgeCandidates | Int | Optional | Analogous to A higher number provides a graph with high-quality connections, which can improve search quality (recall), but it can also negatively affect query latency. |
About the vector Type
Your index definition's vector field must contain an array of numbers of
one of the following types:
BSON
doubleBSON BinData
vectorsubtypefloat32BSON BinData
vectorsubtypeint1BSON BinData
vectorsubtypeint8
Note
To learn more about generating BSON BinData vectors with subtype float32
int1 or int8 for your data, see
How to Ingest Pre-Quantized Vectors.
You must index the vector field as the vector type inside the
fields array.
The following syntax defines the vector field type:
1 { 2 "fields":[ 3 { 4 "type": "vector", 5 "path": <field-to-index>, 6 "numDimensions": <number-of-dimensions>, 7 "similarity": "euclidean | cosine | dotProduct", 8 "quantization": "none | scalar | binary", 9 "hnswOptions": { 10 "maxEdges": <number-of-connected-neighbors>, 11 "numEdgeCandidates": <number-of-nearest-neighbors> 12 } 13 }, 14 ... 15 ] 16 }
About the Similarity Functions
MongoDB Vector Search supports the following similarity functions:
euclidean- measures the distance between ends of vectors. This value allows you to measure similarity based on varying dimensions. To learn more, see Euclidean.cosine- measures similarity based on the angle between vectors. This value allows you to measure similarity that isn't scaled by magnitude. You can't use zero magnitude vectors withcosine. To measure cosine similarity, we recommend that you normalize your vectors and usedotProductinstead.dotProduct- measures similarity likecosine, but takes into account the magnitude of the vector. If you normalize the magnitude,cosineanddotProductare almost identical in measuring similarity.To use
dotProduct, you must normalize the vector to unit length at index-time and query-time.
The following table shows the similarity functions for the various types:
Vector Embeddings Type | euclidean | cosine | dotProduct |
|---|---|---|---|
| √ | ||
| √ | √ | √ |
| √ | √ | √ |
| √ | √ | √ |
For vector ingestion.
For automatic scalar or binary quantization.
For best performance, check your embedding model to determine which
similarity function aligns with your embedding model's training
process. If you don't have any guidance, start with dotProduct.
Setting fields.similarity to the dotProduct value allows you
to efficiently measure similarity based on both angle and magnitude.
dotProduct consumes less computational resources than cosine
and is efficient when vectors are of unit length. However, if your
vectors aren't normalized, evaluate the similarity scores in the
results of a sample query for euclidean distance and cosine
similarity to determine which corresponds to reasonable results.
About the filter Type
You can optionally index additional fields to pre-filter your data. You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison. It reduces the number of documents against which to run similarity comparisons, which can decrease query latency and increase the accuracy of search results.
You must index the fields that you want to filter by using the
filter type inside the fields array.
The following syntax defines the filter field type:
1 { 2 "fields":[ 3 { 4 "type": "vector", 5 ... 6 }, 7 { 8 "type": "filter", 9 "path": "<field-to-index>" 10 }, 11 ... 12 ] 13 }
Note
Pre-filtering your data doesn't affect the score that MongoDB Vector Search returns
using $vectorSearchScore for $vectorSearch queries.
Create a MongoDB Vector Search Index
You can create a MongoDB Vector Search index for all collections that contain vector
embeddings less than or equal to 8192 dimensions in length for any kind
of data along with other data on your cluster through the
Atlas UI, Atlas Administration API, Atlas CLI, mongosh,
or a supported MongoDB Driver.
Prerequisites
To create a MongoDB Vector Search index, you must have a cluster with the following prerequisites:
MongoDB version
6.0.11,7.0.2, or higherA collection for which to create the MongoDB Vector Search index
Note
You can use the mongosh command or driver helper methods to create
MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver
versions, see Supported Clients.
Required Access
You need the Project Data Access Admin or higher role to create
and manage MongoDB Vector Search indexes.
Index Limitations
You cannot create more than:
3 indexes (regardless of the type,
searchorvector) onM0clusters.10 indexes on Flex clusters.
We recommend that you create no more than 2,500 search indexes on a
single M10+ cluster.
Procedure
Note
The procedure includes index definition examples for the
embedded_movies collection in the sample_mflix database. If
you load the sample data on your
cluster and create the example MongoDB Vector Search indexes for this collection,
you can run the sample $vectorSearch queries against this
collection. To learn more about the sample queries that you can run,
see $vectorSearch Examples.
View a MongoDB Vector Search Index
You can view MongoDB Vector Search indexes for all collections from the
Atlas UI, Atlas Administration API, Atlas CLI, mongosh,
or a supported MongoDB Driver.
Required Access
You need the Project Search Index Editor or higher role to view
MongoDB Vector Search indexes.
Note
You can use the mongosh command or driver helper methods to retrieve
MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver
versions, see Supported Clients.
Procedure
Edit a MongoDB Vector Search Index
You can change the index definition
of an existing MongoDB Vector Search index from the Atlas UI, Atlas Administration API,
Atlas CLI, mongosh, or a supported MongoDB Driver.
You can't rename an index or change the index type. If you need to
change an index name or type, you must create a new index and delete the old one.
Important
After you edit an index, MongoDB Vector Search rebuilds it. While the index rebuilds, you can continue to run vector search queries by using the old index definition. When the index finishes rebuilding, the old index is automatically replaced. This process is similar to MongoDB Search indexes. To learn more, see Creating and Updating a MongoDB Search Index.
Required Access
You must have the Project Search Index Editor or higher role to
edit a MongoDB Vector Search index.
Note
You can use the mongosh command or driver helper methods to edit
MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver
versions, see Supported Clients.
Procedure
Delete a MongoDB Vector Search Index
You can delete a MongoDB Vector Search index at any time from the
Atlas UI, Atlas Administration API, Atlas CLI, mongosh,
or a supported MongoDB Driver.
Required Access
You must have the Project Search Index Editor or higher role to
delete a MongoDB Vector Search index.
Note
You can use the mongosh command or driver helper methods to delete
MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver
versions, see Supported Clients.
Procedure
Index Status
When you create the MongoDB Vector Search index, the Status column shows the current state of the index on the primary node of the cluster. Click the View status details link below the status to view the state of the index on all the nodes of the cluster.
When the Status column reads Active, the index is ready to use. In other states, queries against the index may return incomplete results.
Status | Description |
|---|---|
Not Started | Atlas has not yet started building the index. |
Initial Sync | Atlas is building the index or re-building the index after an edit. When the index is in this state:
|
Active | Index is ready to use. |
Recovering | Replication encountered an error. This state commonly occurs
when the current replication point is no longer available on the
|
Failed | Atlas could not build the index. Use the error in the View status details modal window to troubleshoot the issue. To learn more, see Fix Issues. |
Delete in Progress | Atlas is deleting the index from the cluster nodes. |
While Atlas builds the index and after the build completes, the Documents column shows the percentage and number of documents indexed. The column also shows the total number of documents in the collection.