How to Index Fields for Vector Search

Deployment Type

Interface

You can use the vectorSearch type to index fields for running $vectorSearch queries. You can define the index for the vector embeddings that you want to query and any additional fields that you want to use to pre-filter your data. Filtering your data is useful to narrow the scope of your semantic search and ensure that certain vector embeddings are not considered for comparison, such as in a multi-tenant environment.

You can use the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver to create your MongoDB Vector Search index.

Note

You can't use the deprecated knnBeta operator to query fields indexed using the vectorSearch type index definition.

Considerations

In a vectorSearch type index definition, you can index arrays with only a single element. You can't index embedding fields inside arrays of documents or embedding fields inside arrays of objects. You can index embedding fields inside documents using the dot notation. The same embedding field can't be indexed multiple times in the same index defintion.

Before indexing your embeddings, we recommend converting your embeddings to BSON BinData vectors with subtype float32, int1, or int8 for efficient storage in your cluster. To learn more, see how to convert your embeddings to BSON vectors.

When you use MongoDB Vector Search indexes, you might experience elevated resource consumption on an idle node for your Atlas cluster. This is due to the underlying mongot process, which performs various essential operations for MongoDB Vector Search. The CPU utilization on an idle node can vary depending on the number, complexity, and size of the indexes.

To learn more about sizing considerations for your indexes, see Memory Requirements for Indexing Vectors.

If you make changes to the collection for which you defined MongoDB Vector Search index, the latest data might not be available immediately for queries. However, mongot monitors the change streams and updates stored copies of data, making MongoDB Vector Search indexes eventually consistent. You can view the number of indexed Documents in the Atlas UI to verify that changes to the collection are reflected in the index.

Alternatively, you can create a new index after adding new documents to your collection and wait for the index to become queryable. You can also implement a polling logic similar to the following to ensure that the index is ready for querying before attempting to use it.

Example

console.log("Polling to check if the index is ready. This may take up to a minute.")
let isQueryable = false;
while (!isQueryable) {
  const cursor = collection.listSearchIndexes();
  for await (const index of cursor) {
    if (index.name === result) {
      if (index.queryable) {
        console.log(`${result} is ready for querying.`);
        isQueryable = true;
      } else {
        await new Promise(resolve => setTimeout(resolve, 5000));
      }
    }
  }
}

Supported Clients

You can create and manage MongoDB Vector Search indexes through the Atlas UI, mongosh, Atlas CLI, Atlas Administration API, and the following MongoDB Drivers:

MongoDB Driver	Version
C	1.28.0 or higher
C++	3.11.0 or higher
C#	3.1.0 or higher
Go	1.16.0 or higher
Java	5.2.0 or higher
Kotlin	5.2.0 or higher
Node	6.6.0 or higher
PHP	1.20.0 or higher
Python	4.7 or higher
Rust	3.1.0 or higher
Scala	5.2.0 or higher

Syntax

The following syntax defines the vectorSearch index type:

1 {
2   "fields":[
3     {
4       "type": "vector",
5       "path": "<field-to-index>",
6       "numDimensions": <number-of-dimensions>,
7       "similarity": "euclidean | cosine | dotProduct",
8       "quantization": "none | scalar | binary",
9       "hnswOptions": {
10         "maxEdges": <number-of-connected-neighbors>,
11         "numEdgeCandidates": <number-of-nearest-neighbors>
12       }
13     },
14     {
15       "type": "filter",
16       "path": "<field-to-index>"
17     },
18     ...
19   ]
20 }

MongoDB Vector Search Index Fields

The MongoDB Vector Search index definition takes the following fields:

Option	Type	Necessity	Purpose
`fields`	Array of field definition documents	Required	Definitions for the vector and filter fields to index, one definition per document. Each field definition document specifies the `type`, `path`, and other configuration options for the field to index. The `fields` array must contain at least one `vector`-type field definition. You can add additional `filter`-type field definitions to your array to pre-filter your data.
`fields.` `type`	String	Required	Field type to use to index fields for `$vectorSearch`. You can specify one of the following values: `vector` - for fields that contain vector embeddings. `filter` - for additional fields to filter on. You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types. To learn more, see About the `vector` Type and About the `filter` Type.
`fields.` `path`	String	Required	Name of the field to index. For nested fields, use dot notation to specify path to embedded fields.
`fields.` `numDimensions`	Int	Required	Number of vector dimensions that MongoDB Vector Search enforces at index-time and query-time. You can set this field only for `vector`-type fields. You must specify a value less than or equal to `8192`. For indexing quantized vectors or BinData, you can specify one of the following values: `1` to `8192` for `int8` vectors for ingestion. Multiple of `8` for `int1` vectors for ingestion. `1` to `8192` for `binData(float32)` and `array(float32)` vectors for automatic scalar quantization. Multiple of `8` for `binData(float32)` and `array(float32)` vectors for automatic binary quantization. The embedding model you choose determines the number of dimensions in your vector embeddings, with some models having multiple options for how many dimensions are output. To learn more, see Choosing a Method to Create Embeddings.
`fields.` `similarity`	String	Required	Vector similarity function to use to search for top K-nearest neighbors. You can set this field only for `vector`-type fields. You can specify one of the following values: `euclidean` - measures the distance between ends of vectors. `cosine` - measures similarity based on the angle between vectors. `dotProduct` - measures similarity like `cosine`, but takes into account the magnitude of the vector. To learn more, see About the Similarity Functions.
`fields.` `quantization`	String	Optional	Type of automatic vector quantization for your vectors. Use this setting only if your embeddings are `float` or `double` vectors. You can specify one of the following values: `none` - Indicates no automatic quantization for the vector embeddings. Use this setting if you have pre-quantized vectors for ingestion. If omitted, this is the default value. `scalar` - Indicates scalar quantization, which transforms values to 1 byte integers. `binary` - Indicates binary quantization, which transforms values to a single bit. To use this value, `numDimensions` must be a multiple of 8. If precision is critical, select `none` or `scalar` instead of `binary`. To learn more, see Vector Quantization.
`fields.` `hnswOptions`	Object	Optional	Parameters to use for Hierarchical Navigable Small Worlds graph construction. If omitted, uses the default values for the `maxEdges` and `numEdgeCandidates` parameters. IMPORTANT: This is available as a Preview feature. Modifying the default values might negatively impact your MongoDB Vector Search index and queries.
`fields.` `hnswOptions.` `maxEdges`	Int	Optional	Maximum number of edges (or connections) that a node can have in the Hierarchical Navigable Small Worlds graph. Value can be between `16` and `64`, both inclusive. If omitted, defaults to `16`. For example, for a value of `16`, each node can have a maximum of sixteen outgoing edges at each layer of the Hierarchical Navigable Small Worlds graph. A higher number improves recall (accuracy of search results) because the graph is better connected. However, this slows down query speed because of the number of neighbors to evaluate per graph node, increases the memory for the Hierarchical Navigable Small Worlds graph because each node stores more connections, and slows down indexing because MongoDB Vector Search evaluates more neighbors and adjusts for every new node added to the graph.
`fields.` `hnswOptions.` `numEdgeCandidates`	Int	Optional	Analogous to `numCandidates` at query-time, this parameter controls the maximum number of nodes to evaluate to find the closest neighbors to connect to a new node. Value can be between `100` and `3200`, both inclusive. If omitted, defaults to `100`. A higher number provides a graph with high-quality connections, which can improve search quality (recall), but it can also negatively affect query latency.

About the `vector` Type

Your index definition's vector field must contain an array of numbers of one of the following types:

BSON double
BSON BinData vector subtype float32
BSON BinData vector subtype int1
BSON BinData vector subtype int8

Note

To learn more about generating BSON BinData vectors with subtype float32 int1 or int8 for your data, see How to Ingest Pre-Quantized Vectors.

You must index the vector field as the vector type inside the fields array.

The following syntax defines the vector field type:

1 {
2   "fields":[
3     {
4       "type": "vector",
5       "path": <field-to-index>,
6       "numDimensions": <number-of-dimensions>,
7       "similarity": "euclidean | cosine | dotProduct",
8       "quantization": "none | scalar | binary",
9       "hnswOptions": {
10         "maxEdges": <number-of-connected-neighbors>,
11         "numEdgeCandidates": <number-of-nearest-neighbors>
12       }
13     },
14     ...
15   ]
16 }

About the Similarity Functions

MongoDB Vector Search supports the following similarity functions:

euclidean - measures the distance between ends of vectors. This value allows you to measure similarity based on varying dimensions. To learn more, see Euclidean.
cosine - measures similarity based on the angle between vectors. This value allows you to measure similarity that isn't scaled by magnitude. You can't use zero magnitude vectors with cosine. To measure cosine similarity, we recommend that you normalize your vectors and use dotProduct instead.
dotProduct - measures similarity like cosine, but takes into account the magnitude of the vector. If you normalize the magnitude, cosine and dotProduct are almost identical in measuring similarity.
To use dotProduct, you must normalize the vector to unit length at index-time and query-time.

The following table shows the similarity functions for the various types:

Vector Embeddings Type	`euclidean`	`cosine`	`dotProduct`
`binData(int1)`	√
`binData(int8)`	√	√	√
`binData(float32)`	√	√	√
`array(float32)`	√	√	√

For vector ingestion.

For automatic scalar or binary quantization.

For best performance, check your embedding model to determine which similarity function aligns with your embedding model's training process. If you don't have any guidance, start with dotProduct. Setting fields.similarity to the dotProduct value allows you to efficiently measure similarity based on both angle and magnitude. dotProduct consumes less computational resources than cosine and is efficient when vectors are of unit length. However, if your vectors aren't normalized, evaluate the similarity scores in the results of a sample query for euclidean distance and cosine similarity to determine which corresponds to reasonable results.

About the `filter` Type

You can optionally index additional fields to pre-filter your data. You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison. It reduces the number of documents against which to run similarity comparisons, which can decrease query latency and increase the accuracy of search results.

You must index the fields that you want to filter by using the filter type inside the fields array.

The following syntax defines the filter field type:

1 {
2   "fields":[
3     {
4       "type": "vector",
5       ...
6     },
7     {
8       "type": "filter",
9       "path": "<field-to-index>"
10     },
11     ...
12   ]
13 }

Note

Pre-filtering your data doesn't affect the score that MongoDB Vector Search returns using $vectorSearchScore for $vectorSearch queries.

Create a MongoDB Vector Search Index

You can create a MongoDB Vector Search index for all collections that contain vector embeddings less than or equal to 8192 dimensions in length for any kind of data along with other data on your cluster through the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver.

Prerequisites

To create a MongoDB Vector Search index, you must have a cluster with the following prerequisites:

MongoDB version 6.0.11, 7.0.2, or higher
A collection for which to create the MongoDB Vector Search index

Note

You can use the mongosh command or driver helper methods to create MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

Required Access

You need the Project Data Access Admin or higher role to create and manage MongoDB Vector Search indexes.

Index Limitations

You cannot create more than:

3 indexes (regardless of the type, search or vector) on M0 clusters.
10 indexes on Flex clusters.

We recommend that you create no more than 2,500 search indexes on a single M10+ cluster.

Procedure

Note

The procedure includes index definition examples for the embedded_movies collection in the sample_mflix database. If you load the sample data on your cluster and create the example MongoDB Vector Search indexes for this collection, you can run the sample $vectorSearch queries against this collection. To learn more about the sample queries that you can run, see $vectorSearch Examples.

View a MongoDB Vector Search Index

You can view MongoDB Vector Search indexes for all collections from the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver.

Required Access

You need the Project Search Index Editor or higher role to view MongoDB Vector Search indexes.

Note

You can use the mongosh command or driver helper methods to retrieve MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

Procedure

Edit a MongoDB Vector Search Index

You can change the index definition of an existing MongoDB Vector Search index from the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver. You can't rename an index or change the index type. If you need to change an index name or type, you must create a new index and delete the old one.

Important

After you edit an index, MongoDB Vector Search rebuilds it. While the index rebuilds, you can continue to run vector search queries by using the old index definition. When the index finishes rebuilding, the old index is automatically replaced. This process is similar to MongoDB Search indexes. To learn more, see Creating and Updating a MongoDB Search Index.

Required Access

You must have the Project Search Index Editor or higher role to edit a MongoDB Vector Search index.

Note

You can use the mongosh command or driver helper methods to edit MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

Procedure

Delete a MongoDB Vector Search Index

You can delete a MongoDB Vector Search index at any time from the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver.

Required Access

You must have the Project Search Index Editor or higher role to delete a MongoDB Vector Search index.

Note

You can use the mongosh command or driver helper methods to delete MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

Procedure

Index Status

When you create the MongoDB Vector Search index, the Status column shows the current state of the index on the primary node of the cluster. Click the View status details link below the status to view the state of the index on all the nodes of the cluster.

When the Status column reads Active, the index is ready to use. In other states, queries against the index may return incomplete results.

Status	Description
Not Started	Atlas has not yet started building the index.
Initial Sync	Atlas is building the index or re-building the index after an edit. When the index is in this state: For a new index, MongoDB Vector Search doesn't serve queries until the index build is complete. For an existing index, you can continue to use the old index for existing and new queries until the index rebuild is complete.
Active	Index is ready to use.
Recovering	Replication encountered an error. This state commonly occurs when the current replication point is no longer available on the `mongod` oplog. You can still query the existing index until it updates and its status changes to Active. Use the error in the View status details modal window to troubleshoot the issue. To learn more, see Fix Issues.
Failed	Atlas could not build the index. Use the error in the View status details modal window to troubleshoot the issue. To learn more, see Fix Issues.
Delete in Progress	Atlas is deleting the index from the cluster nodes.

While Atlas builds the index and after the build completes, the Documents column shows the percentage and number of documents indexed. The column also shows the total number of documents in the collection.

Back

Create Embeddings

Query Reference

1	{
2	"fields":[
3	{
4	"type": "vector",
5	"path": "<field-to-index>",
6	"numDimensions": <number-of-dimensions>,
7	"similarity": "euclidean \| cosine \| dotProduct",
8	"quantization": "none \| scalar \| binary",
9	"hnswOptions": {
10	"maxEdges": <number-of-connected-neighbors>,
11	"numEdgeCandidates": <number-of-nearest-neighbors>
12	}
13	},
14	{
15	"type": "filter",
16	"path": "<field-to-index>"
17	},
18	...
19	]
20	}

1	{
2	"fields":[
3	{
4	"type": "vector",
5	"path": <field-to-index>,
6	"numDimensions": <number-of-dimensions>,
7	"similarity": "euclidean \| cosine \| dotProduct",
8	"quantization": "none \| scalar \| binary",
9	"hnswOptions": {
10	"maxEdges": <number-of-connected-neighbors>,
11	"numEdgeCandidates": <number-of-nearest-neighbors>
12	}
13	},
14	...
15	]
16	}

Note

Considerations

Example

Supported Clients

Syntax

MongoDB Vector Search Index Fields

About the vector Type

Note

About the Similarity Functions

About the filter Type

Note

Create a MongoDB Vector Search Index

Prerequisites

Note

Required Access

Index Limitations

Procedure

Note

View a MongoDB Vector Search Index

Required Access

Note

Procedure

Edit a MongoDB Vector Search Index

Important

Required Access

Note

Procedure

Delete a MongoDB Vector Search Index

Required Access

Note

Procedure

Index Status

Earn a Skill Badge

About the `vector` Type

About the `filter` Type