Docs Menu
Docs Home
/ /

How to Index Fields for Vector Search

You can use the vectorSearch type to index fields for running $vectorSearch queries. You can define the index for the vector embeddings that you want to query and any additional fields that you want to use to pre-filter your data. Filtering your data is useful to narrow the scope of your semantic search and ensure that certain vector embeddings are not considered for comparison, such as in a multi-tenant environment.

You can use the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver to create your MongoDB Vector Search index.

Note

You can't use the deprecated knnBeta operator to query fields indexed using the vectorSearch type index definition.

In a vectorSearch type index definition, you can index arrays with only a single element. You can't index embedding fields inside arrays of documents or embedding fields inside arrays of objects. You can index embedding fields inside documents using the dot notation. The same embedding field can't be indexed multiple times in the same index defintion.

Before indexing your embeddings, we recommend converting your embeddings to BSON BinData vectors with subtype float32, int1, or int8 for efficient storage in your cluster. To learn more, see how to convert your embeddings to BSON vectors.

When you use MongoDB Vector Search indexes, you might experience elevated resource consumption on an idle node for your Atlas cluster. This is due to the underlying mongot process, which performs various essential operations for MongoDB Vector Search. The CPU utilization on an idle node can vary depending on the number, complexity, and size of the indexes.

To learn more about sizing considerations for your indexes, see Memory Requirements for Indexing Vectors.

If you make changes to the collection for which you defined MongoDB Vector Search index, the latest data might not be available immediately for queries. However, mongot monitors the change streams and updates stored copies of data, making MongoDB Vector Search indexes eventually consistent. You can view the number of indexed Documents in the Atlas UI to verify that changes to the collection are reflected in the index.

Alternatively, you can create a new index after adding new documents to your collection and wait for the index to become queryable. You can also implement a polling logic similar to the following to ensure that the index is ready for querying before attempting to use it.

Example

console.log("Polling to check if the index is ready. This may take up to a minute.")
let isQueryable = false;
while (!isQueryable) {
const cursor = collection.listSearchIndexes();
for await (const index of cursor) {
if (index.name === result) {
if (index.queryable) {
console.log(`${result} is ready for querying.`);
isQueryable = true;
} else {
await new Promise(resolve => setTimeout(resolve, 5000));
}
}
}
}

You can create and manage MongoDB Vector Search indexes through the Atlas UI, mongosh, Atlas CLI, Atlas Administration API, and the following MongoDB Drivers:

MongoDB Driver
Version

1.28.0 or higher

3.11.0 or higher

3.1.0 or higher

1.16.0 or higher

5.2.0 or higher

5.2.0 or higher

6.6.0 or higher

1.20.0 or higher

4.7 or higher

3.1.0 or higher

5.2.0 or higher

The following syntax defines the vectorSearch index type:

1{
2 "fields":[
3 {
4 "type": "vector",
5 "path": "<field-to-index>",
6 "numDimensions": <number-of-dimensions>,
7 "similarity": "euclidean | cosine | dotProduct",
8 "quantization": "none | scalar | binary",
9 "hnswOptions": {
10 "maxEdges": <number-of-connected-neighbors>,
11 "numEdgeCandidates": <number-of-nearest-neighbors>
12 }
13 },
14 {
15 "type": "filter",
16 "path": "<field-to-index>"
17 },
18 ...
19 ]
20}

The MongoDB Vector Search index definition takes the following fields:

Option
Type
Necessity
Purpose

fields

Array of field definition documents

Required

Definitions for the vector and filter fields to index, one definition per document. Each field definition document specifies the type, path, and other configuration options for the field to index.

The fields array must contain at least one vector-type field definition. You can add additional filter-type field definitions to your array to pre-filter your data.

fields.
type

String

Required

Field type to use to index fields for $vectorSearch. You can specify one of the following values:

  • vector - for fields that contain vector embeddings.

  • filter - for additional fields to filter on. You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types.

To learn more, see About the vector Type and About the filter Type.

fields.
path

String

Required

Name of the field to index. For nested fields, use dot notation to specify path to embedded fields.

fields.
numDimensions

Int

Required

Number of vector dimensions that MongoDB Vector Search enforces at index-time and query-time. You can set this field only for vector-type fields. You must specify a value less than or equal to 8192.

For indexing quantized vectors or BinData, you can specify one of the following values:

  • 1 to 8192 for int8 vectors for ingestion.

  • Multiple of 8 for int1 vectors for ingestion.

  • 1 to 8192 for binData(float32) and array(float32) vectors for automatic scalar quantization.

  • Multiple of 8 for binData(float32) and array(float32) vectors for automatic binary quantization.

The embedding model you choose determines the number of dimensions in your vector embeddings, with some models having multiple options for how many dimensions are output. To learn more, see Choosing a Method to Create Embeddings.

fields.
similarity

String

Required

Vector similarity function to use to search for top K-nearest neighbors. You can set this field only for vector-type fields.

You can specify one of the following values:

  • euclidean - measures the distance between ends of vectors.

  • cosine - measures similarity based on the angle between vectors.

  • dotProduct - measures similarity like cosine, but takes into account the magnitude of the vector.

To learn more, see About the Similarity Functions.

fields.
quantization

String

Optional

Type of automatic vector quantization for your vectors. Use this setting only if your embeddings are float or double vectors.

You can specify one of the following values:

  • none - Indicates no automatic quantization for the vector embeddings. Use this setting if you have pre-quantized vectors for ingestion. If omitted, this is the default value.

  • scalar - Indicates scalar quantization, which transforms values to 1 byte integers.

  • binary - Indicates binary quantization, which transforms values to a single bit. To use this value, numDimensions must be a multiple of 8.

    If precision is critical, select none or scalar instead of binary.

To learn more, see Vector Quantization.

fields.
hnswOptions

Object

Optional

Parameters to use for Hierarchical Navigable Small Worlds graph construction. If omitted, uses the default values for the maxEdges and numEdgeCandidates parameters.

IMPORTANT: This is available as a Preview feature. Modifying the default values might negatively impact your MongoDB Vector Search index and queries.

fields.
hnswOptions.
maxEdges

Int

Optional

Maximum number of edges (or connections) that a node can have in the Hierarchical Navigable Small Worlds graph. Value can be between 16 and 64, both inclusive. If omitted, defaults to 16. For example, for a value of 16, each node can have a maximum of sixteen outgoing edges at each layer of the Hierarchical Navigable Small Worlds graph.

A higher number improves recall (accuracy of search results) because the graph is better connected. However, this slows down query speed because of the number of neighbors to evaluate per graph node, increases the memory for the Hierarchical Navigable Small Worlds graph because each node stores more connections, and slows down indexing because MongoDB Vector Search evaluates more neighbors and adjusts for every new node added to the graph.

fields.
hnswOptions.
numEdgeCandidates

Int

Optional

Analogous to numCandidates at query-time, this parameter controls the maximum number of nodes to evaluate to find the closest neighbors to connect to a new node. Value can be between 100 and 3200, both inclusive. If omitted, defaults to 100.

A higher number provides a graph with high-quality connections, which can improve search quality (recall), but it can also negatively affect query latency.

Your index definition's vector field must contain an array of numbers of one of the following types:

  • BSON double

  • BSON BinData vector subtype float32

  • BSON BinData vector subtype int1

  • BSON BinData vector subtype int8

Note

To learn more about generating BSON BinData vectors with subtype float32 int1 or int8 for your data, see How to Ingest Pre-Quantized Vectors.

You must index the vector field as the vector type inside the fields array.

The following syntax defines the vector field type:

1{
2 "fields":[
3 {
4 "type": "vector",
5 "path": <field-to-index>,
6 "numDimensions": <number-of-dimensions>,
7 "similarity": "euclidean | cosine | dotProduct",
8 "quantization": "none | scalar | binary",
9 "hnswOptions": {
10 "maxEdges": <number-of-connected-neighbors>,
11 "numEdgeCandidates": <number-of-nearest-neighbors>
12 }
13 },
14 ...
15 ]
16}

MongoDB Vector Search supports the following similarity functions:

  • euclidean - measures the distance between ends of vectors. This value allows you to measure similarity based on varying dimensions. To learn more, see Euclidean.

  • cosine - measures similarity based on the angle between vectors. This value allows you to measure similarity that isn't scaled by magnitude. You can't use zero magnitude vectors with cosine. To measure cosine similarity, we recommend that you normalize your vectors and use dotProduct instead.

  • dotProduct - measures similarity like cosine, but takes into account the magnitude of the vector. If you normalize the magnitude, cosine and dotProduct are almost identical in measuring similarity.

    To use dotProduct, you must normalize the vector to unit length at index-time and query-time.

The following table shows the similarity functions for the various types:

Vector Embeddings Type
euclidean
cosine
dotProduct

binData(int1)

√

binData(int8)

√

√

√

binData(float32)

√

√

√

array(float32)

√

√

√

For vector ingestion.

For automatic scalar or binary quantization.

For best performance, check your embedding model to determine which similarity function aligns with your embedding model's training process. If you don't have any guidance, start with dotProduct. Setting fields.similarity to the dotProduct value allows you to efficiently measure similarity based on both angle and magnitude. dotProduct consumes less computational resources than cosine and is efficient when vectors are of unit length. However, if your vectors aren't normalized, evaluate the similarity scores in the results of a sample query for euclidean distance and cosine similarity to determine which corresponds to reasonable results.

You can optionally index additional fields to pre-filter your data. You can filter on boolean, date, objectId, numeric, string, and UUID values, including arrays of these types. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison. It reduces the number of documents against which to run similarity comparisons, which can decrease query latency and increase the accuracy of search results.

You must index the fields that you want to filter by using the filter type inside the fields array.

The following syntax defines the filter field type:

1{
2 "fields":[
3 {
4 "type": "vector",
5 ...
6 },
7 {
8 "type": "filter",
9 "path": "<field-to-index>"
10 },
11 ...
12 ]
13}

Note

Pre-filtering your data doesn't affect the score that MongoDB Vector Search returns using $vectorSearchScore for $vectorSearch queries.

You can create a MongoDB Vector Search index for all collections that contain vector embeddings less than or equal to 8192 dimensions in length for any kind of data along with other data on your cluster through the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver.

To create a MongoDB Vector Search index, you must have a cluster with the following prerequisites:

  • MongoDB version 6.0.11, 7.0.2, or higher

  • A collection for which to create the MongoDB Vector Search index

Note

You can use the mongosh command or driver helper methods to create MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

You need the Project Data Access Admin or higher role to create and manage MongoDB Vector Search indexes.

You cannot create more than:

  • 3 indexes (regardless of the type, search or vector) on M0 clusters.

  • 10 indexes on Flex clusters.

We recommend that you create no more than 2,500 search indexes on a single M10+ cluster.

Note

The procedure includes index definition examples for the embedded_movies collection in the sample_mflix database. If you load the sample data on your cluster and create the example MongoDB Vector Search indexes for this collection, you can run the sample $vectorSearch queries against this collection. To learn more about the sample queries that you can run, see $vectorSearch Examples.

You can view MongoDB Vector Search indexes for all collections from the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver.

You need the Project Search Index Editor or higher role to view MongoDB Vector Search indexes.

Note

You can use the mongosh command or driver helper methods to retrieve MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

You can change the index definition of an existing MongoDB Vector Search index from the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver. You can't rename an index or change the index type. If you need to change an index name or type, you must create a new index and delete the old one.

Important

After you edit an index, MongoDB Vector Search rebuilds it. While the index rebuilds, you can continue to run vector search queries by using the old index definition. When the index finishes rebuilding, the old index is automatically replaced. This process is similar to MongoDB Search indexes. To learn more, see Creating and Updating a MongoDB Search Index.

You must have the Project Search Index Editor or higher role to edit a MongoDB Vector Search index.

Note

You can use the mongosh command or driver helper methods to edit MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

You can delete a MongoDB Vector Search index at any time from the Atlas UI, Atlas Administration API, Atlas CLI, mongosh, or a supported MongoDB Driver.

You must have the Project Search Index Editor or higher role to delete a MongoDB Vector Search index.

Note

You can use the mongosh command or driver helper methods to delete MongoDB Vector Search indexes on all Atlas cluster tiers. For a list of supported driver versions, see Supported Clients.

When you create the MongoDB Vector Search index, the Status column shows the current state of the index on the primary node of the cluster. Click the View status details link below the status to view the state of the index on all the nodes of the cluster.

When the Status column reads Active, the index is ready to use. In other states, queries against the index may return incomplete results.

Status
Description

Not Started

Atlas has not yet started building the index.

Initial Sync

Atlas is building the index or re-building the index after an edit. When the index is in this state:

  • For a new index, MongoDB Vector Search doesn't serve queries until the index build is complete.

  • For an existing index, you can continue to use the old index for existing and new queries until the index rebuild is complete.

Active

Index is ready to use.

Recovering

Replication encountered an error. This state commonly occurs when the current replication point is no longer available on the mongod oplog. You can still query the existing index until it updates and its status changes to Active. Use the error in the View status details modal window to troubleshoot the issue. To learn more, see Fix Issues.

Failed

Atlas could not build the index. Use the error in the View status details modal window to troubleshoot the issue. To learn more, see Fix Issues.

Delete in Progress

Atlas is deleting the index from the cluster nodes.

While Atlas builds the index and after the build completes, the Documents column shows the percentage and number of documents indexed. The column also shows the total number of documents in the collection.

Back

Create Embeddings

Earn a Skill Badge

Master "Vector Search Fundamentals" for free!

Learn more

On this page