Multi-Vector Semantic Search: Advanced Video Search with TwelveLabs and Amazon OpenSearch

Multi-Vector Semantic Search: Advanced Video Search with TwelveLabs and Amazon OpenSearch

How TwelveLabs AI Models and Amazon OpenSearch Serverless enable multi-vector semantic and hybrid search for video content.


Introduction

The rapid growth of enterprise video content has created a new challenge: how can organizations efficiently search and analyze massive video libraries to uncover meaningful insights? Traditional keyword-based approaches can’t capture the full richness of video, which encompasses visual, audio, speech, and contextual elements. In this post, we’ll demonstrate how combining TwelveLabs’ advanced multimodal AI models with Amazon OpenSearch Serverless’s scalable vector search capabilities enables powerful multi-vector semantic and hybrid search for video content. This makes it possible to find and understand information across every dimension of your video data.

Nested Field Search

According to OpenSearch’s documentation, using nested fields in a vector index, you can store multiple vectors in a single OpenSearch document. For example, if your content consists of various components, such as chapters of a book or segments of a video, you can generate a vector value for each component and store each vector in a nested field. As demonstrated in this blog post, this feature of OpenSearch will be crucial for storing the vector embeddings generated by TwelveLabs’ multi-vector approach, utilizing their breakthrough video foundation model, Marengo.

Article content
Multi-vector nested field OpenSearch Documents

TwelveLabs Multimodal Video Intelligence

TwelveLabs is a leader in AI-driven video understanding, moving beyond traditional approaches that treat video as a collection of separate images or audio tracks. The company has developed multimodal foundation models designed explicitly for video-first applications. The platform’s core innovation lies in its ability to understand video content holistically, processing multiple modalities simultaneously rather than analyzing them separately and attempting to piece them together. This approach mirrors human cognition, where we naturally integrate visual, auditory, and contextual information to understand what we’re seeing.

This approach mirrors human cognition, where we naturally integrate visual, auditory, and contextual information to understand what we’re seeing.

Model Architecture

TwelveLabs offers two primary AI models that work in tandem to provide comprehensive video understanding:

Marengo serves as the multimodal embedding foundation model, designed to generate rich vector representations using a multi-vector approach, creating separate specialized vectors for different aspects of video content rather than compressing everything into a single embedding.

Pegasus employs an encoder-decoder architecture optimized for comprehensive video understanding. This model enables sophisticated video-to-text generation, powering features like summarization, analysis, and content generation based on video understanding.

Amazon Bedrock

Good news, AWS announced in April that TwelveLabs is coming soon to Amazon Bedrock. With Marengo and Pegasus in Amazon Bedrock, you will be able to use TwelveLabs’ models to build and scale generative AI applications without having to manage underlying infrastructure or access external APIs.

Source Code

All of the open-source code for this blog post is available on GitHub as part of a Jupyter Notebook.

Building a Video Index

To demonstrate the platform’s capabilities, we’ll create a TwelveLabs’ Index using stock footage from Pexels. Pexels offers thousands of free, high-resolution videos covering a diverse range of subjects, making it an ideal source for building a representative video dataset for this demonstration. An index is a fundamental unit for organizing and storing video data, comprising video embeddings and metadata. Indexes facilitate information retrieval and processing.

Article content
Pexels free stock video content

Creating the Index

To begin, establish a TwelveLabs free or paid account and create a private API key from within the TwelveLabs Dashboard’s Settings.

Article content
Creating your TwelveLabs private API key

Free accounts provide up to 600 minutes and up to 50 API calls per day. If you exceed the API call limit, you will see an error similar to the following:

RateLimitError: Error code: 429 - {'code': 'too_many_requests', 'message': 'You have exceeded the rate limit (50req/1day). Please try again later after 2025-06-12T01:16:22Z.'}        

Next, instantiate a TwelveLabs client with your private API key:

from twelvelabs import TwelveLabs
from twelvelabs.models import Video
from twelvelabs.exceptions import NotFoundError

# Set the API key for TwelveLabs from environment variable
TL_API_KEY = os.getenv("TL_API_KEY")

tl_client = TwelveLabs(api_key=TL_API_KEY)        

Using the TwelveLabs platform, creating an index involves providing an index name, selecting the models and options, and optional add-ons. The TwelveLabs Python SDK streamlines this process, providing methods for index management. The SDK supports authentication and asynchronous operations.

For this post, I created a visual-only index (no audio), as the Pexels videos do not contain audio. Adjust the model’s options below, depending on your video content, visual or visual and audio:

def create_index(index_name: str) -> str:
    """Create a new index for embeddings.
    
    Args:
        index_name (str): The name of the index to create.
    
    Returns:
        str: The ID of the created index.
    """
    # Check if the index already exists
    index_list = tl_client.index.list(
        name=index_name,
        sort_option="asc",
        page_limit=1,
    )
    
    # If the index exists, return its ID
    if index_list:
        for index in index_list:
            print(f"Index '{index.name}' already exists.")
            return index.id

    # If the index does not exist, create a new one
    print(f"Creating index '{index_name}'...")
    models = [
        {"name": "marengo2.7", "options": ["visual", "audio"]},
        {"name": "pegasus1.2", "options": ["visual", "audio"]},
    ]

    created_index = tl_client.index.create(
        name=index_name, models=models, addons=["thumbnail"]
    )

    return created_index.id



TL_INDEX_NAME = "pexels_sample_index"

tl_index_id = create_index(TL_INDEX_NAME)
print(f"New index ID: {tl_index_id}")        

Uploading Videos

Once you have successfully created your index, you can upload our videos. The code assumes your videos meet all the TwelveLabs’ requirements and are located in the videos/pexels directory:

def upload_video(tl_index_id: str, video_path: str) -> None:
    """Upload a video to the TwelveLabs index.
    
    Args:
        tl_index_id (str): The ID of the TwelveLabs index.
        video_path (str): The path to the video file to upload.
    
    Returns:
        None
    """
    try:
        task = tl_client.task.create(index_id=tl_index_id, file=video_path)
        print(f"Task id={task.id}")
        print(f"Video '{video_path}' uploaded successfully!")
    except Exception as e:
        print(f"Failed to upload video '{video_path}': {e}")


video_directory = "videos/pexels"
if not os.path.exists(video_directory):
    print(f"Video directory '{video_directory}' does not exist. Creating it.")
    os.makedirs(video_directory)

for video in os.listdir(video_directory):
    if video.endswith(".mp4"):
        video_path = os.path.join(video_directory, video)
        upload_video(tl_index_id, video_path)        
Article content
Indexing the uploaded video into TwelveLabs' Index

After a few minutes of indexing, you should see your videos in the index. For this post’s quick demonstration, I’ve used a small quantity of just 25 videos.

Article content
TwelveLabs Index with Pexels videos

Exploring TwelveLabs’ Core Features

The TwelveLabs platform offers three primary capabilities that demonstrate the power of multimodal video understanding: Search, Analyze, and Embed.

Article content
Examples of the Analyze and Search features of TwelveLabs

Search: Natural Language Video Retrieval

The Search feature enables users to find specific moments within videos using natural language queries. Unlike traditional keyword matching, this semantic search understands context and intent, allowing queries like “when the person in the red shirt enters the restaurant” to return precise video segments even without exact keyword matches.

Users can search by entering text-based queries or uploading reference images to find similar content. This flexibility makes the platform especially powerful for content discovery and analysis workflows. The search functionality includes both ‘View by Clips,’ which displays video segments matching your query, and ‘View by Video,’ which highlights the relevant clips within each video. In my view, these advanced search features, combined with a well-designed user interface, set TwelveLabs apart from the competition.

These advanced search features, combined with a well-designed user interface, set TwelveLabs apart from the competition.
Article content
View by Clips option on the TwelveLabs Search Playground
Article content
View by Video option on the TwelveLabs Search Playground

Analyze: Comprehensive Video Intelligence

The Analyze feature leverages the Pegasus model to generate detailed video understanding outputs, including:

  • Summaries: Comprehensive overviews of video content, capturing key themes and information
  • Chapter Analysis: Automatic segmentation of videos into logical chapters with timestamps and descriptions
  • Highlights: Identification of the most critical or engaging moments within the video content
  • Gist Generation: Creation of concise titles, topics, and relevant hashtags that capture the essence of the video
  • Open-ended Analysis: Perform open-ended analysis on video content, generating tailored text outputs based on your prompts

These analysis capabilities transform raw video content into structured, searchable metadata that enhances discoverability and understanding.

Article content
Analyze Playground on the TwelveLabs platform

Embed: Vector Representation Generation

The Embed function produces rich multimodal embeddings that capture the semantic meaning of video content. These embeddings serve as the foundation for advanced applications, including semantic search, content recommendation, and similarity analysis.

With Marengo’s multi-vector architecture, the embeddings provide remarkable granularity, separating visual and audio information into separate vector representations. This approach enables more precise matching and retrieval compared to single-vector methods. To generate embeddings, the platform divides videos into clips of between 2 and 10 seconds in length. Marengo creates dense vector embeddings with 1,024 dimensions (a list of floats).

Below is an example of embeddings generated for a 20-second video. The video has been divided into three equal-sized clips as indicated by the start and end offsets. There is a visual (visual-text) and an audio embedding for each clip, totaling six embeddings. The vector embeddings have been truncated for brevity.

Video ID: 68029cb8037ae3f16ac260e9

Embeddings:
  embedding_scope=clip embedding_option=visual-text start_offset_sec=0.0 end_offset_sec=6.7777777
  embeddings: [-0.008768515, 0.021398513, -0.011509235, ...]

  embedding_scope=clip embedding_option=visual-text start_offset_sec=6.7777777 end_offset_sec=13.555555
  embeddings: [-0.013005765, 0.008057533, -0.0063977824, ...]

  embedding_scope=clip embedding_option=visual-text start_offset_sec=13.555555 end_offset_sec=20.333334
  embeddings: [-0.01812577, 0.00537808, -0.02521663, ...]

  embedding_scope=clip embedding_option=audio start_offset_sec=0.0 end_offset_sec=6.7777777
  embeddings: [-0.040039062, -0.024536133, 0.004852295, ...]

  embedding_scope=clip embedding_option=audio start_offset_sec=6.7777777 end_offset_sec=13.555555
  embeddings: [-0.040771484, -0.036621094, 0.0047912598, ...]

  embedding_scope=clip embedding_option=audio start_offset_sec=13.555555 end_offset_sec=20.333334
  embeddings: [-0.02734375, -0.036376953, 0.006011963, ...]        

Using the TwelveLabs’ Embed Playground, you can perform a semantic search by inputting a search query, such as “sporting event in a stadium,” which is converted into a text vector embedding. The vector representation of your search query is displayed in relative proximity to the vector representation of your videos in the index. Note that each video is shown as multiple clips, each with its own embedding(s).

Article content
Embed Playground on the TwelveLabs platform


Article content
Embed Playground on the TwelveLabs platform

Batch Retrieval of Embeddings from the Index

The TwelveLabs platform allows you to retrieve embeddings for videos. You can create embeddings from a new video without creating an index using the Python SDK’s client.embed.task.create method, or retrieve embeddings from an existing index using the client.index.video.retrieve method. The TwelveLabs Python SDK provides straightforward access to the platform’s embedding capabilities.

To perform batch retrieval of embeddings from the index, I’ve created two methods, one to extract a list of video IDs from the index and a second to write the embeddings to an interim JSON file. We iterate over each video in the index based on the list of video IDs, retrieve the embeddings, and write them to disk.

def save_embeddings_to_json(video: Video, output_path: str) -> None:
    """Save the embedding task details to a JSON file.

    Args:
        video (Video): The video object containing embedding details.
        output_path (str): The path where the JSON file will be saved.
    
    Returns:
        None
    """
    # Serialize the video object to a dictionary
    video_data = video.model_dump_json()
    video_data = json.loads(video_data)
    video_data["video_id"] = video.id

    # Determine the filename using the input filename from the task metadata
    input_filename = video_data["video_id"]
    output_filename = f"{output_path}/{input_filename}_embeddings.json"
    if os.path.exists(output_filename):
        print(f"Embeddings already exist for video ID {video.id}. Skipping...")
        return

    # Write the dictionary to a JSON file
    with open(output_filename, "w") as json_file:
        json.dump(video_data, json_file, indent=4)
    print(f"Embeddings saved to {output_filename}")


def get_videos_from_index(index_id: str, page_limit: int = 25) -> list:
    """Retrieve video IDs from the specified index.

    Args:
        index_id (str): The ID of the index to query.
        page_limit (int): The maximum number of results to return.

    Returns:
        list: A list of video IDs retrieved from the index.
    """
    result = tl_client.search.query(
        index_id=index_id,
        query_text="*",
        options=["visual"],
        page_limit=page_limit,
    )

    print(f"Total count of videos in index {index_id}: {result.pool.total_count}")
    if result.pool.total_count == 0:
        raise NotFoundError(f"No videos found in index {index_id}")
    print(result)
    video_ids = [item.video_id for item in result.data]
    return video_ids


# Retrieve the video IDs from the index
video_ids = get_videos_from_index(tl_index_id)

# Retrieve the video embeddings from the index and save to JSON
for video_id in video_ids:
    video = tl_client.index.video.retrieve(
        index_id=tl_index_id, id=video_id, embedding_option=["visual-text"]
    )

    output_directory = "output/pexels"
    if not os.path.exists(output_directory):
        print(f"Output directory '{output_directory}' does not exist. Creating it.")
        os.makedirs(output_directory)

    print(f"Processing video ID: {video.id}")
    save_embeddings_to_json(video, output_directory)        

The interim JSON files with the embeddings will be saved to the ./output/pexels/ directory. You could also choose to persist this data in memory.

Article content

Video Analysis: Summaries, Chapters, Highlights, Gists, and Open-ended analysis

The TwelveLabs Analyze function transforms video content into structured textual intelligence. This process leverages the Pegasus model’s ability to generate human-quality descriptions and analysis. There are several options, including summaries, chapters, highlights, gist (with titles, topics, and hashtags), and open-ended analysis.

Summaries

Video summaries provide comprehensive overviews that capture the essential information and themes within video content. The summarization process analyzes both visual and audio elements, creating coherent narratives that reflect the video’s key messages and content structure.

summary = tl_client.summarize(
    video_id=video_id,
    prompt="Summarize the video in a concise manner.",
    temperature=0.4,
    type="summary",
)        

Example summary from 112-second video:

"summary": "The video captures a bustling intersection from behind a blue metal fence at a crosswalk, showcasing the continuous flow of traffic and pedestrians. Various vehicles, including scooters, a green truck, two cars, a pink bus, a white vehicle, and a flatbed trailer, move through the scene, followed by several motorcycles. Amidst the vehicular traffic, a man on a bicycle enters the frame from the bottom and rides towards the top. Pedestrian activity is also prominent, with a person in dark pants crossing the street from left to right, and another individual holding an umbrella navigating the pedestrian crossing while multiple motorbikes approach. The video concludes as all the pedestrians have crossed, leaving the intersection momentarily quiet. The creator likely aims to depict the dynamic and varied movements within a typical urban intersection."        

Chapters

Automatic chapter generation identifies logical breakpoints within videos, creating timestamped segments with descriptive titles. This feature is particularly valuable for long-form content, educational materials, and presentations where users need to navigate to specific sections.

chapters = tl_client.summarize(
    video_id=video_id,
    prompt="List the chapters of the video.",
    temperature=0.4,
    type="chapter",
)        

Example chapters from 112-second video:

"chapters": [
    {
        "chapter_number": 0,
        "start": 0.0,
        "end": 45.0,
        "chapter_title": "Traffic and Bicycle Rider",
        "chapter_summary": "The video begins with a view from behind a blue metal fence at a crosswalk. Several scooters, a green truck, two cars, a pink bus, and two more vehicles (one white and another with a flatbed trailer) pass through the intersection. More traffic continues moving through, including several motorcycles. A man riding a bicycle enters the scene from the bottom edge of the screen and rides towards the top."
    },
    {
        "chapter_number": 1,
        "start": 71.0,
        "end": 112.0,
        "chapter_title": "Pedestrians Crossing",
        "chapter_summary": "A person wearing dark pants walks into the frame from the left side and crosses the street. Another individual holding an umbrella walks across the pedestrian crossing as multiple motorbikes approach them. After all pedestrians have crossed, the clip ends showing no further movement within this segment."
    }
]        

Highlights

The highlight detection capability identifies the most engaging or essential moments within video content. This analysis considers factors such as visual activity, audio intensity, and content significance to surface key segments that represent the video’s most valuable information.

highlights = tl_client.summarize(
    video_id=video_id,
    prompt="List the highlights of the video.",
    temperature=0.4,
    type="highlight",
)        

Example highlights from a 112-second video:

"highlights": [
    {
        "start": 0.0,
        "end": 5.0,
        "highlight": "Scooters Passing",
        "highlight_summary": "Several scooters pass through the intersection behind a blue metal fence."
    },
    {
        "start": 71.0,
        "end": 85.0,
        "highlight": "Person Walking",
        "highlight_summary": "A person in dark pants walks into the frame from the left and crosses the street."
    },
    {
        "start": 86.0,
        "end": 105.0,
        "highlight": "Umbrella Pedestrian",
        "highlight_summary": "An individual holding an umbrella walks across the pedestrian crossing as multiple motorbikes approach."
    },
    {
        "start": 106.0,
        "end": 112.0,
        "highlight": "End of Movement",
        "highlight_summary": "After all pedestrians have crossed, the clip ends with no further movement."
    }
]        

Gist: Title, Topic, and Hashtags

Gist generation produces concise metadata, including titles, topics, and hashtags that encapsulate the video’s essence. This structured metadata enhances searchability and enables efficient content categorization and discovery. TwelveLabs provides both streaming and non-streaming methods for retrieving the results of the open-ended analysis.

gist = tl_client.gist(
  video_id=video_id, 
  types=["title", "topic", "hashtag"]
)        

Example gist from a 112-second video:

"gist": {
    "title": "A Day at the Intersection: Capturing the Flow of Traffic and Pedestrians",
    "topics": [
        "Urban Traffic and Pedestrian Movement"
    ],
    "hashtags": [
        "StreetTraffic",
        "Vehicles",
        "Pedestrians",
        "Intersection",
        "UrbanScenes"
    ]
}        

Open-ended Analysis

Open-ended analysis of video content allows you to generate tailored text outputs based on your prompt. This feature provides more customization options than the summarization feature.

res_analyze = tl_client.analyze(
        video_id=video_id,
        prompt="Describe what is happening in the video.",
        temperature=0.4,
    )        

Example open-ended analysis from a 112-second video:

"analysis": "The video captures a busy intersection in Taiwan, where various vehicles such as scooters, cars, a green truck, a pink bus, and a white vehicle with a flatbed trailer move through the crosswalk. Motorcycles and a bicycle also pass by. Pedestrians, including one holding an umbrella, cross the street amidst the ongoing traffic. The scene is set against a backdrop of a construction site with blue and yellow barriers."        

To perform batch generation of a complete analysis from the index, I’ve created one additional method to generate the analysis and write it to an interim JSON file. We iterate over each video in the index, based on the list of video IDs, and generate the analyses.

def summarize_video(index_id: str, video_id: str, output_path: str) -> None:
    """Summarize a video and save the analysis to a JSON file if it doesn't already exist.

    Args:
        index_id (str): The ID of the index where the video is stored.
        video_id (str): The ID of the video to summarize.
        output_path (str): The path where the JSON file will be saved.

    Returns:
        None
    """
    # Check if the analysis already exists
    filename = f"{output_path}/{video_id}_analysis.json"
    if os.path.exists(filename):
        print(f"Analysis already exists for video ID {video_id}. Skipping...")
        return
    print(f"Analyzing video ID: {video_id}")

    # Get the video summary
    res_summary = tl_client.summarize(
        video_id=video_id,
        prompt="Summarize the video in a concise manner.",
        temperature=0.4,
        type="summary",
    )

    # Get the chapters of the video
    res_chapters = tl_client.summarize(
        video_id=video_id,
        prompt="List the chapters of the video.",
        temperature=0.4,
        type="chapter",
    )

    # Get the highlights of the video
    res_highlights = tl_client.summarize(
        video_id=video_id,
        prompt="List the highlights of the video.",
        temperature=0.4,
        type="highlight",
    )

    # Get open-ended text analysis of the video
    res_analyze = tl_client.analyze(
        video_id=video_id,
        prompt="Describe what is happening in the video.",
        temperature=0.4,
    )

    # Get the gist of the video
    res_gist = tl_client.gist(video_id=video_id, types=["title", "topic", "hashtag"])

    # Combined responses
    analyses = {}

    analyses.update(
        {
            "gist": res_gist.model_dump(),
            "video_id": video_id,
            "index_id": index_id,
            "summary": res_summary.summary,
            "analysis": res_analyze.data,
            "chapters": res_chapters.chapters.model_dump(),
            "highlights": res_highlights.highlights.model_dump(),
        }
    )

    # save to file
    with open(filename, "w") as f:
        f.write(json.dumps(analyses))


# Retrieve the video IDs from the index
video_ids = get_videos_from_index(tl_index_id)

# Retrieve the video analysis from the index and save to JSON
for video_id in video_ids:
    print(f"Processing video ID: {video_id}")
    summarize_video(tl_index_id, video_id, output_directory)        

The interim JSON files with the analyses will be saved to the ./output/pexels/ directory alongside the embeddings.

Article content

Combining TwelveLabs’ Embeddings and Analyses

Before moving on to Amazon OpenSearch, the final step is to combine the embeddings with the analyses and any other metadata into a single document, one for each video, which will be indexed in Amazon OpenSearch.

def extract_video_ids(output_path: str) -> list:
    """Extract video IDs from analysis filenames in the specified directory.

    Args:
        output_path (str): Directory containing the analysis JSON files

    Returns:
        list: List of extracted video IDs
    """
    video_ids = []

    # Check if the output directory exists
    if not os.path.exists(output_path):
        print(f"Directory {output_path} doesn't exist")
        return video_ids

    for filename in os.listdir(output_path):
        # Check if it's an analysis file
        if filename.endswith("_analysis.json"):
            # Extract the ID part from the filename
            # The ID is everything before "_analysis.json"
            video_id = filename.split("_analysis.json")[0]
            video_ids.append(video_id)

    return video_ids


# Extract video IDs from the analysis files
video_ids = extract_video_ids(output_directory)
print(f"Found {len(video_ids)} video IDs: {video_ids}")


def combine_segments_to_documents(
    output_path: str, document_path: str, video_ids: list
) -> None:
    """Combine embeddings and analyses into single documents and save them to a local directory.

    Args:
        output_path (str): Directory containing the analysis and embeddings JSON files
        document_path (str): Directory to save the combined document files
        video_ids (list): List of video IDs to process

    Returns:
        None
    """
    for video_id in video_ids:
        # Open corresponding analyses and embeddings documents and combined
        with open(f"{output_path}/{video_id}_embeddings.json", "r") as f:
            embeddings = json.load(f)

        with open(f"{output_path}/{video_id}_analysis.json", "r") as f:
            analyses = json.load(f)

        # Combine the two documents
        document = {}
        document.update(analyses)
        document.update(embeddings)

        # Remove unneeded keys
        document["gist"].pop("id", None)
        document["gist"].pop("usage", None)
        
        # Segments of video
        segments = document["embedding"]["video_embedding"]["segments"]

        # Write documents to local directory for each segment
        filename = f"{document_path}/{document['video_id']}_document.json"
        document.pop("embedding", None)
        document["segments"] = segments
        for segment in document["segments"]:
            segment["segment_embedding"] = segment["embeddings_float"].copy()
            segment.pop("embeddings_float", None)

        with open(filename, "w") as f:
            f.write(json.dumps(document, indent=4))


document_directory = "documents/pexels"
if not os.path.exists(document_directory):
    print(f"Document directory '{document_directory}' does not exist. Creating it.")
    os.makedirs(document_directory)

combine_segments_to_documents(output_directory, document_directory, video_ids)        

The JSON files representing Amazon OpenSearch documents, with both the embeddings and analyses, will be saved to the ./documents/pexels/ directory.

Article content

The final structure of the documents is similar to the example below. The embeddings have been truncated in the example for brevity:

{
    "gist": {
        "title": "A Day at the Intersection: Capturing the Flow of Traffic and Pedestrians",
        "topics": [
            "Urban Traffic and Pedestrian Movement"
        ],
        "hashtags": [
            "StreetTraffic",
            "Vehicles",
            "Pedestrians",
            "Intersection",
            "UrbanScenes"
        ]
    },
    "video_id": "6853656f3e86ee22b3162cbf",
    "index_id": "68481c835b5b47d9b345c12d",
    "summary": "The video captures a bustling urban intersection through the lens of a stationary camera positioned behind a blue metal fence. The scene opens with scooters zipping through the crosswalk, setting the tone for the continuous flow of traffic. A green truck and two cars soon follow, adding to the dynamic street life. A pink bus then crosses the frame from right to left, drawing attention with its vibrant color. The traffic continues with a white vehicle and one with a flatbed trailer, followed by several motorcycles passing by, illustrating the diverse modes of transportation. Amidst the vehicular traffic, a man on a bicycle enters from the bottom and rides towards the top, weaving through the crowd. Pedestrian activity is also highlighted, with a person in dark pants crossing the street from left to right, and another individual holding an umbrella walking across the pedestrian crossing, narrowly avoiding multiple approaching motorbikes. The video concludes with the pedestrians safely crossing, leaving the intersection momentarily quiet, emphasizing the rhythm and flow of urban life.",
    "analysis": "The video captures a bustling intersection in Taiwan, where various vehicles such as scooters, cars, a green truck, a pink bus, and a white vehicle with a flatbed trailer move through the crosswalk. Motorcycles also pass by, adding to the traffic flow. Early in the video, a man on a bicycle enters from the bottom edge and rides towards the top. Midway through, a pedestrian wearing dark pants crosses the street from the left side, followed by another individual holding an umbrella walking across the pedestrian crossing as multiple motorbikes approach. The scene is framed by a construction site with blue and yellow barriers, and towards the end, the intersection becomes quieter with no further movement shown.",
    "chapters": [
        {
            "chapter_number": 0,
            "start": 0.0,
            "end": 45.0,
            "chapter_title": "Traffic and Bicycle Rider",
            "chapter_summary": "The video begins with a view from behind a blue metal fence at a crosswalk. Several scooters, a green truck, two cars, a pink bus, and two more vehicles (one white and another with a flatbed trailer) pass through the intersection. More traffic continues moving through, including several motorcycles. A man riding a bicycle enters the scene from the bottom edge of the screen and rides towards the top."
        },
        {
            "chapter_number": 1,
            "start": 71.0,
            "end": 112.0,
            "chapter_title": "Pedestrians Crossing",
            "chapter_summary": "A person wearing dark pants walks into the frame from the left side and crosses the street. Another individual holding an umbrella walks across the pedestrian crossing as multiple motorbikes approach them. After all pedestrians have crossed, the clip ends showing no further movement within this segment."
        }
    ],
    "highlights": [
        {
            "start": 0.0,
            "end": 5.0,
            "highlight": "Scooters Passing",
            "highlight_summary": "Several scooters pass through the intersection behind a blue metal fence."
        },
        {
            "start": 71.0,
            "end": 85.0,
            "highlight": "Person Walking",
            "highlight_summary": "A person in dark pants walks into the frame from the left and crosses the street."
        },
        {
            "start": 86.0,
            "end": 105.0,
            "highlight": "Umbrella Pedestrian",
            "highlight_summary": "An individual holding an umbrella walks across the pedestrian crossing as multiple motorbikes approach."
        },
        {
            "start": 106.0,
            "end": 112.0,
            "highlight": "End of Movement",
            "highlight_summary": "After all pedestrians have crossed, the clip ends with no further movement."
        }
    ],
    "id": "6853656f3e86ee22b3162cbf",
    "created_at": "2025-06-19T01:18:39Z",
    "updated_at": null,
    "system_metadata": {
        "filename": "13239132-hd_1920_1080_50fps.mp4",
        "duration": 112.361667,
        "fps": 50.0,
        "width": 1920,
        "height": 1080,
        "size": 72844586
    },
    "user_metadata": null,
    "hls": {
        "video_url": "https://deuqpmn4rs7j5.cloudfront.net/88d47efb7556bg3d95ce9441/796gg67g5f97ff3d5h63e73c/stream/ec888844-742g-6ac8-a3hc-c66d368ce0c0.m3u8",
        "thumbnail_urls": [
            "https://deuqpmn4rs7j5.cloudfront.net/88d47efb7556bg3d95ce9441/796gg67g5f97ff3d5h63e73c/thumbnails/ec888844-742g-6ac8-a3hc-c66d368ce0c0.0000001.jpg"
        ],
        "status": "COMPLETE",
        "updated_at": "2025-06-19T01:19:08.904Z"
    },
    "source": null,
    "segments": [
        {
            "start_offset_sec": 0.0,
            "end_offset_sec": 9.363472,
            "embedding_scope": "clip",
            "embedding_option": "visual-text",
            "segment_embedding": [
                0.034297787,
                -0.0010638289,
                0.01824182
            ]
        },
        {
            "start_offset_sec": 9.363472,
            "end_offset_sec": 18.726944,
            "embedding_scope": "clip",
            "embedding_option": "visual-text",
            "segment_embedding": [
                0.02694066,
                0.018537117,
                0.026755061
            ]
        },
        {
            "start_offset_sec": 18.726944,
            "end_offset_sec": 28.090416,
            "embedding_scope": "clip",
            "embedding_option": "visual-text",
            "segment_embedding": [
                0.026583178,
                0.014603035,
                0.022988498
            ]
        },
        ...
        {
            "start_offset_sec": 102.99819,
            "end_offset_sec": 112.361664,
            "embedding_scope": "clip",
            "embedding_option": "visual-text",
            "segment_embedding": [
                0.02268905,
                0.0030316901,
                0.021492222
            ]
        }
    ]
}        

Amazon OpenSearch Serverless

Amazon OpenSearch Serverless is an on-demand, serverless option for Amazon OpenSearch Service that automatically scales compute resources based on application needs, eliminating the need to provision or manage clusters. The service utilizes a cloud-native architecture that separates indexing from search operations, with Amazon S3 serving as the primary data storage, enabling both functions to scale independently.

OpenSearch Serverless measures capacity in OpenSearch Compute Units (OCUs), automatically scaling up and down based on demand to ensure you only pay for resources consumed while maintaining fast performance. Amazon OpenSearch Serverless provides a robust foundation for storing and searching multimodal video data. The platform’s vector engine supports up to 16,000 dimensions and can accommodate billions of vectors with millisecond response times.

Creating the OpenSearch Serverless Index

Setting up OpenSearch Serverless for multi-vector nested field search requires a specific index mapping configuration. The vector engine supports nested field structures, allowing for the storage of multiple embeddings within a single document while maintaining efficient search performance. The nested field configuration allows OpenSearch to treat each embedded vector as a separate searchable entity while maintaining the relationship to the parent document. This approach enables sophisticated search scenarios where multiple vectors from the same document can contribute to relevance scoring.

Assuming you have already created an Amazon OpenSearch Serverless Collection, create the index within the Collection to hold the documents using the client.indices.create method. This process begins by authenticating with your AWS Account and instantiating an OpenSearch Client instance. We then create the vector search index, which is enabled by setting k-nearest neighbors, or k-NN. The k-NN search finds the k neighbors closest to a query point across an index of vectors. To determine the neighbors, you can specify the space (the distance function) you want to use to measure the distance between points.

The Index’s method defines the algorithm used for organizing vector data at indexing time and searching it at search time using approximate k-NN search. We will utilize the Hierarchical Navigable Small World (HNSW) algorithm. The engine is the library that implements these methods. We will use the Faiss (Facebook AI Similarity Search) engine. Lastly, we will select l2, the Euclidean norm, as the vector space used to calculate the distance between vectors.

Make sure to update the aws_region and aoss_host variables before continuing:

# *** Change values to match your environment ***
aws_region = "us-east-1"
aoss_host = "abcdefg12345.us-east-1.aoss.amazonaws.com"
aoss_index = "video-search-nested"

# Create opensearch client (your authentication method may vary)
service = "aoss"
credentials = boto3.Session(
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    aws_session_token=aws_session_token,
    region_name=aws_region,
).get_credentials()
auth = AWSV4SignerAuth(credentials, aws_region, service)

aoss_client = OpenSearch(
    hosts=[{"host": aoss_host, "port": 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    pool_maxsize=20,
)

# Create new nested field search index (multiple vector fields)
try:
    response = aoss_client.indices.delete(index=aoss_index)
except NotFoundError as e:
    print(f"Index {aoss_index} not found, skipping deletion.")
except Exception as e:
    print(f"Error deleting index: {e}")

index_body = {
    "settings": {
        "index": {
            "knn": True,
            "number_of_shards": 2,
        }
    },
    "mappings": {
        "properties": {
            "segments": {
                "type": "nested",
                "properties": {
                    "segment_embedding": {
                        "type": "knn_vector",
                        "dimension": 1024,
                        "method": {
                            "engine": "faiss",
                            "name": "hnsw",
                            "space_type": "l2",
                        },
                    }
                },
            }
        }
    },
}

try:
    response = aoss_client.indices.create(index=aoss_index, body=index_body)
    print(json.dumps(response, indent=4))
except Exception as ex:
    print(ex)        

Note that the Amazon OpenSearch Serverless Management Console does not appear to display the correct Vector fields information for the nested field configuration.

Article content
Amazon OpenSearch Serverless Management Console

We can check the index’s configuration programmatically using the boto3 SDK:

try:
    response = aoss_client.indices.get(index=aoss_index)
    print(json.dumps(response, indent=4))
except NotFoundError as ex:
    print(f"Index not found: {ex}")
except Exception as ex:
    print(ex.error)        

You should observe that the configuration mirrors the client.indices.create method’s parameters, above:

{
    "video-search-nested": {
        "aliases": {},
        "mappings": {
            "properties": {
                "segments": {
                    "type": "nested",
                    "properties": {
                        "segment_embedding": {
                            "type": "knn_vector",
                            "dimension": 1024,
                            "method": {
                                "engine": "faiss",
                                "space_type": "l2",
                                "name": "hnsw",
                                "parameters": {}
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "number_of_shards": "2",
                "provided_name": "video-search-nested",
                "knn": "true",
                "creation_date": "1750000421047",
                "number_of_replicas": "0",
                "uuid": "o1IndJcBdo3sdID2SPl7",
                "version": {
                    "created": "136327827"
                }
            }
        }
    }
}        

Bulk Indexing

With the index created, we can bulk index our documents:

def load_and_index_documents(document_path: str) -> None:
    """Load documents from JSON files in the specified directory and index them in OpenSearch.

    Args:
        document_path (str): Directory containing the document JSON files

    Returns:
        None
    """
    payload = ""
    put_command = f'{{ "create": {{ "_index": "{aoss_index}" }} }}\n'

    for file in os.listdir(document_path):
        if file.endswith("_document.json"):
            with open(os.path.join(document_path, file), "r") as f:
                tmp = json.load(f)
                payload += f"{put_command}{json.dumps(tmp)}\n"

    try:
        response = aoss_client.bulk(
            index=aoss_index,
            body=payload,
        )
        print(json.dumps(response, indent=4))
        row_count = int(len(payload.splitlines()) / 2)
        return row_count
    except Exception as ex:
        print(f"Error indexing documents: {ex}")
        return 0


row_count = load_and_index_documents(document_directory)
print(f"Total rows to index: {row_count}")        

We should see a [truncated] response similar to the following:

Total rows to index: 25
{
    "took": 547,
    "errors": false,
    "items": [
        {
            "create": {
                "_index": "video-search-nested",
                "_id": "1%3A0%3AlUTLepcBdThdSBWVDFT5",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 0,
                    "successful": 0,
                    "failed": 0
                },
                "_seq_no": 0,
                "_primary_term": 0,
                "status": 201
            }
        },
        {
            "create": {
                "_index": "video-search-nested",
                "_id": "1%3A0%3AlkTLepcBdThdSBWVDFT5",
...
            }
        }
    ]
}        

Since the Amazon OpenSearch Serverless index can take up to 60 seconds to refresh, we can poll the index until our documents are available for searching:

from time import sleep

# Wait for indexing to complete and refresh
response = aoss_client.count(index=aoss_index)
while response["count"] != row_count:
    response = aoss_client.count(index=aoss_index)
    print(f"Current indexed documents: {response['count']}")
    sleep(10)
print(f"Indexing completed. Total indexed documents: {response['count']}")        

Searching for Videos with OpenSearch Serverless

Amazon OpenSearch Serverless’s hybrid search capabilities combine the precision of vector similarity search with the flexibility of traditional text-based retrieval. This dual approach ensures comprehensive coverage of user intent while maintaining high relevance in search results.

Semantic Search

Semantic search leverages the multimodal embeddings generated by TwelveLabs to find videos based on conceptual similarity rather than exact keyword matches. Users can search for “people celebrating outdoors” and retrieve videos of parties, festivals, and gatherings even if these exact terms aren’t present in the metadata.

The vector search process involves:

  1. Query Encoding: Converting user queries into embeddings using the same TwelveLabs models used for indexing
  2. Similarity Calculation: Computing cosine similarity or other distance metrics between query and document vectors
  3. Result Ranking: Ordering results based on semantic similarity scores

Nested field search

According to OpenSearch documentation, using nested fields in a vector index, you can store multiple vectors in a single document. For example, if your document consists of various components, such as video segments, you can generate a vector value for each component and store each vector in a nested field.

A vector search operates at the field level. For a document with nested fields, OpenSearch examines only the vector nearest to the query vector to decide whether to include the document in the results. For example, consider an index containing documents A and B. Document A is represented by vectors A1 and A2, and document B is represented by vector B1. Further, the similarity order for a query Q is A1, A2, B1. If you search using query Q with a k value of 2, the search will return both documents A and B instead of only document A. This is key with our videos, which have been divided into clips, each with its own embeddings. Without the nested field search feature, we would most likely return multiple clips of the same video, rather than uniquely different videos.

This concept is further explained in the article, Enhanced multi-vector support for OpenSearch k-NN search with nested fields.

User’s Search Query

To perform a semantic search, we must convert the user’s search query into a vector embedding using Marengo. This is the same model used to produce the video vector embeddings.

def get_text_embedding_from_query(query: str) -> list:
    """Convert a text query to an embedding using TwelveLabs.

    Args:
        query (str): The text query to convert.

    Returns:
        list: The embedding vector.
    """
    res = tl_client.embed.create(
        model_name="Marengo-retrieval-2.7",
        text_truncate="start",
        text=query,
    )
    print(res)
    if res.text_embedding is not None and res.text_embedding.segments is not None:
        return res.text_embedding.segments[0].embeddings_float
    else:
        raise ValueError("Failed to retrieve embedding from the response.")

query = "bustling street scene from a low-angle perspective"
text_embedding = get_text_embedding_from_query(query)
print(f"Embedding: {text_embedding[:5]}...")        

The user’s query, “bustling street scene from a low-angle perspective,” is converted to a vector embedding:

Embedding: [0.0028839111, -0.029296875, -0.0079956055, -0.011047363, -0.0019454956, ...]        

Sample Image

To perform a semantic search, we can also use an image instead of a text-based query.

Article content
Sample image from a different Pexels video not part of the index

Similar to the text-based query, we can convert the image to a dense vector embedding:

def get_image_embedding_from_query(image_file: str) -> list:
    """Convert an image file to an embedding using TwelveLabs.

    Args:
        query (str): The text query to convert.

    Returns:
        list: The embedding vector.
    """
    res = tl_client.embed.create(
        model_name="Marengo-retrieval-2.7",
        image_file=image_file,
    )
    print(res)
    if res.image_embedding is not None and res.image_embedding.segments is not None:
        return res.image_embedding.segments[0].embeddings_float
    else:
        raise ValueError("Failed to retrieve embedding from the response.")


image_embedding = get_image_embedding_from_query("sample_image.jpg")        

Semantic Search

Before writing any Python code, we can test our semantic search in the OpenSearch Collection’s Dashboard’s Dev Tools UI using the index’s _search endpoint. For example (truncated embedding shown of the user’s search query):

GET /video-search-nested/_search
{
  "size": 5,
  "fields": [
    "system_metadata.filename",
    "title",
    "segments.start_offset_sec",
    "segments.end_offset_sec"
  ],
  "_source": false,
  "query": {
    "nested": {
      "path": "segments",
      "query": {
        "knn": {
          "segments.segment_embedding": {
            "vector": [
              0.02905363,
              -0.00882094,
              0.0012452933,
              ...
            ],
            "k": 5
          }
        }
      }
    }
  }
}        

Query and results as displayed in the OpenSearch Dashboard’s Dev Tools view:

Article content
Semantic search executed in the Collection's OpenSearch Dashboard's Dev Tools

We will now perform the semantic search using the OpenSearch Python SDK. Since the video embeddings are large, we will exclude them from the search results:

def semantic_search(aoss_index: str, embedding: list) -> dict:
    """Query the OpenSearch index using a text embedding.

    Args:
        aoss_index (str): The ID of the Amazon OpenSearch index.
        embedding (list): The embedding vector to use for the query.

    Returns:
        dict: The search response from OpenSearch.
    """
    query = {
        "fields": [
          "system_metadata.filename",
          "segment.start_offset_sec",
          "segment.end_offset_sec"
        ],
        "query": {
            "nested": {
                "path": "segments",
                "query": {
                    "knn": {
                        "segments.segment_embedding": {
                            "vector": embedding,
                            "k": 5,
                        }
                    }
                },
            }
        },
        "size": 5,
        "_source": {"excludes": ["segments.segment_embedding"]},
    }

    try:
        search_results = aoss_client.search(body=query, index=aoss_index)
        return search_results
    except Exception as ex:
        print(f"Error querying index: {ex}")
        return {}


# Query the index with the embedding
search_results_1 = semantic_search(aoss_index, query_embedding)

for hit in search_results_1["hits"]["hits"]:
    print(f"Video ID: {hit['_source']['video_id']}")
    print(f"Title: {hit['_source']['gist']['title']}")
    print(f"Score: {hit['_score']}")
    print(f"Duration: {hit['_source']['system_metadata']['duration']:.2f} seconds")
    print("\r")        

We see results that we would expect based on the indexed videos and the query, “bustling street scene from a low-angle perspective”:

Video ID: 684820715b5b47d9b345c2dd
Title: Street-Level Footage Captures the Flow of Daily Commuters and Traffic
Score: 0.45485634
Duration: 15.72 seconds

Video ID: 684ee29f3577b13b3f41c51a
Title: Bustling City Square: A Day in the Life of Pedestrians and Passersby
Score: 0.4479842
Duration: 28.07 seconds

Video ID: 6853656ba060f609bc07af8e
Title: Night Drive: Wet City Streets
Score: 0.4384909
Duration: 10.69 seconds

Video ID: 684820717de046cf7b90f4e6
Title: Exploring London's Busy Streets with Red Double-Decker Buses
Score: 0.4327768
Duration: 22.55 seconds

Video ID: 6853656a70cc831146386b4b
Title: Busy Intersection: A Blend of Cyclists and Car Traffic
Score: 0.42975047
Duration: 9.97 seconds        

Semantic Search with Filter

With OpenSearch, we can also apply filters to our semantic search. In the k-NN query clause, include the point of interest that is used to search for nearest neighbors, the number of nearest neighbors to return (k), and a filter with the restriction criteria. Let’s perform a semantic search, using the same user query (“bustling street scene from a low-angle perspective”), but this time add additional search criteria, filtering the total duration of the video to be between 20 and 60 seconds. The filter should eliminate some of the shorter videos returned from the previous unfiltered search:

def semantic_search_with_filter(aoss_index: str, embedding: list) -> dict:
    """Query the OpenSearch index using a text embedding.

    Args:
        aoss_index (str): The ID of the Amazon OpenSearch index.
        embedding (list): The embedding vector to use for the query.

    Returns:
        dict: The search response from OpenSearch.
    """
    query = {
        "query": {
            "nested": {
                "path": "segments",
                "query": {
                    "knn": {
                        "segments.segment_embedding": {
                            "vector": embedding,
                            "k": 5,
                            "filter": {
                                "bool": {
                                    "must": [
                                        {
                                            "range": {
                                                "system_metadata.duration": {
                                                    "gte": 20,
                                                    "lte": 60,
                                                }
                                            }
                                        },
                                    ]
                                }
                            },
                        }
                    }
                },
            }
        },
        "size": 5,
        "_source": {"excludes": ["segments.segment_embedding"]},
    }

    try:
        search_results = aoss_client.search(body=query, index=aoss_index)
        return search_results
    except Exception as ex:
        print(f"Error querying index: {ex}")
        return {}


# Query the index with the embedding
search_results_2 = semantic_search_with_filter(aoss_index, query_embedding)

for hit in search_results_2["hits"]["hits"]:
    print(f"Video ID: {hit['_source']['video_id']}")
    print(f"Title: {hit['_source']['gist']['title']}")
    print(f"Score: {hit['_score']}")
    print(f"Duration: {hit['_source']['system_metadata']['duration']:.2f} seconds")
    print("\r")        

We see results that we would expect based on the indexed videos, a query of “bustling street scene from a low-angle perspective,” and the additional search criteria of a duration between 20 and 60 seconds:

Video ID: 684ee29f3577b13b3f41c51a
Title: Bustling City Square: A Day in the Life of Pedestrians and Passersby
Score: 0.4479842
Duration: 28.07 seconds

Video ID: 684820717de046cf7b90f4e6
Title: Exploring London's Busy Streets with Red Double-Decker Buses
Score: 0.43277684
Duration: 22.55 seconds

Video ID: 6848207002481ab373c6de17
Title: Warehouse Encounter: A Cloak-and-Dagger Money Exchange
Score: 0.3978317
Duration: 30.12 seconds

Video ID: 6848207202481ab373c6de19
Title: Orange Uniformed Basketball Team Perfects Their Shooting Drills in Indoor Stadium
Score: 0.39230272
Duration: 23.51 seconds

Video ID: 6853656b70cc831146386b4d
Title: Productive Study Session in a Lecture Hall
Score: 0.38804084
Duration: 28.36 seconds        

Semantic Search: Include Inner Hits

As we’ve discussed, using nested fields in a vector index allows us to store multiple vectors in a single document. When you retrieve documents based on matches in nested fields, the response defaults to not containing information about which inner objects matched the query. Thus, it is not apparent why the document is a match. According to OpenSearch documentation, to include information about the matching nested fields in the response, you can provide the inner_hits object in your query:

def semantic_search_inner_hits(aoss_index: str, embedding: list) -> dict:
    """Query the OpenSearch index using a text embedding.

    Args:
        aoss_index (str): The ID of the Amazon OpenSearch index.
        embedding (list): The embedding vector to use for the query.

    Returns:
        dict: The search response from OpenSearch.
    """
    query = {
        "query": {
            "nested": {
                "path": "segments",
                "query": {
                    "knn": {
                        "segments.segment_embedding": {
                            "vector": embedding,
                            "k": 5,
                        }
                    }
                },
                "inner_hits": {
                    "_source": False,
                    "fields": ["segments.start_offset_sec", "segments.end_offset_sec"],
                },
            }
        },
        "size": 5,
        "_source": {"excludes": ["segments.segment_embedding"]},
    }

    try:
        search_results = aoss_client.search(body=query, index=aoss_index)
        return search_results
    except Exception as ex:
        print(f"Error querying index: {ex}")
        return {}


# Query the index with the embedding
search_results_3 = semantic_search_inner_hits(aoss_index, query_embedding)

for hit in search_results_3["hits"]["hits"]:
    print(f"Video ID: {hit['_source']['video_id']}")
    print(f"Title: {hit['_source']['gist']['title']}")
    print(f"Score: {hit['_score']}")
    print(f"Duration: {hit['_source']['system_metadata']['duration']:.2f} seconds")
    print("Matching Segment:")
    for segment in hit["inner_hits"]["segments"]["hits"]["hits"]:
        print(f"  Segment: {segment['_nested']['offset']}")
        print(f"    Score: {segment['_score']}")
        print(f"    Start: {segment['fields']['segments.start_offset_sec'][0]} seconds")
        print(f"    End: {segment['fields']['segments.end_offset_sec'][0]} seconds")
    print("\r")        

We see results that we would expect based on the indexed videos, a query of “bustling street scene from a low-angle perspective,” and the use of the inner_hits object:

Video ID: 684820715b5b47d9b345c2dd
Title: Street-Level Footage Captures the Flow of Daily Commuters and Traffic
Score: 0.45485628
Duration: 15.72 seconds
Matching Segment:
  Segment: 0
    Score: 0.45485628
    Start: 0.0 seconds
    End: 7.86 seconds

Video ID: 684ee29f3577b13b3f41c51a
Title: Bustling City Square: A Day in the Life of Pedestrians and Passersby
Score: 0.4479842
Duration: 28.07 seconds
Matching Segment:
  Segment: 1
    Score: 0.4479842
    Start: 9.355556 seconds
    End: 18.711111 seconds

Video ID: 6853656ba060f609bc07af8e
Title: Night Drive: Wet City Streets
Score: 0.43849096
Duration: 10.69 seconds
Matching Segment:
  Segment: 1
    Score: 0.43849096
    Start: 5.3433337 seconds
    End: 10.686667 seconds

Video ID: 684820717de046cf7b90f4e6
Title: Exploring London's Busy Streets with Red Double-Decker Buses
Score: 0.43277684
Duration: 22.55 seconds
Matching Segment:
  Segment: 0
    Score: 0.43277684
    Start: 0.0 seconds
    End: 7.5183334 seconds

Video ID: 6853656a70cc831146386b4b
Title: Busy Intersection: A Blend of Cyclists and Car Traffic
Score: 0.4297504
Duration: 9.97 seconds
Matching Segment:
  Segment: 0
    Score: 0.4297504
    Start: 0.0 seconds
    End: 9.966667 seconds        

Semantic Search: Retrieving all Nested Hits

By default, only the highest-scoring nested document is considered when you query nested fields. According to OpenSearch documentation, to retrieve the scores for all nested field documents within each parent document, set expand_nested_docs to true in your query. The parent document’s score is calculated as the average of its scores. To use the highest score among the nested field documents as the parent document’s score, set score_mode to max:

def semantic_search_all_inner_hits(aoss_index: str, embedding: list) -> dict:
    """Query the OpenSearch index using a text embedding.

    Args:
        aoss_index (str): The ID of the Amazon OpenSearch index.
        embedding (list): The embedding vector to use for the query.

    Returns:
        dict: The search response from OpenSearch.
    """
    query = {
        "query": {
            "nested": {
                "path": "segments",
                "query": {
                    "knn": {
                        "segments.segment_embedding": {
                            "vector": embedding,
                            "k": 5,
                            "expand_nested_docs": True,
                        }
                    }
                },
                "inner_hits": {
                    "_source": False,
                    "fields": ["segments.start_offset_sec", "segments.end_offset_sec"],
                },
                "score_mode": "max",
            }
        },
        "size": 5,
        "_source": {"excludes": ["segments.segment_embedding"]},
    }

    try:
        search_results = aoss_client.search(body=query, index=aoss_index)
        return search_results
    except Exception as ex:
        print(f"Error querying index: {ex}")
        return {}

# Query the index with the embedding
search_results_4 = semantic_search_all_inner_hits(aoss_index, query_embedding)
for hit in search_results_4["hits"]["hits"]:
    print(f"Video ID: {hit['_source']['video_id']}")
    print(f"Title: {hit['_source']['gist']['title']}")
    print(f"Score: {hit['_score']}")
    print(f"Duration: {hit['_source']['system_metadata']['duration']:.2f} seconds")
    print("Matching Segment(s):")
    for segment in hit["inner_hits"]["segments"]["hits"]["hits"]:
        print(f"  Segment: {segment['_nested']['offset']}")
        print(f"    Score: {segment['_score']}")
        print(f"    Start: {segment['fields']['segments.start_offset_sec'][0]} seconds")
        print(f"    End: {segment['fields']['segments.end_offset_sec'][0]} seconds")
    print("\r")        

We see results that we would expect based on the indexed videos, a query of “bustling street scene from a low-angle perspective,” and the use of the options to retrieve the scores for all nested field documents:

ideo ID: 684820715b5b47d9b345c2dd
Title: Street-Level Footage Captures the Flow of Daily Commuters and Traffic
Score: 0.45485628
Duration: 15.72 seconds
Matching Segment(s):
  Segment: 0
    Score: 0.45485628
    Start: 0.0 seconds
    End: 7.86 seconds
  Segment: 1
    Score: 0.4526655
    Start: 7.86 seconds
    End: 15.72 seconds

Video ID: 684ee29f3577b13b3f41c51a
Title: Bustling City Square: A Day in the Life of Pedestrians and Passersby
Score: 0.4479842
Duration: 28.07 seconds
Matching Segment(s):
  Segment: 1
    Score: 0.4479842
    Start: 9.355556 seconds
    End: 18.711111 seconds
  Segment: 2
    Score: 0.4462172
    Start: 18.711111 seconds
    End: 28.066668 seconds
  Segment: 0
    Score: 0.44612914
    Start: 0.0 seconds
    End: 9.355556 seconds

Video ID: 6853656ba060f609bc07af8e
Title: Night Drive: Wet City Streets
Score: 0.4384909
Duration: 10.69 seconds
Matching Segment(s):
  Segment: 1
    Score: 0.4384909
    Start: 5.3433337 seconds
    End: 10.686667 seconds
  Segment: 0
    Score: 0.43790838
    Start: 0.0 seconds
    End: 5.3433337 seconds

Video ID: 684820717de046cf7b90f4e6
Title: Exploring London's Busy Streets with Red Double-Decker Buses
Score: 0.43277684
Duration: 22.55 seconds
Matching Segment(s):
  Segment: 0
    Score: 0.43277684
    Start: 0.0 seconds
    End: 7.5183334 seconds
  Segment: 1
    Score: 0.43107942
    Start: 7.5183334 seconds
    End: 15.036667 seconds
  Segment: 2
    Score: 0.43025485
    Start: 15.036667 seconds
    End: 22.555 seconds

Video ID: 6853656a70cc831146386b4b
Title: Busy Intersection: A Blend of Cyclists and Car Traffic
Score: 0.42975047
Duration: 9.97 seconds
Matching Segment(s):
  Segment: 0
    Score: 0.42975047
    Start: 0.0 seconds
    End: 9.966667 seconds        

Radial Search

According to the OpenSearch documentation, Radial search enhances the vector search capabilities beyond approximate top-k searches. With radial search, you can search all points within a vector space that reside within a specified maximum distance or minimum score threshold from a query point. This provides increased flexibility and utility in search operations. Radial Search is compatible with the Faiss engine, which we are using for our OpenSearch index.

The max_distance specifies a physical distance within the vector space, identifying all points that are within this distance from the query point. This approach is beneficial for applications that require spatial proximity or precise distance measurements.

def radial_search(aoss_index: str, embedding: list) -> dict:
    """Query the OpenSearch index using a text embedding.

    Args:
        aoss_index (str): The ID of the Amazon OpenSearch index.
        embedding (list): The embedding vector to use for the query.

    Returns:
        dict: The search response from OpenSearch.
    """
    query = {
        "query": {
            "nested": {
                "path": "segments",
                "query": {
                    "knn": {
                        "segments.segment_embedding": {
                            "vector": embedding,
                            "max_distance": 2
                        }
                    }
                },
            }
        },
        "size": 5,
        "_source": {"excludes": ["segments.segment_embedding"]},
    }

    try:
        search_results = aoss_client.search(body=query, index=aoss_index)
        return search_results
    except Exception as ex:
        print(f"Error querying index: {ex}")
        return {}


# Query the index with the embedding
search_results_5 = semantic_search(aoss_index, query_embedding)

for hit in search_results_5["hits"]["hits"]:
    print(f"Video ID: {hit['_source']['video_id']}")
    print(f"Title: {hit['_source']['gist']['title']}")
    print(f"Score: {hit['_score']}")
    print(f"Duration: {hit['_source']['system_metadata']['duration']:.2f} seconds")
    print("\r")        

All documents that fall within the squared Euclidean distance (l2²) of 2 are returned based on a user query of “bustling street scene from a low-angle perspective”:

Video ID: 684820715b5b47d9b345c2dd
Title: Street-Level Footage Captures the Flow of Daily Commuters and Traffic
Score: 0.45485628
Duration: 15.72 seconds

Video ID: 684ee29f3577b13b3f41c51a
Title: Bustling City Square: A Day in the Life of Pedestrians and Passersby
Score: 0.4479842
Duration: 28.07 seconds

Video ID: 6853656ba060f609bc07af8e
Title: Night Drive: Wet City Streets
Score: 0.43849096
Duration: 10.69 seconds

Video ID: 684820717de046cf7b90f4e6
Title: Exploring London's Busy Streets with Red Double-Decker Buses
Score: 0.43277684
Duration: 22.55 seconds

Video ID: 6853656a70cc831146386b4b
Title: Busy Intersection: A Blend of Cyclists and Car Traffic
Score: 0.4297504
Duration: 9.97 seconds        

Visualizing Search Results

We can visualize the results using the URL of the thumbnail supplied by TwelveLabs when we retrieved the embeddings, which is now included in the OpenSearch Document: hit["_source"]["hls"]["thumbnail_urls"][0]. Here is a simple example using Pillow and matplotlib.

from matplotlib import pyplot as plt
from PIL import Image
from urllib import request
import io


def load_image_from_url(url):
    """Load an image from a URL.

    Args:
        url (str): The URL of the image to load.

    Returns:
        PIL.Image.Image: The loaded image.
    """
    try:
        with request.urlopen(url) as response:
            image_data = response.read()
            image = Image.open(io.BytesIO(image_data))
            return image
    except Exception as e:
        print(f"Error loading video thumbnail from URL: {e}")
        return None


index = 1
rows = 3
columns = 3

fig = plt.figure(figsize=(10, 7))

for hit in search_results_1["hits"]["hits"]:
    fig.set_dpi(300)
    fig.add_subplot(rows, columns, index)
    image_url = hit["_source"]["hls"]["thumbnail_urls"][0]
    image = load_image_from_url(image_url)
    plt.axis("off")
    plt.imshow(image)
    plt.title(
        f'Video: {hit["_source"]["system_metadata"]["filename"]}\nScore: {hit["_score"]}',
        fontdict=dict(family="Arial", size=8),
        color="black",
    )
    index += 1        

Now, we see a keyframe preview for each video in the search results:

Article content
Preview of videos alongside semantic search results

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) can be used to visualize how closely a user’s search query aligns with the results of a semantic search. PCA is a statistical method that reduces the dimensionality of large datasets by transforming a high number of variables into a smaller set that preserves most of the original information. In this context, we apply PCA to compress the 1,024-dimensional dense vector embeddings — representing both the video clips in the result set and the user query — into two dimensions. These reduced representations are then plotted in a two-dimensional space using Plotly, allowing for an intuitive visualization of their relative positions.

Article content
Two-dimensional scatter plot of search results vs. user query using PCA

With a relatively small index of just 25 videos, our analysis shows that, out of nine search results, five are in close visual proximity to the user’s search query. This closeness is reflected in their higher similarity scores. In OpenSearch, these vector search scores quantify the similarity between the query vector and the indexed vectors, indicating how closely related the corresponding documents are.

Additionally, by applying PCA, we can reduce the dimensionality of the semantic search results to three dimensions. This allows us to visualize their relative positions with greater detail using a 3D scatter plot.

Three-dimensional scatter plot of search results vs. user query using PCA

Performance and Scalability

OpenSearch Serverless’s Vector Engine is designed for enterprise-scale deployments, supporting billions of vectors with consistent millisecond response times. The service automatically scales based on demand, eliminating the need for manual capacity planning and infrastructure management.

The platform utilizes OpenSearch Compute Units (OCUs) as its scaling mechanism, with each OCU capable of handling up to 2 million vectors for 128 dimensions or 500,000 vectors for 768 dimensions at a 99% recall rate. This scalability ensures that video search applications can grow from prototype to production without architectural changes.

Conclusion

The combination of TwelveLabs’ multimodal AI models with Amazon OpenSearch Serverless creates a potent foundation for next-generation video search applications. By leveraging TwelveLabs’ multi-vector embeddings and comprehensive video analysis capabilities alongside OpenSearch Serverless’s scalable vector search infrastructure, organizations can build sophisticated video understanding systems that operate at enterprise scale.

Good news, AWS announced in April that TwelveLabs is coming soon to Amazon Bedrock. With Marengo and Pegasus in Amazon Bedrock, you will be able to use TwelveLabs’ models to build and scale generative AI applications without having to manage underlying infrastructure or access third-party APIs.

Article content

This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, images, logos, and brands are the property of their respective owners.

Yongkang H.

Cloud & DevOps Solutions | Kubernetes & AI Advocate | Founder @KSUG.AI | Community Leader (180K+ reach) | Creator @kubestrong & @awstronaut | Google GDE | AWS Community Builder | #1 kubestronaut

2mo

Gary, congrats on becoming an AWS golden jacket! Did you know 160 of your peers joined @awstronaut club? You're invited to join us too. You will be featured on our LinkedIn pages with 150,000+ social reach. You will receive an #awstronaut digital badge and certificate. And plus you will be featured on awstronaut global map. https://KubeSmart.ai/awstronaut More info about awstronaut http://gj.ksug.ai

Like
Reply
Gary Stafford

Experienced Technology Leader, Consultant, CTO, COO, President | Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | 14x AWS Certified / Gold Jacket

4mo

Posted a video version of the post with a code walkthrough on YouTube: https://youtu.be/aXyPRyCiqfE?si=sY9bnZrk_QywKZkI

Bobby Mohr

VP, Revenue @ TwelveLabs (We're Hiring!!)

4mo

thank you Gary Stafford!! as always, amazing stuff!

James Le

I help AI infrastructure startups cross the chasm

4mo

Gary Stafford This is an amazing tutorial!

Gary Stafford

Experienced Technology Leader, Consultant, CTO, COO, President | Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | 14x AWS Certified / Gold Jacket

5mo

All of the open-source code for this blog post is available on GitHub as part of a Jupyter Notebook: https://github.com/garystafford/twelve-labs-opensearch-demo

To view or add a comment, sign in

More articles by Gary Stafford

Others also viewed

Explore content categories