1 | © Copyright 8/16/23 Zilliz
1 | © Copyright 8/16/23 Zilliz
Stephen Batifol | Zilliz
Milvus: Scaling Vector Data
Solutions for Gen AI
2 | © Copyright 8/16/23 Zilliz
2 | © Copyright 8/16/23 Zilliz
Stephen Batifol
Developer Advocate, Zilliz/ Milvus
stephen.batifol@zilliz.com
linkedin.com/in/stephen-batifol/
@stephenbtl
Speaker
3 | © Copyright 8/16/23 Zilliz
3 | © Copyright 8/16/23 Zilliz
29K
GitHub
Stars
25M
Downloads
250
Contributors
2,600
+
Forks
Milvus is an open-source vector database for GenAI projects. pip install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup
pip install
pymilvus to start
coding in a notebook
within seconds.
Reusable Code
Write once, and
deploy with one line
of code into the
production
environment
Integration
Plug into OpenAI,
Langchain,
LlamaIndex, and
many more
Feature-rich
Dense & sparse
embeddings,
Filtering, Reranking
and beyond
4 | © Copyright 8/16/23 Zilliz
4 | © Copyright 8/16/23 Zilliz
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
5 | © Copyright 8/16/23 Zilliz
5 | © Copyright 8/16/23 Zilliz
Retrieval Augmented
Generation RAG
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar ones to
make effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios in large datasets
for tasks like genre classification or
speech recognition
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Anomaly Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
Common AI Use Cases
6 | © Copyright 8/16/23 Zilliz
6 | © Copyright 8/16/23 Zilliz
6 | © Copyright 8/16/23 Zilliz
6 | © Copyright 8/16/23 Zilliz
01
Introduction to Vector DB
and Vector Search
7 | © Copyright 8/16/23 Zilliz
7 | © Copyright 8/16/23 Zilliz
Traditional database was built upon exact search
8 | © Copyright 8/16/23 Zilliz
8 | © Copyright 8/16/23 Zilliz
…which misses context, semantic meaning, and user intent
VS.
Apple
VS.
Rising dough
VS.
Change car tire
Rising Dough
Proofing Bread
✔
❌
9 | © Copyright 8/16/23 Zilliz
9 | © Copyright 8/16/23 Zilliz
…and cannot process increasingly growing unstructured data
Data Source: The Digitization of the World by IDC
20%
Other
newly generated data in 2025
will be unstructured data
80%
10 | © Copyright 8/16/23 Zilliz
10 | © Copyright 8/16/23 Zilliz
As Easy as a Numpy KNN?
11 | © Copyright 8/16/23 Zilliz
11 | © Copyright 8/16/23 Zilliz
Scale is a problem
12 | © Copyright 8/16/23 Zilliz
12 | © Copyright 8/16/23 Zilliz
• Search Quality - Hybrid Search? Filtering?
• Scalability - Billions of vectors?
• Multi tenancy - Isolating Multi-Tenant data
• Cost - Memory, disk, S3?
• Security - Data Safety and Privacy
TL;DR: Vector search libraries lack the infrastructure to help you scale,
deploy, and manage your apps in production.
Why Not Vector Search Libraries?
13 | © Copyright 8/16/23 Zilliz
13 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
13
Milvus
14 | © Copyright 8/16/23 Zilliz
14 | © Copyright 8/16/23 Zilliz
● Pip-install on your laptop
● Plug into your favorite AI dev tools
● Push to production with a single line of code
Easy to start
15 | © Copyright 8/16/23 Zilliz
15 | © Copyright 8/16/23 Zilliz
2024
Milvus Lite Milvus Standalone Milvus Distributed
● Ideal for prototyping,
small scale
experiments.
● Easy to set up and
use, pip instally
pymilvus
● Scale to ≈1M vectors
● Run on K8s
● Load balancer and
Multi-Node
Management
● Scaling of each
component
independently
● Scale to 100B
vectors
● Single-Node
Deployment
● Bundled in a single
Docker Image
● Supports Primary/
Secondary
● Scale up to 100M
vectors
Ready to scale 🚀
Write your code once, and run it everywhere, at scale!
● API and SDK are the same
16 | © Copyright 8/16/23 Zilliz
16 | © Copyright 8/16/23 Zilliz
Search Types
Support multiple types such
as top-K ANN, Range ANN,
Sparse & Dense,
Multi-vector, Grouping,
and Metadata Filtering
Enable query flexibility and
accuracy, allowing
developers to tailor their
information retrieval needs
Compute Types
Designed for various
compute powers, such as
AVX512, Neon for SIMD,
quantization cache-aware
optimization and GPU
Leverage strengths of each
hardware type, ensuring
high-speed processing and
cost-effective scalability for
different application needs
Multi-tenancy
Enable Multi-Tenancy
through collection and
partition management
Allow for efficient resource
utilization and customizable
data segregation, ensuring
secure and isolated data
handling for each tenant
Index Types
Offer a wide range of 15
indexes support, including
popular ones like HNSW,
PQ, Binary, Sparse,
DiskANN and GPU index
Empower developers with
tailored search
optimizations, catering to
performance, accuracy and
cost needs
Weʼve built technologies for various types of use
cases
17 | © Copyright 8/16/23 Zilliz
17 | © Copyright 8/16/23 Zilliz
2024
10B vectors
of 1536 dimensions
in a single Milvus/Zilliz Cloud
instance
100B vectors
in one of the largest deployment running
on K8s.
But at what Scale?
18 | © Copyright 8/16/23 Zilliz
18 | © Copyright 8/16/23 Zilliz
Vector
Databases
Where do Vectors Come From?
19 | © Copyright 8/16/23 Zilliz
19 | © Copyright 8/16/23 Zilliz
Vector Embedding
20 | © Copyright 8/16/23 Zilliz
20 | © Copyright 8/16/23 Zilliz
Vector Space
21 | © Copyright 8/16/23 Zilliz
21 | © Copyright 8/16/23 Zilliz
21 | © Copyright 8/16/23 Zilliz
21 | © Copyright 8/16/23 Zilliz
02
How do Vector Databases
Work?
22 | © Copyright 8/16/23 Zilliz
22 | © Copyright 8/16/23 Zilliz
How Similarity Search Works
Vn, 1
…
…
…
1
2
3
4
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database
23 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
23 | © Copyright 8/16/23 Zilliz
03
Achieving Billion+ Scale
vector Search with K8s
24 | © Copyright 8/16/23 Zilliz
24 | © Copyright 8/16/23 Zilliz
Milvus 🤝 Open-Source
MINIO
Store Vectors and
Indexes
Enables Milvus’ stateless
architecture
Kafka/ Pulsar
Handles Data Insertion
stream
Internal Component
Communications
Real-time updates to
Milvus
Prometheus /
Grafana
Collects metrics from
Milvus
Provides real-time
monitoring dashboards
Kubernetes
Milvus Operator CRDs
25 | © Copyright 8/16/23 Zilliz
25 | © Copyright 8/16/23 Zilliz
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message Storage
VECTOR
DATABASE
Access Layer
Query Node Data Node Index Node
Fully Distributed Architecture
26 | © Copyright 8/16/23 Zilliz
26 | © Copyright 8/16/23 Zilliz
Distributed
Architecture
27 | © Copyright 8/16/23 Zilliz
27 | © Copyright 8/16/23 Zilliz
Milvus Data Structures
Shard
• Boost the ingestion rate
Segment
• A single unit of Data in Milvus.
Segment < Partition < Collection
Growing Segment
• Directly retrieves data from the
message queue for rapid service.
Utilizes a brute-force index and
prioritizes data freshness.
Sealed Segment
• An immutable segment uses indexing
methods to guarantee efficiency.
28 | © Copyright 8/16/23 Zilliz
28 | © Copyright 8/16/23 Zilliz
Async Compaction
● DataNode merges segments into
bigger ones and requests IndexNode
to construct new indexes for them.
● QueryNode then loads these new big
indexes to replace the initial small
ones
Compaction
29 | © Copyright 8/16/23 Zilliz
29 | © Copyright 8/16/23 Zilliz
Optimizing for Performance & Cost Efficiency
Unified Object Storage
Uses Metadata and access patterns
to classify data as “hot” or “cold”
without moving it.
Data Temperature
“Hot” data is stored in memory
“Cold” is fetched from Object Storage
when needed
Optimized Data Retrieval
Queries are optimized based on whether
they're likely to hit hot or cold data.
Query Routing
Newer or frequently accessed segments
are more likely to be kept in the memory
cache.
Segment Management
Dynamically adjusts the resources
allocated to components, particularly the
memory allocated for caching in query
nodes.
Adaptive Resource Allocation
All data in Milvus (Hot & Cold) is stored
in Object Storage
30 | © Copyright 8/16/23 Zilliz
30 | © Copyright 8/16/23 Zilliz
Milvus 🤝 Kubernetes
Deployments
Milvus is stateless.
Deployments allow us to
scale up and down easily.
Horizontal Pod
Autoscaler (HPA)
Automatically scales up
and down based on
custom metrics (e.g.,
query latency, throughput)
Node Affinity
Specific nodes for Query/
Data Nodes.
GPU nodes if needed
31 | © Copyright 8/16/23 Zilliz
31 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
31
Vector Search at Scale
32 | © Copyright 8/16/23 Zilliz
32 | © Copyright 8/16/23 Zilliz
Indexing
Strategies
• Cluster based
• Graph based
• Hash based
• Tree based
33 | © Copyright 2023 Zilliz
33 | © Copyright 9/25/23 Zilliz
33
FLAT
34 | © Copyright 2023 Zilliz
34
FLAT Index
35 | © Copyright 2023 Zilliz
35 | © Copyright 9/25/23 Zilliz
35
Inverted File FLAT
IVFFLAT
36 | © Copyright 2023 Zilliz
36
IVFFLAT Index
37 | © Copyright 2023 Zilliz
37
IVFFLAT Index
38 | © Copyright 2023 Zilliz
38
IVFFLAT Index
39 | © Copyright 2023 Zilliz
39 | © Copyright 9/25/23 Zilliz
39
Hierarchical Navigable
Small World HNSW
40 | © Copyright 2023 Zilliz
40
HNSW
41 | © Copyright 2023 Zilliz
41
Picking an Index
● 100% Recall – Use FLAT search if you need 100% accuracy
● 10MB < index_size < 2GB  Standard IVF
● 2GB < index_size < 20GB  Consider PQ and HNSW
● 20GB < index_size < 200GB  Composite Index, IVF_PQ or
HNSW_SQ
● Disk-based indexes
zilliz.com/learn/choosing-right-vector-index-for-your-project
42 | © Copyright 8/16/23 Zilliz
42 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
42
Filtering
43 | © Copyright 8/16/23 Zilliz
43 | © Copyright 8/16/23 Zilliz
Filtering on Metadata
● Search Space Reduction w/ Pre-Filtering
● Bitset Wizardry 🧙
○ Use Compact Bitsets to represent Filter Matches
○ Low-level CPU operations for speed
● Scalar Indexing
○ Bloom Filter
○ Hash
○ Tree-based
44 | © Copyright 8/16/23 Zilliz
44 | © Copyright 8/16/23 Zilliz
• Distributed Search across
shards
• Parallel Processing
• Query Optimization
Scalable Search
45 | © Copyright 8/16/23 Zilliz
45 | © Copyright 8/16/23 Zilliz
Indexing + Filtering + Vector
Search = 🏎 🚀
46 | © Copyright 8/16/23 Zilliz
46 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
46
RAG
Retrieval Augmented Generation)
47 | © Copyright 8/16/23 Zilliz
47 | © Copyright 8/16/23 Zilliz
Basic Idea
Use RAG to force the LLM to work with your data
by injecting it via a vector database like Milvus
48 | © Copyright 8/16/23 Zilliz
48 | © Copyright 8/16/23 Zilliz
RAG vs. LLM
- Knowledge of LLM is out-of-date
- LLM can not get your private knowledge
- Help reducing Hallucinations
- Transparency and interpretability
RAG vs. Fine-tune
- Fine-tune is expensive
- Fine-tune spent much time
- RAG is pluggable
Why RAG?
49 | © Copyright 8/16/23 Zilliz
49 | © Copyright 8/16/23 Zilliz
Basic RAG Architecture
50 | © Copyright 8/16/23 Zilliz
50 | © Copyright 8/16/23 Zilliz
5 lines starter
51 | © Copyright 8/16/23 Zilliz
51 | © Copyright 8/16/23 Zilliz
RAG at Scale
● Pre-Filtering on Metadata
● Vector Search/ Hybrid Search
● Multi-Vector Search
○ Store Multiple vector embeddings per document
○ Easy Multi-modal RAG
● GPU Search?
● Dynamic updates (real-time inserts & updates)
● Scale up w/ K8s
○ Auto-scaling based on Query Load and Data Size
52 | © Copyright 8/16/23 Zilliz
52 | © Copyright 8/16/23 Zilliz
| © Copyright 8/16/23 Zilliz
52
Demo!
github.com/stephen37/talks/blob/main/search_at_scale/OSS_EU.ipynb
53 | © Copyright 8/16/23 Zilliz
53 | © Copyright 8/16/23 Zilliz
53 | © Copyright 8/16/23 Zilliz
53 | © Copyright 8/16/23 Zilliz
09 Agentic RAG
54 | © Copyright 8/16/23 Zilliz
54 | © Copyright 8/16/23 Zilliz
General Ideas
55 | © Copyright 8/16/23 Zilliz
55 | © Copyright 8/16/23 Zilliz
milvus.io
github.com/milvus-io/
@milvusio
@stephenbtl
/in/stephen-batifol
Thank you
56 | © Copyright 8/16/23 Zilliz
56 | © Copyright 8/16/23 Zilliz

Milvus: Scaling Vector Data Solutions for Gen AI

  • 1.
    1 | ©Copyright 8/16/23 Zilliz 1 | © Copyright 8/16/23 Zilliz Stephen Batifol | Zilliz Milvus: Scaling Vector Data Solutions for Gen AI
  • 2.
    2 | ©Copyright 8/16/23 Zilliz 2 | © Copyright 8/16/23 Zilliz Stephen Batifol Developer Advocate, Zilliz/ Milvus stephen.batifol@zilliz.com linkedin.com/in/stephen-batifol/ @stephenbtl Speaker
  • 3.
    3 | ©Copyright 8/16/23 Zilliz 3 | © Copyright 8/16/23 Zilliz 29K GitHub Stars 25M Downloads 250 Contributors 2,600 + Forks Milvus is an open-source vector database for GenAI projects. pip install on your laptop, plug into popular AI dev tools, and push to production with a single line of code. Easy Setup pip install pymilvus to start coding in a notebook within seconds. Reusable Code Write once, and deploy with one line of code into the production environment Integration Plug into OpenAI, Langchain, LlamaIndex, and many more Feature-rich Dense & sparse embeddings, Filtering, Reranking and beyond
  • 4.
    4 | ©Copyright 8/16/23 Zilliz 4 | © Copyright 8/16/23 Zilliz Well-connected in LLM infrastructure to enable RAG use cases Framework Hardware Infrastructure Embedding Models LLMs Software Infrastructure Vector Database
  • 5.
    5 | ©Copyright 8/16/23 Zilliz 5 | © Copyright 8/16/23 Zilliz Retrieval Augmented Generation RAG Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications. Match user behavior or content features with other similar ones to make effective recommendations. Recommender System Search for semantically similar texts across vast amounts of natural language documents. Text/ Semantic Search Image Similarity Search Identify and search for visually similar images or objects from a vast collection of image libraries. Video Similarity Search Search for similar videos, scenes, or objects from extensive collections of video libraries. Audio Similarity Search Find similar audios in large datasets for tasks like genre classification or speech recognition Molecular Similarity Search Search for similar substructures, superstructures, and other structures for a specific molecule. Anomaly Detection Detect data points, events, and observations that deviate significantly from the usual pattern Multimodal Similarity Search Search over multiple types of data simultaneously, e.g. text and images Common AI Use Cases
  • 6.
    6 | ©Copyright 8/16/23 Zilliz 6 | © Copyright 8/16/23 Zilliz 6 | © Copyright 8/16/23 Zilliz 6 | © Copyright 8/16/23 Zilliz 01 Introduction to Vector DB and Vector Search
  • 7.
    7 | ©Copyright 8/16/23 Zilliz 7 | © Copyright 8/16/23 Zilliz Traditional database was built upon exact search
  • 8.
    8 | ©Copyright 8/16/23 Zilliz 8 | © Copyright 8/16/23 Zilliz …which misses context, semantic meaning, and user intent VS. Apple VS. Rising dough VS. Change car tire Rising Dough Proofing Bread ✔ ❌
  • 9.
    9 | ©Copyright 8/16/23 Zilliz 9 | © Copyright 8/16/23 Zilliz …and cannot process increasingly growing unstructured data Data Source: The Digitization of the World by IDC 20% Other newly generated data in 2025 will be unstructured data 80%
  • 10.
    10 | ©Copyright 8/16/23 Zilliz 10 | © Copyright 8/16/23 Zilliz As Easy as a Numpy KNN?
  • 11.
    11 | ©Copyright 8/16/23 Zilliz 11 | © Copyright 8/16/23 Zilliz Scale is a problem
  • 12.
    12 | ©Copyright 8/16/23 Zilliz 12 | © Copyright 8/16/23 Zilliz • Search Quality - Hybrid Search? Filtering? • Scalability - Billions of vectors? • Multi tenancy - Isolating Multi-Tenant data • Cost - Memory, disk, S3? • Security - Data Safety and Privacy TL;DR: Vector search libraries lack the infrastructure to help you scale, deploy, and manage your apps in production. Why Not Vector Search Libraries?
  • 13.
    13 | ©Copyright 8/16/23 Zilliz 13 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 13 Milvus
  • 14.
    14 | ©Copyright 8/16/23 Zilliz 14 | © Copyright 8/16/23 Zilliz ● Pip-install on your laptop ● Plug into your favorite AI dev tools ● Push to production with a single line of code Easy to start
  • 15.
    15 | ©Copyright 8/16/23 Zilliz 15 | © Copyright 8/16/23 Zilliz 2024 Milvus Lite Milvus Standalone Milvus Distributed ● Ideal for prototyping, small scale experiments. ● Easy to set up and use, pip instally pymilvus ● Scale to ≈1M vectors ● Run on K8s ● Load balancer and Multi-Node Management ● Scaling of each component independently ● Scale to 100B vectors ● Single-Node Deployment ● Bundled in a single Docker Image ● Supports Primary/ Secondary ● Scale up to 100M vectors Ready to scale 🚀 Write your code once, and run it everywhere, at scale! ● API and SDK are the same
  • 16.
    16 | ©Copyright 8/16/23 Zilliz 16 | © Copyright 8/16/23 Zilliz Search Types Support multiple types such as top-K ANN, Range ANN, Sparse & Dense, Multi-vector, Grouping, and Metadata Filtering Enable query flexibility and accuracy, allowing developers to tailor their information retrieval needs Compute Types Designed for various compute powers, such as AVX512, Neon for SIMD, quantization cache-aware optimization and GPU Leverage strengths of each hardware type, ensuring high-speed processing and cost-effective scalability for different application needs Multi-tenancy Enable Multi-Tenancy through collection and partition management Allow for efficient resource utilization and customizable data segregation, ensuring secure and isolated data handling for each tenant Index Types Offer a wide range of 15 indexes support, including popular ones like HNSW, PQ, Binary, Sparse, DiskANN and GPU index Empower developers with tailored search optimizations, catering to performance, accuracy and cost needs Weʼve built technologies for various types of use cases
  • 17.
    17 | ©Copyright 8/16/23 Zilliz 17 | © Copyright 8/16/23 Zilliz 2024 10B vectors of 1536 dimensions in a single Milvus/Zilliz Cloud instance 100B vectors in one of the largest deployment running on K8s. But at what Scale?
  • 18.
    18 | ©Copyright 8/16/23 Zilliz 18 | © Copyright 8/16/23 Zilliz Vector Databases Where do Vectors Come From?
  • 19.
    19 | ©Copyright 8/16/23 Zilliz 19 | © Copyright 8/16/23 Zilliz Vector Embedding
  • 20.
    20 | ©Copyright 8/16/23 Zilliz 20 | © Copyright 8/16/23 Zilliz Vector Space
  • 21.
    21 | ©Copyright 8/16/23 Zilliz 21 | © Copyright 8/16/23 Zilliz 21 | © Copyright 8/16/23 Zilliz 21 | © Copyright 8/16/23 Zilliz 02 How do Vector Databases Work?
  • 22.
    22 | ©Copyright 8/16/23 Zilliz 22 | © Copyright 8/16/23 Zilliz How Similarity Search Works Vn, 1 … … … 1 2 3 4 5 Transform into Vectors Unstructured Data Images User Generated Content Video Documents Audio Vector Embeddings Perform Approximate Nearest Neighbor Similarity Search Perform Query Get Results Store in Vector Database
  • 23.
    23 | ©Copyright 8/16/23 Zilliz 23 | © Copyright 8/16/23 Zilliz 23 | © Copyright 8/16/23 Zilliz 23 | © Copyright 8/16/23 Zilliz 03 Achieving Billion+ Scale vector Search with K8s
  • 24.
    24 | ©Copyright 8/16/23 Zilliz 24 | © Copyright 8/16/23 Zilliz Milvus 🤝 Open-Source MINIO Store Vectors and Indexes Enables Milvus’ stateless architecture Kafka/ Pulsar Handles Data Insertion stream Internal Component Communications Real-time updates to Milvus Prometheus / Grafana Collects metrics from Milvus Provides real-time monitoring dashboards Kubernetes Milvus Operator CRDs
  • 25.
    25 | ©Copyright 8/16/23 Zilliz 25 | © Copyright 8/16/23 Zilliz Meta Storage Root Query Data Index Coordinator Service Proxy Proxy etcd Log Broker SDK Load Balancer DDL/DCL DML NOTIFICATION CONTROL SIGNAL Object Storage Minio / S3 / AzureBlob Log Snapshot Delta File Index File Worker Node QUERY DATA DATA Message Storage VECTOR DATABASE Access Layer Query Node Data Node Index Node Fully Distributed Architecture
  • 26.
    26 | ©Copyright 8/16/23 Zilliz 26 | © Copyright 8/16/23 Zilliz Distributed Architecture
  • 27.
    27 | ©Copyright 8/16/23 Zilliz 27 | © Copyright 8/16/23 Zilliz Milvus Data Structures Shard • Boost the ingestion rate Segment • A single unit of Data in Milvus. Segment < Partition < Collection Growing Segment • Directly retrieves data from the message queue for rapid service. Utilizes a brute-force index and prioritizes data freshness. Sealed Segment • An immutable segment uses indexing methods to guarantee efficiency.
  • 28.
    28 | ©Copyright 8/16/23 Zilliz 28 | © Copyright 8/16/23 Zilliz Async Compaction ● DataNode merges segments into bigger ones and requests IndexNode to construct new indexes for them. ● QueryNode then loads these new big indexes to replace the initial small ones Compaction
  • 29.
    29 | ©Copyright 8/16/23 Zilliz 29 | © Copyright 8/16/23 Zilliz Optimizing for Performance & Cost Efficiency Unified Object Storage Uses Metadata and access patterns to classify data as “hot” or “cold” without moving it. Data Temperature “Hot” data is stored in memory “Cold” is fetched from Object Storage when needed Optimized Data Retrieval Queries are optimized based on whether they're likely to hit hot or cold data. Query Routing Newer or frequently accessed segments are more likely to be kept in the memory cache. Segment Management Dynamically adjusts the resources allocated to components, particularly the memory allocated for caching in query nodes. Adaptive Resource Allocation All data in Milvus (Hot & Cold) is stored in Object Storage
  • 30.
    30 | ©Copyright 8/16/23 Zilliz 30 | © Copyright 8/16/23 Zilliz Milvus 🤝 Kubernetes Deployments Milvus is stateless. Deployments allow us to scale up and down easily. Horizontal Pod Autoscaler (HPA) Automatically scales up and down based on custom metrics (e.g., query latency, throughput) Node Affinity Specific nodes for Query/ Data Nodes. GPU nodes if needed
  • 31.
    31 | ©Copyright 8/16/23 Zilliz 31 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 31 Vector Search at Scale
  • 32.
    32 | ©Copyright 8/16/23 Zilliz 32 | © Copyright 8/16/23 Zilliz Indexing Strategies • Cluster based • Graph based • Hash based • Tree based
  • 33.
    33 | ©Copyright 2023 Zilliz 33 | © Copyright 9/25/23 Zilliz 33 FLAT
  • 34.
    34 | ©Copyright 2023 Zilliz 34 FLAT Index
  • 35.
    35 | ©Copyright 2023 Zilliz 35 | © Copyright 9/25/23 Zilliz 35 Inverted File FLAT IVFFLAT
  • 36.
    36 | ©Copyright 2023 Zilliz 36 IVFFLAT Index
  • 37.
    37 | ©Copyright 2023 Zilliz 37 IVFFLAT Index
  • 38.
    38 | ©Copyright 2023 Zilliz 38 IVFFLAT Index
  • 39.
    39 | ©Copyright 2023 Zilliz 39 | © Copyright 9/25/23 Zilliz 39 Hierarchical Navigable Small World HNSW
  • 40.
    40 | ©Copyright 2023 Zilliz 40 HNSW
  • 41.
    41 | ©Copyright 2023 Zilliz 41 Picking an Index ● 100% Recall – Use FLAT search if you need 100% accuracy ● 10MB < index_size < 2GB  Standard IVF ● 2GB < index_size < 20GB  Consider PQ and HNSW ● 20GB < index_size < 200GB  Composite Index, IVF_PQ or HNSW_SQ ● Disk-based indexes zilliz.com/learn/choosing-right-vector-index-for-your-project
  • 42.
    42 | ©Copyright 8/16/23 Zilliz 42 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 42 Filtering
  • 43.
    43 | ©Copyright 8/16/23 Zilliz 43 | © Copyright 8/16/23 Zilliz Filtering on Metadata ● Search Space Reduction w/ Pre-Filtering ● Bitset Wizardry 🧙 ○ Use Compact Bitsets to represent Filter Matches ○ Low-level CPU operations for speed ● Scalar Indexing ○ Bloom Filter ○ Hash ○ Tree-based
  • 44.
    44 | ©Copyright 8/16/23 Zilliz 44 | © Copyright 8/16/23 Zilliz • Distributed Search across shards • Parallel Processing • Query Optimization Scalable Search
  • 45.
    45 | ©Copyright 8/16/23 Zilliz 45 | © Copyright 8/16/23 Zilliz Indexing + Filtering + Vector Search = 🏎 🚀
  • 46.
    46 | ©Copyright 8/16/23 Zilliz 46 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 46 RAG Retrieval Augmented Generation)
  • 47.
    47 | ©Copyright 8/16/23 Zilliz 47 | © Copyright 8/16/23 Zilliz Basic Idea Use RAG to force the LLM to work with your data by injecting it via a vector database like Milvus
  • 48.
    48 | ©Copyright 8/16/23 Zilliz 48 | © Copyright 8/16/23 Zilliz RAG vs. LLM - Knowledge of LLM is out-of-date - LLM can not get your private knowledge - Help reducing Hallucinations - Transparency and interpretability RAG vs. Fine-tune - Fine-tune is expensive - Fine-tune spent much time - RAG is pluggable Why RAG?
  • 49.
    49 | ©Copyright 8/16/23 Zilliz 49 | © Copyright 8/16/23 Zilliz Basic RAG Architecture
  • 50.
    50 | ©Copyright 8/16/23 Zilliz 50 | © Copyright 8/16/23 Zilliz 5 lines starter
  • 51.
    51 | ©Copyright 8/16/23 Zilliz 51 | © Copyright 8/16/23 Zilliz RAG at Scale ● Pre-Filtering on Metadata ● Vector Search/ Hybrid Search ● Multi-Vector Search ○ Store Multiple vector embeddings per document ○ Easy Multi-modal RAG ● GPU Search? ● Dynamic updates (real-time inserts & updates) ● Scale up w/ K8s ○ Auto-scaling based on Query Load and Data Size
  • 52.
    52 | ©Copyright 8/16/23 Zilliz 52 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 52 Demo! github.com/stephen37/talks/blob/main/search_at_scale/OSS_EU.ipynb
  • 53.
    53 | ©Copyright 8/16/23 Zilliz 53 | © Copyright 8/16/23 Zilliz 53 | © Copyright 8/16/23 Zilliz 53 | © Copyright 8/16/23 Zilliz 09 Agentic RAG
  • 54.
    54 | ©Copyright 8/16/23 Zilliz 54 | © Copyright 8/16/23 Zilliz General Ideas
  • 55.
    55 | ©Copyright 8/16/23 Zilliz 55 | © Copyright 8/16/23 Zilliz milvus.io github.com/milvus-io/ @milvusio @stephenbtl /in/stephen-batifol Thank you
  • 56.
    56 | ©Copyright 8/16/23 Zilliz 56 | © Copyright 8/16/23 Zilliz