Milvus: Scaling Vector Data Solutions for Gen AI

1 | © Copyright 8/16/23 Zilliz
Stephen Batifol | Zilliz
Milvus: Scaling Vector Data
Solutions for Gen AI

Stephen Batifol
Developer Advocate, Zilliz/ Milvus
stephen.batifol@zilliz.com
linkedin.com/in/stephen-batifol/
@stephenbtl
Speaker

29K
GitHub
Stars
25M
Downloads
250
Contributors
2,600
+
Forks
Milvus is an open-source vector database for GenAI projects. pip install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup
pip install
pymilvus to start
coding in a notebook
within seconds.
Reusable Code
Write once, and
deploy with one line
of code into the
production
environment
Integration
Plug into OpenAI,
Langchain,
LlamaIndex, and
many more
Feature-rich
Dense & sparse
embeddings,
Filtering, Reranking
and beyond

Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database

Retrieval Augmented
Generation RAG
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar ones to
make effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios in large datasets
for tasks like genre classification or
speech recognition
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Anomaly Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
Common AI Use Cases

01
Introduction to Vector DB
and Vector Search

Traditional database was built upon exact search

…which misses context, semantic meaning, and user intent
VS.
Apple
VS.
Rising dough
VS.
Change car tire
Rising Dough
Proofing Bread
✔
❌

…and cannot process increasingly growing unstructured data
Data Source: The Digitization of the World by IDC
20%
Other
newly generated data in 2025
will be unstructured data
80%

As Easy as a Numpy KNN?

Scale is a problem

• Search Quality - Hybrid Search? Filtering?
• Scalability - Billions of vectors?
• Multi tenancy - Isolating Multi-Tenant data
• Cost - Memory, disk, S3?
• Security - Data Safety and Privacy
TL;DR: Vector search libraries lack the infrastructure to help you scale,
deploy, and manage your apps in production.
Why Not Vector Search Libraries?

| © Copyright 8/16/23 Zilliz
13
Milvus

● Pip-install on your laptop
● Plug into your favorite AI dev tools
● Push to production with a single line of code
Easy to start

2024
Milvus Lite Milvus Standalone Milvus Distributed
● Ideal for prototyping,
small scale
experiments.
● Easy to set up and
use, pip instally
pymilvus
● Scale to ≈1M vectors
● Run on K8s
● Load balancer and
Multi-Node
Management
● Scaling of each
component
independently
● Scale to 100B
vectors
● Single-Node
Deployment
● Bundled in a single
Docker Image
● Supports Primary/
Secondary
● Scale up to 100M
vectors
Ready to scale 🚀
Write your code once, and run it everywhere, at scale!
● API and SDK are the same

Search Types
Support multiple types such
as top-K ANN, Range ANN,
Sparse & Dense,
Multi-vector, Grouping,
and Metadata Filtering
Enable query flexibility and
accuracy, allowing
developers to tailor their
information retrieval needs
Compute Types
Designed for various
compute powers, such as
AVX512, Neon for SIMD,
quantization cache-aware
optimization and GPU
Leverage strengths of each
hardware type, ensuring
high-speed processing and
cost-effective scalability for
different application needs
Multi-tenancy
Enable Multi-Tenancy
through collection and
partition management
Allow for efficient resource
utilization and customizable
data segregation, ensuring
secure and isolated data
handling for each tenant
Index Types
Offer a wide range of 15
indexes support, including
popular ones like HNSW,
PQ, Binary, Sparse,
DiskANN and GPU index
Empower developers with
tailored search
optimizations, catering to
performance, accuracy and
cost needs
Weʼve built technologies for various types of use
cases

2024
10B vectors
of 1536 dimensions
in a single Milvus/Zilliz Cloud
instance
100B vectors
in one of the largest deployment running
on K8s.
But at what Scale?

Vector
Databases
Where do Vectors Come From?

Vector Embedding

Vector Space

02
How do Vector Databases
Work?

How Similarity Search Works
Vn, 1
…
…
…
1
2
3
4
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database

03
Achieving Billion+ Scale
vector Search with K8s

Milvus 🤝 Open-Source
MINIO
Store Vectors and
Indexes
Enables Milvus’ stateless
architecture
Kafka/ Pulsar
Handles Data Insertion
stream
Internal Component
Communications
Real-time updates to
Milvus
Prometheus /
Grafana
Collects metrics from
Milvus
Provides real-time
monitoring dashboards
Kubernetes
Milvus Operator CRDs

Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node QUERY DATA DATA
Message Storage
VECTOR
DATABASE
Access Layer
Query Node Data Node Index Node
Fully Distributed Architecture

Distributed
Architecture

Milvus Data Structures
Shard
• Boost the ingestion rate
Segment
• A single unit of Data in Milvus.
Segment < Partition < Collection
Growing Segment
• Directly retrieves data from the
message queue for rapid service.
Utilizes a brute-force index and
prioritizes data freshness.
Sealed Segment
• An immutable segment uses indexing
methods to guarantee efficiency.

Async Compaction
● DataNode merges segments into
bigger ones and requests IndexNode
to construct new indexes for them.
● QueryNode then loads these new big
indexes to replace the initial small
ones
Compaction

Optimizing for Performance & Cost Efficiency
Uniﬁed Object Storage
Uses Metadata and access patterns
to classify data as “hot” or “cold”
without moving it.
Data Temperature
“Hot” data is stored in memory
“Cold” is fetched from Object Storage
when needed
Optimized Data Retrieval
Queries are optimized based on whether
they're likely to hit hot or cold data.
Query Routing
Newer or frequently accessed segments
are more likely to be kept in the memory
cache.
Segment Management
Dynamically adjusts the resources
allocated to components, particularly the
memory allocated for caching in query
nodes.
Adaptive Resource Allocation
All data in Milvus (Hot & Cold) is stored
in Object Storage

Milvus 🤝 Kubernetes
Deployments
Milvus is stateless.
Deployments allow us to
scale up and down easily.
Horizontal Pod
Autoscaler (HPA)
Automatically scales up
and down based on
custom metrics (e.g.,
query latency, throughput)
Node Aﬃnity
Speciﬁc nodes for Query/
Data Nodes.
GPU nodes if needed

31
Vector Search at Scale

Indexing
Strategies
• Cluster based
• Graph based
• Hash based
• Tree based

33 | © Copyright 2023 Zilliz
33
FLAT

34
FLAT Index

35
Inverted File FLAT
IVFFLAT

36
IVFFLAT Index

37
IVFFLAT Index

38
IVFFLAT Index

39
Hierarchical Navigable
Small World HNSW

40
HNSW

41
Picking an Index
● 100% Recall – Use FLAT search if you need 100% accuracy
● 10MB < index_size < 2GB  Standard IVF
● 2GB < index_size < 20GB  Consider PQ and HNSW
● 20GB < index_size < 200GB  Composite Index, IVF_PQ or
HNSW_SQ
● Disk-based indexes
zilliz.com/learn/choosing-right-vector-index-for-your-project

42
Filtering

Filtering on Metadata
● Search Space Reduction w/ Pre-Filtering
● Bitset Wizardry 🧙
○ Use Compact Bitsets to represent Filter Matches
○ Low-level CPU operations for speed
● Scalar Indexing
○ Bloom Filter
○ Hash
○ Tree-based

• Distributed Search across
shards
• Parallel Processing
• Query Optimization
Scalable Search

Indexing + Filtering + Vector
Search = 🏎 🚀

46
RAG
Retrieval Augmented Generation)

Basic Idea
Use RAG to force the LLM to work with your data
by injecting it via a vector database like Milvus

RAG vs. LLM
- Knowledge of LLM is out-of-date
- LLM can not get your private knowledge
- Help reducing Hallucinations
- Transparency and interpretability
RAG vs. Fine-tune
- Fine-tune is expensive
- Fine-tune spent much time
- RAG is pluggable
Why RAG?

Basic RAG Architecture

5 lines starter

RAG at Scale
● Pre-Filtering on Metadata
● Vector Search/ Hybrid Search
● Multi-Vector Search
○ Store Multiple vector embeddings per document
○ Easy Multi-modal RAG
● GPU Search?
● Dynamic updates (real-time inserts & updates)
● Scale up w/ K8s
○ Auto-scaling based on Query Load and Data Size

52
Demo!
github.com/stephen37/talks/blob/main/search_at_scale/OSS_EU.ipynb

09 Agentic RAG

General Ideas

milvus.io
github.com/milvus-io/
@milvusio
@stephenbtl
/in/stephen-batifol
Thank you

Milvus: Scaling Vector Data Solutions for Gen AI

More Related Content

Similar to Milvus: Scaling Vector Data Solutions for Gen AI

More from Zilliz

Recently uploaded

Milvus: Scaling Vector Data Solutions for Gen AI