Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI on Google Cloud

1
Building
Generative AI with
Google
Abhi
Lead Customer Engineering Data Analytics and AI, MERC
AuNZ

Proprietary + Conﬁdential
A deep history of research and innovation at Google
Responsible AI at the foundation
Built & Tested
for Safety
Privacy in design
Upholds high scientific
standards
Accountable to People
Socially Beneficial
Avoid creating unfair
bias

Google Cloud
Enhance Employee
Productivity
Modernise Customer
Service
Employee & Developer Productivity
Document, Email & Analysis Assist | Improve code development |
Simplify DevOps | Automate Non-Coding Processes
Streamline
& Automate Business
Processes
Customer Service Modernisation
Boost Agent & Employee Productivity | Improve Self-Service & Deflection Rates |
Enhance customer insights & predictions
Back Office
of the Future
Procurement Contract
Management & Compliance |
HR Help Desk & Internal Travel
Bookings | Sales and Marketing
& Accounts Payable
Digital Commerce &
Website Modernisation
Enrich Catalogs & Streamline Content
Generation | Conversation Commerce &
Enhanced Web Navigation | Improve
Self-Service & Deflection Rates
Marketing
Creative & Content Generation |
Personalisation & Media
Performance | Insights &
Measurement
We see 3 productivity pillars
driven by ML and GenAI

The Stack

Vertex AI
Gemini Models
AI Hypercomputer
Gemini for
Google Cloud
Your Agents
Gemini for
Workspace

INTERNAL ONLY - DO NOT DISTRIBUTE
6
AI Hypercomputer:
next generation AI
supercomputing
architecture
Flexible Consumption
Dynamic Workload Scheduler On Demand CUD Spot
Open Software
JAX, TensorFlow, PyTorch
Multislice Training, Multihost Inference, XLA
Google Kubernetes Engine & Compute Engine
Performance-Optimized Hardware
Compute
(GPUs, TPUs)
Storage
(Block, File, Object)
Networking
(OCS, Jupiter)

TPU GKE/GCE
Integration GA
A3 GA
TPU v5p GA
HPC Toolkit
Support for A3 &
NeMO
SW GPU
Key TPU
TPU v5e GA
A3 Mega
Private Preview
Single Host
Inference
GA
Since 2015 Google has been rapidly enhancing its TPUs
TPU
Multislice
Training GA
TPU v5e
Public
Preview
Single Host
Inference Public
Preview
Multislice
Training Public
Preview
2023 Q3 2023 Q4 2024 Q1 2024 Q2 2024 Q3 2024 Q4
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
A3 Mega
GA
GPU
A4
Private
Preview
(Q1'25)
Multi Host
Inference Public
Preview
QRM support
for GPUs in
public preview
Future
Reservations in
Public Preview
Ops Agent
monitoring for
GPUs on GCE
GA
TPU v5p
Public Preview
TPU v6e
Public Preview
TPU v6e
GA
gSC
Foundations
Preview
DWS public preview
ML Perf 4.0
Inference
ML Perf 4.0
Training
DWS GA
A3 Ultra (H200)
Private Preview
A3 Edge (H100)
Seoul

Gemini offers the world’s largest context window

Gemini 1.0 Pro GPT-4 Turbo
Claude 3.5 Sonnet
Gemini 1.5 Pro
2M
2 hour video
22 hour audio
>60k lines of code
>1.4m words

Gemini on Vertex AI Gemma Open Models
Now available
GA
Now
GA
Now
Gemini 1.5 Flash
Fastest and most
cost-efficient model yet
Multimodality
Low Latency
Comparable quality as 1.5 Pro
(on common tasks)
Gemini 1.5 Pro
Native reasoning over enormous
amounts of data
2M Context Window
Multimodality
Versatile & top-tier quality
As of August 2024 : Gemini supported languages jumped from nearly 40 to over
100. This is important for us in APAC and can be paired with our Translation tools for
a much larger set of languages.

Mistral
Small | Large | Codestral
Claude 3.5 Sonnet
Open ecosystem that gives customers choice
Meta Llama 3.1
405B Model
GA
Now
State of the art 3rd party and open source models
are first-class citizens on Vertex
GA
July
Preview
July

Higher quality
Imagen 3 quality exceeds all leading
external competitors in aesthetics,
lower defects, prompt adherence,
and text on images (aspect ratios) 1:1,
9:16, 16:9, 3:4, 4:3
Safety built in
Digital watermark and safety
framework built in
Guardrails to limit reproduction
of people, scenes and much more
Prompt: a family of four sitting at the couch watching tv with their dog
Imagen 3 Fast
Imagen 3
Imagen 3: our latest image generation foundation model
Two new higher-quality model variants to help customers
optimise around quality and latency goals

AIOps represents a suite of technologies across the data lifecycle; however most customers
don’t “see” this end-to-end view and are building their stacks ad hoc
Prepare Develop Validate Prompt Deploy Infer Automate Monitor
Data Collection Model Selection Benchmarking
Prompt
Deconstruction
Model Hosting
(Inference / Serving)
RLHF Tooling
Agent Design &
Orchestration
Logging & Analytics
Data Preprocessing
(e.g., Chunking)
Model Pre-training
Performance
Evaluation
Prompt Libraries &
Templates
Model Caching
Prompt
Reconstruction
Connector Tooling
(Tool Aggregation)
Error & Usage
Analysis
Data Retrieval
(incl. RAG tooling)
Model Fine-Tuning
Model Resilience
Testing
Prompt Chaining Model Orchestration
Infrastructure
Provisioning
LLM Chaining
App / Model
Debugging
Data Labeling &
Annotation
Hyperparameter
Tuning
Model Efficiency
Tracking
Prompt Embedding &
Context Aug.
Distributed
Computing
Human-in-the-Loop
Tooling
Agent Memory
Management
Performance
Monitoring
Data Versioning &
Auditing
Model Hub (Registry)
& Version Control
Experiment Tracking
Automated
Prompt Testing
API & Service
Integrations
Agent Self-Eval
Tooling
Output & Drift
Monitoring
Model Distillation &
Quantization
Model Explainability
Prompt A/B Testing
(Comparison, Merge)
Load Balancing CI/CD Pipelines
Feature Store Grounding Autoscaling
Real-time Agent
Debugging
Govern
Security Compliance Data Privacy Bias Detection Transparency Guardrails Sustainability Disaster Recovery
AIOps Capability Map
Model Building Model Monitoring
Model Deployment
Native to LLMOps

Open Framework Support on Vertex AI
Ray on Vertex AI
Scale AI & Data with Ray
Developers face several major challenges when
scaling AI/ML workloads. Such as
1. Access to sufficient amount of CPU/GPUs
2. Diverse patterns and programming interfaces
3. Running the workload securely in production
With Ray on Vertex AI, OSS Ray users can run
securely on Vertex AI while enjoying both Ray’s
ergonomic APIs and Vertex’s scalable, secure, and
elastic infra.
& Saxml
Multi-host TPU with Saxml
● Saxml pre-built container
● Serve Llama 3 open models using multi-host
Cloud TPUs
PyTorch
● Co-host PyTorch models on the same VM
● Multiple endpoints can be deployed on the
same VM within a DeploymentResourcePool

Google Cloud
Model
“What is
a Pixel Tablet?”
“The Pixel Tablet was designed
by Google and contains a
Google Tensor G2 chip...”
With the latest
external knowledge
Less hallucinations
Vector DBs
Query: Pixel Tablet
A Brief History of LLM Applications
In the early days Retrieval Augmented Generation (RAG) fueled GenAI

Context caching
First provider to offer
context caching API
75%
Lower input price with
context caching*
Take advantage of millions-of-tokens
context windows
Available across both 1.5 Pro and 1.5 Flash
*with >=32K context window
Context
Prompt
Input Prefill
Response
Generation
Output
Input
Prompt
Without
caching
With
caching
Input Prefill
Response
Generation
Output
$$$$
$
Cache
Context
Q/A and Summisation
Vertex

Grounding
with Google
Search
GENERALLY
AVAILABLE
Only provider to offer
grounding with Google
Search (with Gemini)
Grounding
with 3P data
Coming Soon
Currently working with
premier providers such
as
Grounding
on your data
GENERALLY
AVAILABLE
Ground on private
documents and data in
Vertex AI Search
Provide context to
Grounding API directly
Grounding
with
high-fidelity
Experimental
Ensures high levels of
factuality in response
Dynamic
retrieval
Coming Soon
Smartly decide if retrieval is
needed
Optimizes cost while ensuring
factuality
Q/A and Summisation
Vertex
Grounding brings the world’s knowledge to find
the relevant information for GenAI

The provided sources only contain financial
information for Alphabet Inc. for Q3 2024 and
previous quarters, but do not include any
information about Google's revenue for Q4 2024.
Grounding Score: 3%
Grounding with High Fidelity:
Introducing grounding scores and
sourcing from provided context
Prompt:
What was Google’s Q1 2024 revenue?
What was YoY growth?
Google's revenue in Q1 2024 was $80.5 billion,
which represents a 15% year-over-year growth.
Grounding Score: 99.2%,
Source: 2024q1-alphabet-earnings-release-pdf
(Page 1)
Prompt:
What was Google’s Q4 2024 revenue?
Given context/Input:
Alphabet quarterly
and annual reports
Q/A and Summisation
Vertex

Google Cloud
21
“What is a
Pixel Tablet?”
“The Pixel Tablet was
designed by Google
and contains a Google
Tensor G2 chip…”
Reasoning and
orchestration
with the Tools
Letting LLM
calling the
functions
of the Tools
Search Vector DB
Wikipedia Other APIs
Deployment
Model: Let's query with Wikipedia...
Tool: Query "Pixel Tablet" on Wikipedia
Tool: Let's query with Wikipedia...
Model: Summarize the relevant part...
Tools
Model
Orchestration
A Brief History of LLM Applications
…and evolved to Generative AI Agents with reasoning and orchestration

Building GenAI Agents on Vertex AI
Model & Grounding
Orchestrate & Plan
Create, launch, and manage
your agents at scale
Google, 3rd Party & Open
Source Models
Taking Action (Tools)
Ground with Google
Search to access fresh, high
quality information
Ground on your own enterprise
data quickly with out-of-the-box
RAG in Vertex AI Search
Build DIY RAG providers with
LlamaIndex on Vertex
High-Fidelity / 3rd Party
Grounding
Connect LLMs to external tools;
call APIs and Services
Build at any level: no code1
, low
code, or full code options in
Vertex AI Agent Builder &
Agent API
Create your own actions with
Function Calling
accessing custom or private APIs
Deploy and orchestrate
custom agents with
LangChain on Vertex
Access pre-built reusable modules
with Extensions2
Agentic Solutions
Vertex

Transparency and Trust
with your GenAI solutions
● Side-by-side Evaluation
● Prompt Evaluation
● Explainability & Inspection
Google ShieldGemma
ShieldGemma is a suite of tools designed to
detect and mitigate harmful content in AI model
inputs and outputs.
ShieldGemma specifically targets hate speech,
harassment, sexually explicit, and dangerous
content.
Google provides rich tools to build safety and
trust into your experiences

For creators (UI), app developers (API), and AI practitioners (fine-tuning)
Search across Cloud
View data and machine learning artifacts across Google
Cloud products in a single place.
Discover key ML assets
Find champion models and golden datasets across projects
and regions, while still respecting IAM boundaries.
Augment with business metadata
Use Dataplex to document asset owners and additional
business metadata.
At a low price
Propagation and storage of Vertex AI technical metadata in
Dataplex is free. Pay only for Dataplex API usage and any
business metadata added via the Dataplex.
Preview
Data & ML Discovery with
Google Dataplex

25
Neo4j is a huge
unlock for RAG

Neo4j is Google’s Graph Database
Document Key-value In-memory Wide Column Graph Time-series Relational
DBaaS
Firestore
Serverless,
scalable
document
store
Cloud
Bigtable
Low latency,
scalable wide
column store
MemoryStore
Managed Redis
Cloud
Bigtable
Low latency,
scalable wide
column store
Fastest Path to
Graph
Cloud
Bigtable
Low latency,
scalable wide
column store
Cloud SQL - Managed
MySQL, PostgreSQL,
& SQL Server
Cloud Spanner
-Scalable relational
database
Neo4j Graph Database fills a gap in Google Cloud Platform.

Graph Augmented LLMs in action
Benefits of Using Graph Databases in RAG
● Enhanced Contextual Understanding: Graph databases
excel at capturing complex relationships between entities,
allowing the RAG system to better understand the context of
a query.
● Improved Retrieval Accuracy: Graph traversal algorithms
can efficiently traverse the knowledge graph to retrieve
relevant information, leading to more accurate responses.
● Explainable AI: The knowledge graph provides a
transparent and interpretable representation of the
information used to generate responses, making the AI
system more explainable.
Don’t rely on documents, bring relationships
between entities into the context

RAG layer
Graph DB
Applications for
knowledge
consumption
Knowledge
extraction and
ingestion
Structured
Unstructured
Ontologies
Data sources GenAI layer
Customer Service
Ticket Triaging
Recommendations
News Content & Discovery
Enterprise Knowledge
Search
Patient Prioritization
Clinical Decision Support
Systems
Pharmacovigilance
Health Assistants
FAQ Bots
Bloom
APIs
VertexAI
with Generative AI
Neo4j Aura
VertexAI
with Generative AI
Knowledge Graph with Semantic Search
Vector DB
Prompt
Engineering

Solution and Benefits
● Provider of commercial data, analytics and insights
for businesses spanning various sizes and sectors
internationally
● Use Neo4j to support our identity insights
business, including linked and matched data
● Answer questions that span connected data in
real-time, including Ultimate Beneficial Ownership
(UBO) information
Why Neo4j
● Needed a graph solution that aligned with their
cloud strategy
● Helps D&B focus on client needs rather than
database management
Large logistics network in
Australia
Solution and Benefits
● They needed a solution to better understand the
complex relationships within their logistics
network. They knew details about network
endpoints, but getting visibility across the network
of what happened in between was not possible.
● The lack of visibility means they cannot make
real-time decisions about asset flows, and their
ability to make strategic decisions about the
network is constrained due to a lack of
understanding of where bottlenecks are occurring.
Why Neo4j
● Needed a solution that could scale up to 32TB
and be mission critical for the organisation.
● Needed a graph solution that aligned with their
cloud strategy on GCP

Select Neo4j and Google Cloud Joint Customers

How will AI help you
run your business ?
Abhi
Lead Customer Engineering Data Analytics and AI, MERC
AuNZ

Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI on Google Cloud

More Related Content

Similar to Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI on Google Cloud

More from Neo4j

Recently uploaded

Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI on Google Cloud