GitLab Copyright
Talk to Your Data:
Ved Prakash
Staff Data Engineer, GitLab
Implementing Natural Language Analytics with
Snowflake Cortex
Disclaimer
The information provided in this presentation represents the
personal opinion of the presenter(s) and does not represent
the opinion of GitLab Inc., or its affiliates. GitLab Inc. is not
responsible for the accuracy or reliability of the information
presented.
About me
© 2023 GitLab Inc.
GitLab company
Our mission: Everyone Can Contribute
GitLab CREDIT values
Collaboration
Results for
customers
Iteration Transparency
Efficiency
inclusion & belonging
Diversity,
© 2023 GitLab Inc.
Plan
Create
Verify
Package
Secure
Deploy
Monitor
Govern
What GitLab is doing?
The DevSecOps
Platform
delivered as a
single application
to help you
iterate faster and
innovate together
© 2023 GitLab Inc.
The Data Platform Team
Part of the central data team
Central Data Team
Data organisation
Hub & Spoke model
●The central data team supports
distributed analysts in their respective
functions
●Data Leadership forum to align on
prioritisation
●The Data Platform team responsible for a
reliable, scalable data platform and
ingesting the data
Analytics Engineering
Team
The Data Platform
The Enterprise Insights
& Data Science Team
Sales and
Marketing
Product Finance
Engineering People
© 2023 GitLab Inc.
Let’s
Start
Why This Talk?
From Dashboards → Dialogue
Aspect Traditional BI Conversational (Cortex)
Interface Charts & Filters Natural Language
Dependency Analysts Self-Service
Latency Hours/Days Seconds
Flexibility Predefined Open-ended
© 2023 GitLab Inc.
How to
solve?
What Is Snowflake Cortex?
A suite of built-in AI & ML
capabilities in Snowflake.
Natively runs within Snowflake’s
secure compute plane.
The Cortex Family
Cortex
Function
Description Example Use
Cortex Complete LLM interface for text, SQL, summaries “Summarize quarterly sales”
Cortex Search
(Unstructured Data)
Vector search for embeddings Semantic search on documents
Cortex Analyst
(Structured Data)
Natural-language analytics on data “Show me revenue by region”
Cortex Classify Text classification Label support tickets
Cortex Embed Text Generate embeddings Build vector DB for retrieval
Meet Cortex Analyst
● Provides a chat interface to query structured data
● Uses semantic models (YAML) for trusted text-to-
SQL generation
● Integrates with RBAC, Governance, and
Monitoring
● Available via Snowsight & REST API
TheVision
Business users ask natural language
questions. Cortex Analyst translates
them into governed SQL, executes
securely, and returns trusted insights.
© 2023 GitLab Inc.
Let’s
Dive in
© 2023 GitLab Inc.
Cortex Analyst: Bridging Data and Language
GitLab Copyright
GitLab Copyright
GitLab Copyright
GitLab Copyright
Phase 1: The Proof of Concept
● Simple text-to-SQL using Cortex LLM functions.
● Schema context injection.
● Basic natural language interface using Streamlit
Application.
● Limited to few tables for testing.
Phase 1: Early Wins
● 60% accuracy on simple queries
● Stakeholder excitement high
● Quick validation of concept
© 2023 GitLab Inc.
Technical Challenges
● Ambiguous questions
led to wrong queries
● Model struggled with
complex joins
● No way to handle
follow-up questions
● Performance issues
with large result sets
© 2023 GitLab Inc.
Business Challenges
● Users expected
perfection
immediately
● Lack of trust when
results looked "off"
● Need for result
explanations
GitLab Copyright
Phase 2: Learning & Iterating
Technical Enhancements
● Prompt engineering with examples (few-shot
learning)
● Semantic layer for business logic
● Feed in verified query from previous run
● Business specific guideline for query
execution.
GitLab Copyright
Phase 2: Learning & Iterating
Improved Accuracy
● Simple queries: 85% → 95% accuracy
● Complex queries: 40% → 75% accuracy
GitLab Copyright
Handling the Unexpected
Common Edge Cases
● Vague questions: "Show me sales" → Which period?
Region?
● Impossible requests: Data not in warehouse
● Security: User asking for data they can't access
● Ambiguous metrics: Different definitions of "revenue"
GitLab Copyright
Handling the Unexpected
Our Solutions
● Clarifying questions before query generation
● Graceful error messages
● Row-level security enforcement
● Business glossary integration
Monitoring In Cortex Analyst
Cortex Analyst continuously logs user interactions to help administrators improve
the quality and accuracy of model responses.
These logs are stored in an event table in Snowflake for visibility and analysis.
Logged information includes:
● 👤 User who asked the question
● 👤 The question text
● 👤 Generated SQL
● 👤 👤 Errors and warnings
● 👤 Request & response bodies + metadata
Log updates have a
short delay (≈1–2
minutes).
Monitoring In Cortex Analyst
Logs can be viewed in Snowsight under the Monitoring tab of the semantic model.
© 2023 GitLab Inc.
👤 Snowsight
conversational UI
👤 REST API for
embedding in apps
👤 Streamlit / React
chatbots
👤 Role-based access via
RBAC
👤 End-to-end
observability pipeline
Business Adoption Journey
1. Concept Validation – Small internal prototype
2. Pilot Rollout – Limited business users
3. Semantic Model Refinement – Improve
accuracy
4. Full Deployment – Broader access
5. Feedback & Training – Drive adoption
Lessons Learned
Technical Lessons
● Start simple, iterate based on real usage
● Invest heavily in prompt engineering
● Semantic layer is essential, not optional
● Monitor everything - queries, errors, user
satisfaction
Lessons Learned
Business Lessons
● Set realistic expectations early
● User training is as important as the tech
● Celebrate wins, learn from failures publicly
✅ Democratized analytics for all roles
✅ Reduced analyst backlog
✅ Faster time to insight
✅ Consistent, governed metrics
✅ Improved collaboration between data & business
✅ “Now our teams talk to data, not tickets.”
Outcomes & Benefits
Key Takeaways
● Conversational analytics is the next evolution of BI
● Snowflake Cortex = AI-native + Governed + Secure
● Start with one domain (Revenue, Sales) and expand
● Build a trusted semantic layer, not just an LLM wrapper
✅ The best interface to data is language itself.
About me
Find me, ping me, ask me
© 2023 GitLab Inc.
Thank you

[DSC DACH 25] Ved Prakash - Talk to Your Data - Implementing Natural Language Analytics with Snowflake Cortex.pdf

  • 1.
    GitLab Copyright Talk toYour Data: Ved Prakash Staff Data Engineer, GitLab Implementing Natural Language Analytics with Snowflake Cortex
  • 2.
    Disclaimer The information providedin this presentation represents the personal opinion of the presenter(s) and does not represent the opinion of GitLab Inc., or its affiliates. GitLab Inc. is not responsible for the accuracy or reliability of the information presented.
  • 3.
  • 4.
    © 2023 GitLabInc. GitLab company Our mission: Everyone Can Contribute
  • 5.
    GitLab CREDIT values Collaboration Resultsfor customers Iteration Transparency Efficiency inclusion & belonging Diversity,
  • 6.
    © 2023 GitLabInc. Plan Create Verify Package Secure Deploy Monitor Govern What GitLab is doing? The DevSecOps Platform delivered as a single application to help you iterate faster and innovate together
  • 7.
    © 2023 GitLabInc. The Data Platform Team Part of the central data team
  • 8.
    Central Data Team Dataorganisation Hub & Spoke model ●The central data team supports distributed analysts in their respective functions ●Data Leadership forum to align on prioritisation ●The Data Platform team responsible for a reliable, scalable data platform and ingesting the data Analytics Engineering Team The Data Platform The Enterprise Insights & Data Science Team Sales and Marketing Product Finance Engineering People
  • 9.
    © 2023 GitLabInc. Let’s Start
  • 10.
  • 12.
  • 14.
    Aspect Traditional BIConversational (Cortex) Interface Charts & Filters Natural Language Dependency Analysts Self-Service Latency Hours/Days Seconds Flexibility Predefined Open-ended
  • 15.
    © 2023 GitLabInc. How to solve?
  • 17.
  • 18.
    A suite ofbuilt-in AI & ML capabilities in Snowflake.
  • 19.
    Natively runs withinSnowflake’s secure compute plane.
  • 21.
    The Cortex Family Cortex Function DescriptionExample Use Cortex Complete LLM interface for text, SQL, summaries “Summarize quarterly sales” Cortex Search (Unstructured Data) Vector search for embeddings Semantic search on documents Cortex Analyst (Structured Data) Natural-language analytics on data “Show me revenue by region” Cortex Classify Text classification Label support tickets Cortex Embed Text Generate embeddings Build vector DB for retrieval
  • 22.
    Meet Cortex Analyst ●Provides a chat interface to query structured data ● Uses semantic models (YAML) for trusted text-to- SQL generation ● Integrates with RBAC, Governance, and Monitoring ● Available via Snowsight & REST API
  • 24.
  • 25.
    Business users asknatural language questions. Cortex Analyst translates them into governed SQL, executes securely, and returns trusted insights.
  • 26.
    © 2023 GitLabInc. Let’s Dive in
  • 27.
  • 28.
    Cortex Analyst: BridgingData and Language
  • 29.
  • 30.
  • 32.
  • 33.
    GitLab Copyright Phase 1:The Proof of Concept ● Simple text-to-SQL using Cortex LLM functions. ● Schema context injection. ● Basic natural language interface using Streamlit Application. ● Limited to few tables for testing.
  • 34.
    Phase 1: EarlyWins ● 60% accuracy on simple queries ● Stakeholder excitement high ● Quick validation of concept
  • 35.
    © 2023 GitLabInc. Technical Challenges ● Ambiguous questions led to wrong queries ● Model struggled with complex joins ● No way to handle follow-up questions ● Performance issues with large result sets
  • 36.
    © 2023 GitLabInc. Business Challenges ● Users expected perfection immediately ● Lack of trust when results looked "off" ● Need for result explanations
  • 37.
    GitLab Copyright Phase 2:Learning & Iterating Technical Enhancements ● Prompt engineering with examples (few-shot learning) ● Semantic layer for business logic ● Feed in verified query from previous run ● Business specific guideline for query execution.
  • 38.
    GitLab Copyright Phase 2:Learning & Iterating Improved Accuracy ● Simple queries: 85% → 95% accuracy ● Complex queries: 40% → 75% accuracy
  • 39.
    GitLab Copyright Handling theUnexpected Common Edge Cases ● Vague questions: "Show me sales" → Which period? Region? ● Impossible requests: Data not in warehouse ● Security: User asking for data they can't access ● Ambiguous metrics: Different definitions of "revenue"
  • 40.
    GitLab Copyright Handling theUnexpected Our Solutions ● Clarifying questions before query generation ● Graceful error messages ● Row-level security enforcement ● Business glossary integration
  • 41.
    Monitoring In CortexAnalyst Cortex Analyst continuously logs user interactions to help administrators improve the quality and accuracy of model responses. These logs are stored in an event table in Snowflake for visibility and analysis. Logged information includes: ● 👤 User who asked the question ● 👤 The question text ● 👤 Generated SQL ● 👤 👤 Errors and warnings ● 👤 Request & response bodies + metadata Log updates have a short delay (≈1–2 minutes).
  • 42.
    Monitoring In CortexAnalyst Logs can be viewed in Snowsight under the Monitoring tab of the semantic model.
  • 43.
    © 2023 GitLabInc. 👤 Snowsight conversational UI 👤 REST API for embedding in apps 👤 Streamlit / React chatbots 👤 Role-based access via RBAC 👤 End-to-end observability pipeline
  • 44.
    Business Adoption Journey 1.Concept Validation – Small internal prototype 2. Pilot Rollout – Limited business users 3. Semantic Model Refinement – Improve accuracy 4. Full Deployment – Broader access 5. Feedback & Training – Drive adoption
  • 45.
    Lessons Learned Technical Lessons ●Start simple, iterate based on real usage ● Invest heavily in prompt engineering ● Semantic layer is essential, not optional ● Monitor everything - queries, errors, user satisfaction
  • 46.
    Lessons Learned Business Lessons ●Set realistic expectations early ● User training is as important as the tech ● Celebrate wins, learn from failures publicly
  • 47.
    ✅ Democratized analyticsfor all roles ✅ Reduced analyst backlog ✅ Faster time to insight ✅ Consistent, governed metrics ✅ Improved collaboration between data & business ✅ “Now our teams talk to data, not tickets.” Outcomes & Benefits
  • 48.
    Key Takeaways ● Conversationalanalytics is the next evolution of BI ● Snowflake Cortex = AI-native + Governed + Secure ● Start with one domain (Revenue, Sales) and expand ● Build a trusted semantic layer, not just an LLM wrapper ✅ The best interface to data is language itself.
  • 49.
    About me Find me,ping me, ask me
  • 50.
    © 2023 GitLabInc. Thank you