Open Source SQL -
beyond parsers:
ZetaSQL & Apache
Calcite
Northwest Database Society Annual Meeting
2021/01/20
Mosha Pasumansky & Julian Hyde (Google)
Apache Calcite goals
Make it easier to write a simple DBMS
Advance the state of the art for complex DBMS
Bring database approaches to new areas (e.g. streaming, geospatial, federation,
data science)
Composition + evolution (framework + open source)
Apache license & governance
LucidDB
C++
Calcite evolution - origins as an SMP DB
JDBC server
JDBC client
Physical
operators
Rewrite rules
Catalog
Storage & data
structures
SQL parser &
validator
Query
planner
Relational
algebra
Java
Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
Physical
operators
Rewrite rules
SQL parser &
validator
Query
planner
Relational
algebra
Optiq
Calcite evolution - pluggable components
JDBC server
JDBC client
SQL parser &
validator
Query
planner
Adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Physical
operators
Storage
Relational
algebra
Apache Calcite
Calcite evolution - separate JDBC stack
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Apache Calcite
Calcite evolution - federation via adapters
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
SQL
Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
MongoDB
adapter
File adapter
(CSV, JSON, Http)
Apache Kafka
adapter
Apache Spark
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - federation via adapters
Apache Calcite
Pluggable
rewrite rules
Pluggable
stats / cost
Enumerable
adapter
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - federation via adapters
Apache Calcite
JDBC adapter
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
SQL
SQL parser &
validator
Query
planner
Relational
algebra
Apache Calcite
Calcite evolution - SQL dialects
Pluggable
rewrite rules
Pluggable parser, lexical,
conformance, operators
Pluggable
SQL dialect
SQL
SQL
SQL parser &
validator
Query
planner
Relational
algebra
JDBC adapter
Apache Calcite
Calcite evolution - other front-end languages
SQL
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Calcite evolution - other front-end languages
Pig
RelBuilder
Adapter
Physical
operators
Morel
Storage
Query
planner
Relational
algebra
Datalog
SQL parser &
validator
SQL
Apache Calcite
Calcite architecture
Avatica
JDBC server
JDBC client
Pluggable
rewrite rules
Pluggable
stats / cost
Pluggable
catalog
ODBC client
Adapter
Physical
operators
Storage
SQL parser &
validator
Query
planner
Relational
algebra
Core – Operator expressions
(relational algebra) and planner
(based on Cascades)
External – Data storage,
algorithms and catalog
Optional – SQL parser, JDBC &
ODBC drivers
Extensible – Planner rewrite
rules, statistics, cost model,
algebra, UDFs
RelBuilder
Lessons learned
Decompose the database into components
SQL is standard but also allows innovation
Relational algebra intermediate language
Calcite has many uses, including:
● Embedded within DBMS (e.g. Apache Hive, OmniSciDB)
● Lightweight DBMS
● Platform for research
● Sandbox for relational algebra
● Toolkit for translating between SQL dialects
ZetaSQL
SQL
Parser
Catalog
AST
Resolver
Resolved
AST
BigQuery
Spanner
F1
DataFlow
Test Harness
Corpus of
compliance
tests
Reference
implementation
Thank you!
Questions?
#ZetaSQL
https://github.com/google/zetasql
@ApacheCalcite
https://calcite.apache.org

Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite

  • 1.
    Open Source SQL- beyond parsers: ZetaSQL & Apache Calcite Northwest Database Society Annual Meeting 2021/01/20 Mosha Pasumansky & Julian Hyde (Google)
  • 2.
    Apache Calcite goals Makeit easier to write a simple DBMS Advance the state of the art for complex DBMS Bring database approaches to new areas (e.g. streaming, geospatial, federation, data science) Composition + evolution (framework + open source) Apache license & governance
  • 3.
    LucidDB C++ Calcite evolution -origins as an SMP DB JDBC server JDBC client Physical operators Rewrite rules Catalog Storage & data structures SQL parser & validator Query planner Relational algebra Java
  • 4.
    Optiq Calcite evolution -pluggable components JDBC server JDBC client Physical operators Rewrite rules SQL parser & validator Query planner Relational algebra
  • 5.
    Optiq Calcite evolution -pluggable components JDBC server JDBC client SQL parser & validator Query planner Adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Physical operators Storage Relational algebra
  • 6.
    Apache Calcite Calcite evolution- separate JDBC stack Avatica JDBC server JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra
  • 7.
    Apache Calcite Calcite evolution- federation via adapters Pluggable rewrite rules Pluggable stats / cost Pluggable catalog Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra SQL
  • 8.
    Calcite evolution -federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Enumerable adapter MongoDB adapter File adapter (CSV, JSON, Http) Apache Kafka adapter Apache Spark adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 9.
    Calcite evolution -federation via adapters Apache Calcite Pluggable rewrite rules Pluggable stats / cost Enumerable adapter Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 10.
    Calcite evolution -federation via adapters Apache Calcite JDBC adapter Pluggable rewrite rules Pluggable stats / cost Pluggable catalog SQL SQL parser & validator Query planner Relational algebra
  • 11.
    Apache Calcite Calcite evolution- SQL dialects Pluggable rewrite rules Pluggable parser, lexical, conformance, operators Pluggable SQL dialect SQL SQL SQL parser & validator Query planner Relational algebra JDBC adapter
  • 12.
    Apache Calcite Calcite evolution- other front-end languages SQL Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra
  • 13.
    Calcite evolution -other front-end languages Pig RelBuilder Adapter Physical operators Morel Storage Query planner Relational algebra Datalog SQL parser & validator SQL
  • 14.
    Apache Calcite Calcite architecture Avatica JDBCserver JDBC client Pluggable rewrite rules Pluggable stats / cost Pluggable catalog ODBC client Adapter Physical operators Storage SQL parser & validator Query planner Relational algebra Core – Operator expressions (relational algebra) and planner (based on Cascades) External – Data storage, algorithms and catalog Optional – SQL parser, JDBC & ODBC drivers Extensible – Planner rewrite rules, statistics, cost model, algebra, UDFs RelBuilder
  • 17.
    Lessons learned Decompose thedatabase into components SQL is standard but also allows innovation Relational algebra intermediate language Calcite has many uses, including: ● Embedded within DBMS (e.g. Apache Hive, OmniSciDB) ● Lightweight DBMS ● Platform for research ● Sandbox for relational algebra ● Toolkit for translating between SQL dialects
  • 18.
  • 19.