AgensGraph: a Multi-Model Graph Database
based-on PostgreSQL
Kisung Kim (kskim@bitnine.net)
Bitnine R&D Center
2017-1-14
Who am I
• Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc.
• Researched query optimization for graph-structured data during
doctorate degree
• Developed a distributed relational database engine in TmaxSoft
• Lead the development of a new graph database, AgensGraph in
Bitnine Global
What is Graph Database?
Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j
What is Graph Database?
• Relationship is the first-class citizen in the graph database
• Make your data connected in the graph database
Relational Database Graph Database
Entity Row Node (Vertex)
Relationship Row Relationship (Edge)
What is the Graph Database?
• Handle data in different view
• Data model similar to entity-relationship model
• Gartner says it represents a radical change in how data is
organized and processed
Cypher Query Language
• Declarative query language for the property graph model
• Inspired by SQL and SPARQL
– Designed to be human-readable query language
• Developed by Neo technology Inc. since 2011
• Current version is 3.0
• OpenCypher.org (http://opencypher.org)
– Participate in developing the query language
Cypher Query Example
Make two nodes
CREATE (:person {id: 1, name: “Kisung Kim”, birthday: 1980-01-05});
CREATE (:company {id: 1, name: “Bitnine Global”});
Make a relationship between the two nodes
MATCH (p:person {id: 1}), (c:company {id:1})
CREATE (p)-[:workFor {title: “CTO”, since: 2014}]->(c);
Kisung Kim Bitnine Global
workFor
Cypher Query Example
Querying
MATCH (p:person {name: “Kisung Kim”})-[:workFor]->(c:company)
RETURN (p), (c)
No Table Definitions and No Joins
Query with variable length relationships
MATCH (p:person {name: “Kisung Kim”})-[:knows*..3]->(f:person)
RETURN (f)
Kisung Kim ?
workFor
Kisung Kim ?
knows
?
knows
?
knows
GraphDB to PostgreSQL Case
• From Hipolabs
http://engineering.hipolabs.com/graphdb-to-postgresql/
Graph Database and Hybrid Database
Magic Quadrant for Operational Database Management Systems, Gartner, 2016
So, What We Want to Make is
• Hybrid database engine with graph and relational model
• Cypher query processing on PostgreSQL
• Online transactional graph database
• Disk-based persistent graph storage
( ) -[:processes]->(Cypher)
Why We Choose PostgreSQL?
• Fully-featured enterprise-ready open source database
• Graph processing actually uses relational algebra
– Graph is serialized as tables in disk
– Every graph traversal step is in principle a join
(from LDBC documentation)
• It is important to optimize the joins speed up join processing
– PostgreSQL has an excellent query optimizer
• And…. Abundant eco-system of PostgreSQL
Challenges
• How to store graph data
– Efficient structure for graph pattern matching
– At the same time, efficient for transaction processing
• How to process graph queries
– Processing complex graph pattern matching: variable length path,
shortest path
– Mismatches between graph data model & relational data model
– Graph query optimization
Graph Storage
• Graph data is stored in disk as decomposed into vertexes
and edges
• When processing graph pattern matching, it is essential to
find adjacent vertexes or edges efficiently
– Given a start vertex, find end vertexes
– Given an end vertex, find start vertexes
v1
Two Graph Databases
Solution Company Latest Version Features
Neo Technology 3.1
Most famous graph database, Cypher
O(1) access using fixed-size array
Datastax -
Distributed graph system based on
Cassandra
Titan
Graph Storage -Neo4j
• Fixed-size array for nodes and relationships
• Relationships for a node is organized as a doubly-linked list
• Index-free adjacency
• O(1) access for adjacent edges: follow the pointer
From Graph Databases 2nd ed. O’Reilly, 2015
Graph Storage – Titan (DSE Graph)
• Titan stores graphs in adjacency list format
• Each edge is stored twice
• Vertex and edge list are stored in backend storage like HBase
Cassandra or BerkeleyDB
From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
Graph Storage -AgensGraph
• Fixed-size array is hard to implement in PostgreSQL
– Tuples are moved when updated
• Titan’s big row approach is also inadequate
• We chose B-tree index for graph traversal
Graph
Vertex Edge
Vertex ID Properties Edge ID PropertiesStart Vertex ID End Vertex ID
B-tree
Vertex ID
B-tree
(Start, End)
B-tree
(End, Start)
Index Problems
• Current B-tree has several disadvantages for our workload
– Composite index is preferable but the size increases
– There exists a lot of duplicate keys (vertex ID)on start_ID or end_ID
– Property updates incur insertions into B-trees
• We are developing a new index having bucket structure (like
GIN index), in-direct index and supports for index-only scan
for the graph traversals
Graph Storage -AgensGraph
• Vertexes and edges are grouped into labels
• Labels are organized as a label hierarchy
• We use PostgreSQL’s table hierarchy feature
Vertex ID Properties
ag_vertex
Vertex ID Properties
Person
Vertex ID Properties
Message
Vertex ID Properties
Comment
Vertex ID Properties
Post
Current Status
• AgensGraph v0.9
(https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/)
– Graph data model and DDL on PostgreSQL 9.6
– Cypher query processing (70% of OpenCypher spec.)
– Integrated query processing (Cypher + SQL)
– Client library (JDBC, ODBC, Python)
– Monitoring and development using Tadpole DB-hub
Tadpole for Agens Graph
• Tadpole DB Hub is open-source project for managing unified
infrastructure (https://github.com/hangum/TadpoleForDBTools)
• Support various databases including (PostgreSQL and Agens Graph)
• Features of Tadpole for Agens Graph
– Monitoring Agens Graph server
– Cypher query browser and graph visualization
Tadpole for AgensGraph
Future Roadmap
• Distributed graph database
– Plan to exploit Postgres-XL
• Specialized storage and index for graph traversals
• Dictionary compression for JSONB (ZSON)
• Graph query optimization using graph statistics
• Integration with big data systems
– HDFS Storage
– Graph analysis using GraphX
Join Us
• AgensGraph is an open-source project https://github.com/bitnine-oss/agens-
graph
• We also wish to contribute PostgreSQL community
• Graph database meetup in Silicon Valley
– http://www.meetup.com/Graph-Database-in-Silicon-Valley/
Thank You
kskim@bitinine.net
:likes

AgensGraph: a Multi-model Graph Database based on PostgreSql

  • 1.
    AgensGraph: a Multi-ModelGraph Database based-on PostgreSQL Kisung Kim (kskim@bitnine.net) Bitnine R&D Center 2017-1-14
  • 2.
    Who am I •Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc. • Researched query optimization for graph-structured data during doctorate degree • Developed a distributed relational database engine in TmaxSoft • Lead the development of a new graph database, AgensGraph in Bitnine Global
  • 3.
    What is GraphDatabase? Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j
  • 4.
    What is GraphDatabase? • Relationship is the first-class citizen in the graph database • Make your data connected in the graph database Relational Database Graph Database Entity Row Node (Vertex) Relationship Row Relationship (Edge)
  • 5.
    What is theGraph Database? • Handle data in different view • Data model similar to entity-relationship model • Gartner says it represents a radical change in how data is organized and processed
  • 6.
    Cypher Query Language •Declarative query language for the property graph model • Inspired by SQL and SPARQL – Designed to be human-readable query language • Developed by Neo technology Inc. since 2011 • Current version is 3.0 • OpenCypher.org (http://opencypher.org) – Participate in developing the query language
  • 7.
    Cypher Query Example Maketwo nodes CREATE (:person {id: 1, name: “Kisung Kim”, birthday: 1980-01-05}); CREATE (:company {id: 1, name: “Bitnine Global”}); Make a relationship between the two nodes MATCH (p:person {id: 1}), (c:company {id:1}) CREATE (p)-[:workFor {title: “CTO”, since: 2014}]->(c); Kisung Kim Bitnine Global workFor
  • 8.
    Cypher Query Example Querying MATCH(p:person {name: “Kisung Kim”})-[:workFor]->(c:company) RETURN (p), (c) No Table Definitions and No Joins Query with variable length relationships MATCH (p:person {name: “Kisung Kim”})-[:knows*..3]->(f:person) RETURN (f) Kisung Kim ? workFor Kisung Kim ? knows ? knows ? knows
  • 9.
    GraphDB to PostgreSQLCase • From Hipolabs http://engineering.hipolabs.com/graphdb-to-postgresql/
  • 10.
    Graph Database andHybrid Database Magic Quadrant for Operational Database Management Systems, Gartner, 2016
  • 11.
    So, What WeWant to Make is • Hybrid database engine with graph and relational model • Cypher query processing on PostgreSQL • Online transactional graph database • Disk-based persistent graph storage ( ) -[:processes]->(Cypher)
  • 12.
    Why We ChoosePostgreSQL? • Fully-featured enterprise-ready open source database • Graph processing actually uses relational algebra – Graph is serialized as tables in disk – Every graph traversal step is in principle a join (from LDBC documentation) • It is important to optimize the joins speed up join processing – PostgreSQL has an excellent query optimizer • And…. Abundant eco-system of PostgreSQL
  • 13.
    Challenges • How tostore graph data – Efficient structure for graph pattern matching – At the same time, efficient for transaction processing • How to process graph queries – Processing complex graph pattern matching: variable length path, shortest path – Mismatches between graph data model & relational data model – Graph query optimization
  • 14.
    Graph Storage • Graphdata is stored in disk as decomposed into vertexes and edges • When processing graph pattern matching, it is essential to find adjacent vertexes or edges efficiently – Given a start vertex, find end vertexes – Given an end vertex, find start vertexes v1
  • 15.
    Two Graph Databases SolutionCompany Latest Version Features Neo Technology 3.1 Most famous graph database, Cypher O(1) access using fixed-size array Datastax - Distributed graph system based on Cassandra Titan
  • 16.
    Graph Storage -Neo4j •Fixed-size array for nodes and relationships • Relationships for a node is organized as a doubly-linked list • Index-free adjacency • O(1) access for adjacent edges: follow the pointer From Graph Databases 2nd ed. O’Reilly, 2015
  • 17.
    Graph Storage –Titan (DSE Graph) • Titan stores graphs in adjacency list format • Each edge is stored twice • Vertex and edge list are stored in backend storage like HBase Cassandra or BerkeleyDB From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
  • 18.
    Graph Storage -AgensGraph •Fixed-size array is hard to implement in PostgreSQL – Tuples are moved when updated • Titan’s big row approach is also inadequate • We chose B-tree index for graph traversal Graph Vertex Edge Vertex ID Properties Edge ID PropertiesStart Vertex ID End Vertex ID B-tree Vertex ID B-tree (Start, End) B-tree (End, Start)
  • 19.
    Index Problems • CurrentB-tree has several disadvantages for our workload – Composite index is preferable but the size increases – There exists a lot of duplicate keys (vertex ID)on start_ID or end_ID – Property updates incur insertions into B-trees • We are developing a new index having bucket structure (like GIN index), in-direct index and supports for index-only scan for the graph traversals
  • 20.
    Graph Storage -AgensGraph •Vertexes and edges are grouped into labels • Labels are organized as a label hierarchy • We use PostgreSQL’s table hierarchy feature Vertex ID Properties ag_vertex Vertex ID Properties Person Vertex ID Properties Message Vertex ID Properties Comment Vertex ID Properties Post
  • 21.
    Current Status • AgensGraphv0.9 (https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/) – Graph data model and DDL on PostgreSQL 9.6 – Cypher query processing (70% of OpenCypher spec.) – Integrated query processing (Cypher + SQL) – Client library (JDBC, ODBC, Python) – Monitoring and development using Tadpole DB-hub
  • 22.
    Tadpole for AgensGraph • Tadpole DB Hub is open-source project for managing unified infrastructure (https://github.com/hangum/TadpoleForDBTools) • Support various databases including (PostgreSQL and Agens Graph) • Features of Tadpole for Agens Graph – Monitoring Agens Graph server – Cypher query browser and graph visualization
  • 23.
  • 24.
    Future Roadmap • Distributedgraph database – Plan to exploit Postgres-XL • Specialized storage and index for graph traversals • Dictionary compression for JSONB (ZSON) • Graph query optimization using graph statistics • Integration with big data systems – HDFS Storage – Graph analysis using GraphX
  • 25.
    Join Us • AgensGraphis an open-source project https://github.com/bitnine-oss/agens- graph • We also wish to contribute PostgreSQL community • Graph database meetup in Silicon Valley – http://www.meetup.com/Graph-Database-in-Silicon-Valley/
  • 26.