Mapping Graph Queries to PostgreSQL
Gábor Szárnyas, József Marton,
Márton Elekes, János Benjamin Antal
Budapest DB Meetup — 2018/Nov/13
PROPERTY GRAPH DATABASES
NoSQL family
Data model:
 nodes
 edges
 properties
#1 query approach:
graph pattern matching
RANKINGS: POPULARITY CHANGES PER CATEGORY
CYPHER AND OPENCYPHER
Cypher: query language of the Neo4j graph database.
„Cypher is a declarative, SQL-inspired language for describing
patterns in graphs visually using an ascii-art syntax.”
MATCH
(d:Database)<-[:RELATED_TO]-(:Talk)-[:PRESENTED_AT]->(m:Meetup)
WHERE m.date = 'Tuesday, November 13, 2018'
RETURN d
„The openCypher project aims to deliver a full and open
specification of the industry’s most widely adopted graph
database query language: Cypher.” (late 2015)
OPENCYPHER SYSTEMS
 Increasing adoption
 Relational databases:
o SAP HANA
o AGENS Graph
 Research prototypes:
o Graphflow (Univesity of Waterloo, Canada)
o ingraph (incremental graph engine @ BME)
(Source: Keynote talk @ GraphConnect NYC 2017)
PROPERTY GRAPHS
 Textbook graph: 𝐺 = 𝑉, 𝐸
o Dijkstra, Ford-Fulkerson, etc.
o Homogenous nodes
o Homogeneous edges
 Extensions
o Labelled nodes
o Typed edges
o Properties
 The schema is implicit
 Very intuitive
o Things and connections
Bob
53
Carol
38
A Ltd.
David
47
20182007
Erin
30
name
Person,
Student
Company
COLLEAGUE
FRIEND
WORKS_AT
since
name
age Person
GRAPH VS. RELATIONAL DATABASES
 Graph databases
o Graph-based modelling is intuitive
o Concise query language
 Relational databases
o Most common
o Many legacy systems
o Efficient and mature
 No tools available to bridge the two
o i.e. query data in RDBs as a graph
o first you have to wrangle the graph out of the RDB
Col1 Col2
1 A
2 B
Tables
Graph
:Course
name: 'Phys1'
REQUIRES
:Course
name: 'Phys2'ENROLLED
:Student
name: 'John'
ENROLLED
PROPOSED APPROACH
To get the best of both worlds, map Cypher queries to SQL:
1. Formulate queries in Cypher
2. Execute inside an existing RDB
3. Return results as a graph relation
GRAPH QUERIES IN CYPHER
 Subgraph matching and graph traversals
 Example: Alice’s colleagues and their colleagues
MATCH (p1:Person {name: 'Alice'})-[c:COLL*1..2]-(p2:Person)
RETURN p2.name
Bob
53
Carol
38
A Ltd.
David
47
20182007
Erin
30
name
Person,
Student
Company
COLLEAGUE
FRIEND
WORKS_ATsince
name
age Person
Data and Query Mapping
MAPPING BETWEEN DATA MODELS
Class 2
Objects
:Class1
attr1: String
attr2: int
:Class 2
attr1: String
attr2: int
Graph
:Course
name: 'Phys1'
REQUIRES
:Course
name: 'Phys2'ENROLLED
:Student
name: 'John'
ENROLLED
object-
graph
mapping
object-relational mapping (ORM)
e.g. JPA, Entity Framework
graph-relational mapping
Col1 Col2
1 A
2 B
Tables
DATA MAPPING #1: GENERIC SCHEMA
 Useful for representing schema-free data sets
 Hopelessly slow
Bob
53
Carol
38
A Ltd.
David
47
20182007
Erin
30
DATA MAPPING #2: CONCRETE SCHEMA
QUERY MAPPING
Cypher
query
graph
relational
algebra
SQL
algebra
SQL
query
MATCH (p:Person)
-[:COLL|:FRIEND]-()
…
RETURN p
◯ 𝑃𝑒𝑟𝑠𝑜𝑛
⋈
𝜎 𝑎=5
◯ → ◯
COLL,
FRIEND
𝜋 𝑝
⋈ 𝑐𝑜𝑙1,𝑐𝑜𝑙2
𝜎 𝑎=5
𝜋 𝑝
compiler mapping code
generation
◯ person
◯ → ◯
colleague
◯ → ◯
friend
⋃
SELECT …
FROM …
SELECT …
FROM …
SELECT …
FROM …
SELECT …
FROM …
SELECT …
FROM …
SELECT …
FROM …
SELECT …
FROM …
Gábor Szárnyas, József Marton, Dániel Varró:
Formalising openCypher Graph Queries in Relational Algebra.
ADBIS 2017
A SIMPLE EXAMPLE
MATCH (node)
WITH node.name AS name
RETURN name ◯(𝑛𝑜𝑑𝑒,𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒)
𝜋 𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒→𝑛𝑎𝑚𝑒
𝜋 𝑛𝑎𝑚𝑒
SELECT "name"
FROM
(
SELECT "node.name" AS "name"
FROM
(
SELECT
vertex_id AS "node",
(SELECT value
FROM vertex_property
WHERE parent = vertex_id AND key = 'name') AS "node.name"
FROM vertex
)
)
Cypher query relational algebra tree
ingraph
mapping
CHALLENGES #1
 Variable length paths: union of multiple subqueries
 Unbound: WITH RECURSIVE (fixpoint-based evaluation)
 WITH RECURSIVE was introduced in SQL:1999 but
o PostgreSQL 8.4+ (since 2009)
o SQLite 3.8.3+ (since 2014)
o MySQL 8.0.1+ (since 2017)
MATCH (p1:Person {name: 'Alice'})-[c:COLL*1..2]-(p2:Person)
RETURN p2.name
MATCH (p1:Person {name: 'Alice'})-[c:COLL*]-(p2:Person)
RETURN p2.name
CHALLENGES #2
 Edges are directed
o Undirectedness is modelled in the query
o Union of both directions
MATCH (p1:Person …)-[:FRIEND]-(p2:Person),
(p2)-[:WORKS_AT]->(c:Company)
RETURN p2.name, c.name
Bob
53
Carol
38
A Ltd.
David
47
20182007
Erin
30
name
Person,
Student
Company
COLLEAGUE
FRIEND
WORKS_ATsince
name
age Person
CHALLENGES #3
 Multiple tables as sources
MATCH (p1:Person …)-[:COLL|:FRIEND]-(p2:Person)
RETURN p2.name
Bob
53
Carol
38
David
47
Erin
30
Person,
Student
COLLEAGUE
FRIEND
name
age Person
id name
1 Alice
2 Bob
person
p1 p2
1 2
1 3
colleague
p1 p2
2 1
5 4
friend
name
age
CHALLENGES #1 #2 #3
Simple graph patterns turn to many subqueries
 Querying edges as undirected:
• Enumerate edges in both directions (2 ×)
 Variable length paths (1..L, *):
• Limited: enumerate 1, 2, … , 𝐿 → 𝐿 ×
• Unlimited: use WITH RECURSIVE
 Multiple node labels/edge types:
• Enumerate all source tables (𝑁 ×)
Total: union of 2 × 𝐿 × 𝑁 subqueries in SQL
Bob
53
Carol
38
David
47
Person,
Student
COLLEAGUE
FRIEND
name
age Person
MATCH (p1:Person …)-[:COLL|:FRIEND*1..2]-(p2:Person)
A COMPLEX EXAMPLE
WITH
q0 AS
(-- GetVerticesWithGTop
SELECT
ROW(0, p_personid)::vertex_type AS "_e186#0",
"p_personid" AS "_e186.id#0"
FROM person),
q1 AS
(-- Selection
SELECT * FROM q0 AS subquery
WHERE ("_e186.id#0" = :personId)),
q2 AS
(-- GetEdgesWithGTop
SELECT ROW(0, edgeTable."k_person1id")::vertex_type AS "_e186#0", ROW(0, edgeTable."k_person1id", edgeTable."k_person2id")::edge_type AS
"_e187#0", ROW(0, edgeTable."k_person2id")::vertex_type AS "friend#2",
toTable."p_personid" AS "friend.id#2", toTable."p_firstname" AS "friend.firstName#1", toTable."p_lastname" AS "friend.lastName#2"
FROM "knows" edgeTable
JOIN "person" toTable ON (edgeTable."k_person2id" = toTable."p_personid")),
q3 AS
(-- GetEdgesWithGTop
SELECT ROW(0, edgeTable."k_person1id")::vertex_type AS "friend#2", ROW(0, edgeTable."k_person1id", edgeTable."k_person2id")::edge_type AS
"_e187#0", ROW(0, edgeTable."k_person2id")::vertex_type AS "_e186#0",
fromTable."p_personid" AS "friend.id#2", fromTable."p_firstname" AS "friend.firstName#1", fromTable."p_lastname" AS "friend.lastName#2"
FROM "knows" edgeTable
JOIN "person" fromTable ON (fromTable."p_personid" = edgeTable."k_person1id")),
q4 AS
(-- UnionAll
SELECT "_e186#0", "_e187#0", "friend#2", "friend.id#2", "friend.firstName#1", "friend.lastName#2" FROM q2
UNION ALL
SELECT "_e186#0", "_e187#0", "friend#2", "friend.id#2", "friend.firstName#1", "friend.lastName#2" FROM q3),
q5 AS
(-- EquiJoinLike
SELECT left_query."_e186#0", left_query."_e186.id#0", right_query."friend#2", right_query."friend.id#2", right_query."_e187#0",
right_query."friend.lastName#2", right_query."friend.firstName#1" FROM
q1 AS left_query
INNER JOIN
q4 AS right_query
ON left_query."_e186#0" = right_query."_e186#0"),
q6 AS
(-- GetEdgesWithGTop
SELECT ROW(6, fromTable."m_messageid")::vertex_type AS "message#17", ROW(8, fromTable."m_messageid", fromTable."m_creatorid")::edge_type AS
"_e188#0", ROW(0, fromTable."m_creatorid")::vertex_type AS "friend#2",
fromTable."m_messageid" AS "message.id#2", fromTable."m_content" AS "message.content#2", fromTable."m_ps_imagefile" AS
"message.imageFile#0", fromTable."m_creationdate" AS "message.creationDate#13"
FROM "message" fromTable
WHERE (fromTable."m_c_replyof" IS NULL)),
q7 AS
(-- GetEdgesWithGTop
SELECT ROW(6, fromTable."m_messageid")::vertex_type AS "message#17", ROW(8, fromTable."m_messageid", fromTable."m_creatorid")::edge_type AS
"_e188#0", ROW(0, fromTable."m_creatorid")::vertex_type AS "friend#2",
fromTable."m_messageid" AS "message.id#2", fromTable."m_content" AS "message.content#2", fromTable."m_creationdate" AS
"message.creationDate#13"
FROM "message" fromTable
WHERE (fromTable."m_c_replyof" IS NOT NULL)),
q8 AS
(-- UnionAll
SELECT "message#17", "_e188#0", "friend#2", "message.id#2", "message.content#2", "message.imageFile#0", "message.creationDate#13" FROM q6
UNION ALL
SELECT "message#17", "_e188#0", "friend#2", "message.id#2", "message.content#2", NULL AS "message.imageFile#0", "message.creationDate#13"
FROM q7),
q9 AS
(-- EquiJoinLike
SELECT left_query."_e186#0", left_query."_e186.id#0", left_query."_e187#0", left_query."friend#2", left_query."friend.id#2",
left_query."friend.firstName#1", left_query."friend.lastName#2", right_query."message#17", right_query."message.id#2",
right_query."message.imageFile#0", right_query."_e188#0", right_query."message.creationDate#13", right_query."message.content#2" FROM
q5 AS left_query
INNER JOIN
q8 AS right_query
ON left_query."friend#2" = right_query."friend#2"),
q10 AS
(-- AllDifferent
SELECT * FROM q9 AS subquery
WHERE is_unique(ARRAY[]::edge_type[] || "_e188#0" || "_e187#0")),
q11 AS
(-- Selection
SELECT * FROM q10 AS subquery
WHERE ("message.creationDate#13" <= :maxDate)),
q12 AS
(-- Projection
SELECT "friend.id#2" AS "personId#0", "friend.firstName#1" AS "personFirstName#0", "friend.lastName#2" AS "personLastName#0",
"message.id#2" AS "postOrCommentId#0", CASE WHEN ("message.content#2" IS NOT NULL = true) THEN "message.content#2"
ELSE "message.imageFile#0"
END AS "postOrCommentContent#0", "message.creationDate#13" AS "postOrCommentCreationDate#0"
FROM q11 AS subquery),
q13 AS
(-- SortAndTop
SELECT * FROM q12 AS subquery
ORDER BY "postOrCommentCreationDate#0" DESC NULLS LAST, ("postOrCommentId#0")::BIGINT ASC NULLS FIRST
LIMIT 20)
SELECT "personId#0" AS "personId", "personFirstName#0" AS "personFirstName", "personLastName#0" AS "personLastName", "postOrCommentId#0" AS
"postOrCommentId", "postOrCommentContent#0" AS "postOrCommentContent", "postOrCommentCreationDate#0" AS "postOrCommentCreationDate"
FROM q13 AS subquery
MATCH (:Person {id:$personId})-[:KNOWS]-(friend:Person)<-
[:HAS_CREATOR]-(message:Message)
WHERE message.creationDate <= $maxDate
RETURN
friend.id AS personId,
friend.firstName AS personFirstName,
friend.lastName AS personLastName,
message.id AS postOrCommentId,
CASE exists(message.content)
WHEN true THEN message.content
ELSE message.imageFile
END AS postOrCommentContent,
message.creationDate AS postOrCommentCreationDate
ORDER BY postOrCommentCreationDate DESC, toInteger(postOrCommentId) ASC
LIMIT 20
Benchmarks
BENCHMARKS: LINKED DATA BENCHMARK COUNCIL
LDBC is a non-profit organization dedicated to establishing benchmarks,
benchmark practices and benchmark results for graph data
management software.
The Social Network Benchmark is an industrial and academic initiative,
formed by principal actors in the field of graph-like data management.
LDBC IN A NUTSHELL
Peter Boncz, Thomas Neumann, Orri Erling,
TPC-H Analyzed: Hidden Messages and Lessons Learned
from an Influential Benchmark,
TPCTC 2013
Gábor Szárnyas, József Marton, János Benjamin Antal et al.:
An early look at the LDBC Social Network Benchmark’s BI Workload.
GRADES-NDA at SIGMOD, 2018
Orri Erling et al.,
The LDBC Social Network Benchmark: Interactive Workload,
SIGMOD 2015
PERFORMANCE EXPERIMENTS
 LDBC Interactive workload
 Tools
o PostgreSQL (reference implementation)
o Cypher-to-SQL queries on PostgreSQL
o Semantic database (anonymized)
 Geometric mean of 20+ executions
BENCHMARK RESULTS ON LDBC QUERIES
RELATED PROJECTS
 Cytosm
o Cypher to SQL Mapping
o HP Labs for Vertica
o Project abandoned in 2017
o gTop (graph topology) reused
 Cypher for Apache Spark
o Neo4j’s project
o Executes queries in Spark
o Read-only
Tool Source Target OSS Updates Paths
CAPS Cypher SparkSQL   
Cytosm Cypher Vertica SQL   
Cypher-to-SQL Cypher PostgreSQL   
SUMMARY
 Mapping property graph queries to SQL is challenging
o Similar to ORM
o + edge properties
o + reachability
 Initial implementation: C2S
o Moderate feature coverage
o Poor performance
o Needs some tweaks, e.g. work around CTE optimization fences
Gábor Szárnyas, József Marton, János Maginecz, Dániel Varró:
Incremental View Maintenance on Property Graphs.
arXiv preprint 2018
RELATED RESOURCES
ingraph and C2S github.com/ftsrg/ingraph
Cypher for Apache Spark github.com/opencypher/cypher-for-apache-spark
Cytosm github.com/cytosm/cytosm
LDBC github.com/ldbc/
Thanks for the contributions to the whole ingraph team.

Mapping Graph Queries to PostgreSQL

  • 1.
    Mapping Graph Queriesto PostgreSQL Gábor Szárnyas, József Marton, Márton Elekes, János Benjamin Antal Budapest DB Meetup — 2018/Nov/13
  • 2.
    PROPERTY GRAPH DATABASES NoSQLfamily Data model:  nodes  edges  properties #1 query approach: graph pattern matching
  • 3.
  • 4.
    CYPHER AND OPENCYPHER Cypher:query language of the Neo4j graph database. „Cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.” MATCH (d:Database)<-[:RELATED_TO]-(:Talk)-[:PRESENTED_AT]->(m:Meetup) WHERE m.date = 'Tuesday, November 13, 2018' RETURN d „The openCypher project aims to deliver a full and open specification of the industry’s most widely adopted graph database query language: Cypher.” (late 2015)
  • 5.
    OPENCYPHER SYSTEMS  Increasingadoption  Relational databases: o SAP HANA o AGENS Graph  Research prototypes: o Graphflow (Univesity of Waterloo, Canada) o ingraph (incremental graph engine @ BME) (Source: Keynote talk @ GraphConnect NYC 2017)
  • 6.
    PROPERTY GRAPHS  Textbookgraph: 𝐺 = 𝑉, 𝐸 o Dijkstra, Ford-Fulkerson, etc. o Homogenous nodes o Homogeneous edges  Extensions o Labelled nodes o Typed edges o Properties  The schema is implicit  Very intuitive o Things and connections Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30 name Person, Student Company COLLEAGUE FRIEND WORKS_AT since name age Person
  • 7.
    GRAPH VS. RELATIONALDATABASES  Graph databases o Graph-based modelling is intuitive o Concise query language  Relational databases o Most common o Many legacy systems o Efficient and mature  No tools available to bridge the two o i.e. query data in RDBs as a graph o first you have to wrangle the graph out of the RDB Col1 Col2 1 A 2 B Tables Graph :Course name: 'Phys1' REQUIRES :Course name: 'Phys2'ENROLLED :Student name: 'John' ENROLLED
  • 8.
    PROPOSED APPROACH To getthe best of both worlds, map Cypher queries to SQL: 1. Formulate queries in Cypher 2. Execute inside an existing RDB 3. Return results as a graph relation
  • 9.
    GRAPH QUERIES INCYPHER  Subgraph matching and graph traversals  Example: Alice’s colleagues and their colleagues MATCH (p1:Person {name: 'Alice'})-[c:COLL*1..2]-(p2:Person) RETURN p2.name Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30 name Person, Student Company COLLEAGUE FRIEND WORKS_ATsince name age Person
  • 10.
  • 11.
    MAPPING BETWEEN DATAMODELS Class 2 Objects :Class1 attr1: String attr2: int :Class 2 attr1: String attr2: int Graph :Course name: 'Phys1' REQUIRES :Course name: 'Phys2'ENROLLED :Student name: 'John' ENROLLED object- graph mapping object-relational mapping (ORM) e.g. JPA, Entity Framework graph-relational mapping Col1 Col2 1 A 2 B Tables
  • 12.
    DATA MAPPING #1:GENERIC SCHEMA  Useful for representing schema-free data sets  Hopelessly slow Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30
  • 13.
    DATA MAPPING #2:CONCRETE SCHEMA
  • 14.
    QUERY MAPPING Cypher query graph relational algebra SQL algebra SQL query MATCH (p:Person) -[:COLL|:FRIEND]-() … RETURNp ◯ 𝑃𝑒𝑟𝑠𝑜𝑛 ⋈ 𝜎 𝑎=5 ◯ → ◯ COLL, FRIEND 𝜋 𝑝 ⋈ 𝑐𝑜𝑙1,𝑐𝑜𝑙2 𝜎 𝑎=5 𝜋 𝑝 compiler mapping code generation ◯ person ◯ → ◯ colleague ◯ → ◯ friend ⋃ SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … SELECT … FROM … Gábor Szárnyas, József Marton, Dániel Varró: Formalising openCypher Graph Queries in Relational Algebra. ADBIS 2017
  • 15.
    A SIMPLE EXAMPLE MATCH(node) WITH node.name AS name RETURN name ◯(𝑛𝑜𝑑𝑒,𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒) 𝜋 𝑛𝑜𝑑𝑒.𝑛𝑎𝑚𝑒→𝑛𝑎𝑚𝑒 𝜋 𝑛𝑎𝑚𝑒 SELECT "name" FROM ( SELECT "node.name" AS "name" FROM ( SELECT vertex_id AS "node", (SELECT value FROM vertex_property WHERE parent = vertex_id AND key = 'name') AS "node.name" FROM vertex ) ) Cypher query relational algebra tree ingraph mapping
  • 16.
    CHALLENGES #1  Variablelength paths: union of multiple subqueries  Unbound: WITH RECURSIVE (fixpoint-based evaluation)  WITH RECURSIVE was introduced in SQL:1999 but o PostgreSQL 8.4+ (since 2009) o SQLite 3.8.3+ (since 2014) o MySQL 8.0.1+ (since 2017) MATCH (p1:Person {name: 'Alice'})-[c:COLL*1..2]-(p2:Person) RETURN p2.name MATCH (p1:Person {name: 'Alice'})-[c:COLL*]-(p2:Person) RETURN p2.name
  • 17.
    CHALLENGES #2  Edgesare directed o Undirectedness is modelled in the query o Union of both directions MATCH (p1:Person …)-[:FRIEND]-(p2:Person), (p2)-[:WORKS_AT]->(c:Company) RETURN p2.name, c.name Bob 53 Carol 38 A Ltd. David 47 20182007 Erin 30 name Person, Student Company COLLEAGUE FRIEND WORKS_ATsince name age Person
  • 18.
    CHALLENGES #3  Multipletables as sources MATCH (p1:Person …)-[:COLL|:FRIEND]-(p2:Person) RETURN p2.name Bob 53 Carol 38 David 47 Erin 30 Person, Student COLLEAGUE FRIEND name age Person id name 1 Alice 2 Bob person p1 p2 1 2 1 3 colleague p1 p2 2 1 5 4 friend name age
  • 19.
    CHALLENGES #1 #2#3 Simple graph patterns turn to many subqueries  Querying edges as undirected: • Enumerate edges in both directions (2 ×)  Variable length paths (1..L, *): • Limited: enumerate 1, 2, … , 𝐿 → 𝐿 × • Unlimited: use WITH RECURSIVE  Multiple node labels/edge types: • Enumerate all source tables (𝑁 ×) Total: union of 2 × 𝐿 × 𝑁 subqueries in SQL Bob 53 Carol 38 David 47 Person, Student COLLEAGUE FRIEND name age Person MATCH (p1:Person …)-[:COLL|:FRIEND*1..2]-(p2:Person)
  • 20.
    A COMPLEX EXAMPLE WITH q0AS (-- GetVerticesWithGTop SELECT ROW(0, p_personid)::vertex_type AS "_e186#0", "p_personid" AS "_e186.id#0" FROM person), q1 AS (-- Selection SELECT * FROM q0 AS subquery WHERE ("_e186.id#0" = :personId)), q2 AS (-- GetEdgesWithGTop SELECT ROW(0, edgeTable."k_person1id")::vertex_type AS "_e186#0", ROW(0, edgeTable."k_person1id", edgeTable."k_person2id")::edge_type AS "_e187#0", ROW(0, edgeTable."k_person2id")::vertex_type AS "friend#2", toTable."p_personid" AS "friend.id#2", toTable."p_firstname" AS "friend.firstName#1", toTable."p_lastname" AS "friend.lastName#2" FROM "knows" edgeTable JOIN "person" toTable ON (edgeTable."k_person2id" = toTable."p_personid")), q3 AS (-- GetEdgesWithGTop SELECT ROW(0, edgeTable."k_person1id")::vertex_type AS "friend#2", ROW(0, edgeTable."k_person1id", edgeTable."k_person2id")::edge_type AS "_e187#0", ROW(0, edgeTable."k_person2id")::vertex_type AS "_e186#0", fromTable."p_personid" AS "friend.id#2", fromTable."p_firstname" AS "friend.firstName#1", fromTable."p_lastname" AS "friend.lastName#2" FROM "knows" edgeTable JOIN "person" fromTable ON (fromTable."p_personid" = edgeTable."k_person1id")), q4 AS (-- UnionAll SELECT "_e186#0", "_e187#0", "friend#2", "friend.id#2", "friend.firstName#1", "friend.lastName#2" FROM q2 UNION ALL SELECT "_e186#0", "_e187#0", "friend#2", "friend.id#2", "friend.firstName#1", "friend.lastName#2" FROM q3), q5 AS (-- EquiJoinLike SELECT left_query."_e186#0", left_query."_e186.id#0", right_query."friend#2", right_query."friend.id#2", right_query."_e187#0", right_query."friend.lastName#2", right_query."friend.firstName#1" FROM q1 AS left_query INNER JOIN q4 AS right_query ON left_query."_e186#0" = right_query."_e186#0"), q6 AS (-- GetEdgesWithGTop SELECT ROW(6, fromTable."m_messageid")::vertex_type AS "message#17", ROW(8, fromTable."m_messageid", fromTable."m_creatorid")::edge_type AS "_e188#0", ROW(0, fromTable."m_creatorid")::vertex_type AS "friend#2", fromTable."m_messageid" AS "message.id#2", fromTable."m_content" AS "message.content#2", fromTable."m_ps_imagefile" AS "message.imageFile#0", fromTable."m_creationdate" AS "message.creationDate#13" FROM "message" fromTable WHERE (fromTable."m_c_replyof" IS NULL)), q7 AS (-- GetEdgesWithGTop SELECT ROW(6, fromTable."m_messageid")::vertex_type AS "message#17", ROW(8, fromTable."m_messageid", fromTable."m_creatorid")::edge_type AS "_e188#0", ROW(0, fromTable."m_creatorid")::vertex_type AS "friend#2", fromTable."m_messageid" AS "message.id#2", fromTable."m_content" AS "message.content#2", fromTable."m_creationdate" AS "message.creationDate#13" FROM "message" fromTable WHERE (fromTable."m_c_replyof" IS NOT NULL)), q8 AS (-- UnionAll SELECT "message#17", "_e188#0", "friend#2", "message.id#2", "message.content#2", "message.imageFile#0", "message.creationDate#13" FROM q6 UNION ALL SELECT "message#17", "_e188#0", "friend#2", "message.id#2", "message.content#2", NULL AS "message.imageFile#0", "message.creationDate#13" FROM q7), q9 AS (-- EquiJoinLike SELECT left_query."_e186#0", left_query."_e186.id#0", left_query."_e187#0", left_query."friend#2", left_query."friend.id#2", left_query."friend.firstName#1", left_query."friend.lastName#2", right_query."message#17", right_query."message.id#2", right_query."message.imageFile#0", right_query."_e188#0", right_query."message.creationDate#13", right_query."message.content#2" FROM q5 AS left_query INNER JOIN q8 AS right_query ON left_query."friend#2" = right_query."friend#2"), q10 AS (-- AllDifferent SELECT * FROM q9 AS subquery WHERE is_unique(ARRAY[]::edge_type[] || "_e188#0" || "_e187#0")), q11 AS (-- Selection SELECT * FROM q10 AS subquery WHERE ("message.creationDate#13" <= :maxDate)), q12 AS (-- Projection SELECT "friend.id#2" AS "personId#0", "friend.firstName#1" AS "personFirstName#0", "friend.lastName#2" AS "personLastName#0", "message.id#2" AS "postOrCommentId#0", CASE WHEN ("message.content#2" IS NOT NULL = true) THEN "message.content#2" ELSE "message.imageFile#0" END AS "postOrCommentContent#0", "message.creationDate#13" AS "postOrCommentCreationDate#0" FROM q11 AS subquery), q13 AS (-- SortAndTop SELECT * FROM q12 AS subquery ORDER BY "postOrCommentCreationDate#0" DESC NULLS LAST, ("postOrCommentId#0")::BIGINT ASC NULLS FIRST LIMIT 20) SELECT "personId#0" AS "personId", "personFirstName#0" AS "personFirstName", "personLastName#0" AS "personLastName", "postOrCommentId#0" AS "postOrCommentId", "postOrCommentContent#0" AS "postOrCommentContent", "postOrCommentCreationDate#0" AS "postOrCommentCreationDate" FROM q13 AS subquery MATCH (:Person {id:$personId})-[:KNOWS]-(friend:Person)<- [:HAS_CREATOR]-(message:Message) WHERE message.creationDate <= $maxDate RETURN friend.id AS personId, friend.firstName AS personFirstName, friend.lastName AS personLastName, message.id AS postOrCommentId, CASE exists(message.content) WHEN true THEN message.content ELSE message.imageFile END AS postOrCommentContent, message.creationDate AS postOrCommentCreationDate ORDER BY postOrCommentCreationDate DESC, toInteger(postOrCommentId) ASC LIMIT 20
  • 21.
  • 22.
    BENCHMARKS: LINKED DATABENCHMARK COUNCIL LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark results for graph data management software. The Social Network Benchmark is an industrial and academic initiative, formed by principal actors in the field of graph-like data management.
  • 23.
    LDBC IN ANUTSHELL Peter Boncz, Thomas Neumann, Orri Erling, TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark, TPCTC 2013 Gábor Szárnyas, József Marton, János Benjamin Antal et al.: An early look at the LDBC Social Network Benchmark’s BI Workload. GRADES-NDA at SIGMOD, 2018 Orri Erling et al., The LDBC Social Network Benchmark: Interactive Workload, SIGMOD 2015
  • 24.
    PERFORMANCE EXPERIMENTS  LDBCInteractive workload  Tools o PostgreSQL (reference implementation) o Cypher-to-SQL queries on PostgreSQL o Semantic database (anonymized)  Geometric mean of 20+ executions
  • 25.
  • 26.
    RELATED PROJECTS  Cytosm oCypher to SQL Mapping o HP Labs for Vertica o Project abandoned in 2017 o gTop (graph topology) reused  Cypher for Apache Spark o Neo4j’s project o Executes queries in Spark o Read-only Tool Source Target OSS Updates Paths CAPS Cypher SparkSQL    Cytosm Cypher Vertica SQL    Cypher-to-SQL Cypher PostgreSQL   
  • 27.
    SUMMARY  Mapping propertygraph queries to SQL is challenging o Similar to ORM o + edge properties o + reachability  Initial implementation: C2S o Moderate feature coverage o Poor performance o Needs some tweaks, e.g. work around CTE optimization fences Gábor Szárnyas, József Marton, János Maginecz, Dániel Varró: Incremental View Maintenance on Property Graphs. arXiv preprint 2018
  • 28.
    RELATED RESOURCES ingraph andC2S github.com/ftsrg/ingraph Cypher for Apache Spark github.com/opencypher/cypher-for-apache-spark Cytosm github.com/cytosm/cytosm LDBC github.com/ldbc/ Thanks for the contributions to the whole ingraph team.

Editor's Notes

  • #6 https://www.youtube.com/watch?v=nCnR6wRo8x4
  • #25 A teljesítményméréshez implementáltuk az interaktív profil összes lekérdezését és a hozzá tartozó szoftverkomponenseket 3 eszközhöz, azonban a mérés során csak a komplex lekérdezéseket mértük le, a rövid lekérdezések és frissítések mérése a közeljövőbeli terveink között szerepel. A mérés során 5 különböző eszköz teljesítményét mértük le: - PostgreSQL: ez volt a munkánk során a referenciaimplementáció, ezzel validáltuk a lekérdezéseinket biztosítva azt, hogy a különböző nyelveken megírt lekérdezések minden eszköz esetében valóban ugyan azt az eredményt adják. - C2S: A dolgozat elkészültéig 6 lekérdezést sikerült Cypher nyelvről SQL-re transzformálni, azonban 1 lekérdezést tovább technikai probléma miatt nem tudtunk lemérni, így csak 5 lekérdezés eredményét tudjuk bemutatni. Továbbá egy meg nem nevezett tulajdonsággráf alapú adatbázist és 2db szemantikus adatbázis teljesíményét mértük le. Mivel az eredmények nem auditáltak (azaz a fejlesztők által nem megvizsgált és elfogadottak), így a szokásnak megfelelően anonimizált módon közöljük az eredményeket. Az eszközök mérése során minden lekérdezés válaszidejét legalább 20 különböző behelyettesítési paraméter esetében lemértük, majd azok geometriai átlagát vettük.
  • #33 More papers at http://ldbcouncil.org/publications (30+ in total)