Graph Modeling
Tips and Tricks
Mark Needham @markhneedham
Introducing our data set...
meetup.com’s recommendations
Recommendation queries
‣ Several different types
• groups to join
• topics to follow
• events to attend
‣ As a user of meetup.com trying to find
groups to join and events to attend
How will this talk be structured?
Find similar groups to Neo4j
As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
What makes groups similar?
As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Find similar groups to Neo4j
As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Nodes
As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Relationships
As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Labels
As a member of the Neo4j London group
I want to find other similar meetup groups
So that I can join those groups
Properties
Find similar groups to Neo4j
MATCH (group:Group {name: "Neo4j - London User Group"})
-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup)
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
COLLECT(topic.name) AS topics
ORDER BY topicsInCommon DESC, otherGroup.name
LIMIT 10
Find similar groups to Neo4j
Tip: Model incrementally
‣ Build the model for the question we need
to answer now then move onto the next
question
I’m already a member of these!
What other data can we get?
Exclude groups I’m a member of
As a member of the Neo4j London group
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
Exclude groups I’m a member of
As a member of the Neo4j London group
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
Exclude groups I’m a member of
MATCH (group:Group {name: "Neo4j - London User Group"})
-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group)
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
EXISTS((:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(otherGroup)) AS alreadyMember,
COLLECT(topic.name) AS topics
ORDER BY topicsInCommon DESC
LIMIT 10
Exclude groups I’m a member of
Exclude groups I’m a member of
MATCH (group:Group {name: "Neo4j - London User Group"})
-[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group)
WHERE NOT( (:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(otherGroup) )
RETURN otherGroup.name,
COUNT(topic) AS topicsInCommon,
COLLECT(topic.name) AS topics
ORDER BY topicsInCommon DESC
LIMIT 10
Exclude groups I’m a member of
Find my similar groups
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
Find my similar groups
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
Find my similar groups
MATCH (member:Member {name: "Mark Needham"})
-[:INTERESTED_IN]->(topic),
(member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name,
COLLECT(topic.name),
SUM(score) as score
ORDER BY score DESC
Find my similar groups
Find Jonny’s similar groups
Oops...Jonny has no interests!
What is Jonny interested in?
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
INTERESTED_IN?
What is Jonny interested in?
There’s an implicit INTERESTED_IN relationship
between the topics of groups I belong to but
don’t express an interest in. Let’s make it explicit
P
G
T
MEMBER_OF
HAS_TOPIC
P
G
T
MEMBER_OF
HAS_TOPIC
INTERESTED_IN
What is Jonny interested in?
MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH m, topic, COUNT(*) AS times
WHERE times > 3
MERGE (m)-[:INTERESTED_IN]->(topic)
What is Jonny interested in?
Tip: Make the implicit explicit
‣ Fill in the missing links in the graph
‣ You could run this type of query once a day
during a quiet period
‣ On bigger graphs we’d run it in
batches to avoid loading the
whole database into memory
Find next group people join
As a member of a meetup group
I want to find out which meetup groups
other people join after this one
So that I can join those groups
Find next group people join
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[membership:MEMBER_OF]-(member),
(member)-[otherMembership:MEMBER_OF]->(otherGroup)
WHERE membership.joined < otherMembership.joined
WITH member, otherGroup
ORDER BY otherMembership.joined
WITH member, COLLECT(otherGroup)[0] AS nextGroup
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
Find next group people join
It feels a bit clunky...
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[membership:MEMBER_OF]-(member),
(member)-[otherMembership:MEMBER_OF]->(otherGroup)
WHERE membership.joined < otherMembership.joined
WITH member, otherGroup
ORDER BY otherMembership.joined
WITH member, COLLECT(otherGroup)[0] AS nextGroup
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
‣ We have to scan through all the MEMBER_OF
relationships to find the one we want
‣ It might make our lives easier if we made
membership a first class citizen of the domain
Facts can become nodes
Facts can become nodes
Refactor to facts
MATCH (member:Member)-[rel:MEMBER_OF]->(group)
MERGE (membership:Membership {id: member.id + "_" + group.id})
SET membership.joined = rel.joined
MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)
Refactor to facts
MATCH (member:Member)-[:HAS_MEMBERSHIP]->(membership)
WITH member, membership ORDER BY member.id, membership.joined
WITH member, COLLECT(membership) AS memberships
UNWIND RANGE(0,SIZE(memberships) - 2) as idx
WITH memberships[idx] AS m1, memberships[idx+1] AS m2
MERGE (m1)-[:NEXT]->(m2)
Find next group people join
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),
(membership)<-[:HAS_MEMBERSHIP]-(member:Member)
-[:HAS_MEMBERSHIP]->(nextMembership),
(nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
Comparing the approaches
vs
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[membership:MEMBER_OF]-(member),
(member)-[otherMembership:MEMBER_OF]->(otherGroup)
WITH member, membership, otherMembership, otherGroup
ORDER BY member.id, otherMembership.joined
WHERE membership.joined < otherMembership.joined
WITH member, membership, COLLECT(otherGroup)[0] AS nextGroup
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
MATCH (group:Group {name: "Neo4j - London User Group"})
<-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership),
(membership)<-[:HAS_MEMBERSHIP]-(member:Member)
-[:HAS_MEMBERSHIP]->(nextMembership),
(nextMembership)-[:OF_GROUP]->(nextGroup)
RETURN nextGroup.name, COUNT(*) AS times
ORDER BY times DESC
How do I profile a query?
‣ EXPLAIN
• shows the execution plan without actually
executing it or returning any results.
‣ PROFILE
• executes the statement and returns the results
along with profiling information.
45
Neo4j’s longest plan (so far…)
46
Neo4j’s longest plan (so far…)
47
Neo4j’s longest plan (so far…)
48
What is our goal?
At a high level, the goal is
simple: get the number of
db hits down.
49
an abstract unit of storage
engine work.
What is a database hit?
“
”
50
Comparing the approaches
Cypher version: CYPHER 2.3,
planner: COST.
111656 total db hits in 330 ms.
vs
Cypher version: CYPHER 2.3,
planner: COST.
23650 total db hits in 60 ms.
Tip: Profile your queries
‣ Spike the different models and see which
one performs the best
Should we keep both models?
We could but when we add, edit or remove a membership
we’d have to keep both graph structures in sync.
Adding a group membership
WITH "Mark Needham" AS memberName,
"Neo4j - London User Group" AS groupName,
timestamp() AS now
MATCH (group:Group {name: groupName})
MATCH (member:Member {name: memberName})
MERGE (member)-[memberOfRel:MEMBER_OF]->(group)
ON CREATE SET memberOfRel.time = now
MERGE (membership:Membership {id: member.id + "_" + group.id})
ON CREATE SET membership.joined = now
MERGE (member)-[:HAS_MEMBERSHIP]->(membership)
MERGE (membership)-[:OF_GROUP]->(group)
Removing a group membership
WITH "Mark Needham" AS memberName,
"Neo4j - London User Group" AS groupName,
timestamp() AS now
MATCH (group:Group {name: groupName})
MATCH (member:Member {name: memberName})
MATCH (member)-[memberOfRel:MEMBER_OF]->(group)
MATCH (membership:Membership {id: member.id + "_" + group.id})
MATCH (member)-[hasMembershipRel:HAS_MEMBERSHIP]->(membership)
MATCH (membership)-[ofGroupRel:OF_GROUP]->(group)
DELETE memberOfRel, hasMembershipRel, ofGroupRel, membership
Let’s delete MEMBER_OF then...
...not so fast!
As a member of several meetup groups
I want to find other similar meetup groups
that I’m not already a member of
So that I can join those groups
Why not delete MEMBER_OF?
MATCH (member:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
MATCH (member:Member {name: "Mark Needham"})
-[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
Why not delete MEMBER_OF?
MATCH (member:Member {name: "Mark Needham"})
-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:MEMBER_OF]->(otherGroup)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
MATCH (member:Member {name: "Mark Needham"})
-[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic)
WITH member, topic, COUNT(*) AS score
MATCH (topic)<-[:HAS_TOPIC]-(otherGroup)
WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group)
RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score
ORDER BY score DESC
433318 total db hits in 485 ms.
83268 total db hits in 117 ms.
Tip: Maintaining multiple models
‣ Different models perform better for
different queries but worse for others
‣ Optimising for reads may mean we pay a
write and maintenance penalty
What about events?
Events in my groups
As a member of several meetup groups who
has previously attended events
I want to find other events hosted by those
groups
So that I can attend those events
Events in my groups
As a member of several meetup
groups who has previously
attended events
I want to find other events
hosted by those groups
So that I can attend those events
WITH 24.0*60*60*1000 AS oneDay
MATCH (member:Member {name: "Mark Needham"}),
(member)-[:MEMBER_OF]->(group),
(group)-[:HOSTED_EVENT]->(futureEvent)
WHERE futureEvent.time >= timestamp()
AND NOT (member)-[:RSVPD]->(futureEvent)
RETURN group.name, futureEvent.name,
round((futureEvent.time - timestamp()) / oneDay) AS days
ORDER BY days
LIMIT 10
Events in my groups
Events in my groups
+ previous events attended
WITH 24.0*60*60*1000 AS oneDay
MATCH (member:Member {name: "Mark Needham"})
MATCH (futureEvent:Event)
WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent)
MATCH (futureEvent)<-[:HOSTED_EVENT]-(group)
WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember
OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group)
WHERE pastEvent.time < timestamp()
RETURN group.name,
futureEvent.name,
isMember,
COUNT(rsvp) AS previousEvents,
round((futureEvent.time - timestamp()) / oneDay) AS days
ORDER BY days, previousEvents DESC
+ previous events attended
RSVPD_YES vs RSVPD
I was curious whether refactoring
RSVPD {response: "yes"} to RSVPD_YES would have
any impact as Neo4j is optimised for querying
by unique relationship types.
Refactor to specific relationships
MATCH (m:Member)-[rsvp:RSVPD {response:"yes"}]->(event)
MERGE (m)-[rsvpYes:RSVPD_YES {id: rsvp.id}]->(event)
ON CREATE SET rsvpYes.created = rsvp.created,
rsvpYes.lastModified = rsvp.lastModified;
MATCH (m:Member)-[rsvp:RSVPD {response:"no"}]->(event)
MERGE (m)-[rsvpYes:RSVPD_NO {id: rsvp.id}]->(event)
ON CREATE SET rsvpYes.created = rsvp.created,
rsvpYes.lastModified = rsvp.lastModified;
RSVPD_YES vs RSVPD
RSVPD {response: "yes"}
vs
RSVPD_YES
Cypher version: CYPHER 2.3,
planner: COST.
688635 total db hits in 232 ms.
Cypher version: CYPHER 2.3,
planner: COST.
559866 total db hits in 207 ms.
Why would we keep RSVPD?
MATCH (m:Member)-[r:RSVPD]->(event)<-[:HOSTED_EVENT]-(group)
WHERE m.name = "Mark Needham"
RETURN event, group, r
MATCH (m:Member)-[r:RSVPD_YES|:RSVPD_NO|:RSVPD_WAITLIST]->(event),
(event)<-[:HOSTED_EVENT]-(group)
WHERE m.name = "Mark Needham"
RETURN event, group, r
Tip: Specific relationships
‣ Neo4j is optimised for querying by unique
relationship types…
‣ ...but sometimes we pay a query
maintenance cost to achieve this
+ events my friends are attending
There’s an implicit FRIENDS relationship
between people who attended the same events.
Let’s make it explicit.
M
E
M
RSVPD
RSVPD
FRIENDS
M
E
M
RSVPD
RSVPD
+ events my friends are attending
MATCH (m1:Member)
WHERE NOT m1:Processed
WITH m1 LIMIT {limit}
MATCH (m1)-[:RSVP_YES]->(event:Event)<-[:RSVP_YES]-(m2:Member)
WITH m1, m2, COLLECT(event) AS events, COUNT(*) AS times
WHERE times >= 5
WITH m1, m2, times, [event IN events | SIZE((event)<-[:RSVP_YES]-())] AS attendances
WITH m1, m2, REDUCE(score = 0.0, a IN attendances | score + (1.0 / a)) AS score
MERGE (m1)-[friendsRel:FRIENDS]-(m2)
SET friendsRel.score = row.score
Bidirectional relationships
‣ You may have noticed that we didn’t specify a
direction when creating the relationship
MERGE (m1)-[:FRIENDS]-(m2)
‣ FRIENDS is a bidirectional relationship. We only
need to create it once between two people.
‣ We ignore the direction when querying
+ events my friends are attending
WITH 24.0*60*60*1000 AS oneDay
MATCH (member:Member {name: "Mark Needham"})
MATCH (futureEvent:Event)
WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent)
MATCH (futureEvent)<-[:HOSTED_EVENT]-(group)
WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember
OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group)
WHERE pastEvent.time < timestamp()
WITH oneDay, group, futureEvent, member, isMember, COUNT(rsvp) AS previousEvents
OPTIONAL MATCH (futureEvent)<-[:HOSTED_EVENT]-()-[:HAS_TOPIC]->(topic)<-[:INTERESTED_IN]-(member)
WITH oneDay, group, futureEvent, member, isMember, previousEvents, COUNT(topic) AS topics
OPTIONAL MATCH (member)-[:FRIENDS]-(:Member)-[rsvpYes:RSVP_YES]->(futureEvent)
RETURN group.name, futureEvent.name, isMember, round((futureEvent.time - timestamp()) / oneDay) AS days,
previousEvents, topics, COUNT(rsvpYes) AS friendsGoing
ORDER BY days, friendsGoing DESC, previousEvents DESC
LIMIT 15
+ events my friends are attending
Tip: Bidirectional relationships
‣ Some relationships are bidirectional in nature
‣ Neo4j always stores relationships with a
direction but we can choose to ignore that
when we query
tl;dr
‣ Model incrementally
‣ Always profile your queries
‣ Consider making the implicit explicit…
• ...but beware the maintenance cost
‣ Be specific with relationship types
‣ Ignore direction for bidirectional relationships
That’s all for today!
Questions? :-)
Mark Needham @markhneedham
https://github.com/neo4j-meetups/modeling-worked-example

Tips and Tricks for Graph Data Modeling

  • 1.
    Graph Modeling Tips andTricks Mark Needham @markhneedham
  • 2.
  • 3.
  • 4.
    Recommendation queries ‣ Severaldifferent types • groups to join • topics to follow • events to attend ‣ As a user of meetup.com trying to find groups to join and events to attend
  • 5.
    How will thistalk be structured?
  • 6.
    Find similar groupsto Neo4j As a member of the Neo4j London group I want to find other similar meetup groups So that I can join those groups
  • 7.
  • 8.
    As a memberof the Neo4j London group I want to find other similar meetup groups So that I can join those groups Find similar groups to Neo4j
  • 9.
    As a memberof the Neo4j London group I want to find other similar meetup groups So that I can join those groups Nodes
  • 10.
    As a memberof the Neo4j London group I want to find other similar meetup groups So that I can join those groups Relationships
  • 11.
    As a memberof the Neo4j London group I want to find other similar meetup groups So that I can join those groups Labels
  • 12.
    As a memberof the Neo4j London group I want to find other similar meetup groups So that I can join those groups Properties
  • 13.
    Find similar groupsto Neo4j MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) AS topics ORDER BY topicsInCommon DESC, otherGroup.name LIMIT 10
  • 14.
  • 15.
    Tip: Model incrementally ‣Build the model for the question we need to answer now then move onto the next question
  • 16.
    I’m already amember of these!
  • 17.
    What other datacan we get?
  • 18.
    Exclude groups I’ma member of As a member of the Neo4j London group I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  • 19.
    Exclude groups I’ma member of As a member of the Neo4j London group I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  • 20.
    Exclude groups I’ma member of MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, EXISTS((:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(otherGroup)) AS alreadyMember, COLLECT(topic.name) AS topics ORDER BY topicsInCommon DESC LIMIT 10
  • 21.
  • 22.
    Exclude groups I’ma member of MATCH (group:Group {name: "Neo4j - London User Group"}) -[:HAS_TOPIC]->(topic)<-[:HAS_TOPIC]-(otherGroup:Group) WHERE NOT( (:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(otherGroup) ) RETURN otherGroup.name, COUNT(topic) AS topicsInCommon, COLLECT(topic.name) AS topics ORDER BY topicsInCommon DESC LIMIT 10
  • 23.
  • 24.
    Find my similargroups As a member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  • 25.
    Find my similargroups As a member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  • 26.
    Find my similargroups MATCH (member:Member {name: "Mark Needham"}) -[:INTERESTED_IN]->(topic), (member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC
  • 27.
  • 28.
  • 29.
  • 30.
    What is Jonnyinterested in? As a member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups INTERESTED_IN?
  • 31.
    What is Jonnyinterested in? There’s an implicit INTERESTED_IN relationship between the topics of groups I belong to but don’t express an interest in. Let’s make it explicit P G T MEMBER_OF HAS_TOPIC P G T MEMBER_OF HAS_TOPIC INTERESTED_IN
  • 32.
    What is Jonnyinterested in? MATCH (m:Member)-[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH m, topic, COUNT(*) AS times WHERE times > 3 MERGE (m)-[:INTERESTED_IN]->(topic)
  • 33.
    What is Jonnyinterested in?
  • 34.
    Tip: Make theimplicit explicit ‣ Fill in the missing links in the graph ‣ You could run this type of query once a day during a quiet period ‣ On bigger graphs we’d run it in batches to avoid loading the whole database into memory
  • 35.
    Find next grouppeople join As a member of a meetup group I want to find out which meetup groups other people join after this one So that I can join those groups
  • 36.
    Find next grouppeople join MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup) WHERE membership.joined < otherMembership.joined WITH member, otherGroup ORDER BY otherMembership.joined WITH member, COLLECT(otherGroup)[0] AS nextGroup RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC
  • 37.
    Find next grouppeople join
  • 38.
    It feels abit clunky... MATCH (group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup) WHERE membership.joined < otherMembership.joined WITH member, otherGroup ORDER BY otherMembership.joined WITH member, COLLECT(otherGroup)[0] AS nextGroup RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC ‣ We have to scan through all the MEMBER_OF relationships to find the one we want ‣ It might make our lives easier if we made membership a first class citizen of the domain
  • 39.
  • 40.
  • 41.
    Refactor to facts MATCH(member:Member)-[rel:MEMBER_OF]->(group) MERGE (membership:Membership {id: member.id + "_" + group.id}) SET membership.joined = rel.joined MERGE (member)-[:HAS_MEMBERSHIP]->(membership) MERGE (membership)-[:OF_GROUP]->(group)
  • 42.
    Refactor to facts MATCH(member:Member)-[:HAS_MEMBERSHIP]->(membership) WITH member, membership ORDER BY member.id, membership.joined WITH member, COLLECT(membership) AS memberships UNWIND RANGE(0,SIZE(memberships) - 2) as idx WITH memberships[idx] AS m1, memberships[idx+1] AS m2 MERGE (m1)-[:NEXT]->(m2)
  • 43.
    Find next grouppeople join MATCH (group:Group {name: "Neo4j - London User Group"}) <-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member) -[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup) RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC
  • 44.
    Comparing the approaches vs MATCH(group:Group {name: "Neo4j - London User Group"}) <-[membership:MEMBER_OF]-(member), (member)-[otherMembership:MEMBER_OF]->(otherGroup) WITH member, membership, otherMembership, otherGroup ORDER BY member.id, otherMembership.joined WHERE membership.joined < otherMembership.joined WITH member, membership, COLLECT(otherGroup)[0] AS nextGroup RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC MATCH (group:Group {name: "Neo4j - London User Group"}) <-[:OF_GROUP]-(membership)-[:NEXT]->(nextMembership), (membership)<-[:HAS_MEMBERSHIP]-(member:Member) -[:HAS_MEMBERSHIP]->(nextMembership), (nextMembership)-[:OF_GROUP]->(nextGroup) RETURN nextGroup.name, COUNT(*) AS times ORDER BY times DESC
  • 45.
    How do Iprofile a query? ‣ EXPLAIN • shows the execution plan without actually executing it or returning any results. ‣ PROFILE • executes the statement and returns the results along with profiling information. 45
  • 46.
    Neo4j’s longest plan(so far…) 46
  • 47.
    Neo4j’s longest plan(so far…) 47
  • 48.
    Neo4j’s longest plan(so far…) 48
  • 49.
    What is ourgoal? At a high level, the goal is simple: get the number of db hits down. 49
  • 50.
    an abstract unitof storage engine work. What is a database hit? “ ” 50
  • 51.
    Comparing the approaches Cypherversion: CYPHER 2.3, planner: COST. 111656 total db hits in 330 ms. vs Cypher version: CYPHER 2.3, planner: COST. 23650 total db hits in 60 ms.
  • 52.
    Tip: Profile yourqueries ‣ Spike the different models and see which one performs the best
  • 53.
    Should we keepboth models? We could but when we add, edit or remove a membership we’d have to keep both graph structures in sync.
  • 54.
    Adding a groupmembership WITH "Mark Needham" AS memberName, "Neo4j - London User Group" AS groupName, timestamp() AS now MATCH (group:Group {name: groupName}) MATCH (member:Member {name: memberName}) MERGE (member)-[memberOfRel:MEMBER_OF]->(group) ON CREATE SET memberOfRel.time = now MERGE (membership:Membership {id: member.id + "_" + group.id}) ON CREATE SET membership.joined = now MERGE (member)-[:HAS_MEMBERSHIP]->(membership) MERGE (membership)-[:OF_GROUP]->(group)
  • 55.
    Removing a groupmembership WITH "Mark Needham" AS memberName, "Neo4j - London User Group" AS groupName, timestamp() AS now MATCH (group:Group {name: groupName}) MATCH (member:Member {name: memberName}) MATCH (member)-[memberOfRel:MEMBER_OF]->(group) MATCH (membership:Membership {id: member.id + "_" + group.id}) MATCH (member)-[hasMembershipRel:HAS_MEMBERSHIP]->(membership) MATCH (membership)-[ofGroupRel:OF_GROUP]->(group) DELETE memberOfRel, hasMembershipRel, ofGroupRel, membership
  • 56.
  • 57.
    ...not so fast! Asa member of several meetup groups I want to find other similar meetup groups that I’m not already a member of So that I can join those groups
  • 58.
    Why not deleteMEMBER_OF? MATCH (member:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC MATCH (member:Member {name: "Mark Needham"}) -[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC
  • 59.
    Why not deleteMEMBER_OF? MATCH (member:Member {name: "Mark Needham"}) -[:MEMBER_OF]->(group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:MEMBER_OF]->(otherGroup) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC MATCH (member:Member {name: "Mark Needham"}) -[:HAS_MEMBERSHIP]->()-[:OF_GROUP]->(group:Group)-[:HAS_TOPIC]->(topic) WITH member, topic, COUNT(*) AS score MATCH (topic)<-[:HAS_TOPIC]-(otherGroup) WHERE NOT (member)-[:HAS_MEMBERSHIP]->(:Membership)-[:OF_GROUP]->(otherGroup:Group) RETURN otherGroup.name, COLLECT(topic.name), SUM(score) as score ORDER BY score DESC 433318 total db hits in 485 ms. 83268 total db hits in 117 ms.
  • 60.
    Tip: Maintaining multiplemodels ‣ Different models perform better for different queries but worse for others ‣ Optimising for reads may mean we pay a write and maintenance penalty
  • 61.
  • 62.
    Events in mygroups As a member of several meetup groups who has previously attended events I want to find other events hosted by those groups So that I can attend those events
  • 63.
    Events in mygroups As a member of several meetup groups who has previously attended events I want to find other events hosted by those groups So that I can attend those events
  • 64.
    WITH 24.0*60*60*1000 ASoneDay MATCH (member:Member {name: "Mark Needham"}), (member)-[:MEMBER_OF]->(group), (group)-[:HOSTED_EVENT]->(futureEvent) WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent) RETURN group.name, futureEvent.name, round((futureEvent.time - timestamp()) / oneDay) AS days ORDER BY days LIMIT 10 Events in my groups
  • 65.
  • 66.
    + previous eventsattended WITH 24.0*60*60*1000 AS oneDay MATCH (member:Member {name: "Mark Needham"}) MATCH (futureEvent:Event) WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent) MATCH (futureEvent)<-[:HOSTED_EVENT]-(group) WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group) WHERE pastEvent.time < timestamp() RETURN group.name, futureEvent.name, isMember, COUNT(rsvp) AS previousEvents, round((futureEvent.time - timestamp()) / oneDay) AS days ORDER BY days, previousEvents DESC
  • 67.
  • 68.
    RSVPD_YES vs RSVPD Iwas curious whether refactoring RSVPD {response: "yes"} to RSVPD_YES would have any impact as Neo4j is optimised for querying by unique relationship types.
  • 69.
    Refactor to specificrelationships MATCH (m:Member)-[rsvp:RSVPD {response:"yes"}]->(event) MERGE (m)-[rsvpYes:RSVPD_YES {id: rsvp.id}]->(event) ON CREATE SET rsvpYes.created = rsvp.created, rsvpYes.lastModified = rsvp.lastModified; MATCH (m:Member)-[rsvp:RSVPD {response:"no"}]->(event) MERGE (m)-[rsvpYes:RSVPD_NO {id: rsvp.id}]->(event) ON CREATE SET rsvpYes.created = rsvp.created, rsvpYes.lastModified = rsvp.lastModified;
  • 70.
    RSVPD_YES vs RSVPD RSVPD{response: "yes"} vs RSVPD_YES Cypher version: CYPHER 2.3, planner: COST. 688635 total db hits in 232 ms. Cypher version: CYPHER 2.3, planner: COST. 559866 total db hits in 207 ms.
  • 71.
    Why would wekeep RSVPD? MATCH (m:Member)-[r:RSVPD]->(event)<-[:HOSTED_EVENT]-(group) WHERE m.name = "Mark Needham" RETURN event, group, r MATCH (m:Member)-[r:RSVPD_YES|:RSVPD_NO|:RSVPD_WAITLIST]->(event), (event)<-[:HOSTED_EVENT]-(group) WHERE m.name = "Mark Needham" RETURN event, group, r
  • 72.
    Tip: Specific relationships ‣Neo4j is optimised for querying by unique relationship types… ‣ ...but sometimes we pay a query maintenance cost to achieve this
  • 73.
    + events myfriends are attending There’s an implicit FRIENDS relationship between people who attended the same events. Let’s make it explicit. M E M RSVPD RSVPD FRIENDS M E M RSVPD RSVPD
  • 74.
    + events myfriends are attending MATCH (m1:Member) WHERE NOT m1:Processed WITH m1 LIMIT {limit} MATCH (m1)-[:RSVP_YES]->(event:Event)<-[:RSVP_YES]-(m2:Member) WITH m1, m2, COLLECT(event) AS events, COUNT(*) AS times WHERE times >= 5 WITH m1, m2, times, [event IN events | SIZE((event)<-[:RSVP_YES]-())] AS attendances WITH m1, m2, REDUCE(score = 0.0, a IN attendances | score + (1.0 / a)) AS score MERGE (m1)-[friendsRel:FRIENDS]-(m2) SET friendsRel.score = row.score
  • 75.
    Bidirectional relationships ‣ Youmay have noticed that we didn’t specify a direction when creating the relationship MERGE (m1)-[:FRIENDS]-(m2) ‣ FRIENDS is a bidirectional relationship. We only need to create it once between two people. ‣ We ignore the direction when querying
  • 76.
    + events myfriends are attending WITH 24.0*60*60*1000 AS oneDay MATCH (member:Member {name: "Mark Needham"}) MATCH (futureEvent:Event) WHERE futureEvent.time >= timestamp() AND NOT (member)-[:RSVPD]->(futureEvent) MATCH (futureEvent)<-[:HOSTED_EVENT]-(group) WITH oneDay, group, futureEvent, member, EXISTS((group)<-[:MEMBER_OF]-(member)) AS isMember OPTIONAL MATCH (member)-[rsvp:RSVPD {response: "yes"}]->(pastEvent)<-[:HOSTED_EVENT]-(group) WHERE pastEvent.time < timestamp() WITH oneDay, group, futureEvent, member, isMember, COUNT(rsvp) AS previousEvents OPTIONAL MATCH (futureEvent)<-[:HOSTED_EVENT]-()-[:HAS_TOPIC]->(topic)<-[:INTERESTED_IN]-(member) WITH oneDay, group, futureEvent, member, isMember, previousEvents, COUNT(topic) AS topics OPTIONAL MATCH (member)-[:FRIENDS]-(:Member)-[rsvpYes:RSVP_YES]->(futureEvent) RETURN group.name, futureEvent.name, isMember, round((futureEvent.time - timestamp()) / oneDay) AS days, previousEvents, topics, COUNT(rsvpYes) AS friendsGoing ORDER BY days, friendsGoing DESC, previousEvents DESC LIMIT 15
  • 77.
    + events myfriends are attending
  • 78.
    Tip: Bidirectional relationships ‣Some relationships are bidirectional in nature ‣ Neo4j always stores relationships with a direction but we can choose to ignore that when we query
  • 79.
    tl;dr ‣ Model incrementally ‣Always profile your queries ‣ Consider making the implicit explicit… • ...but beware the maintenance cost ‣ Be specific with relationship types ‣ Ignore direction for bidirectional relationships
  • 80.
    That’s all fortoday! Questions? :-) Mark Needham @markhneedham https://github.com/neo4j-meetups/modeling-worked-example