Application Modeling
with Graph Databases



              http://joind.in/6694
@josh_adell

• Software developer: PHP, Javascript, SQL
• http://www.dunnwell.com
• http://blog.everymansoftware.com

• http://github.com/jadell/neo4jphp
• http://frostymug.herokuapp.com
The Problem
The Solution?

> -- First degree
> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title
FROM cast WHERE actor_name='Kevin Bacon')

> -- Second degree
> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title
FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN
(SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon')))

> -- Third degree
> SELECT actor_name FROM cast WHERE movie_title IN(SELECT DISTINCT movie_title
FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN
(SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT
actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM
cast WHERE actor_name='Kevin Bacon'))))
The Truth

Relational databases aren't very good with relationships


                       Data



                      RDBMs
RDBs Use Set Math
Try again?
Right Tool for the Job



          =
Warning: Computer Science Ahead

     A graph is an ordered pair G = (V, E)

       where V is a set of vertices and

            E is a set of edges,
       which are pairs of vertices in V.
Graphs are Everywhere
Relational Databases are Graphs!
Everything is connected
Some Graph Use Cases

•   Social networking
•   Manufacturing
•   Map directions
•   Geo-spatial algorithms
•   Fraud detection
•   Multi-tenancy
•   Dependency mapping
•   Bioinformatics
•   Natural language processing
Graphs are "Whiteboard-Friendly"




   Nouns => nodes, Verbs => relationships
Back to Bacon




START s=node:actors(name="Keanu Reeves"),
      e=node:actors(name="Kevin Bacon")

MATCH p = shortestPath( s-[*]-e )

RETURN p, length(p)
                                            http://tinyurl.com/c65d99w
ACL

• Users can belong to groups
• Groups can belong to groups
• Groups and users have permissions on objects
  o read
  o write
  o denied
START u=node:users(name="User 3")
MATCH u-[:belongs_to*]->g
RETURN g

                                    http://tinyurl.com/cyn3rkx
START u=node:users(name="User 2"),
      o=node:objects(name="Home")
MATCH u-[:belongs_to*0..]->g,
      g-[:can_read]->o
RETURN g
                                     http://tinyurl.com/dx7onro
START u=node:users(name="User 3"),
      o=node:objects(name="Users 1 Blog")
MATCH u-[:belongs_to*0..]->g,
      g-[:can_read]->o,
      u-[d?:denied*]->o
WHERE d is null
RETURN g
                                            http://tinyurl.com/bwtyhvt
Real Life Example

• Companies have brands, locations, location groups
• Brands have locations, location groups
• Location groups have locations
START c=node:companies(name="Company 1")
MATCH c-[:HAS*]->l
WHERE l.type = 'location'
RETURN l ORDER BY l.name
                                           http://tinyurl.com/cxm4heh
START b=node:brands(name="Brand 1")
MATCH b<-[:HAS*]-c-[:HAS*]->l<-[h?:HAS*]-b
WHERE h IS NULL AND l.type='location'
RETURN l ORDER BY l.name
                                         http://tinyurl.com/cl537w6
Tweet

    @chicken_tech
we should be using graph dbs!
But Wait...There's More!

•   Mutating Cypher (insert, update)
•   Indexing (auto, full-text, spatial)
•   Batches and Transactions
•   Embedded (for JVM) or REST
Where fore art thou, RDB?

•   Aggregation
•   Ordered data
•   Truly tabular data
•   Few or clearly defined relationships
Questions?
Resources

• http://joind.in/6694

• http://neo4j.org
• http://docs.neo4j.org
• http://www.youtube.com/watch?v=UodTzseLh04

• http://github.com/jadell/neo4jphp

•   http://joshadell.com
•   josh.adell@gmail.com
•   @josh_adell
•   Google+, Facebook, LinkedIn

Application Modeling with Graph Databases

  • 1.
    Application Modeling with GraphDatabases http://joind.in/6694
  • 2.
    @josh_adell • Software developer:PHP, Javascript, SQL • http://www.dunnwell.com • http://blog.everymansoftware.com • http://github.com/jadell/neo4jphp • http://frostymug.herokuapp.com
  • 3.
  • 4.
    The Solution? > --First degree > SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon') > -- Second degree > SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon'))) > -- Third degree > SELECT actor_name FROM cast WHERE movie_title IN(SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon'))))
  • 5.
    The Truth Relational databasesaren't very good with relationships Data RDBMs
  • 6.
  • 7.
  • 8.
    Right Tool forthe Job =
  • 10.
    Warning: Computer ScienceAhead A graph is an ordered pair G = (V, E) where V is a set of vertices and E is a set of edges, which are pairs of vertices in V.
  • 11.
  • 12.
  • 13.
  • 14.
    Some Graph UseCases • Social networking • Manufacturing • Map directions • Geo-spatial algorithms • Fraud detection • Multi-tenancy • Dependency mapping • Bioinformatics • Natural language processing
  • 15.
    Graphs are "Whiteboard-Friendly" Nouns => nodes, Verbs => relationships
  • 16.
    Back to Bacon STARTs=node:actors(name="Keanu Reeves"), e=node:actors(name="Kevin Bacon") MATCH p = shortestPath( s-[*]-e ) RETURN p, length(p) http://tinyurl.com/c65d99w
  • 17.
    ACL • Users canbelong to groups • Groups can belong to groups • Groups and users have permissions on objects o read o write o denied
  • 18.
    START u=node:users(name="User 3") MATCHu-[:belongs_to*]->g RETURN g http://tinyurl.com/cyn3rkx
  • 19.
    START u=node:users(name="User 2"), o=node:objects(name="Home") MATCH u-[:belongs_to*0..]->g, g-[:can_read]->o RETURN g http://tinyurl.com/dx7onro
  • 20.
    START u=node:users(name="User 3"), o=node:objects(name="Users 1 Blog") MATCH u-[:belongs_to*0..]->g, g-[:can_read]->o, u-[d?:denied*]->o WHERE d is null RETURN g http://tinyurl.com/bwtyhvt
  • 21.
    Real Life Example •Companies have brands, locations, location groups • Brands have locations, location groups • Location groups have locations
  • 22.
    START c=node:companies(name="Company 1") MATCHc-[:HAS*]->l WHERE l.type = 'location' RETURN l ORDER BY l.name http://tinyurl.com/cxm4heh
  • 23.
    START b=node:brands(name="Brand 1") MATCHb<-[:HAS*]-c-[:HAS*]->l<-[h?:HAS*]-b WHERE h IS NULL AND l.type='location' RETURN l ORDER BY l.name http://tinyurl.com/cl537w6
  • 24.
    Tweet @chicken_tech we should be using graph dbs!
  • 25.
    But Wait...There's More! • Mutating Cypher (insert, update) • Indexing (auto, full-text, spatial) • Batches and Transactions • Embedded (for JVM) or REST
  • 26.
    Where fore artthou, RDB? • Aggregation • Ordered data • Truly tabular data • Few or clearly defined relationships
  • 27.
  • 28.
    Resources • http://joind.in/6694 • http://neo4j.org •http://docs.neo4j.org • http://www.youtube.com/watch?v=UodTzseLh04 • http://github.com/jadell/neo4jphp • http://joshadell.com • josh.adell@gmail.com • @josh_adell • Google+, Facebook, LinkedIn

Editor's Notes

  • #2 * Goal here is to inspire further investigation * Not going to go into nuts &amp; bolts * Docs are amazing!
  • #3 * graph db usage poll
  • #4 * Six degrees game * Relational databases can&apos;t easily answer certain types of questions * arbitrary path query * the basic unit of social networking
  • #5 * Each degree adds a join * Increases complexity * Decreases performance * Stop when the actor you&apos;re looking for is in the list
  • #6 * this problem highlights the ugly truth about RDBs * they weren&apos;t designed to handle these types of problems. * RDB relationships join data, but are not data in themselves * arbitrary path query * RDB does &amp;quot;query&amp;quot;, not &amp;quot;path&amp;quot; * certainly not &amp;quot;arbitrary&amp;quot;
  • #7 * Gather everything in the set that matches these criteria, then tell me if this thing is in the set * 1 set, no problem * 2nd set no problem * 3rd set not related to 1st * 4th not related to 2nd * 5th related to 1st and 4th * etc. * Relationships are only available between overlapping sets
  • #8 * avoid schema lock-in * intuitive * ditch digger&apos;s dilemma
  • #9 * Neo4j * AGPL for community * ACID compliant * High Availablity mode * Embedded and REST
  • #10 * Neo4j * AGPL for community * ACID compliant * High Availablity mode * Embedded and REST * Bindings for every language
  • #11 * graph theory * edges can be ordered or unordered pairs * vocab: - vertex -&gt; node - edge -&gt; relationship
  • #12 * Tree data-structures * Networks * Maps * vehicles on streets == packets through network * social networking * manufacturing * fraud detection * supply chain
  • #13 * Make each record a node * Make every foreign key a relationship * RDB indexes are usually stored in a tree structure * Trees are graphs * Why not use RDBs? * The trouble with RDBs is how they are stored in memory and queried   * Require a translation step from memory blocks to graph structure * ORMs hide the problem, but do not solve it * Relationships not first-class citizens * Many problem domains map poorly to rows/tables
  • #14 The zen of graph databases
  • #15 * Social networking - friends of friends of friends of friends * Assembly/Manufacturing - 1 widget contains 3 gadgets each contain 2 gizmos * Map directions - starting at my house find a route to the office that goes past the pub * Multi-tenancy - root node per tenant * all queries start at root * No overlap between graphs = no accidental data spillage * Fraud: track transactions back to origination * Pretty much anything that can be drawn on a whiteboard
  • #16 * Example: retail system * Customer makes Order * Store sells Order * Order contains Items * Supplier supplied Items * Customer rates Items * Did this customer rank supplier X highly? * Which suppliers sell the highest rated items? * Does item A get rated higher when ordered with Item B? * All can be answered with RDBs as well * Not as elegant * Not as performant
  • #17 * Actors are nodes * Movies are nodes * Relationship: Actor is IN a movie * Compare to degree selection join queries
  • #19 * all groups user 3 is a member of directly or inherited
  • #20 * does user 2 have permission to read the home page?
  • #21 * does user 3 have permission to read the user 1&apos;s blog?
  • #23 * Find all locations in company
  • #24 * For a given brand, find all locations not under that brand
  • #27 * RDBs are really good at data aggregation * Set math, duh * Have to traverse the whole graph in order to do aggregation * Truly tabular means not a lot of relationships between the data types * Neo4j guys say: rdb will tell you the salary of everyone in the room; graph db will tell you who will buy you a beer
  • #29 * Emil Eifrem (Neo Tech CEO) webinar * Check out 54 minute mark