Copyright © Objectivity, Inc. 2013
Using A Distributed Graph Database To Make Sense Of Disparate Data
Stores
Leon Guzenda
Dataweek
San Francisco – October 2, 2013
 Current Big Data Analytics
 Graph Analytics
 InfiniteGraph
 The ETL & Discovery Process
Copyright © Objectivity, Inc. 2013
Objectivity Inc.
• Objectivity, Inc. is headquartered in Sunnyvale, CA.
• Objectivity has over two decades of Big Data and NoSQL experience
• We develop NoSQL platforms for managing and discovering relationships and
patterns in complex data:
–Objectivity/DB - an object database that manages localized, centralized or
distributed databases
–InfiniteGraph - a massively scalable graph database built on Objectivity/DB that
enables organizations to find, store and exploit the relationships in their data
 Millions of deployments - Our technology is embedded in hundreds of enterprise
and government systems and commercial products
Copyright © Objectivity, Inc. 2013
A Typical Objectivity Deployment - Sensor Data Fusion
Network Centric Collaborative Targeting
Copyright © Objectivity, Inc. 2013
A Typical InfiniteGraph Deployment - GraphMyLife
Copyright © Objectivity, Inc. 2013
A Typical “Big Data” Analytics Setup
Data Aggregation and Analytics Applications
Commodity Linux Platforms and/or High Performance Computing Clusters
Structured Semi-Structured Unstructured
Graph
DB
Object
DB
Doc DB K-V StoreHadoop
Column
Store
Data W/HRDBMS
Copyright © Objectivity, Inc. 2013
Incremental Analytics Improvements Aren’t Enough
All current solutions use the same basic architectural model
• None of the popular solutions have an efficient way to store connections
between entities in different silos
• Most analytic technology focuses on the content of the data nodes, rather
than the many kinds of connections between the nodes and the data in those
connections
• Why? Because traditional and earlier NoSQL solutions are bad at handling
relationships.
• Graph databases can efficiently store, manage and query the many kinds of
relationships hidden in the data.
Copyright © Objectivity, Inc. 2013
Graph Analytics
Copyright © Objectivity, Inc. 2013
Graph (Relationship) Analytics...
A SQL Shortcoming
Think about the SQL query for finding all links between the two “blue” rows... it's hard!!
Table_A Table_B Table_C Table_D Table_E Table_F Table_G
There are some kinds of complex relationship handling problems that SQL
wasn't designed for.
Copyright © Objectivity, Inc. 2013
...Graph Analytics
InfiniteGraph - The solution can be found with a few lines of code
A SQL Shortcoming
A3 G4
Table_A Table_B Table_C Table_D Table_E Table_F Table_G
Copyright © Objectivity, Inc. 2013
Applications for Graph Analytics
LOGISTICS
HEALTHCARE INFORMATICS
MARKET ANALYSIS SOCIAL NETWORK ANALYSIS
Representing the Graph...
Combatant A
Civilian Q
Situation Y
Civilian P
Bank X
Civilian S
Civilian R
Events/Places People/Orgs Facts
Situation X
The existing COMINT and HUMINT data might look like this:
Target T
Cafe C S Seen Near TA Banks at X
A Called P
A Seen At Y
A Seen Near X P Emailed S
P Called Q Q Seen Near T
P Called R R Seen Near T
X Paid S
A Eats At
Representing the Graph...
Combatant A
Civilian Q
Situation Y
Civilian P
Civilian S
Civilian R
Events/Places People/Orgs Facts
Situation X
Target T
We start by identifying the nodes (Vertices) and the connections (Edges)
NODES CONNECTIONS
S Seen Near TA Banks at X
A Called P
A Seen At Y
A Seen Near X P Emailed S
P Called Q Q Seen Near T
P Called R R Seen Near T
X Paid SBank X
Cafe C
A Eats At
VERTEX EDGE
2 N
...Representing the Graph..
“Nodes” “Connections”
...Representing the Graph..
Situation X Combatant ASeen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
Eats At
Cafe C
VERTEX EDGE“Nodes” “Connections”
...Analyzing the Graph...
Situation X Combatant ASeen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
Eats At
Cafe C
...Threat Analysis
Situation X Combatant ASeen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
SUSPECTS
NEEDS PROTECTION
Copyright © Objectivity, Inc. 2013
Visual Analytics
Copyright © Objectivity, Inc. 2013
Graphs Can Scale Very Quickly
We often hear about the “trillion row” database. Amazon S3 has reached 2 trillion,
but one Objectivity site:
• Processes 10s of trillions of objects per day
• Supports over 1000 analysts around the clock.
Consider a graph where each node has 10 connections:
• At 6 degrees of freedom, finding a path between two nodes may require traversing
a million links.
• 9 degrees of freedom requires a billion traversals
• 12 degrees of freedom requires a trillion traversals
• 15 degrees of freedom requires a quadrillion traversals...
Copyright © Objectivity, Inc. 2013
THE ETL & DISCOVERY PROCESS
Copyright © Objectivity, Inc. 2013
Not Only SQL – A group of 4 primary technologies
Simple Highly
Interconnected
Copyright © Objectivity, Inc. 2013
• A high performance distributed database engine that supports analyst-time decision
support and actionable intelligence
• Cost effective link analysis – flexible deployment on commodity resources (hardware
and OS).
• Efficient, scalable, risk averse technology – enterprise proven.
• High Speed parallel ingest to load graph data quickly.
• Parallel, distributed queries
• Flexible plugin architecture
• Complementary technology
• Fast proof of concept – easy to use Graph API.
InfiniteGraph - The Enterprise Graph Database
Copyright © Objectivity, Inc. 2013
InfiniteGraph Capabilities
Parallel Graph Traversal Inclusive or Exclusive
Selection
X
X
Shortest or All Paths Between Objects
Start Start
Start Finish Start
Compute Cost To Date
Visualize
Computational & Visualization Plug-Ins
Copyright © Objectivity, Inc. 2013
A Powerful InfiniteGraph Query
San Francisco
Palo Alto
Hillsboro
Oakland
Pacifica
Palo Alto Cupertino
San Jose
Half Moon Bay
Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose
// Policies: Depth_First, Exclude Railway_Edge, Exclude_Road_Edge
// Calculate: Cost_To_This_City()
// Navigate: From “San Francisco” To “San Jose”
// Visualizer: Map_Cheapest_Route
// Visualizer: List_Cost_Breakdown.
Water
Rail
Road
Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose
// Note: This is pseudocode, not the actual Java statements.
Copyright © Objectivity, Inc. 2013
Copyright © Objectivity, Inc. 2012
Recognizing Graphs In Object Models...
Tree Structures
Graph (Network) Structures
Relationship Data
Object Class A
Object Class A
1-to-Many Relationship
Data
Object Class A
Many-to-Many
Object Class A
Copyright © Objectivity, Inc. 2013
Copyright © Objectivity, Inc. 2012
...Recognizing Graphs In Object Models
Tree Structures
Graph (Network) Structures
Relationship Data
Object Class A
Object Class A
1-to-Many Relationship
Data
Object Class A
Many-to-Many
Object Class A
EDGE
VERTEX
GRAPH MODEL
Copyright © Objectivity, Inc. 2013
The ETL Process
ETL Tools/Applications
Commodity Linux Platforms and/or High Performance Computing Clusters
Structured Semi-Structured
Object
DB
Graph
DB
Unstructured
Doc DB K-V StoreHadoop
Column
Store
Data W/HRDBMS
Nodes & Edges
Copyright © Objectivity, Inc. 2013
Commonly Used Graph Algorithms...
 Connectedness
 Node degree
 Shortest Path
 Average path length
 Transitive Closure
 Graph diameter (or Span)
 Centrality (Betweeness, Degree and Closeness)
In the graph below, node D has the highest betweeness centrality
Copyright © Objectivity, Inc. 2013
Data Visualization
& Analytics
Big Data
Connection
Platform
*Now HP *Now IBM
Conventional & Relationship Analytics
ORACLE Big
Data
Solutions
+
A Typical Deployment Supplements Traditional or Big Data Systems With Graph Analytics
Copyright © Objectivity, Inc. 2013
Online Demo - Call Detail Record Analysis
Used in law enforcement, counter-terrorism and Customer Resource Management
Copyright © Objectivity, Inc. 2013
Thank You!
Please take a look at objectivity.com
For InfiniteGraph Online Demos, White Papers, Free
Downloads, Samples & Tutorials
and visit our booth for a demonstration

Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

  • 1.
    Copyright © Objectivity,Inc. 2013 Using A Distributed Graph Database To Make Sense Of Disparate Data Stores Leon Guzenda Dataweek San Francisco – October 2, 2013  Current Big Data Analytics  Graph Analytics  InfiniteGraph  The ETL & Discovery Process
  • 2.
    Copyright © Objectivity,Inc. 2013 Objectivity Inc. • Objectivity, Inc. is headquartered in Sunnyvale, CA. • Objectivity has over two decades of Big Data and NoSQL experience • We develop NoSQL platforms for managing and discovering relationships and patterns in complex data: –Objectivity/DB - an object database that manages localized, centralized or distributed databases –InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data  Millions of deployments - Our technology is embedded in hundreds of enterprise and government systems and commercial products
  • 3.
    Copyright © Objectivity,Inc. 2013 A Typical Objectivity Deployment - Sensor Data Fusion Network Centric Collaborative Targeting
  • 4.
    Copyright © Objectivity,Inc. 2013 A Typical InfiniteGraph Deployment - GraphMyLife
  • 5.
    Copyright © Objectivity,Inc. 2013 A Typical “Big Data” Analytics Setup Data Aggregation and Analytics Applications Commodity Linux Platforms and/or High Performance Computing Clusters Structured Semi-Structured Unstructured Graph DB Object DB Doc DB K-V StoreHadoop Column Store Data W/HRDBMS
  • 6.
    Copyright © Objectivity,Inc. 2013 Incremental Analytics Improvements Aren’t Enough All current solutions use the same basic architectural model • None of the popular solutions have an efficient way to store connections between entities in different silos • Most analytic technology focuses on the content of the data nodes, rather than the many kinds of connections between the nodes and the data in those connections • Why? Because traditional and earlier NoSQL solutions are bad at handling relationships. • Graph databases can efficiently store, manage and query the many kinds of relationships hidden in the data.
  • 7.
    Copyright © Objectivity,Inc. 2013 Graph Analytics
  • 8.
    Copyright © Objectivity,Inc. 2013 Graph (Relationship) Analytics... A SQL Shortcoming Think about the SQL query for finding all links between the two “blue” rows... it's hard!! Table_A Table_B Table_C Table_D Table_E Table_F Table_G There are some kinds of complex relationship handling problems that SQL wasn't designed for.
  • 9.
    Copyright © Objectivity,Inc. 2013 ...Graph Analytics InfiniteGraph - The solution can be found with a few lines of code A SQL Shortcoming A3 G4 Table_A Table_B Table_C Table_D Table_E Table_F Table_G
  • 10.
    Copyright © Objectivity,Inc. 2013 Applications for Graph Analytics LOGISTICS HEALTHCARE INFORMATICS MARKET ANALYSIS SOCIAL NETWORK ANALYSIS
  • 11.
    Representing the Graph... CombatantA Civilian Q Situation Y Civilian P Bank X Civilian S Civilian R Events/Places People/Orgs Facts Situation X The existing COMINT and HUMINT data might look like this: Target T Cafe C S Seen Near TA Banks at X A Called P A Seen At Y A Seen Near X P Emailed S P Called Q Q Seen Near T P Called R R Seen Near T X Paid S A Eats At
  • 12.
    Representing the Graph... CombatantA Civilian Q Situation Y Civilian P Civilian S Civilian R Events/Places People/Orgs Facts Situation X Target T We start by identifying the nodes (Vertices) and the connections (Edges) NODES CONNECTIONS S Seen Near TA Banks at X A Called P A Seen At Y A Seen Near X P Emailed S P Called Q Q Seen Near T P Called R R Seen Near T X Paid SBank X Cafe C A Eats At
  • 13.
    VERTEX EDGE 2 N ...Representingthe Graph.. “Nodes” “Connections”
  • 14.
    ...Representing the Graph.. SituationX Combatant ASeen Near Civilian P Called Called Seen At Situation Y Civilian Q Target T Seen Near Emailed Banks At Bank X Civilian S Seen Near Called Civilian R Seen Near Paid Eats At Cafe C VERTEX EDGE“Nodes” “Connections”
  • 15.
    ...Analyzing the Graph... SituationX Combatant ASeen Near Civilian P Called Called Seen At Situation Y Civilian Q Target T Seen Near Emailed Banks At Bank X Civilian S Seen Near Called Civilian R Seen Near Paid Eats At Cafe C
  • 16.
    ...Threat Analysis Situation XCombatant ASeen Near Civilian P Called Called Seen At Situation Y Civilian Q Target T Seen Near Emailed Banks At Bank X Civilian S Seen Near Called Civilian R Seen Near Paid SUSPECTS NEEDS PROTECTION
  • 17.
    Copyright © Objectivity,Inc. 2013 Visual Analytics
  • 18.
    Copyright © Objectivity,Inc. 2013 Graphs Can Scale Very Quickly We often hear about the “trillion row” database. Amazon S3 has reached 2 trillion, but one Objectivity site: • Processes 10s of trillions of objects per day • Supports over 1000 analysts around the clock. Consider a graph where each node has 10 connections: • At 6 degrees of freedom, finding a path between two nodes may require traversing a million links. • 9 degrees of freedom requires a billion traversals • 12 degrees of freedom requires a trillion traversals • 15 degrees of freedom requires a quadrillion traversals...
  • 19.
    Copyright © Objectivity,Inc. 2013 THE ETL & DISCOVERY PROCESS
  • 20.
    Copyright © Objectivity,Inc. 2013 Not Only SQL – A group of 4 primary technologies Simple Highly Interconnected
  • 21.
    Copyright © Objectivity,Inc. 2013 • A high performance distributed database engine that supports analyst-time decision support and actionable intelligence • Cost effective link analysis – flexible deployment on commodity resources (hardware and OS). • Efficient, scalable, risk averse technology – enterprise proven. • High Speed parallel ingest to load graph data quickly. • Parallel, distributed queries • Flexible plugin architecture • Complementary technology • Fast proof of concept – easy to use Graph API. InfiniteGraph - The Enterprise Graph Database
  • 22.
    Copyright © Objectivity,Inc. 2013 InfiniteGraph Capabilities Parallel Graph Traversal Inclusive or Exclusive Selection X X Shortest or All Paths Between Objects Start Start Start Finish Start Compute Cost To Date Visualize Computational & Visualization Plug-Ins
  • 23.
    Copyright © Objectivity,Inc. 2013 A Powerful InfiniteGraph Query San Francisco Palo Alto Hillsboro Oakland Pacifica Palo Alto Cupertino San Jose Half Moon Bay Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose // Policies: Depth_First, Exclude Railway_Edge, Exclude_Road_Edge // Calculate: Cost_To_This_City() // Navigate: From “San Francisco” To “San Jose” // Visualizer: Map_Cheapest_Route // Visualizer: List_Cost_Breakdown. Water Rail Road Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose // Note: This is pseudocode, not the actual Java statements.
  • 24.
    Copyright © Objectivity,Inc. 2013 Copyright © Objectivity, Inc. 2012 Recognizing Graphs In Object Models... Tree Structures Graph (Network) Structures Relationship Data Object Class A Object Class A 1-to-Many Relationship Data Object Class A Many-to-Many Object Class A
  • 25.
    Copyright © Objectivity,Inc. 2013 Copyright © Objectivity, Inc. 2012 ...Recognizing Graphs In Object Models Tree Structures Graph (Network) Structures Relationship Data Object Class A Object Class A 1-to-Many Relationship Data Object Class A Many-to-Many Object Class A EDGE VERTEX GRAPH MODEL
  • 26.
    Copyright © Objectivity,Inc. 2013 The ETL Process ETL Tools/Applications Commodity Linux Platforms and/or High Performance Computing Clusters Structured Semi-Structured Object DB Graph DB Unstructured Doc DB K-V StoreHadoop Column Store Data W/HRDBMS Nodes & Edges
  • 27.
    Copyright © Objectivity,Inc. 2013 Commonly Used Graph Algorithms...  Connectedness  Node degree  Shortest Path  Average path length  Transitive Closure  Graph diameter (or Span)  Centrality (Betweeness, Degree and Closeness) In the graph below, node D has the highest betweeness centrality
  • 28.
    Copyright © Objectivity,Inc. 2013 Data Visualization & Analytics Big Data Connection Platform *Now HP *Now IBM Conventional & Relationship Analytics ORACLE Big Data Solutions + A Typical Deployment Supplements Traditional or Big Data Systems With Graph Analytics
  • 29.
    Copyright © Objectivity,Inc. 2013 Online Demo - Call Detail Record Analysis Used in law enforcement, counter-terrorism and Customer Resource Management
  • 30.
    Copyright © Objectivity,Inc. 2013 Thank You! Please take a look at objectivity.com For InfiniteGraph Online Demos, White Papers, Free Downloads, Samples & Tutorials and visit our booth for a demonstration