Big Data – Hadoop - NoSQL and Graph Database
Ramazan FIRIN
20.11.2012




  This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are
  confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission
  of this document in any manner to any third parties that are not authorised to receive.
AGENDA

•   Big Data
•   Hadoop
•   NoSQL
•   Graph DB and Neoj
•   Possible Usage in Tellco
•   Demo




                               2
Executive Summary

       • Big Data is a new IT trend

       • Hadoop and NoSQL can used to process Big Data

       • Possible usage area in Tellco :
           - Prevent Churn
            - to offer customer spesific campaign
            - to get more customer




AVEA                                   3                 R&D /MW Developement
What is Big Data?




   Datasets that are too awkward to work with using traditional,
             hands-ondatabase management tools.




                                 4
Big Data- 3V Concept




                       5
Big Data Sources

1.   Social network profiles -Facebook, LinkedIn, Yahoo, Google
2.   Social influencers - blog comments, user forums, review sites,
3.   Activity-generated data - application logs, sensor data
4.   Public—Wikipedia, IMDb, etc
5.   Data warehouse appliances - transactional data
6.   Network and in-stream monitoring
7.   Legacy documents—




                                    6
Big Data To Smart Data




 Cover of The Economist
                          7
Volume




         8
New Data Sources - Internet


•   2 Billion internet users by 2011
•   Twitter processes 7 terabytes data of every day
•   Facebook processes 10 terabytes data of every day
•   4.6 billion mobile phone
•   Google processes 24 petabytes data of every day




                                       9
Big Data Approach




                    10
Big Data Design




                  11
Big Data Usage Sector




                        12
Sample Usage - 360°Degree View of the
Customers




                      13
Sample Usage – Customer Sentiment




                     14
Sample Usage – Detect Churn Pattern




                      15
Sample Usage - Healty




                        16
Big Data Market




                  17
Big Data Solutions – Oracle Big Data Appliance




                       18
Big Data Solutions – IBM Pure Data




                       19
TOP 10 Tecnology Trend 2012 from CSC




                     20
Gartner: Top 10 IT Trends for 2013




Avea                    21           21R&D /MW Developement
Gartner:10 Critical IT Trends For The Next Five
Years

•      Third trend is Bigger data and storage:
•      By 2015, big data demand will generate 1 million jobs in the Global
       1000,
•      but only a one-third of jobs will get filled due to shortage of talent.
•      Analytics and pattern recognition are key.
•      Seeing new specialized ARM-based servers to do specialty analytics.




Avea                                      22                        22R&D /MW Developement
HADOOP




  23
What is HADOOP?




     The Apache Hadoop software library is a framework that
    allows for the distributed processing of large data sets
  across clusters of computers using simple programming models




                               24
History




          25
Hadoop Components




                    26
HADOOP ARCHITECTURE




                      27
Hadoop Ecosystem




Pig - simplifies hadoop programming, data processing language
Hive - SQL like queries
HBase - Random read/write, billions of row and millions of colums
  (NoSQL)

                                   28
Other Google Research




                        29
NoSQL




 30
RDBMS PERFORMANCE




Avea            31   31R&D /MW Developement
Join is killer...




Avea                32   32R&D /MW Developement
What is NoSQL?


•       Stands for Not Only SQL
•       Non relational
•       Cheap, Easy to implement
•       Scalability
    –   Vertically - Add more data
    –   Horizontally - Add more storage
•       No pre-defined schema
•       No join operations
•       Not ACID, support CAP threom



                                          33
NoSQL DB Types


1. Key-values Stores
2. Document Databases
3. Column Family Stores
4. Graph Databases




                          34
Key-Value Stores




 -   Redis, Voldemort
                   35
Document Database




- CouchDB, MongoDB
                    36
Column Family Stores




 -   Cassandra, HBase
                       37
Graph Database




- Neo4J, InfoGrid, Infinite Graph
                 38
RMDBS Support ACID



•   Atomicity - a transaction is all or nothing
•   Consistency - only valid data is written to the database
•   Isolation - pretend all transactions are happening serially and the data
    is correct
•   Durability - what you write is what you get




                                       39
NoSQL Support CAP Threom




                    40
NoSQL Support CAP Theorem




•   Consistency - each client always has the same view of the data.
•   Availability - all clients can always read and write.
•   Partition tolerance - if one or more nodes fails the system still works



                     You can pick only two...


                                        41
Visual Guide to NoSQL Systems




Avea                 42         42R&D /MW Developement
NoSQL Complexity




                   43
NoSQL Performance




                    44
Job Trends




Avea         45   45R&D /MW Developement
Graph DB and Neo4j




       46
Graph DB

Graph database uses graph structures with nodes, edges, and properties
  to represent and store data.




                                  47
Graph DB Usage Area



•   Recommendations             •   Time Series data
•   Business Inteligence        •   Product Catalogue
•   Social networking           •   Web Analitics
•   MDM                         •   Scientific Computing
•   System Management           •   Indexing your slow
                                    RMDBS


                           48
Relational Databases are Graphs!




                       49
Neo4j


•   Leading Graph         •   Opensource
    Database
•   Transaction           •   Traversal framework
    support (ACID)
                          •   High Performance
•   Indexing                  (traverse 1.000.000 +
                              relationship/seconds)
•   Querying
•   REST support          •   Robust (in 7/24 operation
                              since 2003)
•   Disk Based
                          •   Massive scalability
                     50
Neo4j Data Model


Neo4j has Nodes and Relationship.
Nodes and realtionships have properties.


                      Relationship type : knows
             Node1    Property          : Date of meeting   Node2
                              Relationship
                                                            Property:name
   Property:name
                                                            Property:surname
   Property:surname




                                        51
Ne4j Performance




http://www.neotechnology.com/2012/10/20-billion-relationships-imported-
   into-neo4j-on-ec2/

                                   52
Who use Neo4j?




•    Cisco - Master Data Management
•    Telenor Group : Customer organization scructure (203 million
     subscribers )
•    Deutsche Telekom: Social football site (150 million subscribers )
                                    53
Cypher For Query




                   54
Sample Code




              55
Spring Data Neo4j




                    56
Neoclipse




            57
Product Catalog




Avea              58   58R&D /MW Developement
Sample OM Data Model




                       59
Hardware Calculating Tool




                      60
Hardware Calculating Tool Result


Calculation Result             Prod Environment
                           •   4 pysical machines
                           •   3 node at every machines
                           •   1024 mhz cpu
                           •   65536 MB Ram




                      61
Orient DB


•   The Document-Graph              •   HTTP / Restfull / Json /
    database                            Binary supports
•   ACID support                    •   Hooks
•   SQL and Native Queries,         •   Fetch plans
•   schema-less, schema-full        •   Inheritance
    and schema-mixed modes
                                    •   200.000 insert per
•   Roles + Security                    second(6 M node travels
                                        with cache)
•   Functions

                               62
FluxGraph

•     Temporal Graph Database
•     Has checkpoint
•     Compatible with Neo4j




Mercedes-Benz Türk A.Ş.         63   632008-07-01_Presentation Template MBT / CEO
Examples for TelCos


•      CDR
•      Routing
•      Social graphs
•      Master Data Management
•      Spatial and LBS
•      Network topology analysis
•      Neo4j and Android




Avea                               64   64R&D /MW Developement
CDR Analysis




Avea           65   65R&D /MW Developement
Master Data Management




Avea                     66   66R&D /MW Developement
Network Management




Avea                 67   67R&D /MW Developement
Cell Network Analiysis




Avea                     68   68R&D /MW Developement
Sample Senarios



•   Customer Spesific Campaign
•   Prevent Churn
•   Get More Customer
•   Special offer for campaigns




                                  69
Thanks




  70

Big data hadoop-no sql and graph db-final

  • 1.
    Big Data –Hadoop - NoSQL and Graph Database Ramazan FIRIN 20.11.2012 This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission of this document in any manner to any third parties that are not authorised to receive.
  • 2.
    AGENDA • Big Data • Hadoop • NoSQL • Graph DB and Neoj • Possible Usage in Tellco • Demo 2
  • 3.
    Executive Summary • Big Data is a new IT trend • Hadoop and NoSQL can used to process Big Data • Possible usage area in Tellco : - Prevent Churn - to offer customer spesific campaign - to get more customer AVEA 3 R&D /MW Developement
  • 4.
    What is BigData? Datasets that are too awkward to work with using traditional, hands-ondatabase management tools. 4
  • 5.
    Big Data- 3VConcept 5
  • 6.
    Big Data Sources 1. Social network profiles -Facebook, LinkedIn, Yahoo, Google 2. Social influencers - blog comments, user forums, review sites, 3. Activity-generated data - application logs, sensor data 4. Public—Wikipedia, IMDb, etc 5. Data warehouse appliances - transactional data 6. Network and in-stream monitoring 7. Legacy documents— 6
  • 7.
    Big Data ToSmart Data Cover of The Economist 7
  • 8.
  • 9.
    New Data Sources- Internet • 2 Billion internet users by 2011 • Twitter processes 7 terabytes data of every day • Facebook processes 10 terabytes data of every day • 4.6 billion mobile phone • Google processes 24 petabytes data of every day 9
  • 10.
  • 11.
  • 12.
    Big Data UsageSector 12
  • 13.
    Sample Usage -360°Degree View of the Customers 13
  • 14.
    Sample Usage –Customer Sentiment 14
  • 15.
    Sample Usage –Detect Churn Pattern 15
  • 16.
    Sample Usage -Healty 16
  • 17.
  • 18.
    Big Data Solutions– Oracle Big Data Appliance 18
  • 19.
    Big Data Solutions– IBM Pure Data 19
  • 20.
    TOP 10 TecnologyTrend 2012 from CSC 20
  • 21.
    Gartner: Top 10IT Trends for 2013 Avea 21 21R&D /MW Developement
  • 22.
    Gartner:10 Critical ITTrends For The Next Five Years • Third trend is Bigger data and storage: • By 2015, big data demand will generate 1 million jobs in the Global 1000, • but only a one-third of jobs will get filled due to shortage of talent. • Analytics and pattern recognition are key. • Seeing new specialized ARM-based servers to do specialty analytics. Avea 22 22R&D /MW Developement
  • 23.
  • 24.
    What is HADOOP? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models 24
  • 25.
  • 26.
  • 27.
  • 28.
    Hadoop Ecosystem Pig -simplifies hadoop programming, data processing language Hive - SQL like queries HBase - Random read/write, billions of row and millions of colums (NoSQL) 28
  • 29.
  • 30.
  • 31.
    RDBMS PERFORMANCE Avea 31 31R&D /MW Developement
  • 32.
    Join is killer... Avea 32 32R&D /MW Developement
  • 33.
    What is NoSQL? • Stands for Not Only SQL • Non relational • Cheap, Easy to implement • Scalability – Vertically - Add more data – Horizontally - Add more storage • No pre-defined schema • No join operations • Not ACID, support CAP threom 33
  • 34.
    NoSQL DB Types 1.Key-values Stores 2. Document Databases 3. Column Family Stores 4. Graph Databases 34
  • 35.
    Key-Value Stores - Redis, Voldemort 35
  • 36.
  • 37.
    Column Family Stores - Cassandra, HBase 37
  • 38.
    Graph Database - Neo4J,InfoGrid, Infinite Graph 38
  • 39.
    RMDBS Support ACID • Atomicity - a transaction is all or nothing • Consistency - only valid data is written to the database • Isolation - pretend all transactions are happening serially and the data is correct • Durability - what you write is what you get 39
  • 40.
  • 41.
    NoSQL Support CAPTheorem • Consistency - each client always has the same view of the data. • Availability - all clients can always read and write. • Partition tolerance - if one or more nodes fails the system still works You can pick only two... 41
  • 42.
    Visual Guide toNoSQL Systems Avea 42 42R&D /MW Developement
  • 43.
  • 44.
  • 45.
    Job Trends Avea 45 45R&D /MW Developement
  • 46.
    Graph DB andNeo4j 46
  • 47.
    Graph DB Graph databaseuses graph structures with nodes, edges, and properties to represent and store data. 47
  • 48.
    Graph DB UsageArea • Recommendations • Time Series data • Business Inteligence • Product Catalogue • Social networking • Web Analitics • MDM • Scientific Computing • System Management • Indexing your slow RMDBS 48
  • 49.
  • 50.
    Neo4j • Leading Graph • Opensource Database • Transaction • Traversal framework support (ACID) • High Performance • Indexing (traverse 1.000.000 + relationship/seconds) • Querying • REST support • Robust (in 7/24 operation since 2003) • Disk Based • Massive scalability 50
  • 51.
    Neo4j Data Model Neo4jhas Nodes and Relationship. Nodes and realtionships have properties. Relationship type : knows Node1 Property : Date of meeting Node2 Relationship Property:name Property:name Property:surname Property:surname 51
  • 52.
  • 53.
    Who use Neo4j? • Cisco - Master Data Management • Telenor Group : Customer organization scructure (203 million subscribers ) • Deutsche Telekom: Social football site (150 million subscribers ) 53
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
    Product Catalog Avea 58 58R&D /MW Developement
  • 59.
  • 60.
  • 61.
    Hardware Calculating ToolResult Calculation Result Prod Environment • 4 pysical machines • 3 node at every machines • 1024 mhz cpu • 65536 MB Ram 61
  • 62.
    Orient DB • The Document-Graph • HTTP / Restfull / Json / database Binary supports • ACID support • Hooks • SQL and Native Queries, • Fetch plans • schema-less, schema-full • Inheritance and schema-mixed modes • 200.000 insert per • Roles + Security second(6 M node travels with cache) • Functions 62
  • 63.
    FluxGraph • Temporal Graph Database • Has checkpoint • Compatible with Neo4j Mercedes-Benz Türk A.Ş. 63 632008-07-01_Presentation Template MBT / CEO
  • 64.
    Examples for TelCos • CDR • Routing • Social graphs • Master Data Management • Spatial and LBS • Network topology analysis • Neo4j and Android Avea 64 64R&D /MW Developement
  • 65.
    CDR Analysis Avea 65 65R&D /MW Developement
  • 66.
    Master Data Management Avea 66 66R&D /MW Developement
  • 67.
    Network Management Avea 67 67R&D /MW Developement
  • 68.
    Cell Network Analiysis Avea 68 68R&D /MW Developement
  • 69.
    Sample Senarios • Customer Spesific Campaign • Prevent Churn • Get More Customer • Special offer for campaigns 69
  • 70.

Editor's Notes

  • #2 This template can be used as a starter file to give updates for project milestones.SectionsRight-click on a slide to add sections. Sections can help to organize your slides or facilitate collaboration between multiple authors.NotesUse the Notes section for delivery notes or to provide additional details for the audience. View these notes in Presentation View during your presentation. Keep in mind the font size (important for accessibility, visibility, videotaping, and online production)Coordinated colors Pay particular attention to the graphs, charts, and text boxes.Consider that attendees will print in black and white or grayscale. Run a test print to make sure your colors work when printed in pure black and white and grayscale.Graphics, tables, and graphsKeep it simple: If possible, use consistent, non-distracting styles and colors.Label all graphs and tables.
  • #3 What is the project about?Define the goal of this projectIs it similar to projects in the past or is it a new effort?Define the scope of this projectIs it an independent project or is it related to other projects?* Note that this slide is not necessary for weekly status meetings
  • #34 * If any of these issues caused a schedule delay or need to be discussed further, include details in next slide.
  • #35 Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
  • #52 Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
  • #53 Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.
  • #71 Duplicate this slide as necessary if there is more than one issue.This and related slides can be moved to the appendix or hidden if necessary.