Morning with MongoDB


 wifi: DanAcademy
 #MongoDBIsrael
Agenda


09.30 – Welcome
09.40 – Introduction to MongoDB
10.10 – MongoDB Fundamentals
10.45 – Coffee
11.00 – Uri Cohen – Giga Spaces
11.45 – Yuval Sapir – IMBA Games
12.30 – What’s Next?
12.50 – Prize Draw
#MongoDBIsrael




Business Development Director, 10gen
10gen Overview




                     10gen is the
                     company behind
                     MongoDB –
                     the leading
                     NoSQL
                     database


                 4
10gen Overview




                     170+
                     employees




                 5
10gen Overview




                     500+
                     customers




                 6
10gen Overview




                     $73M
                     in funding from
                     top investors



                 7
Leading Organizations Rely on
MongoDB




                  8
Global MongoDB Community
41,000+
Monthly Unique Downloads
24,000+
Online Education Registrants
12,000+
MongoDB User Group Members
10,000+
Annual MongoDB Days Attendees
mongoDB Adoption

Resource           User Data Management




              10
Database Industry
Database Evolution #1
Database Evolution #2
Database Evolution #3
Organizations are becoming frustrated using a
RDBMS.
 Productivity decreases                                 Productivity
 • Needed to add new software
   layers of ORM, Caching,
   Sharding, Message Queue
 • Polymorphic, semi-structured
   and unstructured data not well
   supported




 Costs                              Cost of database increases
                                    • Vertical, not horizontal, scaling
                                    • High cost of SAN
NoSQL Values, For Which Audience?
 What




                       17
Databases in the Future Audience?
What Values, For Which




                       18
Why MongoDB?
European Clients
MongoDB is a scalable, high-performance NoSQL
database.




 • Open source, written in C++   • Full featured indexes, query
 • Document-oriented Storage       language
    – Based on JSON Documents    • Replication & High Availability
    – Schema-less
                                 • Auto-sharding
Relational Database Challenges

 Data Types                                    Agile Development
 •Unstructured data                            •Iterative
 •Semi-structured data                         •Short development cycles
 •Polymorphic data                             •New workloads




Volume of Data                                  New Architectures
•Petabytes of data                              •Horizontal scaling
•Trillions of records                           •Commodity servers
•Tens of millions of queries per second         •Cloud computing



                                          22
Volume of Data



                      Volume of Data
                      •Petabytes of data
                      •Trillions of records
                      •Millions of queries per second




                 23
Data Types



 {
     _id : ObjectId("4c4ba5e5e8aabf3"),
                                               Data Types
     employee_name: "Dunham, Justin",
     department : "Marketing",
                                               •Unstructured data
                                               •Semi-structured data
     title : "Product Manager, Web",
     report_up: "Neray, Graham",
     pay_band: “C",
     benefits : [
            { type : "Health",
                                               •Polymorphic data
               plan : "PPO Plus" },
            { type :    "Dental",
               plan : "Standard" }
                ]
 }




                                          24
Agile Development




                    Agile Development
                    •Iterative
                    •Short development cycles
                    •New workloads




               25
MongoDB Use Cases
  Content Management              Operational Intelligence




 E-Commerce       User Data Management   High Volume Data Feeds
Problem                        Why MongoDB                                 Impact
 A need to extract value from
 A need to extract value from       Built around scalability, with
                                     Built around scalability, with      Priority Moments project is
                                                                          Priority Moments project is
    existing semi-structured
    existing semi-structured             auto-sharding features
                                          auto-sharding features                 a strong success
                                                                                  a strong success
      data sources (social
       data sources (social             mongoDB deployment
                                        mongoDB deployment                Subsequent adoption of
                                                                           Subsequent adoption of
         networks etc.)
          networks etc.)                architecture prevents any
                                        architecture prevents any              mongoDB by O2 &
                                                                                mongoDB by O2 &
  A fast-growing customer-
  A fast-growing customer-               single point of failure
                                           single point of failure          Telefonica across a large
                                                                            Telefonica across a large
   base required any solution
   base required any solution         Geospatial indexing out-of-
                                      Geospatial indexing out-of-             number of projects
                                                                               number of projects
      to be easily scalable
      to be easily scalable             the-box enables location-
                                        the-box enables location-
                                          based service delivery
                                          based service delivery




“Selecting MongoDB as our database platform was a no brainer as the technology offered us the flexibility
                    and scalability that we knew we’d need for Priority Moments.”
                                                                Andrew Pattinson, Head of Online Delivery
Problem                        Why MongoDB                               Impact
    RDBMS architecture
     RDBMS architecture              Flexible data model allows
                                      Flexible data model allows            The Guardian has
                                                                              The Guardian has
   constrained their ability to
   constrained their ability to        for heterogeneous structure
                                       for heterogeneous structure          competitive advantage,
                                                                            competitive advantage,
        absorb upstream
        absorb upstream                   Rich query language
                                          Rich query language              through enabling social
                                                                            through enabling social
    contributions from users
     contributions from users             preserves functionality
                                          preserves functionality         conversations through the
                                                                           conversations through the
 New features, competitions
 New features, competitions          System updates with zero
                                       System updates with zero                      site
                                                                                       site
  needed to log data into user
  needed to log data into user                   downtime
                                                 downtime                Interactive features can be
                                                                         Interactive features can be
   records, requiring schema
    records, requiring schema        Ease of use, allowing a large
                                     Ease of use, allowing a large         delivered more quickly,
                                                                            delivered more quickly,
             changes
             changes                   development team to adopt
                                        development team to adopt             which translates to
                                                                               which translates to
                                          the technology quickly
                                           the technology quickly             increased revenues
                                                                               increased revenues




“Relational databases have a sound approach, but that doesn’t necessarily match the way we see our data.
 mongoDB gave us the flexibility to store data in the way that we understand it as opposed to somebody’s
                                             theoretical view.”
                                                                          Philip Wills, Software Architect
New Architectures




                     New Architectures
                     •Horizontal scaling
                     •Commodity servers
                     •Cloud computing




                29
30
Summary Solution
MongoDB

          Document-Oriented Database




  Agile            Scalable            Best TCO
Best Total Cost of Ownership
 (TCO)
Developer and Ops Savings
•Less code
•More productive development
•Easier to maintain

Hardware Savings
•Commodity servers
•Internal storage (no SAN)
•Scale out, not up

Software and Support Savings
•No upfront license – pay for value   DB Alternative
   over time
•Cost visibility for usage growth
Relational Database Challenges

 Data Types                                    Agile Development
 •Unstructured data                            •Iterative
 •Semi-structured data                         •Short development cycles
 •Polymorphic data                             •New workloads




Volume of Data                                  New Architectures
•Petabytes of data                              •Horizontal scaling
•Trillions of records                           •Commodity servers
•Tens of millions of queries per second         •Cloud computing



                                          34
Morning with MongoDB


 wifi: DanAcademy
 #MongoDBIsrael
Summary Solution
MongoDB

          Document-Oriented Database




  Agile            Scalable            Best TCO
For Developers / Architects Audience?
  What Values, For Which

  • Agility / Flexibility
     – Schema-Free
     – Easy to get started


  • Performance
     – Significant improvement over RDBMS


  • Features
     – Rich-Query Language, Aggregation Framework, Map-
       Reduce
                             37
For Operations For Which Audience?
  What Values,

  • Automation & Scaling
    – Sharding
    – High-Availability


  • Resilience, Disaster Recovery
    – Write-concerns, granular control,
    – Cross data centre sharding




                             38
What Values, For Which Audience?
For Executives


  • Competitive Advantage
    – Faster time-to-market
    – Accessible real-time analyics
    – Flexible (low-risk) deployments


  • Commodity Infrastructure
    – Lower TCO than proprietary RDBMS




                             39
What Values, For Which Audience?
Work with us – Services and Support


  • Community Support (free)
    – Mongo User Group Monday 17th Dec, 7pm
    – Shalom Tower – Meet-up
    – Online Education education.10gen.com
  • Commercial Services (not free)
    –   Developer Support
    –   Consulting onsite & remote
    –   Production Support
    –   Managed Hosting, Public/Private Cloud
    –   Training (Developer/DBA)
    –   OEM
                              40
Summary Solution
MongoDB

          Document-Oriented Database




  Agile            Scalable            Best TCO
Email: dan.harris@10gen.com
LinkedIn: danharris1pgr
Twitter: danharris75

Welcome and Introduction to A Morning with MongoDB Petah Tikvah

  • 1.
    Morning with MongoDB wifi: DanAcademy #MongoDBIsrael
  • 2.
    Agenda 09.30 – Welcome 09.40– Introduction to MongoDB 10.10 – MongoDB Fundamentals 10.45 – Coffee 11.00 – Uri Cohen – Giga Spaces 11.45 – Yuval Sapir – IMBA Games 12.30 – What’s Next? 12.50 – Prize Draw
  • 3.
  • 4.
    10gen Overview 10gen is the company behind MongoDB – the leading NoSQL database 4
  • 5.
    10gen Overview 170+ employees 5
  • 6.
    10gen Overview 500+ customers 6
  • 7.
    10gen Overview $73M in funding from top investors 7
  • 8.
  • 9.
    Global MongoDB Community 41,000+ MonthlyUnique Downloads 24,000+ Online Education Registrants 12,000+ MongoDB User Group Members 10,000+ Annual MongoDB Days Attendees
  • 10.
    mongoDB Adoption Resource User Data Management 10
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Organizations are becomingfrustrated using a RDBMS. Productivity decreases Productivity • Needed to add new software layers of ORM, Caching, Sharding, Message Queue • Polymorphic, semi-structured and unstructured data not well supported Costs Cost of database increases • Vertical, not horizontal, scaling • High cost of SAN
  • 16.
    NoSQL Values, ForWhich Audience? What 17
  • 17.
    Databases in theFuture Audience? What Values, For Which 18
  • 18.
  • 19.
  • 20.
    MongoDB is ascalable, high-performance NoSQL database. • Open source, written in C++ • Full featured indexes, query • Document-oriented Storage language – Based on JSON Documents • Replication & High Availability – Schema-less • Auto-sharding
  • 21.
    Relational Database Challenges Data Types Agile Development •Unstructured data •Iterative •Semi-structured data •Short development cycles •Polymorphic data •New workloads Volume of Data New Architectures •Petabytes of data •Horizontal scaling •Trillions of records •Commodity servers •Tens of millions of queries per second •Cloud computing 22
  • 22.
    Volume of Data Volume of Data •Petabytes of data •Trillions of records •Millions of queries per second 23
  • 23.
    Data Types { _id : ObjectId("4c4ba5e5e8aabf3"), Data Types employee_name: "Dunham, Justin", department : "Marketing", •Unstructured data •Semi-structured data title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", •Polymorphic data plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] } 24
  • 24.
    Agile Development Agile Development •Iterative •Short development cycles •New workloads 25
  • 25.
    MongoDB Use Cases Content Management Operational Intelligence E-Commerce User Data Management High Volume Data Feeds
  • 26.
    Problem Why MongoDB Impact  A need to extract value from  A need to extract value from  Built around scalability, with  Built around scalability, with  Priority Moments project is  Priority Moments project is existing semi-structured existing semi-structured auto-sharding features auto-sharding features a strong success a strong success data sources (social data sources (social  mongoDB deployment  mongoDB deployment  Subsequent adoption of  Subsequent adoption of networks etc.) networks etc.) architecture prevents any architecture prevents any mongoDB by O2 & mongoDB by O2 &  A fast-growing customer-  A fast-growing customer- single point of failure single point of failure Telefonica across a large Telefonica across a large base required any solution base required any solution  Geospatial indexing out-of-  Geospatial indexing out-of- number of projects number of projects to be easily scalable to be easily scalable the-box enables location- the-box enables location- based service delivery based service delivery “Selecting MongoDB as our database platform was a no brainer as the technology offered us the flexibility and scalability that we knew we’d need for Priority Moments.” Andrew Pattinson, Head of Online Delivery
  • 27.
    Problem Why MongoDB Impact  RDBMS architecture  RDBMS architecture  Flexible data model allows  Flexible data model allows  The Guardian has  The Guardian has constrained their ability to constrained their ability to for heterogeneous structure for heterogeneous structure competitive advantage, competitive advantage, absorb upstream absorb upstream  Rich query language  Rich query language through enabling social through enabling social contributions from users contributions from users preserves functionality preserves functionality conversations through the conversations through the  New features, competitions  New features, competitions  System updates with zero  System updates with zero site site needed to log data into user needed to log data into user downtime downtime  Interactive features can be  Interactive features can be records, requiring schema records, requiring schema  Ease of use, allowing a large  Ease of use, allowing a large delivered more quickly, delivered more quickly, changes changes development team to adopt development team to adopt which translates to which translates to the technology quickly the technology quickly increased revenues increased revenues “Relational databases have a sound approach, but that doesn’t necessarily match the way we see our data. mongoDB gave us the flexibility to store data in the way that we understand it as opposed to somebody’s theoretical view.” Philip Wills, Software Architect
  • 28.
    New Architectures New Architectures •Horizontal scaling •Commodity servers •Cloud computing 29
  • 29.
  • 30.
    Summary Solution MongoDB Document-Oriented Database Agile Scalable Best TCO
  • 31.
    Best Total Costof Ownership (TCO) Developer and Ops Savings •Less code •More productive development •Easier to maintain Hardware Savings •Commodity servers •Internal storage (no SAN) •Scale out, not up Software and Support Savings •No upfront license – pay for value DB Alternative over time •Cost visibility for usage growth
  • 32.
    Relational Database Challenges Data Types Agile Development •Unstructured data •Iterative •Semi-structured data •Short development cycles •Polymorphic data •New workloads Volume of Data New Architectures •Petabytes of data •Horizontal scaling •Trillions of records •Commodity servers •Tens of millions of queries per second •Cloud computing 34
  • 33.
    Morning with MongoDB wifi: DanAcademy #MongoDBIsrael
  • 34.
    Summary Solution MongoDB Document-Oriented Database Agile Scalable Best TCO
  • 35.
    For Developers /Architects Audience? What Values, For Which • Agility / Flexibility – Schema-Free – Easy to get started • Performance – Significant improvement over RDBMS • Features – Rich-Query Language, Aggregation Framework, Map- Reduce 37
  • 36.
    For Operations ForWhich Audience? What Values, • Automation & Scaling – Sharding – High-Availability • Resilience, Disaster Recovery – Write-concerns, granular control, – Cross data centre sharding 38
  • 37.
    What Values, ForWhich Audience? For Executives • Competitive Advantage – Faster time-to-market – Accessible real-time analyics – Flexible (low-risk) deployments • Commodity Infrastructure – Lower TCO than proprietary RDBMS 39
  • 38.
    What Values, ForWhich Audience? Work with us – Services and Support • Community Support (free) – Mongo User Group Monday 17th Dec, 7pm – Shalom Tower – Meet-up – Online Education education.10gen.com • Commercial Services (not free) – Developer Support – Consulting onsite & remote – Production Support – Managed Hosting, Public/Private Cloud – Training (Developer/DBA) – OEM 40
  • 39.
    Summary Solution MongoDB Document-Oriented Database Agile Scalable Best TCO
  • 40.

Editor's Notes

  • #4 Ok, so here are the presenters notes. Your first job is to add you name and other useful stuff so that your students can contact you afterwards. This is a good time to - introduce yourself - create a seating chart, get each student to say their name, company and what they want to learn... and write it on your seating chart
  • #7 Note: Growth refers to year-to-date revenue based on our fiscal years for 2011 and 2012, i.e., it compares Feb-Oct 2011 (calendar year) to Feb-Oct 2012 (calendar). These figures are unaudited and subject to change.
  • #12 Ok, so here are the presenters notes. Your first job is to add you name and other useful stuff so that your students can contact you afterwards. This is a good time to - introduce yourself - create a seating chart, get each student to say their name, company and what they want to learn... and write it on your seating chart
  • #44 A highlight of some key features in 2.4. . . . We ’ll add more details and more items each month as we work towards a winter release. Security: SASL is a framework for authentication that helps decouple specific authentication mechanisms from client/server implementation. This framework will permit working with a variety of authentication mechanisms, initially we ’ll build in kerberos. We may add others over time, but SASL implementation will make it much easier for you to add your own without having to implement a new client. Kerberos is quite common, so we ’ll build that one in first. With additional authentication, we want to take a few steps to separate out activities authorized to various users. Separate read, read/write, security administration, database-specific (compact, validate, etc.), and server/cluster administration (fsync, log rotate, shutdown, create database, etc.). This is just an initial step in our authorization work. Hash-based sharding Apply a hash function to a selected key as the shard key. Evenly spread documents in a sharded cluster. Evenly spread the work associated with queries in a sharded cluster. Will minimize migrations (should only happen when growing a cluster). Note: this is something you can do now, but not automatic. Geospatial index resolution: Talk about challenge of specifying some polygon and finding overlap with another polygon in a document, this becomes interesting for location-aware applications, intelligence community. Replica set flapping: avoid electing a new primary due to a falsely detecting that the current primary went down. Adding mechanisms to reduce false detections. This is good for heavy load and network issues/blips in a data center.
  • #46 Ok, so here are the presenters notes. Your first job is to add you name and other useful stuff so that your students can contact you afterwards. This is a good time to - introduce yourself - create a seating chart, get each student to say their name, company and what they want to learn... and write it on your seating chart
  • #48 A highlight of some key features in 2.4. . . . We ’ll add more details and more items each month as we work towards a winter release. Security: SASL is a framework for authentication that helps decouple specific authentication mechanisms from client/server implementation. This framework will permit working with a variety of authentication mechanisms, initially we ’ll build in kerberos. We may add others over time, but SASL implementation will make it much easier for you to add your own without having to implement a new client. Kerberos is quite common, so we ’ll build that one in first. With additional authentication, we want to take a few steps to separate out activities authorized to various users. Separate read, read/write, security administration, database-specific (compact, validate, etc.), and server/cluster administration (fsync, log rotate, shutdown, create database, etc.). This is just an initial step in our authorization work. Hash-based sharding Apply a hash function to a selected key as the shard key. Evenly spread documents in a sharded cluster. Evenly spread the work associated with queries in a sharded cluster. Will minimize migrations (should only happen when growing a cluster). Note: this is something you can do now, but not automatic. Geospatial index resolution: Talk about challenge of specifying some polygon and finding overlap with another polygon in a document, this becomes interesting for location-aware applications, intelligence community. Replica set flapping: avoid electing a new primary due to a falsely detecting that the current primary went down. Adding mechanisms to reduce false detections. This is good for heavy load and network issues/blips in a data center.