Brig Lamoreaux
             Shekhar Vemuri
                Apollo Group
                    12/12/12

brig.lamoreaux@apollogrp.edu
 briglamoreaux.wordpress.com1
•  Tell	
  our	
  story	
  
•  Teach	
  evalua/on	
  Principles	
  
•  Share	
  our	
  Results	
  




                                          2
•    Company	
  Overview:	
  Who	
  we	
  are.	
  
•    Background:	
  Looking	
  beyond	
  rela3onal	
  
•    The	
  Problem:	
  We	
  have	
  no	
  exper3se	
  in	
  MongoDB	
  
•    Approaching	
  MongoDB?	
  How	
  do	
  we	
  solve	
  this	
  problem	
  
•    Results.	
  What	
  did	
  we	
  learn	
  




                                                                                  3
Company Overview




                   4
•  Founded	
  in	
  1973	
  
•  Leading	
  provider	
  of	
  higher	
  educa3on	
  for	
  working	
  adults	
  
•  Parent	
  company	
  of	
  	
  
     –    University	
  of	
  Phoenix	
  
     –    Apollo	
  Global	
  
     –    Apollo	
  Educa3on	
  Services	
  
     –    Carnegie	
  Learning	
  
     –    College	
  of	
  Financial	
  Planning	
  
     –    Ins3tute	
  for	
  Professional	
  Development	
  
•  Educate	
  over	
  350	
  thousand	
  students	
  per	
  year	
  



                                                                                     5
6
7
8
9
10
11
12
13
2
_        1
         _
     >   10
17
              14
15
The Problem




              16
17
We don’t know anything about MongoDB




                                       18
Methodology / Approach




                         19
•  Gain	
  valuable	
  informa3on	
  	
  about	
  terrain	
  
•  Viable	
  vs	
  100%	
  
•  Boyed	
  Loop	
  
       –  Observe	
  
       –  Orient	
  
       –  Decide	
  
       –  Act	
  
	
  



                                                                20
21
•  Gain	
  valuable	
  informa3on	
  	
  about	
  terrain	
  
•  Accept	
  Threshold	
  	
  
       –  Viable	
  vs	
  100%	
  
•  Boyed	
  Loop	
  
       –  Observe	
  
       –  Orient	
  
       –  Decide	
  
       –  Act	
  
	
  


                                                                22
•      Problem	
  
•      Objec3ve	
  
•      Timetable	
  
•      Gather	
  Info	
  
	
  




                            23
The Results




              24
Two	
  week	
  phased	
  approach	
  
	
  
•  Phase	
  1.	
  Form	
  Team,	
  goals,	
  data	
  
•  Phase	
  2.	
  Develop	
  Model,	
  small	
  server	
  
•  Phase	
  3.	
  Large	
  deployment	
  
•  Phase	
  4.	
  Performance	
  test	
  




                                                             25
Implemen3ng	
  a	
  new	
  repository	
  solu3on	
  introduces	
  
new	
  areas	
  of	
  needs	
  such	
  as:	
  
	
  
•  Plan	
  and	
  deploy	
  a	
  solu3on	
  
•  Opera3onal	
  procedures	
  
•  Designing	
  object	
  models	
  
•  Determine	
  MongoDB	
  Client	
  and	
  Frameworks	
  
•  Measuring	
  effec3veness	
  
	
  

                                                                     26
What	
  do	
  we	
  want	
  to	
  know	
  about	
  MongoDB	
  
	
  
•      Resiliency	
  
•      Stability	
  
•      Adaptability	
  of	
  Data	
  Model	
  
•      Performance	
  
•      Configura3on	
  Flexibility	
  
•      Time	
  to	
  Implement	
  
•      Administrator	
  Func3onality	
  
•      Training	
  
•      Data	
  Migra3on	
  
•      Conformity	
  with	
  Standards	
  
•      Quality	
  of	
  Support	
  
                                                                 27
28
Conference   10gen Training 10gen        Lab
                                            Consulting   Env.

Run Book
(Deploy)             X            X                         X
Run Book
(Maintenance)        X            X                         X
Object Model
                     X            X              X          X
Measure
Effectiveness                                               X
Java Client
                                                            X

                                                                29
Course	
  Offering	
  System:	
  
	
  
•  Manages	
  Courses	
  
•  Manages	
  enrolment	
  status	
  of	
  students	
  in	
  course	
  
•  Balances	
  number	
  of	
  students	
  in	
  course	
  
•  Schedules	
  faculty	
  to	
  teach	
  




                                                                          30
31
•  Design	
  Data	
  Model	
  
•  Small	
  Server	
  
•  Run	
  Use	
  Case	
  	
  




                                 32
33
SQL ID   Executions   Percentage
         15,572,099   46%
         4,339,293    13%
         3,232,297    10%
         3,016,176    9%
         2,541,686    8%
         2,485,334    7%
         2,384,839    7%




                                   34
{	
  
	
  	
  "_id":	
  "8738728763872",	
  
	
  	
  "role"	
  :	
  "Student",	
  
	
  	
  "user	
  :{	
  
	
  	
  	
  	
  "id"	
  :	
  "b7ed789f198a",	
  
	
  	
  	
  	
  "firstName"	
  :	
  "Rick",	
  
	
  	
  	
  	
  "lastName"	
  :	
  "Matin"	
  
	
  	
  },	
  	
  
	
  	
  "course"	
  :	
  {	
  
	
  	
  	
  	
  "dateRange"	
  :	
  {	
  
	
  	
  	
  	
  	
  	
  "startDate"	
  :	
  ISODate("2011-­‐12-­‐30T07:00:00Z"),	
  
	
  	
  	
  	
  	
  	
  "endDate"	
  :	
  ISODate("2012-­‐01-­‐30T07:00:00Z")	
  
	
  	
  	
  	
  	
  },	
  
	
  	
  	
  	
  	
  "courseId"	
  :	
  "734234274",	
  	
  
	
  	
  	
  	
  	
  "code"	
  :	
  "MATH/101",	
  
	
  	
  	
  	
  	
  "title"	
  :	
  "Introduction	
  to	
  Mathematics"	
  
	
  	
  }	
  
}	
  	
                                                                                35
•  Analyze	
  our	
  Data*	
  
     –  Applica3on	
  API	
  review	
  
     –  Performance	
  
     –  Call	
  Type	
  
     –  Query/Data	
  Usage	
  
 •  Small	
  Scope	
  




* One of the pearls discovered

                                          36
•  Install	
  and	
  Ac/vate.	
  We	
  quickly	
  spun	
  up	
  a	
  blank	
  
   virtual	
  machine	
  	
  on	
  Amazon	
  EC2,	
  and	
  then	
  installed	
  
   MongoDB	
  on	
  it.	
  
•  Populate	
  with	
  Data.	
  We	
  used	
  approximately	
  300,000	
  
   very	
  simple	
  records.	
  We	
  used	
  a	
  Python	
  script	
  to	
  
   import	
  the	
  data	
  




                                                                                    37
•  MongoDB	
  Farm	
  Architecture	
  
•  Chef/Puppet	
  Scripts	
  to	
  
    –  Deploy	
  new	
  farm	
  
    –  Add	
  replica3on	
  sets	
  
•  Monitor	
  Servers	
  
•  High	
  Avail.	
  
•  Disaster	
  Recoverability	
  




                                         38
Configuration         Results

A: Clients on Same    Typical Response Time: 0-1.7 ms
Machine               Maximum Throughput: 9,000 queries/sec CPU-bound.
                      Typical CPU Utilization: 100%
B: Clients and        Typical Response Time: 1.2-8.5 ms
MongoDB on            Maximum Throughput: 12,000 queries/sec
Separate Amazon       Typical CPU Utilization: 80%


C: Clients and        Typical Response Time: 1.2-10.6 ms
MongoDB in            Maximum Throughput: 12,200 queries/sec
Separate              Typical CPU Utilization: 85%
Availability Zones,   Approximately the same response time, throughput, and CPU
but within One        utilization as Configuration B.
Amazon EC2
Region
D: Clients and        Typical Response Time: 85.6-87.3 ms.
MongoDB in            Maximum Throughput: 1,600 queries/sec
Different Amazon      Typical CPU Utilization: 2%. Very low; EC2 instance was
EC2 Regions           unstressed.
                      East coast-west coast network was bottleneck in this
                      configuration – EC2 instances were not stressed. Response
                      times were much higher than when instances were located
                      within a single Amazon EC2 region (configurations B & C).


                                                                                  39
Local Client




               40
Same Zone




            41
Same Region




              42
Two Regions




              43
Primary	
  
 •    Data	
  driven	
  Data	
  Model	
  
 •    Data	
  driven	
  deployment	
  architecture	
  
 •    Hybrid	
  deployment	
  are	
  possible	
  (Cloud,	
  on	
  premise)	
  
 •    High	
  latency	
  between	
  EC2	
  regions	
  
 •    85%	
  CPU	
  Mongo	
  behavior	
  changes	
  

Secondary	
  
 •  Opera3ons/Developer/DBA	
  trained	
  
 •  Roadmap	
  Development/opera3ons/	
  
                                                                                 44
Evalua3ng	
  
  •    Fail	
  Fast	
  
  •    Boyd	
  loop	
  
  •    Stand	
  on	
  the	
  shoulders	
  of	
  others	
  
  •    Have	
  a	
  prac3cal	
  use	
  case	
  

Paper	
  vs.	
  Real	
  live	
  
  •    Time	
  table	
  was	
  more	
  organic	
  
  •    Nice	
  list	
  of	
  evalua3on	
  items	
  
  •    Weekly	
  changes	
  
  •    Usage	
  data	
  slowly	
  came	
  in	
  
  •    Learned	
  as	
  we	
  went	
                         45
•  8	
  applica3ons/services	
  built	
  or	
  being	
  built	
  on	
  top	
  of	
  Mongo	
  
      –  More	
  being	
  discussed	
  
•    Content	
  Management	
  system	
  moving	
  to	
  mongodb	
  
•    No	
  sharding	
  yet	
  
•    Developer	
  experience	
  has	
  been	
  good	
  so	
  far	
  
•    Definitely	
  a	
  learning	
  curve	
  in	
  moving	
  from	
  rela3onal	
  schema	
  
     design	
  to	
  document	
  schema	
  design	
  
      –  Personal	
  experience	
  has	
  been	
  to	
  do	
  some	
  analysis,	
  build	
  an	
  end	
  to	
  end	
  
         test	
  and	
  then	
  iterate	
  
      –  Pay	
  aden3on	
  to	
  access	
  paderns	
  
•  Plans	
  to	
  move	
  towards	
  Cross	
  region	
  deployments	
  

                                                                                                                         46
Questions




            47
End




      48
Appendix




           49
50

Webinar: How We Evaluated MongoDB as a Relational Database Replacement

  • 1.
    Brig Lamoreaux Shekhar Vemuri Apollo Group 12/12/12 brig.lamoreaux@apollogrp.edu briglamoreaux.wordpress.com1
  • 2.
    •  Tell  our  story   •  Teach  evalua/on  Principles   •  Share  our  Results   2
  • 3.
    •  Company  Overview:  Who  we  are.   •  Background:  Looking  beyond  rela3onal   •  The  Problem:  We  have  no  exper3se  in  MongoDB   •  Approaching  MongoDB?  How  do  we  solve  this  problem   •  Results.  What  did  we  learn   3
  • 4.
  • 5.
    •  Founded  in  1973   •  Leading  provider  of  higher  educa3on  for  working  adults   •  Parent  company  of     –  University  of  Phoenix   –  Apollo  Global   –  Apollo  Educa3on  Services   –  Carnegie  Learning   –  College  of  Financial  Planning   –  Ins3tute  for  Professional  Development   •  Educate  over  350  thousand  students  per  year   5
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    2 _ 1 _ > 10 17 14
  • 15.
  • 16.
  • 17.
  • 18.
    We don’t knowanything about MongoDB 18
  • 19.
  • 20.
    •  Gain  valuable  informa3on    about  terrain   •  Viable  vs  100%   •  Boyed  Loop   –  Observe   –  Orient   –  Decide   –  Act     20
  • 21.
  • 22.
    •  Gain  valuable  informa3on    about  terrain   •  Accept  Threshold     –  Viable  vs  100%   •  Boyed  Loop   –  Observe   –  Orient   –  Decide   –  Act     22
  • 23.
    •  Problem   •  Objec3ve   •  Timetable   •  Gather  Info     23
  • 24.
  • 25.
    Two  week  phased  approach     •  Phase  1.  Form  Team,  goals,  data   •  Phase  2.  Develop  Model,  small  server   •  Phase  3.  Large  deployment   •  Phase  4.  Performance  test   25
  • 26.
    Implemen3ng  a  new  repository  solu3on  introduces   new  areas  of  needs  such  as:     •  Plan  and  deploy  a  solu3on   •  Opera3onal  procedures   •  Designing  object  models   •  Determine  MongoDB  Client  and  Frameworks   •  Measuring  effec3veness     26
  • 27.
    What  do  we  want  to  know  about  MongoDB     •  Resiliency   •  Stability   •  Adaptability  of  Data  Model   •  Performance   •  Configura3on  Flexibility   •  Time  to  Implement   •  Administrator  Func3onality   •  Training   •  Data  Migra3on   •  Conformity  with  Standards   •  Quality  of  Support   27
  • 28.
  • 29.
    Conference 10gen Training 10gen Lab Consulting Env. Run Book (Deploy) X X X Run Book (Maintenance) X X X Object Model X X X X Measure Effectiveness X Java Client X 29
  • 30.
    Course  Offering  System:     •  Manages  Courses   •  Manages  enrolment  status  of  students  in  course   •  Balances  number  of  students  in  course   •  Schedules  faculty  to  teach   30
  • 31.
  • 32.
    •  Design  Data  Model   •  Small  Server   •  Run  Use  Case     32
  • 33.
  • 34.
    SQL ID Executions Percentage 15,572,099 46% 4,339,293 13% 3,232,297 10% 3,016,176 9% 2,541,686 8% 2,485,334 7% 2,384,839 7% 34
  • 35.
    {      "_id":  "8738728763872",      "role"  :  "Student",      "user  :{          "id"  :  "b7ed789f198a",          "firstName"  :  "Rick",          "lastName"  :  "Matin"      },        "course"  :  {          "dateRange"  :  {              "startDate"  :  ISODate("2011-­‐12-­‐30T07:00:00Z"),              "endDate"  :  ISODate("2012-­‐01-­‐30T07:00:00Z")            },            "courseId"  :  "734234274",              "code"  :  "MATH/101",            "title"  :  "Introduction  to  Mathematics"      }   }     35
  • 36.
    •  Analyze  our  Data*   –  Applica3on  API  review   –  Performance   –  Call  Type   –  Query/Data  Usage   •  Small  Scope   * One of the pearls discovered 36
  • 37.
    •  Install  and  Ac/vate.  We  quickly  spun  up  a  blank   virtual  machine    on  Amazon  EC2,  and  then  installed   MongoDB  on  it.   •  Populate  with  Data.  We  used  approximately  300,000   very  simple  records.  We  used  a  Python  script  to   import  the  data   37
  • 38.
    •  MongoDB  Farm  Architecture   •  Chef/Puppet  Scripts  to   –  Deploy  new  farm   –  Add  replica3on  sets   •  Monitor  Servers   •  High  Avail.   •  Disaster  Recoverability   38
  • 39.
    Configuration Results A: Clients on Same Typical Response Time: 0-1.7 ms Machine Maximum Throughput: 9,000 queries/sec CPU-bound. Typical CPU Utilization: 100% B: Clients and Typical Response Time: 1.2-8.5 ms MongoDB on Maximum Throughput: 12,000 queries/sec Separate Amazon Typical CPU Utilization: 80% C: Clients and Typical Response Time: 1.2-10.6 ms MongoDB in Maximum Throughput: 12,200 queries/sec Separate Typical CPU Utilization: 85% Availability Zones, Approximately the same response time, throughput, and CPU but within One utilization as Configuration B. Amazon EC2 Region D: Clients and Typical Response Time: 85.6-87.3 ms. MongoDB in Maximum Throughput: 1,600 queries/sec Different Amazon Typical CPU Utilization: 2%. Very low; EC2 instance was EC2 Regions unstressed. East coast-west coast network was bottleneck in this configuration – EC2 instances were not stressed. Response times were much higher than when instances were located within a single Amazon EC2 region (configurations B & C). 39
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    Primary   •  Data  driven  Data  Model   •  Data  driven  deployment  architecture   •  Hybrid  deployment  are  possible  (Cloud,  on  premise)   •  High  latency  between  EC2  regions   •  85%  CPU  Mongo  behavior  changes   Secondary   •  Opera3ons/Developer/DBA  trained   •  Roadmap  Development/opera3ons/   44
  • 45.
    Evalua3ng   •  Fail  Fast   •  Boyd  loop   •  Stand  on  the  shoulders  of  others   •  Have  a  prac3cal  use  case   Paper  vs.  Real  live   •  Time  table  was  more  organic   •  Nice  list  of  evalua3on  items   •  Weekly  changes   •  Usage  data  slowly  came  in   •  Learned  as  we  went   45
  • 46.
    •  8  applica3ons/services  built  or  being  built  on  top  of  Mongo   –  More  being  discussed   •  Content  Management  system  moving  to  mongodb   •  No  sharding  yet   •  Developer  experience  has  been  good  so  far   •  Definitely  a  learning  curve  in  moving  from  rela3onal  schema   design  to  document  schema  design   –  Personal  experience  has  been  to  do  some  analysis,  build  an  end  to  end   test  and  then  iterate   –  Pay  aden3on  to  access  paderns   •  Plans  to  move  towards  Cross  region  deployments   46
  • 47.
  • 48.
    End 48
  • 49.
  • 50.