BuzzNumbers PresentationMoving From SQL Server to MongoDB
Todays PresentationProblems faced with Social Media Monitoring/AnalyticsWhy choose NoSQL over SQL Why choose MongoDBNOSQL Vs SQL Schema DesignInfinite scalability with commodity hardware & .NETWhy we still use .NET (why not Ruby/Java/Python)Lessons Learned
NOSQL at BuzzNumbersAbout BuzzNumbers
About BuzzNumbersSaaSWeb Product CompanyWeb and Social Media Analytics Collect “big data”web contentNear-Realtime data captureNews, Blogs, Social MediaetcScraping, API’s, FeedsAnalytics & Business IntelligenceBI, Text, Sentiment, Locations, NLP, Machine Learning
BuzzNumbers Project Team Nick Holmes a Court - @nickhacBrett Anderson - @brehttSteve Casey - @stevencaseyJacinto SantamariaChris Fulstow - @chrisfulstowJosie Kidd - @jose9
NOSQL at BuzzNumbersProblems Faced at BuzzNumbers
Problems faced at BuzzNumbers Large and fast growing DB TablesLots of Read/Writes from data collection 24/7 Massive Table Scans for user reports (< 3 sec SLA) Large Joins (10+ Tables) with Nested Views Complex Queries (Aggregates, Where’s, FullText) FullText Search Indexes needed real-time updates  Read/Write Contention   Rapid Index fragmentation, Slow rebuilds  DB Locks occurring (with no implicit Transactions) Blocking Transactions (both small/large tables)
Outgrew SQL Server Enterprise 2008“Free” Software from MSFT from BizSpark Tried everything with SQL EnterpriseSignificant SQL Performance Tuning  Dirty Reads (nolock), Offline Index RebuildsReplication / Clustering / Multi-Instance Problems Schema changes impossible with uptime requirementsDBA tasks made system unavailable for hours/daysHardware / SQL DBA got very expensive Web users experienced annoying / unnecessary waits on blocked queries that were non-complex because of joins
BuzzNumbers NOSQL PresentationWhy NOSQL over SQL
What is NOSQL New generation of “Databases” “Not Only SQL”  - Mostly Open Source  NOSQL Distributed database designed to deliver Distributed “Big Data” storage Distributed processing of queries/calculations NOSQL Examples includeGoogle– BigTableYahoo -Hadoop (30k+ Nodes)Facebook - CassandraFourSquare - MongoDB
Why NoSQL over SQLSQL Guaranteed consistencyTransactionsSchemas / DataTypesJoins / Foreign KeysTSQL/PL-SQL (Views, Procs)Scale Up (hardware)Many Benefits includingEase of useMany developers skilled in SQLTrusted for decades / ProvenNoSQLEventual ConsistencyNo Transaction SupportKey/Value Data (mostly)Flat Data (no joins)Key Lookups / MapReduce / CodeScale out (distributed)Many Benefits includingPerformance / ScaleLower license costsSolves Web2 problems
Why NoSQL over SQLCAP Theorem ConsistencyAvailabilityPartitioningOnly 2 of 3 are PossibleConsistency/Availability RDBMSAvailability / Partitioning NOSQLConsistency / Partitioning Availability Issues (No one wants this)
BuzzNumbers NOSQL Presentation Why MongoDB for NOSQL?
NOSQL Providers
Who uses Mongo?
Why Mongo Proven for multiple usage scenariosHigh performance (eventual consistency)  Data stored in JSON (not only Key/Value)Supports Multiple Indexes (Anywhere in JSON)Easy to Install, Easy to Use(Linux/Windows)Easy to Scale for High Volume Writes (Sharding)Easy to Scale for High Volume Reads (Replica Sets)Automatic Failover and Redundancy (Replica Sets)REST Interface and Drivers for Ruby/.NET/Java/EtcEasy to Query via multiple techniquesKey/Value, Mongo Query, JavaScript, MapReduce
BuzzNumbers NOSQL Presentation Moving from SQL Schema to No-Schema
BuzzNumbers NOSQL Presentation RDMBS Schema (Tables)Mongo Collection (JSON)
BuzzNumbers NOSQL Presentation RDMBS SchemaMongo JSON Document
BuzzNumbers NOSQL Presentation RDMBS SchemaMongo JSON DocumentOne Document Per Website Per Day
BuzzNumbers NOSQL Presentation RDMBS SchemaMongo JSON DocumentPre-Aggregate SUM/COUNT/AVG Calculations using UPSERT
BuzzNumbers NOSQL Presentation RDMBS SchemaMongo JSON DocumentStore Line Items with rich data as Nested Arrays .Use JavaScript or MapReduce to Query
Basic SQL vs Mongo SyntaxSelect * from Clientsdb.clients.find()Select * from Clients where clientid = 1db.clients.find({”ClientID” :1})Insert into clients (ClientID, Name) Values (1, “ACME”)db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” })Create Table / Alter Table Just start inserting db.client.insert({JSON HERE})Create Indexdb.clients.ensureIndex({“ClientID”:1, “Name”:1})
Basic SQL vs Mongo SyntaxSelect * from Clientsdb.clients.find()Select * from Clients where clientid = 1db.clients.find({”ClientID” :1})Insert into clients (ClientID, Name) Values (“ACME”, 1)db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” })Create Table Just start insertingCreate Indexdb.clients.ensureIndex({“ClientID”:1, “Name”:1})
BuzzNumbers NOSQL Presentation Infinite Scale with .NET and NOSQL
Infinite Scale with .NET Use .NET for Rapid Product Development Web Applications (IIS, ASP.NET, User Databases) Server Applications (Scraping, Apps, Services, Data)Scheduled Tasks / Backend Jobs Use Open Source for Infinite Scale on LinuxMongoDB for Big Data Storage  SOLR (distributed Lucene) for Full Text Indexing.NET Drivers Available for Mongo/SOLR
Infinite Scale with .NET Cloud Hosting for Low Cost Scale Rackspace Cloud ($200 p/m per 4GB-RAM server) Windows and Ubuntu – Image/Clone/API supportZabbix Monitoring – notify when near capacity Amazon/Heroku/dotCloud alternates Tips to deliver fantastic performance at scale Indexes MUST fit in RAM (Disk Reads are Slow)SSD’s HardDisks are worth the extra price4GB RAM / 160GB Disk seems to be optimum price/performance per node in distributed system
BuzzNumbers NOSQL Presentation Why we stay with .NET?
Why we stay with .NET Visual Studio best IDE!!!SQL Server great database for most Data Proven Tech Stack (low corporate risk)   Lots of support (MSFT and Consultants) Large online community with code samples Many Open Source libraries  ASP.NET MVC RAZOR is RADNon-Complex Sysadmin for Windows Servers Drivers/Integration available for most OSS Projects Lots of Agile/Scrum/TDD/CI/Project Management tools Lots of smart .NET web developers & engineers
BuzzNumbers NOSQL Presentation Lessons Learned
Lessons Learned“Big Data” is not 100M records: but 1BN+ Don’t scale until you need to (Premature optimisation costs - big time)SQL RBDMS solves most problems but Scale up costs are prohibitive for startups so plan in advance when you might need to switchMixing SQL for SmallData and NOSQL for BigData delivers both ease/speed of development and performanceMongo/SOLR works well to solve specific performance problems Not all problems are equal: optimiseeach solution per performance problemDon’t go NOSQL unless you absolutely need toVery early technology with lots of learning overhead, risks and production issuesSkilled .NET/Mongo/SOLR engineers are very  hard to findIf client/data segmentation is possible, multiple SQL instances can deliverEnsure Indexes fit in MemorySpend time planning your schema in advances based on query requirements
BuzzNumbers NOSQL Presentation Interested to learn more?
Thanks for your time Speak with one of the Buzz Team tonight Join our Team? We’re Hiring!Web DevelopersSoftware EngineersUX / Web DesignersImmediate and Future roles… Talk to us!

Moving from SQL Server to MongoDB

  • 1.
  • 2.
    Todays PresentationProblems facedwith Social Media Monitoring/AnalyticsWhy choose NoSQL over SQL Why choose MongoDBNOSQL Vs SQL Schema DesignInfinite scalability with commodity hardware & .NETWhy we still use .NET (why not Ruby/Java/Python)Lessons Learned
  • 3.
  • 4.
    About BuzzNumbersSaaSWeb ProductCompanyWeb and Social Media Analytics Collect “big data”web contentNear-Realtime data captureNews, Blogs, Social MediaetcScraping, API’s, FeedsAnalytics & Business IntelligenceBI, Text, Sentiment, Locations, NLP, Machine Learning
  • 5.
    BuzzNumbers Project TeamNick Holmes a Court - @nickhacBrett Anderson - @brehttSteve Casey - @stevencaseyJacinto SantamariaChris Fulstow - @chrisfulstowJosie Kidd - @jose9
  • 6.
    NOSQL at BuzzNumbersProblemsFaced at BuzzNumbers
  • 7.
    Problems faced atBuzzNumbers Large and fast growing DB TablesLots of Read/Writes from data collection 24/7 Massive Table Scans for user reports (< 3 sec SLA) Large Joins (10+ Tables) with Nested Views Complex Queries (Aggregates, Where’s, FullText) FullText Search Indexes needed real-time updates Read/Write Contention Rapid Index fragmentation, Slow rebuilds DB Locks occurring (with no implicit Transactions) Blocking Transactions (both small/large tables)
  • 8.
    Outgrew SQL ServerEnterprise 2008“Free” Software from MSFT from BizSpark Tried everything with SQL EnterpriseSignificant SQL Performance Tuning Dirty Reads (nolock), Offline Index RebuildsReplication / Clustering / Multi-Instance Problems Schema changes impossible with uptime requirementsDBA tasks made system unavailable for hours/daysHardware / SQL DBA got very expensive Web users experienced annoying / unnecessary waits on blocked queries that were non-complex because of joins
  • 9.
  • 10.
    What is NOSQLNew generation of “Databases” “Not Only SQL” - Mostly Open Source NOSQL Distributed database designed to deliver Distributed “Big Data” storage Distributed processing of queries/calculations NOSQL Examples includeGoogle– BigTableYahoo -Hadoop (30k+ Nodes)Facebook - CassandraFourSquare - MongoDB
  • 11.
    Why NoSQL overSQLSQL Guaranteed consistencyTransactionsSchemas / DataTypesJoins / Foreign KeysTSQL/PL-SQL (Views, Procs)Scale Up (hardware)Many Benefits includingEase of useMany developers skilled in SQLTrusted for decades / ProvenNoSQLEventual ConsistencyNo Transaction SupportKey/Value Data (mostly)Flat Data (no joins)Key Lookups / MapReduce / CodeScale out (distributed)Many Benefits includingPerformance / ScaleLower license costsSolves Web2 problems
  • 12.
    Why NoSQL overSQLCAP Theorem ConsistencyAvailabilityPartitioningOnly 2 of 3 are PossibleConsistency/Availability RDBMSAvailability / Partitioning NOSQLConsistency / Partitioning Availability Issues (No one wants this)
  • 13.
    BuzzNumbers NOSQL PresentationWhy MongoDB for NOSQL?
  • 14.
  • 15.
  • 16.
    Why Mongo Provenfor multiple usage scenariosHigh performance (eventual consistency) Data stored in JSON (not only Key/Value)Supports Multiple Indexes (Anywhere in JSON)Easy to Install, Easy to Use(Linux/Windows)Easy to Scale for High Volume Writes (Sharding)Easy to Scale for High Volume Reads (Replica Sets)Automatic Failover and Redundancy (Replica Sets)REST Interface and Drivers for Ruby/.NET/Java/EtcEasy to Query via multiple techniquesKey/Value, Mongo Query, JavaScript, MapReduce
  • 17.
    BuzzNumbers NOSQL PresentationMoving from SQL Schema to No-Schema
  • 18.
    BuzzNumbers NOSQL PresentationRDMBS Schema (Tables)Mongo Collection (JSON)
  • 19.
    BuzzNumbers NOSQL PresentationRDMBS SchemaMongo JSON Document
  • 20.
    BuzzNumbers NOSQL PresentationRDMBS SchemaMongo JSON DocumentOne Document Per Website Per Day
  • 21.
    BuzzNumbers NOSQL PresentationRDMBS SchemaMongo JSON DocumentPre-Aggregate SUM/COUNT/AVG Calculations using UPSERT
  • 22.
    BuzzNumbers NOSQL PresentationRDMBS SchemaMongo JSON DocumentStore Line Items with rich data as Nested Arrays .Use JavaScript or MapReduce to Query
  • 23.
    Basic SQL vsMongo SyntaxSelect * from Clientsdb.clients.find()Select * from Clients where clientid = 1db.clients.find({”ClientID” :1})Insert into clients (ClientID, Name) Values (1, “ACME”)db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” })Create Table / Alter Table Just start inserting db.client.insert({JSON HERE})Create Indexdb.clients.ensureIndex({“ClientID”:1, “Name”:1})
  • 24.
    Basic SQL vsMongo SyntaxSelect * from Clientsdb.clients.find()Select * from Clients where clientid = 1db.clients.find({”ClientID” :1})Insert into clients (ClientID, Name) Values (“ACME”, 1)db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” })Create Table Just start insertingCreate Indexdb.clients.ensureIndex({“ClientID”:1, “Name”:1})
  • 25.
    BuzzNumbers NOSQL PresentationInfinite Scale with .NET and NOSQL
  • 26.
    Infinite Scale with.NET Use .NET for Rapid Product Development Web Applications (IIS, ASP.NET, User Databases) Server Applications (Scraping, Apps, Services, Data)Scheduled Tasks / Backend Jobs Use Open Source for Infinite Scale on LinuxMongoDB for Big Data Storage SOLR (distributed Lucene) for Full Text Indexing.NET Drivers Available for Mongo/SOLR
  • 27.
    Infinite Scale with.NET Cloud Hosting for Low Cost Scale Rackspace Cloud ($200 p/m per 4GB-RAM server) Windows and Ubuntu – Image/Clone/API supportZabbix Monitoring – notify when near capacity Amazon/Heroku/dotCloud alternates Tips to deliver fantastic performance at scale Indexes MUST fit in RAM (Disk Reads are Slow)SSD’s HardDisks are worth the extra price4GB RAM / 160GB Disk seems to be optimum price/performance per node in distributed system
  • 28.
    BuzzNumbers NOSQL PresentationWhy we stay with .NET?
  • 29.
    Why we staywith .NET Visual Studio best IDE!!!SQL Server great database for most Data Proven Tech Stack (low corporate risk) Lots of support (MSFT and Consultants) Large online community with code samples Many Open Source libraries ASP.NET MVC RAZOR is RADNon-Complex Sysadmin for Windows Servers Drivers/Integration available for most OSS Projects Lots of Agile/Scrum/TDD/CI/Project Management tools Lots of smart .NET web developers & engineers
  • 30.
  • 31.
    Lessons Learned“Big Data”is not 100M records: but 1BN+ Don’t scale until you need to (Premature optimisation costs - big time)SQL RBDMS solves most problems but Scale up costs are prohibitive for startups so plan in advance when you might need to switchMixing SQL for SmallData and NOSQL for BigData delivers both ease/speed of development and performanceMongo/SOLR works well to solve specific performance problems Not all problems are equal: optimiseeach solution per performance problemDon’t go NOSQL unless you absolutely need toVery early technology with lots of learning overhead, risks and production issuesSkilled .NET/Mongo/SOLR engineers are very hard to findIf client/data segmentation is possible, multiple SQL instances can deliverEnsure Indexes fit in MemorySpend time planning your schema in advances based on query requirements
  • 32.
    BuzzNumbers NOSQL PresentationInterested to learn more?
  • 33.
    Thanks for yourtime Speak with one of the Buzz Team tonight Join our Team? We’re Hiring!Web DevelopersSoftware EngineersUX / Web DesignersImmediate and Future roles… Talk to us!

Editor's Notes

  • #20 {&quot;WebsiteID&quot;: 12345,&quot;DomainName&quot;:&quot;buzznumbershq.com&quot;,&quot;DateSummary&quot;: &quot;2011-09-22&quot;,&quot;UserIDSummary&quot;:[1,2,3,4,5,6,7,8]&quot;PageVisitSummary&quot;:{ &quot;Home&quot;: [&quot;VisitCount&quot;: 20000, &quot;Uniques&quot;:55], &quot;About&quot;: [&quot;VisitCount&quot;: 1667, &quot;Uniques&quot;:44], &quot;Products&quot;: [&quot;VisitCount&quot;: 1223, &quot;Uniques&quot;:33], &quot;Contact&quot;: [&quot;VisitCount&quot;: 50, &quot;Uniques&quot;:22]},&quot;PageVisits&quot;:{ &quot;PageVisit&quot;: [&quot;UserID&quot;:1, &quot;PageName&quot;:&quot;Home&quot;], &quot;PageVisit&quot;: [&quot;UserID&quot;:2, &quot;PageName&quot;:&quot;About&quot;], &quot;PageVisit&quot;: [&quot;UserID&quot;:3, &quot;PageName&quot;:&quot;Products&quot;], &quot;PageVisit&quot;: [&quot;UserID&quot;:4, &quot;PageName&quot;:&quot;Contact&quot;],etcetc } } Proven Tech Stack (low risk) Lots of smart web developers/engineers Visual Studio best IDE by Miles Lots of support (MSFT and Consultants) Large online community with code samples Many Open Source libraries ASP.NET MVC RAZOR is RAD Low levels of SysAdmin Drivers/Integration available for most OSS Lots of Agile/Scrum/TDD/CI/Project Management tools