NOSQL
Why NOSQL
Aggregate Data Models
More Details on Data Models
WHY NOSQL
NoSQL database provides much
more flexibility when it comes to
handling data. There is no
requirement to specify the schema to
start working with the application.
Also, the NoSQL database doesn't
put a restriction on the types of data
you can store together. It allows you
to add more new types as your needs
change
THE VALUE OF RELATIONAL DATABASES
Getting at Persistent Data
Concurrency
Integration
A (Mostly) Standard Model
GETTING AT PERSISTENT DATA
 Need to Store Large Data.
 Two Ways storing Data
 Main Memory – Limited in Space – loss of Data due to
power failures
 Backing Data - Large in Size – Slower
 Productivity Apps – Word Processor – File System
 Enterprise Applications – Database
CONCURRENCY
 Multiple Users Accessing at a Time
 Majorly Modifying Data
 Transaction Handling
 Transactions should be Rolled Back if Needed
Hotel Room Booking
INTEGRATION
 Applications Written by Multiple Teams
 Collaboration
 Shared Database Integration
 Concurrency Control of Database handles
Multiple Applications
A (MOSTLY) STANDARD MODEL
 Relational databases have succeeded because
they provide the core benefits we outlined earlier
in a (mostly) standard way
 Vendors Might Differ but not the Benfits
IMPEDANCE MISMATCH
Though RDBMS provides many advantages still it
is not perfect. One of the dissatisfaction for
developers is “Impedance Mismatch”
Impedance Mismatch
The difference between the relational model and
the in-memory data structures
IMPEDANCE MISMATCH
 The relational data model organizes data into a
structure of tables and rows, or more properly,
relations and tuples
 The values in a relational tuple have to be
simple—they cannot contain any structure, such
as a nested record or a list
 if you want to use a richer inmemory data
structure, you have to translate it to a relational
representation to store it on disk
IMPEDANCE MISMATCH
IMPEDANCE MISMATCH
 The Solution in earl 2000’s is OOP and OOD.
 OOD given solution to Impedance Mismatch
 Major issue is Integration with RDBMS
 Frame Works for Integrations like HIBERNATE
 Solution is not Feasible
APPLICATION AND INTEGRATION
DATABASES
 Integration Database
with multiple applications, usually developed
by separate teams, storing their data in a
common database. This improves
communication because all the applications
are operating on a consistent set of persistent
data
 Complexity has been Increased
 Number of Applications is a Tedious Task
 In 2000’s the Paradigm Shift is “WEB
SERVICES”
APPLICATION AND INTEGRATION
DATABASES
 HTTP
 Flexibility in Exchanging the Data through HTTP
REQ/RESP
 XML or JSON
 Application Specific Database instead of
Integrated Database
ATTACK OF THE CLUSTERS
 Growth in Millenium in the Name of Applications
and Databases
 Y2K Problem
 Traffic on Websites Increased
 Social Media
 Log Data
 Mapping of Data
To handle this kind of increase, you have two
choices: up or out
SCALE UP or GO OUT OF THE
MARKET
ATTACK OF THE CLUSTERS
 Scaling up implies bigger machines, more
processors, disk storage, and memory. But bigger
machines get more and more expensive, not to
mention that there are real limits as your size
increases. The alternative is to use lots of small
machines in a cluster.
 A cluster of small machines can use commodity
hardware and ends up being cheaper at these
kinds of scales. It can also be more resilient—
while individual machine failures are common,
the overall cluster can be built to keep going
despite such failures, providing high reliability.
ATTACK OF THE CLUSTERS
 Relational databases are not designed to be run
on clusters
 Clustered relational databases, such as the
Oracle RAC or Microsoft SQL Server, work on
the concept of a shared disk subsystem
 This mismatch between relational databases and
clusters led some organization to consider an
alternative route to data storage. Two companies
in particular—Google and Amazon
 BigTable from Google and Dynamo from Amazon.
THE EMERGENCE OF NOSQL
 Late 90’s
 Open Source
 Carlo Strozzi
 This database stores its tables as ASCII files,
each tuple represented by a line with fields
separated by tabs
 The name comes from the fact that the database
doesn’t use SQL as a query language
 The database is manipulated through shell
scripts that can be combined into the usual UNIX
pipelines
THE EMERGENCE OF NOSQL
 Relational databases use ACID transactions to
handle consistency across the whole database.
 NoSQL databases offer a range of options for
consistency and distribution
 Graph databases are one style of NoSQL
databases that uses a distribution model similar
to relational databases but offers a different data
model that makes it better at handling data with
complex relationships.
 NoSQL databases operate without a schema
 Useful when dealing with nonuniform data
KEY POINTS
 Relational databases have been a successful
technology for twenty years, providing persistence,
concurrency control, and an integration mechanism.
 Application developers have been frustrated with the
impedance mismatch between the relational model
and the in-memory data structures.
 There is a movement away from using databases as
integration points towards encapsulating databases
within applications and integrating through services.
 The vital factor for a change in data storage was the
need to support large volumes of data by running on
clusters. Relational databases are not designed to run
efficiently on clusters.
 NoSQL is an accidental neologism. There is no
prescriptive definition—all you can make is an
observation of common characteristics.
KEY POINTS
 The common characteristics of NoSQL databases
are
 Not using the relational model
 Running well on clusters
 Open-source
 Built for the 21st century web estates
 Schemaless
 The most important result of the rise of NoSQL
is Polyglot Persistence – Various Data Storage
options are available
AGGREGATE DATA MODELS
 A data model is the model through which we
perceive and manipulate our data
 Data Model describes how we interact with the
data in the database
 Distinct from a storage model, which describes
how the database stores and manipulates the
data internally
 Developer might point to an entity-relationship
diagram of their database and refer to that as
their data model containing customers, orders,
products, and the like
AGGREGATE DATA MODELS
 Relational Model
 Consists of Rows and Columns in the form of Tables
 NoSQL solution has a different model that it
uses, which we put into four categories widely
used in the NoSQL ecosystem:
 Key-Value
 Document
 Column-Family
 Graph
AGGREGATES
 Relational model takes the information that we
want to store and divides it into tuples (rows)
 A tuple is a limited data structure
 Cannot nest one tuple within another to get
nested records, nor can you put a list of values or
tuples within another.
aggregate is a collection of related objects that we
wish to treat as a unit

No SQL databases basics module 1 vtu notes

  • 1.
  • 2.
    Why NOSQL Aggregate DataModels More Details on Data Models
  • 3.
    WHY NOSQL NoSQL databaseprovides much more flexibility when it comes to handling data. There is no requirement to specify the schema to start working with the application. Also, the NoSQL database doesn't put a restriction on the types of data you can store together. It allows you to add more new types as your needs change
  • 4.
    THE VALUE OFRELATIONAL DATABASES Getting at Persistent Data Concurrency Integration A (Mostly) Standard Model
  • 5.
    GETTING AT PERSISTENTDATA  Need to Store Large Data.  Two Ways storing Data  Main Memory – Limited in Space – loss of Data due to power failures  Backing Data - Large in Size – Slower  Productivity Apps – Word Processor – File System  Enterprise Applications – Database
  • 6.
    CONCURRENCY  Multiple UsersAccessing at a Time  Majorly Modifying Data  Transaction Handling  Transactions should be Rolled Back if Needed Hotel Room Booking
  • 7.
    INTEGRATION  Applications Writtenby Multiple Teams  Collaboration  Shared Database Integration  Concurrency Control of Database handles Multiple Applications
  • 8.
    A (MOSTLY) STANDARDMODEL  Relational databases have succeeded because they provide the core benefits we outlined earlier in a (mostly) standard way  Vendors Might Differ but not the Benfits
  • 9.
    IMPEDANCE MISMATCH Though RDBMSprovides many advantages still it is not perfect. One of the dissatisfaction for developers is “Impedance Mismatch” Impedance Mismatch The difference between the relational model and the in-memory data structures
  • 10.
    IMPEDANCE MISMATCH  Therelational data model organizes data into a structure of tables and rows, or more properly, relations and tuples  The values in a relational tuple have to be simple—they cannot contain any structure, such as a nested record or a list  if you want to use a richer inmemory data structure, you have to translate it to a relational representation to store it on disk
  • 11.
  • 12.
    IMPEDANCE MISMATCH  TheSolution in earl 2000’s is OOP and OOD.  OOD given solution to Impedance Mismatch  Major issue is Integration with RDBMS  Frame Works for Integrations like HIBERNATE  Solution is not Feasible
  • 13.
    APPLICATION AND INTEGRATION DATABASES Integration Database with multiple applications, usually developed by separate teams, storing their data in a common database. This improves communication because all the applications are operating on a consistent set of persistent data  Complexity has been Increased  Number of Applications is a Tedious Task  In 2000’s the Paradigm Shift is “WEB SERVICES”
  • 14.
    APPLICATION AND INTEGRATION DATABASES HTTP  Flexibility in Exchanging the Data through HTTP REQ/RESP  XML or JSON  Application Specific Database instead of Integrated Database
  • 15.
    ATTACK OF THECLUSTERS  Growth in Millenium in the Name of Applications and Databases  Y2K Problem  Traffic on Websites Increased  Social Media  Log Data  Mapping of Data To handle this kind of increase, you have two choices: up or out SCALE UP or GO OUT OF THE MARKET
  • 16.
    ATTACK OF THECLUSTERS  Scaling up implies bigger machines, more processors, disk storage, and memory. But bigger machines get more and more expensive, not to mention that there are real limits as your size increases. The alternative is to use lots of small machines in a cluster.  A cluster of small machines can use commodity hardware and ends up being cheaper at these kinds of scales. It can also be more resilient— while individual machine failures are common, the overall cluster can be built to keep going despite such failures, providing high reliability.
  • 17.
    ATTACK OF THECLUSTERS  Relational databases are not designed to be run on clusters  Clustered relational databases, such as the Oracle RAC or Microsoft SQL Server, work on the concept of a shared disk subsystem  This mismatch between relational databases and clusters led some organization to consider an alternative route to data storage. Two companies in particular—Google and Amazon  BigTable from Google and Dynamo from Amazon.
  • 18.
    THE EMERGENCE OFNOSQL  Late 90’s  Open Source  Carlo Strozzi  This database stores its tables as ASCII files, each tuple represented by a line with fields separated by tabs  The name comes from the fact that the database doesn’t use SQL as a query language  The database is manipulated through shell scripts that can be combined into the usual UNIX pipelines
  • 19.
    THE EMERGENCE OFNOSQL  Relational databases use ACID transactions to handle consistency across the whole database.  NoSQL databases offer a range of options for consistency and distribution  Graph databases are one style of NoSQL databases that uses a distribution model similar to relational databases but offers a different data model that makes it better at handling data with complex relationships.  NoSQL databases operate without a schema  Useful when dealing with nonuniform data
  • 20.
    KEY POINTS  Relationaldatabases have been a successful technology for twenty years, providing persistence, concurrency control, and an integration mechanism.  Application developers have been frustrated with the impedance mismatch between the relational model and the in-memory data structures.  There is a movement away from using databases as integration points towards encapsulating databases within applications and integrating through services.  The vital factor for a change in data storage was the need to support large volumes of data by running on clusters. Relational databases are not designed to run efficiently on clusters.  NoSQL is an accidental neologism. There is no prescriptive definition—all you can make is an observation of common characteristics.
  • 21.
    KEY POINTS  Thecommon characteristics of NoSQL databases are  Not using the relational model  Running well on clusters  Open-source  Built for the 21st century web estates  Schemaless  The most important result of the rise of NoSQL is Polyglot Persistence – Various Data Storage options are available
  • 22.
    AGGREGATE DATA MODELS A data model is the model through which we perceive and manipulate our data  Data Model describes how we interact with the data in the database  Distinct from a storage model, which describes how the database stores and manipulates the data internally  Developer might point to an entity-relationship diagram of their database and refer to that as their data model containing customers, orders, products, and the like
  • 23.
    AGGREGATE DATA MODELS Relational Model  Consists of Rows and Columns in the form of Tables  NoSQL solution has a different model that it uses, which we put into four categories widely used in the NoSQL ecosystem:  Key-Value  Document  Column-Family  Graph
  • 24.
    AGGREGATES  Relational modeltakes the information that we want to store and divides it into tuples (rows)  A tuple is a limited data structure  Cannot nest one tuple within another to get nested records, nor can you put a list of values or tuples within another. aggregate is a collection of related objects that we wish to treat as a unit