No SQL databases basics module 1 vtu notes

Why NOSQL
Aggregate Data Models
More Details on Data Models

WHY NOSQL
NoSQL database provides much
more flexibility when it comes to
handling data. There is no
requirement to specify the schema to
start working with the application.
Also, the NoSQL database doesn't
put a restriction on the types of data
you can store together. It allows you
to add more new types as your needs
change

THE VALUE OF RELATIONAL DATABASES
Getting at Persistent Data
Concurrency
Integration
A (Mostly) Standard Model

GETTING AT PERSISTENT DATA
 Need to Store Large Data.
 Two Ways storing Data
 Main Memory – Limited in Space – loss of Data due to
power failures
 Backing Data - Large in Size – Slower
 Productivity Apps – Word Processor – File System
 Enterprise Applications – Database

CONCURRENCY
 Multiple Users Accessing at a Time
 Majorly Modifying Data
 Transaction Handling
 Transactions should be Rolled Back if Needed
Hotel Room Booking

INTEGRATION
 Applications Written by Multiple Teams
 Collaboration
 Shared Database Integration
 Concurrency Control of Database handles
Multiple Applications

A (MOSTLY) STANDARD MODEL
 Relational databases have succeeded because
they provide the core benefits we outlined earlier
in a (mostly) standard way
 Vendors Might Differ but not the Benfits

IMPEDANCE MISMATCH
Though RDBMS provides many advantages still it
is not perfect. One of the dissatisfaction for
developers is “Impedance Mismatch”
Impedance Mismatch
The difference between the relational model and
the in-memory data structures

IMPEDANCE MISMATCH
 The relational data model organizes data into a
structure of tables and rows, or more properly,
relations and tuples
 The values in a relational tuple have to be
simple—they cannot contain any structure, such
as a nested record or a list
 if you want to use a richer inmemory data
structure, you have to translate it to a relational
representation to store it on disk

IMPEDANCE MISMATCH
 The Solution in earl 2000’s is OOP and OOD.
 OOD given solution to Impedance Mismatch
 Major issue is Integration with RDBMS
 Frame Works for Integrations like HIBERNATE
 Solution is not Feasible

APPLICATION AND INTEGRATION
DATABASES
 Integration Database
with multiple applications, usually developed
by separate teams, storing their data in a
common database. This improves
communication because all the applications
are operating on a consistent set of persistent
data
 Complexity has been Increased
 Number of Applications is a Tedious Task
 In 2000’s the Paradigm Shift is “WEB
SERVICES”

APPLICATION AND INTEGRATION
DATABASES
 HTTP
 Flexibility in Exchanging the Data through HTTP
REQ/RESP
 XML or JSON
 Application Specific Database instead of
Integrated Database

ATTACK OF THE CLUSTERS
 Growth in Millenium in the Name of Applications
and Databases
 Y2K Problem
 Traffic on Websites Increased
 Social Media
 Log Data
 Mapping of Data
To handle this kind of increase, you have two
choices: up or out
SCALE UP or GO OUT OF THE
MARKET

 Scaling up implies bigger machines, more
processors, disk storage, and memory. But bigger
machines get more and more expensive, not to
mention that there are real limits as your size
increases. The alternative is to use lots of small
machines in a cluster.
 A cluster of small machines can use commodity
hardware and ends up being cheaper at these
kinds of scales. It can also be more resilient—
while individual machine failures are common,
the overall cluster can be built to keep going
despite such failures, providing high reliability.

 Relational databases are not designed to be run
on clusters
 Clustered relational databases, such as the
Oracle RAC or Microsoft SQL Server, work on
the concept of a shared disk subsystem
 This mismatch between relational databases and
clusters led some organization to consider an
alternative route to data storage. Two companies
in particular—Google and Amazon
 BigTable from Google and Dynamo from Amazon.

THE EMERGENCE OF NOSQL
 Late 90’s
 Open Source
 Carlo Strozzi
 This database stores its tables as ASCII files,
each tuple represented by a line with fields
separated by tabs
 The name comes from the fact that the database
doesn’t use SQL as a query language
 The database is manipulated through shell
scripts that can be combined into the usual UNIX
pipelines

THE EMERGENCE OF NOSQL
 Relational databases use ACID transactions to
handle consistency across the whole database.
 NoSQL databases offer a range of options for
consistency and distribution
 Graph databases are one style of NoSQL
databases that uses a distribution model similar
to relational databases but offers a different data
model that makes it better at handling data with
complex relationships.
 NoSQL databases operate without a schema
 Useful when dealing with nonuniform data

KEY POINTS
 Relational databases have been a successful
technology for twenty years, providing persistence,
concurrency control, and an integration mechanism.
 Application developers have been frustrated with the
impedance mismatch between the relational model
and the in-memory data structures.
 There is a movement away from using databases as
integration points towards encapsulating databases
within applications and integrating through services.
 The vital factor for a change in data storage was the
need to support large volumes of data by running on
clusters. Relational databases are not designed to run
efficiently on clusters.
 NoSQL is an accidental neologism. There is no
prescriptive definition—all you can make is an
observation of common characteristics.

KEY POINTS
 The common characteristics of NoSQL databases
are
 Not using the relational model
 Running well on clusters
 Open-source
 Built for the 21st century web estates
 Schemaless
 The most important result of the rise of NoSQL
is Polyglot Persistence – Various Data Storage
options are available

AGGREGATE DATA MODELS
 A data model is the model through which we
perceive and manipulate our data
 Data Model describes how we interact with the
data in the database
 Distinct from a storage model, which describes
how the database stores and manipulates the
data internally
 Developer might point to an entity-relationship
diagram of their database and refer to that as
their data model containing customers, orders,
products, and the like

AGGREGATE DATA MODELS
 Relational Model
 Consists of Rows and Columns in the form of Tables
 NoSQL solution has a different model that it
uses, which we put into four categories widely
used in the NoSQL ecosystem:
 Key-Value
 Document
 Column-Family
 Graph

AGGREGATES
 Relational model takes the information that we
want to store and divides it into tuples (rows)
 A tuple is a limited data structure
 Cannot nest one tuple within another to get
nested records, nor can you put a list of values or
tuples within another.
aggregate is a collection of related objects that we
wish to treat as a unit

No SQL databases basics module 1 vtu notes

More Related Content

Similar to No SQL databases basics module 1 vtu notes

Recently uploaded

No SQL databases basics module 1 vtu notes