PRESENTATION
     ON
    DATA
WAREHOUSING



        Presented By:
        Jagnesh Chawla
        Manpreet Singh
        Mintu
CONTENTS:
 Meaning Of data warehousing
 Benefit of data warehousing

 Problems

 Architecture of data warehouse

 Main components

 Data flows

 Tools and technologies

 Data Mart
MEANING:
   Data warehouse is data management and data
    analysis




   Goal: is to integrate enterprise wide corporate
    data into a single reository from which users can
    easily run queries
BENEFITS:
   The major benefit of data warehousing are high
    returns on investment.




   Increased productivity of corporate decision-
    makers
PROBLEMS:
 Underestimation of resources for data loading
 Hidden problems with source systems
 Required data not captured
 Increased end-user demands
 Data homogenization
 High demand for resources
 Data ownership
 High maintenance
 Long-duration projects
 Complexity of integration
ARCHITECTURE:

   Operational                                                                       Reporting, query,
   data source1                                                                      application
                                                                                     development,
                                                                 High                and EIS(executive
                                Meta-data                    summarized data         information system)
   Operational                                                                 Query Manage
                                                                                     tools
  data source 2                                   Lightly
                    Load Manager                summarized
                                                   data


  Operational
  data source n                 Detailed data                    DBMS
                                                                                   OLAP(online analytical
                                                                                   processing) tools

  Operational
                                    Warehouse Manager
 data store (ods)



ational data store (ODS)
                                                                                         Data mining

                                      Archive/backup
                                           data
                                                                                         End-user
                       Typical architecture of a data warehouse                          access tools
MAIN COMPONENTS:
 Operational data sourcesfor the DW is
  supplied from mainframe operational data held in
  first generation hierarchical and network databases,
  departmental data held in proprietary file systems,
  private data held on workstaions and private serves
  and external systems such as the Internet,
  commercially available DB, or DB assoicated with
  and organization’s suppliers or customers
 Operational datastore(ODS)is a
  repository of current and integrated operational data
  used for analysis. It is often structured and supplied
  with data in the same way as the data warehouse, but
  may in fact simply act as a staging area for data to be
  moved into the warehouse
MAIN COMPONENTS:
 query   manageralso called backend
 component, it performs all the operations
 associated with the management of user queries.
 The operations performed by this component
 include directing queries to the appropriate
 tables and scheduling the execution of queries
 end-user   access toolscan be categorized into
 five main groups: data reporting and query tools,
 application development tools, executive
 information system (EIS) tools, online analytical
 processing (OLAP) tools, and data mining tools
DATA FLOW:
 Inflow- The processes associated with the
  extraction, cleansing, and loading of the data
  from the source systems into the data warehouse.
 upflow- The process associated with adding value
  to the data in the warehouse through
  summarizing, packaging , packaging, and
  distribution of the data
 downflow- The processes associated with
  archiving and backing-up of data in the
  warehouse
DATA FLOW:
   outflow- The process associated with making the
    data availabe to the end-users.




   Meta-flow- The processes associated with the
    management of the meta-data
Warehouse Manager
   Operational
   data source1


                                                 Meta-flow
                                Meta-data                                High
                                                                     summarized data

Inflow                                                                                 Outflow
                                                       Lightly
                   Load                              summarized
                                                        data
                   Manager
                                                                  Upflow           Query Manage
 Operational
                                                                           DBMS
 data source n                  Detailed data

                                                Warehouse Manager


 Operational
data store (ods)
                                                                                                  Data mining
                                                                                                  tools
                                                                                                   End-user
                                                                   Downflow                        access tools

                                            Archive/backup
                                                 data


                        Information flows of a data warehouse
TOOLS AND TECHNOLOGIES:
   The critical steps in the construction of a data
    warehouse:


a. Extraction

b. Cleansing

c. Transformation
TOOLS AND TECHNOLOGIES:
   after the critical steps, loading the results into
    target system can be carried out either by
    separate products, or by a single, categories:

   code generators

   database data replication tools

   dynamic transformation engines
MANAGEMENT TOOLS:
   For the various types of meta-data and the day-
    to-day operations of the data warehouse, the
    administration and management tools must be
    capable of supporting those tasks:

   Monitoring data loading from multiple sources

   Data quality and integrity checks

   Managing and updating meta-data

   Monitoring database performance to ensure efficient query
    response times and resource utilization
 Auditing data warehouse usage to provide user
  chargeback information
 Replicating, subsetting, and distributing data

 Maintaining effient data storage management

 Purging data;

 Archiving and backing-up data

 Implementing recovery following failure

 Security management
DATA MART:
   Data mart a subset of a data warehouse that
    supports the requirements of particular
    department or business function

   The characteristics that differentiate data marts
    and data warehouses include:


   A data mart focuses on only the requirements of
    users associated with one department or business
    function
Warehouse Manager
        Operational
        data source1



                                                                          High
                                     Meta-data
                                                                      summarized data


       Operational
      data source 2                                        Lightly                                      Query
                         Load                            summarized
                                                            data                                        Manage
                         Manager

      Operational
                                                                                 DBMS
                                    Detailed data
      data source n

                                                    Warehouse Manager


      Operational
     data store (ods)


                                                    (First Tier)
                                                                                                                      (Third Tier)
Operational data store
(ODS)
                                                    Archive/backup                                                     End-user
                                                         data                                                          access tools

                                                                              Data Mart

                                                                                  summarized
                                                                            data(Relational database)




                                                                           Summarized data
                                                                       (Multi-dimension database)           (Second Tier)

                               Typical data warehouse adn data mart architecture
DATA MART ISSUES:
   Data mart functionalitythe capabilities of data marts
    have increased with the growth in their popularity


   Data mart sizethe performance deteriorates as data
    marts grow in size, so need to reduce the size of data marts
    to gain improvements in performance


   Data mart load performancetwo critical components:
    end-user response time and data loading performanceto
    increment DB updating so that only cells affected by the
    change are updated and not the entire MDDB structure
REFERENCES:
 Book of DBMS
 Google.com

 Wikipedia, the free encyclopedia

 InformIT.com

 Allfree-stuff.com
data warehousing

data warehousing

  • 1.
    PRESENTATION ON DATA WAREHOUSING Presented By: Jagnesh Chawla Manpreet Singh Mintu
  • 2.
    CONTENTS:  Meaning Ofdata warehousing  Benefit of data warehousing  Problems  Architecture of data warehouse  Main components  Data flows  Tools and technologies  Data Mart
  • 3.
    MEANING:  Data warehouse is data management and data analysis  Goal: is to integrate enterprise wide corporate data into a single reository from which users can easily run queries
  • 4.
    BENEFITS:  The major benefit of data warehousing are high returns on investment.  Increased productivity of corporate decision- makers
  • 5.
    PROBLEMS:  Underestimation ofresources for data loading  Hidden problems with source systems  Required data not captured  Increased end-user demands  Data homogenization  High demand for resources  Data ownership  High maintenance  Long-duration projects  Complexity of integration
  • 6.
    ARCHITECTURE: Operational Reporting, query, data source1 application development, High and EIS(executive Meta-data summarized data information system) Operational Query Manage tools data source 2 Lightly Load Manager summarized data Operational data source n Detailed data DBMS OLAP(online analytical processing) tools Operational Warehouse Manager data store (ods) ational data store (ODS) Data mining Archive/backup data End-user Typical architecture of a data warehouse access tools
  • 7.
    MAIN COMPONENTS:  Operationaldata sourcesfor the DW is supplied from mainframe operational data held in first generation hierarchical and network databases, departmental data held in proprietary file systems, private data held on workstaions and private serves and external systems such as the Internet, commercially available DB, or DB assoicated with and organization’s suppliers or customers  Operational datastore(ODS)is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse
  • 8.
    MAIN COMPONENTS:  query manageralso called backend component, it performs all the operations associated with the management of user queries. The operations performed by this component include directing queries to the appropriate tables and scheduling the execution of queries  end-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools
  • 9.
    DATA FLOW:  Inflow-The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse.  upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , packaging, and distribution of the data  downflow- The processes associated with archiving and backing-up of data in the warehouse
  • 10.
    DATA FLOW:  outflow- The process associated with making the data availabe to the end-users.  Meta-flow- The processes associated with the management of the meta-data
  • 11.
    Warehouse Manager Operational data source1 Meta-flow Meta-data High summarized data Inflow Outflow Lightly Load summarized data Manager Upflow Query Manage Operational DBMS data source n Detailed data Warehouse Manager Operational data store (ods) Data mining tools End-user Downflow access tools Archive/backup data Information flows of a data warehouse
  • 12.
    TOOLS AND TECHNOLOGIES:  The critical steps in the construction of a data warehouse: a. Extraction b. Cleansing c. Transformation
  • 13.
    TOOLS AND TECHNOLOGIES:  after the critical steps, loading the results into target system can be carried out either by separate products, or by a single, categories:  code generators  database data replication tools  dynamic transformation engines
  • 14.
    MANAGEMENT TOOLS:  For the various types of meta-data and the day- to-day operations of the data warehouse, the administration and management tools must be capable of supporting those tasks:  Monitoring data loading from multiple sources  Data quality and integrity checks  Managing and updating meta-data  Monitoring database performance to ensure efficient query response times and resource utilization
  • 15.
     Auditing datawarehouse usage to provide user chargeback information  Replicating, subsetting, and distributing data  Maintaining effient data storage management  Purging data;  Archiving and backing-up data  Implementing recovery following failure  Security management
  • 16.
    DATA MART:  Data mart a subset of a data warehouse that supports the requirements of particular department or business function  The characteristics that differentiate data marts and data warehouses include:  A data mart focuses on only the requirements of users associated with one department or business function
  • 17.
    Warehouse Manager Operational data source1 High Meta-data summarized data Operational data source 2 Lightly Query Load summarized data Manage Manager Operational DBMS Detailed data data source n Warehouse Manager Operational data store (ods) (First Tier) (Third Tier) Operational data store (ODS) Archive/backup End-user data access tools Data Mart summarized data(Relational database) Summarized data (Multi-dimension database) (Second Tier) Typical data warehouse adn data mart architecture
  • 18.
    DATA MART ISSUES:  Data mart functionalitythe capabilities of data marts have increased with the growth in their popularity  Data mart sizethe performance deteriorates as data marts grow in size, so need to reduce the size of data marts to gain improvements in performance  Data mart load performancetwo critical components: end-user response time and data loading performanceto increment DB updating so that only cells affected by the change are updated and not the entire MDDB structure
  • 19.
    REFERENCES:  Book ofDBMS  Google.com  Wikipedia, the free encyclopedia  InformIT.com  Allfree-stuff.com