Artifacts | Data Dictionary |
Data Modeling | Data Wrangling
Presented By
Md Faisal Akbar
Artifacts
An artifact is one of many kinds of tangible by-products produced during the
development of software.
Some artifacts (e.g., use cases, class diagrams, and other Unified Modeling
Language (UML) models, requirements and design documents) help describe
the function, architecture, and design of software.
Other artifacts are concerned with the process of development itself—
such as project plans, business cases, and risk assessments.
Artifacts are typically living documents and formally updated to reflect
changes in scope. They exist so that everyone involved in the project has a
shared understanding of all information related to the effort.
Data Dictionary
Whatis a data dictionary?
◇ It is an integralpart of a database.
◇ It holds information about the
database and the data that it stores.
◇ A data dictionary is a “virtual database”
containing metadata (data about data).
META DATA
Metadata is Metadata is defined as data providing
information about one or more aspects of the
data, such as:
◇ Time and date of creation.
◇ Authorization of the data.
◇ Attribute size.
◇ Purpose of the data.
It is where the systems analyst goes to define or look
up information about entities, attributes and relationships
on the ERD (Entity Relationship Design).
“
Viewing the data dictionary
SELECT * FROM DICT;
--or
SELECT * FROM DICTIONARY;
lists all tables and views of the data dictionary that are accessible to the
user. The selected information includes the name and a short description of
each table and view
Data Dictionary provides information about
database
◇
◇
◇
◇
◇
◇
◇
◇
◇
◇
Table
Indexes
Columns
Constrains
Relationship to other variables
Precision of data
Variable format
Packages
Data type
And more
BIG Importance
◇
◇
Avoid duplication.
Make maintenance
straightforward.
To locate the error in
system.
And more.
◇ the
◇
Structure of Data Dictionary
Relational
systems all have
some form of
integrated data
dictionary (e.g.
Oracle)
It can be
integrated with
the DBMS or
stand-alone.
It automatically
reflect the
changes in the
database.
Disadvantages of
Data Dictionary?
Creating a new data dictionary is
a very big task. It will take years
To create one.
Requires management commitment,
which is not easy to achieve,
particularly where the benefits are
intangible and long term.
The cost of data dictionary will
be bit high as it includes its initial
build and hardware charges as
well as cost of maintenance.
It needs careful planning,
defining the exact requirements
designing its contents, testing,
implementation and
evaluation.
What is a Data Model ?
 Graphical Representation of tables
 Represent relationship between
tables
 Easily understood
Phases of Data Model
 Conceptual
 Logical
 Physical
Conceptual Data Model
 Highly Abstract
 Easily understood
 Easily enhanced
 Only “Entities” visible
 Abstract Relationship
 No attribute is specified.
 No primary key is specified.
Logical Data Model  Includes all entities and relationships
among them
 Key Attribute
 Non-Key attribute
 The primary key for each entity is specified.
 Foreign keys are specified
 Normalization occurs at this level.
 User Friendly Attribute name
 More detailed than Conceptual Model
 Database agnostic
The steps for designing the logical data model
are as follows:
1. Specify primary keys for all entities.
2. Find the relationships between different
entities.
3. Find all attributes for each entity.
4. Resolve many-to-many relationships.
5. Normalization.
Physical Data Model
Physical data model represents how the model will be
built in the database
 Entities referred to as Tables
 Attribute referred to as Columns
 Foreign keys are used to identify relationships
between tables.
 Denormalization may occur based on user
requirements.
 Database compatible Table names
 Database compatible Column names
 Database specific data types (For example, data
type for a column may be different between MySQL
and SQL Server)
The steps for physical data model design are
as follows:
1. Convert entities into tables.
2. Convert relationships into foreign keys.
3. Convert attributes into columns.
4. Modify the physical data model based on physical
constraints / requirements.
Compare Stages of Data Model
Feature Conceptual Logical Physical
Entity Names ✓ ✓
Entity Relationships ✓ ✓
Attributes ✓
Primary Keys ✓ ✓
Foreign Keys ✓ ✓
Table Names ✓
Column Names ✓
Column Data Types ✓
Data wrangling
Data wrangling is the process of cleaning, structuring and enriching raw data into a desired format
for better decision making in less time.
Key Steps of Data Wrangling:
 Data Acquisition: Identify and obtain access to the data within your sources
 Joining Data : Combine the edited data for further use and analysis
 Data Cleansing: Redesign the data into a usable/functional format and correct/remove any bad
data
Thanks!
Any questions?

Artifacts, Data Dictionary, Data Modeling, Data Wrangling

  • 1.
    Artifacts | DataDictionary | Data Modeling | Data Wrangling Presented By Md Faisal Akbar
  • 2.
    Artifacts An artifact isone of many kinds of tangible by-products produced during the development of software. Some artifacts (e.g., use cases, class diagrams, and other Unified Modeling Language (UML) models, requirements and design documents) help describe the function, architecture, and design of software. Other artifacts are concerned with the process of development itself— such as project plans, business cases, and risk assessments. Artifacts are typically living documents and formally updated to reflect changes in scope. They exist so that everyone involved in the project has a shared understanding of all information related to the effort.
  • 3.
  • 4.
    Whatis a datadictionary? ◇ It is an integralpart of a database. ◇ It holds information about the database and the data that it stores. ◇ A data dictionary is a “virtual database” containing metadata (data about data).
  • 5.
    META DATA Metadata isMetadata is defined as data providing information about one or more aspects of the data, such as: ◇ Time and date of creation. ◇ Authorization of the data. ◇ Attribute size. ◇ Purpose of the data.
  • 6.
    It is wherethe systems analyst goes to define or look up information about entities, attributes and relationships on the ERD (Entity Relationship Design). “
  • 7.
    Viewing the datadictionary SELECT * FROM DICT; --or SELECT * FROM DICTIONARY; lists all tables and views of the data dictionary that are accessible to the user. The selected information includes the name and a short description of each table and view
  • 8.
    Data Dictionary providesinformation about database ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ ◇ Table Indexes Columns Constrains Relationship to other variables Precision of data Variable format Packages Data type And more
  • 9.
    BIG Importance ◇ ◇ Avoid duplication. Makemaintenance straightforward. To locate the error in system. And more. ◇ the ◇
  • 11.
    Structure of DataDictionary Relational systems all have some form of integrated data dictionary (e.g. Oracle) It can be integrated with the DBMS or stand-alone. It automatically reflect the changes in the database.
  • 12.
    Disadvantages of Data Dictionary? Creatinga new data dictionary is a very big task. It will take years To create one. Requires management commitment, which is not easy to achieve, particularly where the benefits are intangible and long term. The cost of data dictionary will be bit high as it includes its initial build and hardware charges as well as cost of maintenance. It needs careful planning, defining the exact requirements designing its contents, testing, implementation and evaluation.
  • 13.
    What is aData Model ?  Graphical Representation of tables  Represent relationship between tables  Easily understood Phases of Data Model  Conceptual  Logical  Physical
  • 14.
    Conceptual Data Model Highly Abstract  Easily understood  Easily enhanced  Only “Entities” visible  Abstract Relationship  No attribute is specified.  No primary key is specified.
  • 15.
    Logical Data Model Includes all entities and relationships among them  Key Attribute  Non-Key attribute  The primary key for each entity is specified.  Foreign keys are specified  Normalization occurs at this level.  User Friendly Attribute name  More detailed than Conceptual Model  Database agnostic The steps for designing the logical data model are as follows: 1. Specify primary keys for all entities. 2. Find the relationships between different entities. 3. Find all attributes for each entity. 4. Resolve many-to-many relationships. 5. Normalization.
  • 16.
    Physical Data Model Physicaldata model represents how the model will be built in the database  Entities referred to as Tables  Attribute referred to as Columns  Foreign keys are used to identify relationships between tables.  Denormalization may occur based on user requirements.  Database compatible Table names  Database compatible Column names  Database specific data types (For example, data type for a column may be different between MySQL and SQL Server) The steps for physical data model design are as follows: 1. Convert entities into tables. 2. Convert relationships into foreign keys. 3. Convert attributes into columns. 4. Modify the physical data model based on physical constraints / requirements.
  • 17.
    Compare Stages ofData Model Feature Conceptual Logical Physical Entity Names ✓ ✓ Entity Relationships ✓ ✓ Attributes ✓ Primary Keys ✓ ✓ Foreign Keys ✓ ✓ Table Names ✓ Column Names ✓ Column Data Types ✓
  • 18.
    Data wrangling Data wranglingis the process of cleaning, structuring and enriching raw data into a desired format for better decision making in less time. Key Steps of Data Wrangling:  Data Acquisition: Identify and obtain access to the data within your sources  Joining Data : Combine the edited data for further use and analysis  Data Cleansing: Redesign the data into a usable/functional format and correct/remove any bad data
  • 19.