From the course: Data Steward Foundations
Protecting data quality
From the course: Data Steward Foundations
Protecting data quality
- Master data management programs are also charged with maintaining the quality of the data in those single source of truth data stores. After all, if a master data store contains inaccurate information, that inaccurate information will then propagate to affect business processes all across the organization. We track data quality by evaluating it on six different dimensions. These include accuracy, completeness, consistency, timeliness, validity, and uniqueness. Let's dive into each one of those. Data accuracy is what we'd often call the correctness of the data. In our master data stores, we want to make sure that our data accurately reflects the reality of the situation. For example, if we have customer telephone numbers in our database, the data is accurate if those numbers actually are the telephone numbers of our customers. Data completeness means that we have all of the relevant data for a field stored in our master data store. For example, if we have a listing of classes taken by a student, the data is only complete if our master data store includes all of the classes that that student actually took. Data consistency means that the data stored in multiple locations is the same. Master data management seeks to build a single source of truth in an effort to achieve data consistency. We want to reduce or eliminate duplicate data stores, and in cases where we need them, ensure that they are synchronized with the master data store. Data timeliness means that our data is current and not out of date. And once again, thinking about telephone numbers, our data is only timely if the telephone number listed in the database remains current. If the customer changes their telephone number and that change isn't reflected in the master data, it's no longer timely. Data validity means that the data meets our requirements and any attribute limitations. For example, a US ZIP code should be either five digits or nine digits with a dash between the fifth and sixth digits. And data uniqueness means that we only have one record for each entity represented in a data set. If the same customer has two different records, that introduces the possibility for error. When checking for data quality, consider the full data life cycle, and look for places where error might be introduced. This begins at data acquisition. Verify that the data source is reputable, and that you're receiving high quality data from that source. As you ingest data into your systems, it often goes through transformations to fit into your data stores, and it may pass through several systems on its way to the final destination. Make sure that this process doesn't mangle your data as it passes through different conversion steps. And you may also have other data manipulation operations that take place during your analysis as you summarize, aggregate, and transform your data. Finally, you present data on reports and dashboards. It's up to data stewards and others involved in the reporting process to make sure that this data is high quality and presented accurately.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.