From the course: Data-Centric AI: Best Practices, Responsible AI, and More

Unlock the full course today

Join today to access over 24,900 courses taught by industry experts.

Best practices

Best practices

- [Instructor] In this section, I'll discuss the critical importance of data validation and pre-processing, and go over some common data issues that needs to be addressed. These are the data issues that I showed you initially when I started talking about what is data-centric AI. Now, let's dig a little deeper into each of these. Domain expertise gaps can lead to incorrect assumptions in the data labeling and the collection process. Hence, having real world knowledge is key. Biased data distribution skew model performance towards certain subgroups. Hence, representativeness must be monitored. Incorrect or missing labels provide the wrong training signal to the models. Hence, human oversight and auditing is very imperative. Inconsistent format and sparsity can make combining the data sources tricky, so consolidation and normalization can help in this scenario. Having duplicates can really bias the model. Hence, having…

Contents