How to Learn from Data Analysis Failures

Explore top LinkedIn content from expert professionals.

Summary

Learning from data analysis failures means identifying mistakes, understanding why they happened, and using those insights to improve future processes. It's about turning setbacks into opportunities for growth and innovation.

Investigate the root cause: Take the time to trace errors back to their origin, whether it's data inconsistencies, training-serving mismatches, or faulty model assumptions.
Develop safeguards: Create checks for data quality, such as monitoring record counts, identifying null values, or testing for unexpected changes in formats.
Embrace continuous learning: Treat failures as valuable lessons, using them to improve testing, transparency, teamwork, and adaptability in your analysis processes.

Summarized by AI based on LinkedIn member posts

Sankar Srinivasan

Senior Data Scientist @ Red Ventures | M.A. Statistics @ UC Berkeley

9,142 followers 2y
Report this post
Silent errors are one of the 𝘣𝘪𝘨𝘨𝘦𝘴𝘵 reasons your machine learning model is underperforming. ❌ And the scariest part? They persist like hidden viruses in many production models today. Let's take a look at some of the silent errors you could have: 1️⃣ Training-serving skew 📊 Are you 100% sure that the data you are using for training is coming from the same environment your model will operate in? This "skew" (i.e. lack of alignment) is a relatively common mistake that results in a model not optimized for its actual production environment. Actually trace the lineage of your training data to ensure 100% alignment! 2️⃣ Bad values in data ⛔ Here's an example: a duplicate user but with conflicting target values. Here's an even more pernicious example: Multiple join keys with some unexpected repeat token, causing an "exploding" join where the resulting rows' columns are effectively randomly joined. To avoid these, make sure to clearly define your expectations for the data in the form of some "contract"! Data engineers should be involved with testing and alerting here. 3️⃣ Leakage 💧 If your modeling process is leaking values from the test set into training, then it could have unreasonably high reported accuracy despite being a poor model. The only way to check for this is to rigorously test your splitting functions and feature engineering for leakage. Add assertions and/or unit tests to act as a safety net. Additional guardrail: Monitoring accuracy metrics and feature importances for values that are "too good to be true". —————— There isn't a ton of stimuli out there on ML testing, but Deepchecks has some pretty comprehensive tooling/libraries for this. Follow them and Philip Tannor (the CEO) for more insights!

5 Comments
Like Comment
Bill Shube

Gaining better supply chain visibility with low-code/no-code analytics and process automation. Note: views are my own and not necessarily shared with my employer.

2,693 followers 1y
Report this post
Want a simple way to earn trust from your stakeholders, analysts? Send them data quality alerts when things go wrong. This is data 101 for engineers, but my team and I are citizen developers. We don't have the same kind of training - things like this simply aren't immediately obvious to us. Here's an example of why you should do this, from just this week: An analysis that we run depends on A LOT of inputs, including some manually uploaded files. Lots of opportunity for things to go wrong. On Monday, I heard from one of the file providers that her upload had been failing for almost 2 weeks. One of my end users spotted the problem at about the same time that I heard from my file provider. Not great being the last one to find out about a data quality problem in an analysis that you're responsible for. I had been working on some data quality alerts, and sure enough, they would have spotted the problem right away. So I'm eager to finalize them and get them into production. Here are some easy things I'm implementing: 1. Record count checks: do today's inputs have roughly the same number of records as yesterday's? This doesn't catch all problems, but it's very easy to implement - it's all I needed to spot the problem I just described. 2. Consistency check: Make sure your inputs "look" the way you expect them to. In this case, the reason the file upload was failing was that one of the columns in the file changed from being numerical to text, and our SQL database didn't like that. 3. Check for null values: You might get the right number of records and the right data types, but the data could all be null. 4. Automated alerts: You don't want to hear from your stakeholders about data quality issues the way that I did. Put in some basic alerts like these with automatic emails when they're triggered. Copy all your stakeholders. This will sound remedial to data engineers, but these are habits that we citizen developers don't always have. There's a lot that we citizen developers can learn from our friends in IT, and simple things like this can go away toward earning our stakeholders' trust. #citizendevelopment #lowcode #nocode #analytics #supplychainanalytics

6 Comments
Like Comment
Tolga Tarhan

Technology Executive | Entrepreneur | Cloud & AI Thought Leader

5,602 followers 1y
Report this post
Has anybody had projects go wrong? I’ll be the first to raise my hand! In my time working with AI, I’ve had my fair share of projects that didn’t go as planned. And while it wasn’t the experience I thought I wanted, the unexpected outcomes always led to lessons and opportunities for growth. So here are some of the things I’ve learned: I learned to embrace the complexity. AI systems operate in complex, often unpredictable environments. Our experience taught us to embrace this complexity, not shy away from it. It pushed us to develop more robust testing scenarios that mimic real-world complexities more accurately. Second, data quality is paramount. Data quality is of critical importance. It's not just about quantity; the relevance, accuracy, and diversity of data play a pivotal role in the success of AI models. Next, algorithmic transparency. Understanding why an AI model makes certain decisions is crucial. Our project's challenges highlighted the need for greater transparency in our algorithms, propelling us towards adopting explainable AI practices. Another very important lesson is the power of team resilience. Facing setbacks tested our team's resilience but also brought us closer. It fostered a culture where every member feels valued and empowered to share ideas and concerns openly, driving collective problem-solving and innovation. Lastly, continuous learning is key. When projects go wrong, it reinforces the idea that in the AI field, continuous learning and adaptation are not just beneficial but necessary. It's a journey of constant evolution, where each setback can lead to greater achievements. In the world of AI, ‘failure’ is not a setback, it’s supposed to be used for growth and innovation! What are some lessons that you’ve learned from unexpected outcomes? #innovation #failures #lessonslearned #AI #technology #Kibsi #computervision #growth #development
Like Comment

How to Learn from Data Analysis Failures

Summary

More in Learning From Mistakes

Explore categories