How to Keep Analytics Data Relevant

Explore top LinkedIn content from expert professionals.

Summary

Keeping analytics data relevant means ensuring that the information you analyze remains accurate, timely, and valuable for decision-making. This involves implementing processes and checks to maintain data quality and align it with current business needs.

Establish data quality checks: Use methods like anomaly detection, row count verification, and data type validation to identify and prevent issues before they impact critical analyses.
Automate alerts and communication: Set up automated notifications for stakeholders to ensure immediate awareness of any data inconsistencies or failures, building trust across teams.
Prioritize critical data: Focus your efforts on maintaining the integrity of your most important data assets to reduce risks and stay aligned with evolving business goals.

Summarized by AI based on LinkedIn member posts

🎯 Mark Freeman II

Data Engineer | Tech Lead @ Gable.ai | O’Reilly Author: Data Contracts | LinkedIn [in]structor (28k+ Learners) | Founder @ On the Mark Data

63,145 followers 4mo
Report this post
I’ve lost count of projects that shipped gorgeous features but relied on messy data assets. The cost always surfaces later when inevitable firefights, expensive backfills, and credibility hits to the data team occur. This is a major reason why I argue we need to incentivize SWEs to treat data as a first-class citizen before they merge code. Here are five ways you can help SWEs make this happen: 1. Treat data as code, not exhaust Data is produced by code (regardless of whether you are the 1st party producer or ingesting from a 3rd party). Many software engineers have minimal visibility into how their logs are used (even the business-critical ones), so you need to make it easy for them to understand their impact. 2. Automate validation at commit time Data contracts enable checks during the CI/CD process when a data asset changes. A failing test should block the merge just like any unit test. Developers receive instant feedback instead of hearing their data team complain about the hundredth data issue with minimal context. 3. Challenge the "move fast and break things" mantra Traditional approaches often postpone quality and governance until after deployment, as shipping fast feels safer than debating data schemas at the outset. Instead, early negotiation shrinks rework, speeds onboarding, and keeps your pipeline clean when the feature's scope changes six months in. Having a data perspective when creating product requirement documents can be a huge unlock! 4. Embed quality checks into your pipeline Track DQ metrics such as null ratios, referential breaks, and out-of-range values on trend dashboards. Observability tools are great for this, but even a set of SQL queries that are triggered can provide value. 5. Don't boil the ocean; Focus on protecting tier 1 data assets first Your most critical but volatile data asset is your top candidate to try these approaches. Ideally, there should be meaningful change as your product or service evolves, but that change can lead to chaos. Making a case for mitigating risk for critical components is an effective way to make SWEs want to pay attention. If you want to fix a broken system, you start at the source of the problem and work your way forward. Not doing this is why so many data teams I talk to feel stuck. What’s one step your team can take to move data quality closer to SWEs? #data #swe #ai

4 Comments
Like Comment
Bill Shube

Gaining better supply chain visibility with low-code/no-code analytics and process automation. Note: views are my own and not necessarily shared with my employer.

2,693 followers 1y
Report this post
Want a simple way to earn trust from your stakeholders, analysts? Send them data quality alerts when things go wrong. This is data 101 for engineers, but my team and I are citizen developers. We don't have the same kind of training - things like this simply aren't immediately obvious to us. Here's an example of why you should do this, from just this week: An analysis that we run depends on A LOT of inputs, including some manually uploaded files. Lots of opportunity for things to go wrong. On Monday, I heard from one of the file providers that her upload had been failing for almost 2 weeks. One of my end users spotted the problem at about the same time that I heard from my file provider. Not great being the last one to find out about a data quality problem in an analysis that you're responsible for. I had been working on some data quality alerts, and sure enough, they would have spotted the problem right away. So I'm eager to finalize them and get them into production. Here are some easy things I'm implementing: 1. Record count checks: do today's inputs have roughly the same number of records as yesterday's? This doesn't catch all problems, but it's very easy to implement - it's all I needed to spot the problem I just described. 2. Consistency check: Make sure your inputs "look" the way you expect them to. In this case, the reason the file upload was failing was that one of the columns in the file changed from being numerical to text, and our SQL database didn't like that. 3. Check for null values: You might get the right number of records and the right data types, but the data could all be null. 4. Automated alerts: You don't want to hear from your stakeholders about data quality issues the way that I did. Put in some basic alerts like these with automatic emails when they're triggered. Copy all your stakeholders. This will sound remedial to data engineers, but these are habits that we citizen developers don't always have. There's a lot that we citizen developers can learn from our friends in IT, and simple things like this can go away toward earning our stakeholders' trust. #citizendevelopment #lowcode #nocode #analytics #supplychainanalytics

6 Comments
Like Comment
Benjamin Rogojan

Fractional Head of Data | Tool-Agnostic. Outcome-Obsessed

181,287 followers 1y
Report this post
Data quality is one of the most essential investments you can make when developing your data infrastructure. If you're data is "real-time" but it's wrong, guess what, you're gonna have a bad time. So how do you implement data quality into your pipelines? On a basic level you'll likely want to integrate some form of checks that could be anything from: - Anomaly and Range checks - These checks ensure that the data received fits an expected range or distribution. So let's say you only ever expect transactions of $5-$100 and you get a $999 transaction. That should set off alarms. In fact I have several cases where the business added new products or someone made a large business purchase that exceeded expectations that were flagged because of these checks - Data type checks - As the name suggests, this ensures that a date field is a date. This is important because if you're pulling files from a 3rd party they might send you headerless files that you have to trust they will keep sending you the same data in the same order. - Row count checks - A lot of businesses have a pretty steady rate of rows when it comes to fact tables. The number of transactions follow some sort of pattern, many are lower on the weekends and perhaps steadily growing over time. Row checks help ensure you don't see 2x the amount of rows because of a bad process or join. - Freshness checks - If you've worked in data long enough you'e likely had an executive bring up that your data was wrong. And it's less that the data was wrong, and more that the data was late(which is kind of wrong). Thus freshness checks make sure you know the data is late first so you can fix it or at least update those that need to know. - Category checks - The first category check I implemented was to ensure that every state abbreviation was valid. I assumed this would be true because they must use a drop down right? Well there were bad state abbreviations entered nonetheless As well as a few others. The next question would become how would you implement these checks and the solutions there range from setting up automated tasks that run during or after a table lands to dashboards to finally using far more developed tools that provide observability into far more than just a few data checks. If you're looking to dig deeper into the topic of data quality and how to implement it I have both a video and an article on the topic. 1. Video - How And Why Data Engineers Need To Care About Data Quality Now - And How To Implement It https://lnkd.in/gjMThSxY 2. Article - How And Why We Need To Implement Data Quality Now! https://lnkd.in/grWmDmkJ #dataengineering #datanalytics
No more previous content

No more next content
11 Comments
Like Comment

How to Keep Analytics Data Relevant

Summary

More in Ensuring Data Quality

Explore categories