Setting up Data Science
for Success: The Data
Layer
WE ARE HIRING!
https://www.weightwatchers.com/us/corporate-careers
80% 10% 10%
Data
Preparation
Modeling Interpretation
& Presentation
76% of data scientists view data preparation as the least
enjoyable part of their work (Crowdflower / Figure8, 2016)
Josh Wills (Slack):
”I’m a data janitor. That’s the
sexiest job of the 21st century.”
80% 10% 10%
A B C
time
● Collecting: searching, API, web scraping
● Understanding structure: entities, relations, foreign keys, UUIDs
● Understanding data: what fields / metrics mean, categorical variables
● Profiling: EDA, ranges, distributions
● Cleaning & normalizing: date formats, encodings, data types
● Reshaping and re-formatting data
● Filter, aggregate
● Deal with scaling issues, waiting for queries, set indexes & partitions
Various facets of data quality:
● Accurate: right types/format
● Coherent: referential integrity, no dupes
● Complete: no missing records & values
● Timely: in order, not late
● Defined: data dictionary, no field stuffing
● ....
Data
Engineering
Data
Science
Business
Intelligence
Internal Apps
/ Services
External Data /
Services
Not my Job!
Data Quality has
to be a shared
responsibility
Upstream, defensive programming
● UTC
● Stable schema
● DB constraints: no dupes
● Audit table / change log
● “Our eventing has always been ‘fire and
forget’ with no guarantee of delivery.”
Validate with known answers
● Monitoring / Alerts
● Retries
● Data type & range checks
● DB constraints
● Audit logs
● Schema on write
Own:
● Single Source of Truth
● Metric implementation
● Monitoring / Alerts
● Data Dictionary
“I found the right table and have this field but I don’t
know what it means”
https://medium.com/@leapingllamas: Data Dictionary: a how to and best practices
“How do we define ARPU?”
“This is my understanding of active user”
● Data quality is important and a shared responsibility
● Create a single source of truth
● Create table-level and company-level data dictionaries
Your data scientists will thank you!
Thank you!
https://www.weightwatchers.com/us/corporate-careers

Setting up Data Science for Success: The Data Layer