A Statistician Walks into a Tech Company
R at a rapidly scaling healthcare technology startup
Sandy Griffith
Twitter: @sgrifter
sgriffith@flatiron.com
www.flatiron.com
My story
Academic biostatistics
© 2016 Flatiron Health, Inc. Proprietary and confidential.
My story
3
Academic biostatistics Healthcare tech
© 2016 Flatiron Health, Inc. Proprietary and confidential. 4
Flatiron’s mission is to serve cancer patients and our
partners by dramatically improving treatment and
accelerating research.
Our Mission
Flatiron Processes EHR Data At Scale
© 2016 Flatiron Health, Inc. Proprietary and confidential. 5
Research-
Grade Data
Demographics
Diagnosis
Visits
Labs
e-Prescribing
Pathology
Report
Discharge
Notes
Radiology
Report
Physician
Notes
Electronic Health
Record
Structured Data Unstructured Data Outside
Practice
Hospital
Lab
Structured Data
Processing
Unstructured
Data
Processing
Standard EHR Data
Rapidly Scaling
January 2015
Flatiron: ~140
Software Engineers: ~50
Quantitative Sciences team: 1
6© 2016 Flatiron Health, Inc. Proprietary and confidential.
Now: We are a team of 262
7
We include…
All Flatiron data and tools are collaboratively built, implemented and maintained by a
cross-disciplinary team that includes oncology, engineering, and quantitative sciences
We come from…
9 Medical oncologists and nurses
70 Software engineers
10 Quantitative scientists
5 Medical informaticists
+ more!
© 2016 Flatiron Health, Inc. Proprietary and confidential.
Primary Language: time of hire
© 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R: time of hire
9© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
10© 2016 Flatiron Health, Inc. Proprietary and confidential.
A decision point early on
11© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
12© 2016 Flatiron Health, Inc. Proprietary and confidential.
Cultivate R culture
1. Internal R Package
2. User group
3. Slack channel
4. Trainings
5. Hiring
13© 2016 Flatiron Health, Inc. Proprietary and confidential.
Proficiency with R
14© 2016 Flatiron Health, Inc. Proprietary and confidential.
Time of hire Now
Now we have R users, but when should we use R?
Three scenarios:
1. R for prototyping → !R in production
2. R as a long-term solution
3. R and !R in parallel
15© 2016 Flatiron Health, Inc. Proprietary and confidential.
R for prototyping → !R in production
16© 2016 Flatiron Health, Inc. Proprietary and confidential.
Prototype
● One-time linkage
● Small cohort (10s of thousands)
● RecordLinkage R package
● Probabilistic linkage method using
EM algorithm
Production
● Repeated daily at scale
● Large cohort (~5 million patients)
● Code maintained by different team
● Deterministic logic in SQL
Example: Linking external mortality data
R for prototyping → !R in production
Why this made sense:
● Stable method -- No longer needed rapid iteration
● Tuning parameters
● Similar performance, more transparency
● No R users on team that would be maintaining code
17© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Linking external mortality data
R as a long-term solution
Early version (Jan 2015)
18© 2016 Flatiron Health, Inc. Proprietary and confidential.
● bash commands for extracting data
run from R script using ETL tool
● R script run via command line
● parameters in metafiles manually
updated
● Runs a series of Rmd files and
renders HTML output
Current Version (April 2016)
Example: Rmarkdown QA report
● linked to data pipeline maintained
by software engineering
● metafile generated dynamically
● Plotly survival curves
● Flatly bootstrap theme
● Plan to continue using R
indefinitely
R as a long-term solution
19© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Rmarkdown QA report
Why this made sense:
● Mature product and team
● Quantitative science members remain embedded in team
● Strong support and collaboration with software engineering
● Requirements are dynamic -- continued need for rapid
prototyping
R and !R in parallel
● Specific research questions
● 2 people code independently in Python/SQL and R
● Compare results
● Language sometimes incidental, more about 2 different perspectives
Why this made sense:
● High stakes or low error tolerance
● Complicated concepts
● Custom projects often involve novel problems
20© 2016 Flatiron Health, Inc. Proprietary and confidential.
Example: Some external collaborations
Thank you
● Melissa Curtis
● Josh Kraut
● Kathi Seidl-Rathkopf
● Cindy Revol
● Rachael Sorg
● Jay Rughani
21© 2016 Flatiron Health, Inc. Proprietary and confidential.
● Paul You
● Aracelis Torres
● Alphan Kirayoglu
● Ben Birnbaum
● Ann Jaskiw
● James Gippetti
Join our Team!
Drop me a note at sgriffith@flatiron.com, @sgrifter,
or visit flatiron.com/careers

A Statistician Walks into a Tech Company: R at a Rapidly Scaling Healthcare Startup

  • 1.
    A Statistician Walksinto a Tech Company R at a rapidly scaling healthcare technology startup Sandy Griffith Twitter: @sgrifter sgriffith@flatiron.com www.flatiron.com
  • 2.
    My story Academic biostatistics ©2016 Flatiron Health, Inc. Proprietary and confidential.
  • 3.
  • 4.
    © 2016 FlatironHealth, Inc. Proprietary and confidential. 4 Flatiron’s mission is to serve cancer patients and our partners by dramatically improving treatment and accelerating research. Our Mission
  • 5.
    Flatiron Processes EHRData At Scale © 2016 Flatiron Health, Inc. Proprietary and confidential. 5 Research- Grade Data Demographics Diagnosis Visits Labs e-Prescribing Pathology Report Discharge Notes Radiology Report Physician Notes Electronic Health Record Structured Data Unstructured Data Outside Practice Hospital Lab Structured Data Processing Unstructured Data Processing Standard EHR Data
  • 6.
    Rapidly Scaling January 2015 Flatiron:~140 Software Engineers: ~50 Quantitative Sciences team: 1 6© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 7.
    Now: We area team of 262 7 We include… All Flatiron data and tools are collaboratively built, implemented and maintained by a cross-disciplinary team that includes oncology, engineering, and quantitative sciences We come from… 9 Medical oncologists and nurses 70 Software engineers 10 Quantitative scientists 5 Medical informaticists + more! © 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 8.
    Primary Language: timeof hire © 2015 Flatiron Health, Inc. Proprietary and confidential. 8© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 9.
    Proficiency with R:time of hire 9© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 10.
    A decision pointearly on 10© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 11.
    A decision pointearly on 11© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 12.
    Cultivate R culture 1.Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 12© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 13.
    Cultivate R culture 1.Internal R Package 2. User group 3. Slack channel 4. Trainings 5. Hiring 13© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 14.
    Proficiency with R 14©2016 Flatiron Health, Inc. Proprietary and confidential. Time of hire Now
  • 15.
    Now we haveR users, but when should we use R? Three scenarios: 1. R for prototyping → !R in production 2. R as a long-term solution 3. R and !R in parallel 15© 2016 Flatiron Health, Inc. Proprietary and confidential.
  • 16.
    R for prototyping→ !R in production 16© 2016 Flatiron Health, Inc. Proprietary and confidential. Prototype ● One-time linkage ● Small cohort (10s of thousands) ● RecordLinkage R package ● Probabilistic linkage method using EM algorithm Production ● Repeated daily at scale ● Large cohort (~5 million patients) ● Code maintained by different team ● Deterministic logic in SQL Example: Linking external mortality data
  • 17.
    R for prototyping→ !R in production Why this made sense: ● Stable method -- No longer needed rapid iteration ● Tuning parameters ● Similar performance, more transparency ● No R users on team that would be maintaining code 17© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Linking external mortality data
  • 18.
    R as along-term solution Early version (Jan 2015) 18© 2016 Flatiron Health, Inc. Proprietary and confidential. ● bash commands for extracting data run from R script using ETL tool ● R script run via command line ● parameters in metafiles manually updated ● Runs a series of Rmd files and renders HTML output Current Version (April 2016) Example: Rmarkdown QA report ● linked to data pipeline maintained by software engineering ● metafile generated dynamically ● Plotly survival curves ● Flatly bootstrap theme ● Plan to continue using R indefinitely
  • 19.
    R as along-term solution 19© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Rmarkdown QA report Why this made sense: ● Mature product and team ● Quantitative science members remain embedded in team ● Strong support and collaboration with software engineering ● Requirements are dynamic -- continued need for rapid prototyping
  • 20.
    R and !Rin parallel ● Specific research questions ● 2 people code independently in Python/SQL and R ● Compare results ● Language sometimes incidental, more about 2 different perspectives Why this made sense: ● High stakes or low error tolerance ● Complicated concepts ● Custom projects often involve novel problems 20© 2016 Flatiron Health, Inc. Proprietary and confidential. Example: Some external collaborations
  • 21.
    Thank you ● MelissaCurtis ● Josh Kraut ● Kathi Seidl-Rathkopf ● Cindy Revol ● Rachael Sorg ● Jay Rughani 21© 2016 Flatiron Health, Inc. Proprietary and confidential. ● Paul You ● Aracelis Torres ● Alphan Kirayoglu ● Ben Birnbaum ● Ann Jaskiw ● James Gippetti Join our Team! Drop me a note at sgriffith@flatiron.com, @sgrifter, or visit flatiron.com/careers