Annotate Types in Large Codebase with Automated
Refactoring
Jimmy Lai, Software Engineer at Carta
Feb. 9, 2022
Tech Stack
…
A Large Python Codebase
Python code
1.8 million lines
27,000 files
120,000 functions
~200 active developers
Lots of TypeError,
AttributeError, ValueError
Type Annotation and Mypy
Mypy: Argument 1 to "add" has incompatible type "str"; expected "int"
Automated Refactoring
Automated code changes for fixing large scale tech
debt (Code Formatting, Type Annotation, Dead Code
Cleanup)
LibCST Features:
● Concrete Syntax Tree
● Transformer and Matcher API
● Metadata with static analysis
Recommended tool: LibCST
A library for modifying Python code easily.
Code Review with Pull Requests
Pull
Request
Pull
Request
Pull
Request
Pull
Request
Add missing types based on static analysis
MonkeyType: add missing types based on runtime data
1. Collect types by running Python program.
2. Aggregate collected types and apply to the code using LibCST.
Run test cases and apply types:
Make it more fun!
Automated weekly updates and leaderboards!
Fully Typed Function Coverage
2018 2021
automated refactoring
Production Type Error Improvement
20

Carta
We are hiring! https://tinyurl.com/carta-jobs
Carta Engineering Blog https://medium.com/building-carta
Contact: jimmy.lai@carta.com

Annotate types in large codebase with automated refactoring

  • 1.
    Annotate Types inLarge Codebase with Automated Refactoring Jimmy Lai, Software Engineer at Carta Feb. 9, 2022
  • 4.
  • 5.
    A Large PythonCodebase Python code 1.8 million lines 27,000 files 120,000 functions ~200 active developers Lots of TypeError, AttributeError, ValueError
  • 6.
    Type Annotation andMypy Mypy: Argument 1 to "add" has incompatible type "str"; expected "int"
  • 7.
    Automated Refactoring Automated codechanges for fixing large scale tech debt (Code Formatting, Type Annotation, Dead Code Cleanup) LibCST Features: ● Concrete Syntax Tree ● Transformer and Matcher API ● Metadata with static analysis Recommended tool: LibCST A library for modifying Python code easily.
  • 8.
    Code Review withPull Requests Pull Request Pull Request Pull Request Pull Request
  • 13.
    Add missing typesbased on static analysis
  • 16.
    MonkeyType: add missingtypes based on runtime data 1. Collect types by running Python program. 2. Aggregate collected types and apply to the code using LibCST. Run test cases and apply types:
  • 17.
    Make it morefun! Automated weekly updates and leaderboards!
  • 18.
    Fully Typed FunctionCoverage 2018 2021 automated refactoring
  • 19.
  • 20.
    20  Carta We are hiring!https://tinyurl.com/carta-jobs Carta Engineering Blog https://medium.com/building-carta Contact: jimmy.lai@carta.com