Introduction Motivation Approach Evaluation Conc.
Speculative Automated Refactoring of Imperative
Deep Learning Programs to Graph Execution
Raffi Khatchadourian1,2
Tatiana Castro Vélez2
Mehdi
Bagherzadeh3
Nan Jia2
Anita Raja1,2
1
City University of New York (CUNY) Hunter College, USA
2
City University of New York (CUNY) Graduate Center, USA
3
Oakland University, USA
International Conference on Automated Software Engineering
November 18, 2025, Seoul, South Korea
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 1 / 15
Introduction Motivation Approach Evaluation Conc.
Deep Learning Systems & Run-time Performance
Machine Learning (ML), including Deep Learning (DL), systems are
pervasive.
As datasets grow, efficiency becomes essential to support
responsiveness [Zhou et al., 2020].
For efficiency, DL frameworks have traditionally embraced a deferred
execution-style supporting graph-based (DNN) computation.
Scalable, but development is . . .
Error-prone.
Cumbersome.
Produces programs that are difficult to debug.
Because graph computation executes statements in a non-imperative
order, traditional SE tools cannot help troubleshoot bugs [Arpteg
et al., 2018].
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 2 / 15
TensorFlow Deferred Execution-style Code
1 # Build a graph.
2 a = tf.constant(5.0)
3 b = tf.constant(6.0)
4 c = a * b
5
6 # Launch graph in a session.
7 sess = tf.Session()
8
9 # Evaluate the tensor `c`.
10 print(sess.run(c)) # prints 30.0
Lines 2–4 build a computation graph.
Line 4 does not execute until the Session is run on line 10.
No native support common imperative program constructs, e.g.,
iteration.
TensorFlow Deferred Execution-style Code
1 # Build a graph.
2 a = tf.constant(5.0)
3 b = tf.constant(6.0)
4 c = a * b
5
6 # Launch graph in a session.
7 sess = tf.Session()
8
9 # Evaluate the tensor `c`.
10 print(sess.run(c)) # prints 30.0
Lines 2–4 build a computation graph.
Line 4 does not execute until the Session is run on line 10.
No native support common imperative program constructs, e.g.,
iteration.
TensorFlow Deferred Execution-style Code
1 # Build a graph.
2 a = tf.constant(5.0)
3 b = tf.constant(6.0)
4 c = a * b
5
6 # Launch graph in a session.
7 sess = tf.Session()
8
9 # Evaluate the tensor `c`.
10 print(sess.run(c)) # prints 30.0
Lines 2–4 build a computation graph.
Line 4 does not execute until the Session is run on line 10.
No native support common imperative program constructs, e.g.,
iteration.
Introduction Motivation Approach Evaluation Conc.
Imperative DL Programming, Eager Execution, &
Hybridization
Imperative DL frameworks (e.g., TensorFlow Eager, Keras, PyTorch)
encouraging eager execution are more natural, less error-prone, and
easier to debug.
Sacrifices run-time performance.
Thus, hybrid approaches (e.g., Hybridize, TorchScript, AutoGraph)
have surfaced that:
Execute imperative DL programs as static graphs at run-time.
Are integrated into mainstream DL frameworks (e.g.,
TensorFlow, MXNet, PyTorch).
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 4 / 15
Eager TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
Hybridized TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10 @tf.function(...) # Executes model as graph (optional args).
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
On line 10, AutoGraph used to potentially enhance performance.
Decorates model’s call() method with @tf.function.
At run-time, call()’s execution will be “traced” (∼9.22 speedup).
Hybridized TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10 @tf.function(...) # Executes model as graph (optional args).
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
On line 10, AutoGraph used to potentially enhance performance.
Decorates model’s call() method with @tf.function.
At run-time, call()’s execution will be “traced” (∼9.22 speedup).
Hybridized TensorFlow Imperative (OO) DL Model Code
1 class SequentialModel(tf.keras.Model):
2 def __init__(self, **kwargs):
3 super(SequentialModel, self).__init__(...)
4 self.flatten = layers.Flatten(input_shape=(28, 28))
5 num_layers = 100 # Add many small layers.
6 self.layers = [layers.Dense(64, activation = "relu") for n in
range(num_layers)]
,
→
7 self.dropout = tf.keras.layers.Dropout(0.2)
8 self.dense_2 = tf.keras.layers.Dense(10)
9
10 @tf.function(...) # Executes model as graph (optional args).
11 def __call__(self, x):
12 x = self.flatten(x)
13 for layer in self.layers:
14 x = layer(x)
15 x = self.dropout(x)
16 x = self.dense_2(x)
17 return x
On line 10, AutoGraph used to potentially enhance performance.
Decorates model’s call() method with @tf.function.
At run-time, call()’s execution will be “traced” (∼9.22 speedup).
Introduction Motivation Approach Evaluation Conc. Drawbacks
Hybridization Drawbacks
Needs non-trivial, specialized metadata [Jeong et al., 2019].
Exhibit limitations and known issues with native program constructs.
Subtle considerations required to:
Specify (decorate) the functions to be migrated.
Make code amenable to safe, accurate, and efficient graph execution.
Avoid performance bottlenecks and semantically inequivalent
results [Cao et al., 2022, Castro Vélez et al., 2022].
Manual analysis and refactoring (semantics-preserving,
source-to-source transformation) for optimal results can be error-
and omission-prone [Dig et al., 2009].
Further complicated by:
Increasing Object-Orientation (OO) in DL model code (e.g.., Keras).
Dynamically-typed languages (e.g., Python).
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 7 / 15
Introduction Motivation Approach Evaluation Conc. Drawbacks
Imperative DL Code With Python Side-effects
1 @tf.function
2 def f(x):
3 print("Input: ", x)
4 f(1)
5 f(1)
6 f(2)
Output (expecting 1, 1, 2):
Input: 1
Input: 2
Side-effect producing, native Python statements, e.g., printing, list
appending, global variable mutation, are problematic for
tf.function-decorated functions (i.e., “tf.functions”).
Because they are traced, a function’s behavior is “etched” into its
corresponding graph.
Can have unexpectant results, executing side-effects multiple times
or not at all.
Side-effects occur when tf.functions are called the first time.
Subsequent calls with similar arguments execute the graph instead.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 8 / 15
Introduction Motivation Approach Evaluation Conc. Insight Refactorings Approach
Problem Insight
Although imperative DL code executes sequentially, hybridization
resembles parallelizing sequential code.
Example
To void unexpected behavior, like concurrent programs, hybrid functions
should avoid side-effects.
Idea
Adopt concepts from automated refactorings that parallelize sequential
code, e.g., Streaming APIs [Khatchadourian et al., 2019].
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 9 / 15
Introduction Motivation Approach Evaluation Conc. Insight Refactorings Approach
Refactorings
Two new, fully-automated refactorings:
Convert Eager Function to Hybrid Transforms otherwise
eagerly-executed imperative (Python) DL code for
enhanced run-time performance.
Automatically specifies (decorates) whether and how
code could be reliably and efficiently executed as
graphs at run-time.
Avoids hybridizing code under certain conditions
(e.g., side-effecting code) to preserve semantics.
Optimize Hybrid Function Transforms code already running as
graphs for optimal run-time performance.
Possibly dehybridize code when eager execution could
be faster (e.g., graph “retracing”).
Issues refactoring “warnings” when hybrid code may
have unexpected results but refactoring is not
possible to due semantics preservation.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 10 / 15
Introduction Motivation Approach Evaluation Conc. Insight Refactorings Approach
Refactoring Preconditions
Table: Convert Eager Function to Hybrid preconditions.
exe tens lit* se rec trans
P1 eag T F F F hyb
* An option exists in our implementation to
not consider Boolean literals.
Table: Optimize Hybrid Function preconditions.
exe tens lit* se trans
P2 graph F N/A F eag
P3 graph T T F eag
* An option exists in our implementation
to not consider Boolean literals.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 11 / 15
Approach Overview
Start Input Source (1) Precondition Check (2) Transformation Stop
Figure: High-level flowchart.
(1) Identify
Functions
Candidate
Functions
(2) Extract
Decorators
Decors
(3) Infer
Execution
(4) Infer
Tensors
(5) Type
Hints
(6) Speculate
Analysis
(7) Dataflow
Analysis
(8) Infer
Literals
(9) Infer
Side-effect
(10) Identify
Recursion
Figure: Precondition checking flowchart.
Novel whole program static tensor analysis for imperative DL code
based on WALA Ariadne.
Extensible to other imperative DL frameworks (e.g., PyTorch).
Novel Python side-effect analysis based on WALA ModRef analysis.
Leverages complementary speculative analysis [Zhou et al., 2020]
using contextual DL keywords.
Implementation
Figure: Screenshot of the Hybridize Functions refactoring preview wizard.
Evaluation Summary
Analyzed 19 open-source Python imperative DL systems.
Varying size and domain.
Ranging from 0.12 to 36.72 KSLOC.
Refactored 42.56% of 766 functions despite conservatism.
Run-time Performance Evaluation Summary
Measured an average relative model training speedup of 2.16.
Memory consumption measurement pending.
Differences in model accuracy and loss before and after refactoring
were negligible.
Introduction Motivation Approach Evaluation Conc.
Conclusion
Imperative DL code is easier to debug, write, and maintain.
Comes at the expense of (run-time) performance.
Hybridization bridges the gap between eager and graph execution.
Optimal performance and semantics preservation is non-trivial.
Our Work
Refactoring approach for automatically converting imperative DL
code to graphs.
Novel tensor analysis for imperative DL.
Open-source tool that successfully refactors 42.56% of candidate
functions across 19 Python DL programs resulting in an average
relative speedup of 2.16.
Come see our poster!
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
Introduction Motivation Approach Evaluation Conc.
For Further Reading I
Abadi, Martı́n et al. (2016). “TensorFlow: A System for Large-Scale Machine Learning”. In: Symposium on Operating Systems
Design and Implementation.
Agrawal, Akshay et al. (2019). TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning. arXiv:
1903.01855 [cs.PL].
Apache (Apr. 8, 2021). Hybridize. Apache MXNet documentation. url:
https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/packages/gluon/blocks/hybridize.html (visited
on 04/08/2021).
Arpteg, A., B. Brinne, L. Crnkovic-Friis, and J. Bosch (2018). “Software Engineering Challenges of Deep Learning”. In: Euromicro
Conference on Software Engineering and Advanced Applications. IEEE, pp. 50–59. doi: 10.1109/SEAA.2018.00018.
Cao, Junming, Bihuan Chen, Chao Sun, Longjie Hu, Shuaihong Wu, and Xin Peng (2022). “Understanding Performance Problems
in Deep Learning Systems”. In: FSE. FSE ’22. ACM, pp. 357–369. doi: 10.1145/3540250.3549123.
Castro Vélez, Tatiana, Raffi Khatchadourian, Mehdi Bagherzadeh, and Anita Raja (May 2022). “Challenges in Migrating
Imperative Deep Learning Programs to Graph Execution: An Empirical Study”. In: MSR. MSR ’22. ACM/IEEE. ACM. doi:
10.1145/3524842.3528455.
Chen, Tianqi, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang
(2015). “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems”. In: Workshop on
Machine Learning Systems at NIPS. arXiv: 1512.01274 [cs.DC].
Chollet, François (2020). Deep Learning with Python. 2nd ed. Manning.
Dig, Danny, John Marrero, and Michael D. Ernst (2009). “Refactoring sequential Java code for concurrency via concurrent
libraries”. In: ICSE, pp. 397–407. doi: 10.1109/ICSE.2009.5070539.
Dilhara, Malinda, Ameya Ketkar, Nikhith Sannidhi, and Danny Dig (2022). “Discovering Repetitive Code Changes in Python ML
Systems”. In: ICSE. ICSE ’22.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
Introduction Motivation Approach Evaluation Conc.
For Further Reading II
Dolby, Julian, Avraham Shinnar, Allison Allain, and Jenna Reinen (2018). “Ariadne. Analysis for Machine Learning Programs”. In:
MAPL. ACM SIGPLAN. ACM, pp. 1–10. doi: 10.1145/3211346.3211349.
Facebook Inc. (2019). PyTorch. TorchScript. en. url: https://pytorch.org/docs/stable/jit.html (visited on 02/19/2021).
Jeong, Eunji, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, Taebum Kim, and Byung-Gon Chun (July 2019).
“Speculative Symbolic Graph Execution of Imperative Deep Learning Programs”. In: SIGOPS Oper. Syst. Rev. 53.1, pp. 26–33.
issn: 0163-5980. doi: 10.1145/3352020.3352025.
Khatchadourian, Raffi, Yiming Tang, Mehdi Bagherzadeh, and Syed Ahmed (2019). “Safe Automated Refactoring for Intelligent
Parallelization of Java 8 Streams”. In: ICSE. ICSE ’19. IEEE Press, pp. 619–630. doi: 10.1109/ICSE.2019.00072.
Kim, Miryung, Thomas Zimmermann, and Nachiappan Nagappan (Nov. 2012). “A Field Study of Refactoring Challenges and
Benefits”. In: FSE. ACM. doi: 10.1145/2393596.2393655.
Moldovan, Dan, James M. Decker, Fei Wang, Andrew A. Johnson, Brian K. Lee, Zachary Nado, D. Sculley, Tiark Rompf, and
Alexander B. Wiltschko (2019). AutoGraph: Imperative-style Coding with Graph-based Performance. arXiv: 1810.08061 [cs.PL].
Negara, Stas, Nicholas Chen, Mohsen Vakilian, Ralph E. Johnson, and Danny Dig (2013). “A Comparative Study of Manual and
Automated Refactorings”. In: ECOOP. Ed. by Giuseppe Castagna. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 552–576.
isbn: 978-3-642-39038-8.
OpenAI, Inc. (Aug. 18, 2023). ChatGPT. url: https://chat.openai.com (visited on 08/18/2023).
Paszke, Adam et al. (Dec. 3, 2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv: 1912.01703
[cs.LG].
WALA (Sept. 8, 2024). T.J. Watson Libraries for Analysis. original-date: 2012-04-05T18:57:03Z. url:
https://github.com/wala/WALA (visited on 09/10/2024).
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
Introduction Motivation Approach Evaluation Conc.
For Further Reading III
Zhou, Weijie, Yue Zhao, Guoqiang Zhang, and Xipeng Shen (2020). “HARP: Holistic Analysis for Refactoring Python-Based
Analytics Programs”. In: ICSE. doi: 10.1145/3377811.3380434.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
Appendix Static Analysis Refactoring LLMs Notebooks
Appendix
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 1 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Why Static Analysis?
Refactorings must operate on (at least some) static information.
Must eventually transform the source code.
May eventually integrate hybrid analyses to resolve difficult static
cases.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 2 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Why Automated Refactoring?
In general, such problems may also be handled by compilers or
runtimes; however, refactoring has several benefits:
Gives developers more control over where the optimizations take
place and making graph execution explicit.
Can be issued multiple times, e.g., prior to major releases.
Unlike static checkers, they transform source code, a task that can
be otherwise error-prone and involve subtle nuances.
Refactorings can act like recommendation systems, which is
important for analyzing and transforming programs written in
dynamic languages where static assumptions may be easily violated!
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 3 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Refactoring Developer Adoption
Developers generally underuse automated refactorings [Kim et al.,
2012, Negara et al., 2013].
Data scientists and engineers may be more open to using automated
(refactoring) tools.
Our approach will be fully automated with minimal barrier to entry.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 4 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
LLMs & Big Data Refactoring
LLMs [OpenAI, Inc., 2023] can also perform refactorings.
Other Big Data-driven refactorings [Dilhara et al., 2022] are exciting
and promising.
Obtaining a (correct) dataset large enough to automatically extract
the proposed refactorings is challenging as developers struggle with
(manually) migrating DL code to graph execution [Castro Vélez
et al., 2022].
LLM inference capabilities are currently limited.
LLMs have a token limitation.
Hybridization requires interprocedural analysis.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 5 / 6
Appendix Static Analysis Refactoring LLMs Notebooks
Notebook Support
We plan to investigate notebook support in the future.
We envision the approach to be used on (larger) DL systems,
consisting of multiple files.
Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 6 / 6

Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

  • 1.
    Introduction Motivation ApproachEvaluation Conc. Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution Raffi Khatchadourian1,2 Tatiana Castro Vélez2 Mehdi Bagherzadeh3 Nan Jia2 Anita Raja1,2 1 City University of New York (CUNY) Hunter College, USA 2 City University of New York (CUNY) Graduate Center, USA 3 Oakland University, USA International Conference on Automated Software Engineering November 18, 2025, Seoul, South Korea Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 1 / 15
  • 2.
    Introduction Motivation ApproachEvaluation Conc. Deep Learning Systems & Run-time Performance Machine Learning (ML), including Deep Learning (DL), systems are pervasive. As datasets grow, efficiency becomes essential to support responsiveness [Zhou et al., 2020]. For efficiency, DL frameworks have traditionally embraced a deferred execution-style supporting graph-based (DNN) computation. Scalable, but development is . . . Error-prone. Cumbersome. Produces programs that are difficult to debug. Because graph computation executes statements in a non-imperative order, traditional SE tools cannot help troubleshoot bugs [Arpteg et al., 2018]. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 2 / 15
  • 3.
    TensorFlow Deferred Execution-styleCode 1 # Build a graph. 2 a = tf.constant(5.0) 3 b = tf.constant(6.0) 4 c = a * b 5 6 # Launch graph in a session. 7 sess = tf.Session() 8 9 # Evaluate the tensor `c`. 10 print(sess.run(c)) # prints 30.0 Lines 2–4 build a computation graph. Line 4 does not execute until the Session is run on line 10. No native support common imperative program constructs, e.g., iteration.
  • 4.
    TensorFlow Deferred Execution-styleCode 1 # Build a graph. 2 a = tf.constant(5.0) 3 b = tf.constant(6.0) 4 c = a * b 5 6 # Launch graph in a session. 7 sess = tf.Session() 8 9 # Evaluate the tensor `c`. 10 print(sess.run(c)) # prints 30.0 Lines 2–4 build a computation graph. Line 4 does not execute until the Session is run on line 10. No native support common imperative program constructs, e.g., iteration.
  • 5.
    TensorFlow Deferred Execution-styleCode 1 # Build a graph. 2 a = tf.constant(5.0) 3 b = tf.constant(6.0) 4 c = a * b 5 6 # Launch graph in a session. 7 sess = tf.Session() 8 9 # Evaluate the tensor `c`. 10 print(sess.run(c)) # prints 30.0 Lines 2–4 build a computation graph. Line 4 does not execute until the Session is run on line 10. No native support common imperative program constructs, e.g., iteration.
  • 6.
    Introduction Motivation ApproachEvaluation Conc. Imperative DL Programming, Eager Execution, & Hybridization Imperative DL frameworks (e.g., TensorFlow Eager, Keras, PyTorch) encouraging eager execution are more natural, less error-prone, and easier to debug. Sacrifices run-time performance. Thus, hybrid approaches (e.g., Hybridize, TorchScript, AutoGraph) have surfaced that: Execute imperative DL programs as static graphs at run-time. Are integrated into mainstream DL frameworks (e.g., TensorFlow, MXNet, PyTorch). Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 4 / 15
  • 7.
    Eager TensorFlow Imperative(OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x
  • 8.
    Hybridized TensorFlow Imperative(OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 @tf.function(...) # Executes model as graph (optional args). 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x On line 10, AutoGraph used to potentially enhance performance. Decorates model’s call() method with @tf.function. At run-time, call()’s execution will be “traced” (∼9.22 speedup).
  • 9.
    Hybridized TensorFlow Imperative(OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 @tf.function(...) # Executes model as graph (optional args). 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x On line 10, AutoGraph used to potentially enhance performance. Decorates model’s call() method with @tf.function. At run-time, call()’s execution will be “traced” (∼9.22 speedup).
  • 10.
    Hybridized TensorFlow Imperative(OO) DL Model Code 1 class SequentialModel(tf.keras.Model): 2 def __init__(self, **kwargs): 3 super(SequentialModel, self).__init__(...) 4 self.flatten = layers.Flatten(input_shape=(28, 28)) 5 num_layers = 100 # Add many small layers. 6 self.layers = [layers.Dense(64, activation = "relu") for n in range(num_layers)] , → 7 self.dropout = tf.keras.layers.Dropout(0.2) 8 self.dense_2 = tf.keras.layers.Dense(10) 9 10 @tf.function(...) # Executes model as graph (optional args). 11 def __call__(self, x): 12 x = self.flatten(x) 13 for layer in self.layers: 14 x = layer(x) 15 x = self.dropout(x) 16 x = self.dense_2(x) 17 return x On line 10, AutoGraph used to potentially enhance performance. Decorates model’s call() method with @tf.function. At run-time, call()’s execution will be “traced” (∼9.22 speedup).
  • 11.
    Introduction Motivation ApproachEvaluation Conc. Drawbacks Hybridization Drawbacks Needs non-trivial, specialized metadata [Jeong et al., 2019]. Exhibit limitations and known issues with native program constructs. Subtle considerations required to: Specify (decorate) the functions to be migrated. Make code amenable to safe, accurate, and efficient graph execution. Avoid performance bottlenecks and semantically inequivalent results [Cao et al., 2022, Castro Vélez et al., 2022]. Manual analysis and refactoring (semantics-preserving, source-to-source transformation) for optimal results can be error- and omission-prone [Dig et al., 2009]. Further complicated by: Increasing Object-Orientation (OO) in DL model code (e.g.., Keras). Dynamically-typed languages (e.g., Python). Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 7 / 15
  • 12.
    Introduction Motivation ApproachEvaluation Conc. Drawbacks Imperative DL Code With Python Side-effects 1 @tf.function 2 def f(x): 3 print("Input: ", x) 4 f(1) 5 f(1) 6 f(2) Output (expecting 1, 1, 2): Input: 1 Input: 2 Side-effect producing, native Python statements, e.g., printing, list appending, global variable mutation, are problematic for tf.function-decorated functions (i.e., “tf.functions”). Because they are traced, a function’s behavior is “etched” into its corresponding graph. Can have unexpectant results, executing side-effects multiple times or not at all. Side-effects occur when tf.functions are called the first time. Subsequent calls with similar arguments execute the graph instead. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 8 / 15
  • 13.
    Introduction Motivation ApproachEvaluation Conc. Insight Refactorings Approach Problem Insight Although imperative DL code executes sequentially, hybridization resembles parallelizing sequential code. Example To void unexpected behavior, like concurrent programs, hybrid functions should avoid side-effects. Idea Adopt concepts from automated refactorings that parallelize sequential code, e.g., Streaming APIs [Khatchadourian et al., 2019]. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 9 / 15
  • 14.
    Introduction Motivation ApproachEvaluation Conc. Insight Refactorings Approach Refactorings Two new, fully-automated refactorings: Convert Eager Function to Hybrid Transforms otherwise eagerly-executed imperative (Python) DL code for enhanced run-time performance. Automatically specifies (decorates) whether and how code could be reliably and efficiently executed as graphs at run-time. Avoids hybridizing code under certain conditions (e.g., side-effecting code) to preserve semantics. Optimize Hybrid Function Transforms code already running as graphs for optimal run-time performance. Possibly dehybridize code when eager execution could be faster (e.g., graph “retracing”). Issues refactoring “warnings” when hybrid code may have unexpected results but refactoring is not possible to due semantics preservation. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 10 / 15
  • 15.
    Introduction Motivation ApproachEvaluation Conc. Insight Refactorings Approach Refactoring Preconditions Table: Convert Eager Function to Hybrid preconditions. exe tens lit* se rec trans P1 eag T F F F hyb * An option exists in our implementation to not consider Boolean literals. Table: Optimize Hybrid Function preconditions. exe tens lit* se trans P2 graph F N/A F eag P3 graph T T F eag * An option exists in our implementation to not consider Boolean literals. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 11 / 15
  • 16.
    Approach Overview Start InputSource (1) Precondition Check (2) Transformation Stop Figure: High-level flowchart. (1) Identify Functions Candidate Functions (2) Extract Decorators Decors (3) Infer Execution (4) Infer Tensors (5) Type Hints (6) Speculate Analysis (7) Dataflow Analysis (8) Infer Literals (9) Infer Side-effect (10) Identify Recursion Figure: Precondition checking flowchart. Novel whole program static tensor analysis for imperative DL code based on WALA Ariadne. Extensible to other imperative DL frameworks (e.g., PyTorch). Novel Python side-effect analysis based on WALA ModRef analysis. Leverages complementary speculative analysis [Zhou et al., 2020] using contextual DL keywords.
  • 17.
    Implementation Figure: Screenshot ofthe Hybridize Functions refactoring preview wizard.
  • 18.
    Evaluation Summary Analyzed 19open-source Python imperative DL systems. Varying size and domain. Ranging from 0.12 to 36.72 KSLOC. Refactored 42.56% of 766 functions despite conservatism. Run-time Performance Evaluation Summary Measured an average relative model training speedup of 2.16. Memory consumption measurement pending. Differences in model accuracy and loss before and after refactoring were negligible.
  • 19.
    Introduction Motivation ApproachEvaluation Conc. Conclusion Imperative DL code is easier to debug, write, and maintain. Comes at the expense of (run-time) performance. Hybridization bridges the gap between eager and graph execution. Optimal performance and semantics preservation is non-trivial. Our Work Refactoring approach for automatically converting imperative DL code to graphs. Novel tensor analysis for imperative DL. Open-source tool that successfully refactors 42.56% of candidate functions across 19 Python DL programs resulting in an average relative speedup of 2.16. Come see our poster! Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
  • 20.
    Introduction Motivation ApproachEvaluation Conc. For Further Reading I Abadi, Martı́n et al. (2016). “TensorFlow: A System for Large-Scale Machine Learning”. In: Symposium on Operating Systems Design and Implementation. Agrawal, Akshay et al. (2019). TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning. arXiv: 1903.01855 [cs.PL]. Apache (Apr. 8, 2021). Hybridize. Apache MXNet documentation. url: https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/packages/gluon/blocks/hybridize.html (visited on 04/08/2021). Arpteg, A., B. Brinne, L. Crnkovic-Friis, and J. Bosch (2018). “Software Engineering Challenges of Deep Learning”. In: Euromicro Conference on Software Engineering and Advanced Applications. IEEE, pp. 50–59. doi: 10.1109/SEAA.2018.00018. Cao, Junming, Bihuan Chen, Chao Sun, Longjie Hu, Shuaihong Wu, and Xin Peng (2022). “Understanding Performance Problems in Deep Learning Systems”. In: FSE. FSE ’22. ACM, pp. 357–369. doi: 10.1145/3540250.3549123. Castro Vélez, Tatiana, Raffi Khatchadourian, Mehdi Bagherzadeh, and Anita Raja (May 2022). “Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study”. In: MSR. MSR ’22. ACM/IEEE. ACM. doi: 10.1145/3524842.3528455. Chen, Tianqi, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang (2015). “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems”. In: Workshop on Machine Learning Systems at NIPS. arXiv: 1512.01274 [cs.DC]. Chollet, François (2020). Deep Learning with Python. 2nd ed. Manning. Dig, Danny, John Marrero, and Michael D. Ernst (2009). “Refactoring sequential Java code for concurrency via concurrent libraries”. In: ICSE, pp. 397–407. doi: 10.1109/ICSE.2009.5070539. Dilhara, Malinda, Ameya Ketkar, Nikhith Sannidhi, and Danny Dig (2022). “Discovering Repetitive Code Changes in Python ML Systems”. In: ICSE. ICSE ’22. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
  • 21.
    Introduction Motivation ApproachEvaluation Conc. For Further Reading II Dolby, Julian, Avraham Shinnar, Allison Allain, and Jenna Reinen (2018). “Ariadne. Analysis for Machine Learning Programs”. In: MAPL. ACM SIGPLAN. ACM, pp. 1–10. doi: 10.1145/3211346.3211349. Facebook Inc. (2019). PyTorch. TorchScript. en. url: https://pytorch.org/docs/stable/jit.html (visited on 02/19/2021). Jeong, Eunji, Sungwoo Cho, Gyeong-In Yu, Joo Seong Jeong, Dong-Jin Shin, Taebum Kim, and Byung-Gon Chun (July 2019). “Speculative Symbolic Graph Execution of Imperative Deep Learning Programs”. In: SIGOPS Oper. Syst. Rev. 53.1, pp. 26–33. issn: 0163-5980. doi: 10.1145/3352020.3352025. Khatchadourian, Raffi, Yiming Tang, Mehdi Bagherzadeh, and Syed Ahmed (2019). “Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams”. In: ICSE. ICSE ’19. IEEE Press, pp. 619–630. doi: 10.1109/ICSE.2019.00072. Kim, Miryung, Thomas Zimmermann, and Nachiappan Nagappan (Nov. 2012). “A Field Study of Refactoring Challenges and Benefits”. In: FSE. ACM. doi: 10.1145/2393596.2393655. Moldovan, Dan, James M. Decker, Fei Wang, Andrew A. Johnson, Brian K. Lee, Zachary Nado, D. Sculley, Tiark Rompf, and Alexander B. Wiltschko (2019). AutoGraph: Imperative-style Coding with Graph-based Performance. arXiv: 1810.08061 [cs.PL]. Negara, Stas, Nicholas Chen, Mohsen Vakilian, Ralph E. Johnson, and Danny Dig (2013). “A Comparative Study of Manual and Automated Refactorings”. In: ECOOP. Ed. by Giuseppe Castagna. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 552–576. isbn: 978-3-642-39038-8. OpenAI, Inc. (Aug. 18, 2023). ChatGPT. url: https://chat.openai.com (visited on 08/18/2023). Paszke, Adam et al. (Dec. 3, 2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv: 1912.01703 [cs.LG]. WALA (Sept. 8, 2024). T.J. Watson Libraries for Analysis. original-date: 2012-04-05T18:57:03Z. url: https://github.com/wala/WALA (visited on 09/10/2024). Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
  • 22.
    Introduction Motivation ApproachEvaluation Conc. For Further Reading III Zhou, Weijie, Yue Zhao, Guoqiang Zhang, and Xipeng Shen (2020). “HARP: Holistic Analysis for Refactoring Python-Based Analytics Programs”. In: ICSE. doi: 10.1145/3377811.3380434. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 15 / 15
  • 23.
    Appendix Static AnalysisRefactoring LLMs Notebooks Appendix Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 1 / 6
  • 24.
    Appendix Static AnalysisRefactoring LLMs Notebooks Why Static Analysis? Refactorings must operate on (at least some) static information. Must eventually transform the source code. May eventually integrate hybrid analyses to resolve difficult static cases. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 2 / 6
  • 25.
    Appendix Static AnalysisRefactoring LLMs Notebooks Why Automated Refactoring? In general, such problems may also be handled by compilers or runtimes; however, refactoring has several benefits: Gives developers more control over where the optimizations take place and making graph execution explicit. Can be issued multiple times, e.g., prior to major releases. Unlike static checkers, they transform source code, a task that can be otherwise error-prone and involve subtle nuances. Refactorings can act like recommendation systems, which is important for analyzing and transforming programs written in dynamic languages where static assumptions may be easily violated! Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 3 / 6
  • 26.
    Appendix Static AnalysisRefactoring LLMs Notebooks Refactoring Developer Adoption Developers generally underuse automated refactorings [Kim et al., 2012, Negara et al., 2013]. Data scientists and engineers may be more open to using automated (refactoring) tools. Our approach will be fully automated with minimal barrier to entry. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 4 / 6
  • 27.
    Appendix Static AnalysisRefactoring LLMs Notebooks LLMs & Big Data Refactoring LLMs [OpenAI, Inc., 2023] can also perform refactorings. Other Big Data-driven refactorings [Dilhara et al., 2022] are exciting and promising. Obtaining a (correct) dataset large enough to automatically extract the proposed refactorings is challenging as developers struggle with (manually) migrating DL code to graph execution [Castro Vélez et al., 2022]. LLM inference capabilities are currently limited. LLMs have a token limitation. Hybridization requires interprocedural analysis. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 5 / 6
  • 28.
    Appendix Static AnalysisRefactoring LLMs Notebooks Notebook Support We plan to investigate notebook support in the future. We envision the approach to be used on (larger) DL systems, consisting of multiple files. Khatchadourian, Castro Vélez, Bagherzadeh, Jia, Raja Speculative Automated Refactoring of Imperative DL to Graphs 6 / 6