Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams
Raffi Khatchadourian1
Yiming Tang1
Mehdi Bagherzadeh2
Syed Ahmed2
1
City University of New York (CUNY) {Hunter College, Graduate Center}, USA 2
Oakland University, USA
Introduction
The Java 8 Stream API sets forth a promising new
programming model that incorporates
functional-like, MapReduce-style features into a
mainstream programming language.
Problem
Developers must manually determine whether
running streams in parallel is efficient yet
interference-free.
Using streams correctly and efficiently requires
many subtle considerations that may not be
immediately evident.
Manual analysis and refactoring can be error-
and omission-prone.
Automated Tool
Our Eclipse Plug-in, based on a novel ordering and
augmented typestate analysis, automatically
identifies and executes refactoring opportunities
where improvements may be made to Java 8
Stream code. The parallelization is “intelligent” as
it carefully considers each context and may result
in de-parallelization.
Refactoring Preconditions
Table: Convert Sequential Stream to Parallel
preconditions. exe is execution mode, seq is sequential, ord is ordering,
SIO is stateful intermediate operation, ROM is reduction ordering
matters.
exe ord se SIO ROM transformation
P1 seq unord F N/A N/A Convert to parallel.
P2 seq ord F F N/A Convert to parallel.
P3 seq ord F T F Unorder and convert
to parallel.
Table: Optimize Parallel Stream preconditions. exe is execution
mode, para is parallel, ord is ordering, SIO is stateful intermediate
operation, ROM is reduction ordering matters.
exe ord SIO ROM transformation
P4 para ord T F Unorder.
P5 para ord T T Convert to sequential.
Contributions
We devise an automated refactoring approach that assists developers in writing optimal stream
code. The approach determines when it is safe and advantageous to convert streams to parallel
and optimize parallel streams. A case study is performed on the applicability of the approach.
Refactorings
1 Convert Sequential Stream to Parallel. Determines if it is possibly
advantageous and safe to convert a sequential stream to parallel.
2 Optimize Parallel Stream. Decides which transformations may improve the
performance of a parallel stream, including unordering and converting to sequential.
Code Snippet of Widget Collection Processing Using the Java 8 Steam API
1 Collection<Widget> unorderedWidgets =
2 new HashSet<>();
3 List<Widget> sortedWidgets =
4 unorderedWidgets
5 .stream()
6 .sorted(Comparator.comparing(
7 Widget::getWeight))
8 .collect(Collectors.toList());
9 Collection<Widget> orderedWidgets =
10 new ArrayList<>();
11 Set<Double> distinctWeightSet =
12 orderedWidgets
13 .stream().parallel()
14 .map(Widget::getWeight).distinct()
15 .collect(Collectors.toCollection(
16 TreeSet::new));
(a) Stream code snippet prior to refactoring.
1 Collection<Widget> unorderedWidgets =
2 new HashSet<>();
3 List<Widget> sortedWidgets =
4 unorderedWidgets
5 .stream()parallelStream()
6 .sorted(Comparator.comparing(
7 Widget::getWeight))
8 .collect(Collectors.toList());
9 Collection<Widget> orderedWidgets =
10 new ArrayList<>();
11 Set<Double> distinctWeightSet =
12 orderedWidgets
13 .stream().parallel()
14 .map(Widget::getWeight).distinct()
15 .collect(Collectors.toCollection(
16 TreeSet::new));
(b) Improved stream client code via refactoring.
Typestate Analysis
We uses typestate analysis to determine stream attributes when a terminal operation is issued.
A typestate variant is being developed since operations like sorted() return (possibly) new
streams derived from the receiver with their attributes altered. Labeled transition systems
(LTSs) are used for execution mode and ordering.
Figure: LTS for execution mode.
Figure: LTS for ordering.
Experimental Results
Table: Experimental results.
subject KLOC eps k str rft P1 P2 P3 t (m)
htm.java 41.14 21 4 34 10 0 10 0 1.85
JacpFX 23.79 195 4 4 3 3 0 0 2.31
jdp*
19.96 25 4 28 15 1 13 1 31.88
jdk8-exp*
3.43 134 4 26 4 0 4 0 0.78
jetty 354.48 106 4 21 7 3 4 0 17.85
jOOQ 154.01 43 4 5 1 0 1 0 12.94
koral 7.13 51 3 6 6 0 6 0 1.06
monads 1.01 47 2 1 1 0 1 0 0.05
retroλ 5.14 1 4 8 6 3 3 0 0.66
streamql 4.01 92 2 22 2 0 2 0 0.72
threeten 27.53 36 2 2 2 0 2 0 0.51
Total 641.65 751 4 157 57 10 46 1 70.60
*
jdp is java-design-patterns and jdk8-exp is jdk8-
experiments.
Table: Refactoring failures.
failure pc cnt
F1. InconsistentPossibleExecutionModes 1
F2. NoStatefulIntermediateOperations P5 1
F3. NonDeterminableReductionOrdering 5
F4. NoTerminalOperations 13
F5. CurrentlyNotHandled 16
F6. ReduceOrderingMatters P3 19
F7. HasSideEffects
P1 4
P2 41
Total 100
Table: Average run times of JMH benchmarks.
# benchmark orig (s/op) refact (s/op) su
1 shouldRetrieveChildren 0.011 (0.001) 0.002 (0.000) 6.57
2 shouldConstructCar 0.011 (0.001) 0.001 (0.000) 8.22
3 addingShouldResultInFailure 0.014 (0.000) 0.004 (0.000) 3.78
4 deletionShouldBeSuccess 0.013 (0.000) 0.003 (0.000) 3.82
5 addingShouldResultInSuccess 0.027 (0.000) 0.005 (0.000) 5.08
6 deletionShouldBeFailure 0.014 (0.000) 0.004 (0.000) 3.90
7 specification.AppTest.test 12.666 (5.961) 12.258 (1.880) 1.03
8 CoffeeMakingTaskTest.testId 0.681 (0.065) 0.469 (0.009) 1.45
9 PotatoPeelingTaskTest.testId 0.676 (0.062) 0.465 (0.008) 1.45
10 SpatialPoolerLocalInhibition 1.580 (0.168) 1.396 (0.029) 1.13
11 TemporalMemory 0.013 (0.001) 0.006 (0.000) 1.97
Conclusion
We present an automated refactoring approach that “intelligently”
optimizes Java 8 stream code. 11 Java projects totaling ∼642 thousands
of lines of code were used in the tool’s assessment. An average speedup
of 3.49 on the refactored code was observed as part of a experimental
study. The tool is publically available at http://git.io/vpTLk.
Future Work
Handle more advanced ways of relating ASTs to SSA-based IR.
Incorporate additional reductions like those involving maps.
Applicability of the tool to other streaming APIs and languages.
Refactoring side-effect producing code.
Finding other kinds of bugs and misuses of Streaming APIs.
International Conference on Software Engineering, May 25–May 31, 2019, Montr´eal, Canada

Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams

  • 1.
    Safe Automated Refactoringfor Intelligent Parallelization of Java 8 Streams Raffi Khatchadourian1 Yiming Tang1 Mehdi Bagherzadeh2 Syed Ahmed2 1 City University of New York (CUNY) {Hunter College, Graduate Center}, USA 2 Oakland University, USA Introduction The Java 8 Stream API sets forth a promising new programming model that incorporates functional-like, MapReduce-style features into a mainstream programming language. Problem Developers must manually determine whether running streams in parallel is efficient yet interference-free. Using streams correctly and efficiently requires many subtle considerations that may not be immediately evident. Manual analysis and refactoring can be error- and omission-prone. Automated Tool Our Eclipse Plug-in, based on a novel ordering and augmented typestate analysis, automatically identifies and executes refactoring opportunities where improvements may be made to Java 8 Stream code. The parallelization is “intelligent” as it carefully considers each context and may result in de-parallelization. Refactoring Preconditions Table: Convert Sequential Stream to Parallel preconditions. exe is execution mode, seq is sequential, ord is ordering, SIO is stateful intermediate operation, ROM is reduction ordering matters. exe ord se SIO ROM transformation P1 seq unord F N/A N/A Convert to parallel. P2 seq ord F F N/A Convert to parallel. P3 seq ord F T F Unorder and convert to parallel. Table: Optimize Parallel Stream preconditions. exe is execution mode, para is parallel, ord is ordering, SIO is stateful intermediate operation, ROM is reduction ordering matters. exe ord SIO ROM transformation P4 para ord T F Unorder. P5 para ord T T Convert to sequential. Contributions We devise an automated refactoring approach that assists developers in writing optimal stream code. The approach determines when it is safe and advantageous to convert streams to parallel and optimize parallel streams. A case study is performed on the applicability of the approach. Refactorings 1 Convert Sequential Stream to Parallel. Determines if it is possibly advantageous and safe to convert a sequential stream to parallel. 2 Optimize Parallel Stream. Decides which transformations may improve the performance of a parallel stream, including unordering and converting to sequential. Code Snippet of Widget Collection Processing Using the Java 8 Steam API 1 Collection<Widget> unorderedWidgets = 2 new HashSet<>(); 3 List<Widget> sortedWidgets = 4 unorderedWidgets 5 .stream() 6 .sorted(Comparator.comparing( 7 Widget::getWeight)) 8 .collect(Collectors.toList()); 9 Collection<Widget> orderedWidgets = 10 new ArrayList<>(); 11 Set<Double> distinctWeightSet = 12 orderedWidgets 13 .stream().parallel() 14 .map(Widget::getWeight).distinct() 15 .collect(Collectors.toCollection( 16 TreeSet::new)); (a) Stream code snippet prior to refactoring. 1 Collection<Widget> unorderedWidgets = 2 new HashSet<>(); 3 List<Widget> sortedWidgets = 4 unorderedWidgets 5 .stream()parallelStream() 6 .sorted(Comparator.comparing( 7 Widget::getWeight)) 8 .collect(Collectors.toList()); 9 Collection<Widget> orderedWidgets = 10 new ArrayList<>(); 11 Set<Double> distinctWeightSet = 12 orderedWidgets 13 .stream().parallel() 14 .map(Widget::getWeight).distinct() 15 .collect(Collectors.toCollection( 16 TreeSet::new)); (b) Improved stream client code via refactoring. Typestate Analysis We uses typestate analysis to determine stream attributes when a terminal operation is issued. A typestate variant is being developed since operations like sorted() return (possibly) new streams derived from the receiver with their attributes altered. Labeled transition systems (LTSs) are used for execution mode and ordering. Figure: LTS for execution mode. Figure: LTS for ordering. Experimental Results Table: Experimental results. subject KLOC eps k str rft P1 P2 P3 t (m) htm.java 41.14 21 4 34 10 0 10 0 1.85 JacpFX 23.79 195 4 4 3 3 0 0 2.31 jdp* 19.96 25 4 28 15 1 13 1 31.88 jdk8-exp* 3.43 134 4 26 4 0 4 0 0.78 jetty 354.48 106 4 21 7 3 4 0 17.85 jOOQ 154.01 43 4 5 1 0 1 0 12.94 koral 7.13 51 3 6 6 0 6 0 1.06 monads 1.01 47 2 1 1 0 1 0 0.05 retroλ 5.14 1 4 8 6 3 3 0 0.66 streamql 4.01 92 2 22 2 0 2 0 0.72 threeten 27.53 36 2 2 2 0 2 0 0.51 Total 641.65 751 4 157 57 10 46 1 70.60 * jdp is java-design-patterns and jdk8-exp is jdk8- experiments. Table: Refactoring failures. failure pc cnt F1. InconsistentPossibleExecutionModes 1 F2. NoStatefulIntermediateOperations P5 1 F3. NonDeterminableReductionOrdering 5 F4. NoTerminalOperations 13 F5. CurrentlyNotHandled 16 F6. ReduceOrderingMatters P3 19 F7. HasSideEffects P1 4 P2 41 Total 100 Table: Average run times of JMH benchmarks. # benchmark orig (s/op) refact (s/op) su 1 shouldRetrieveChildren 0.011 (0.001) 0.002 (0.000) 6.57 2 shouldConstructCar 0.011 (0.001) 0.001 (0.000) 8.22 3 addingShouldResultInFailure 0.014 (0.000) 0.004 (0.000) 3.78 4 deletionShouldBeSuccess 0.013 (0.000) 0.003 (0.000) 3.82 5 addingShouldResultInSuccess 0.027 (0.000) 0.005 (0.000) 5.08 6 deletionShouldBeFailure 0.014 (0.000) 0.004 (0.000) 3.90 7 specification.AppTest.test 12.666 (5.961) 12.258 (1.880) 1.03 8 CoffeeMakingTaskTest.testId 0.681 (0.065) 0.469 (0.009) 1.45 9 PotatoPeelingTaskTest.testId 0.676 (0.062) 0.465 (0.008) 1.45 10 SpatialPoolerLocalInhibition 1.580 (0.168) 1.396 (0.029) 1.13 11 TemporalMemory 0.013 (0.001) 0.006 (0.000) 1.97 Conclusion We present an automated refactoring approach that “intelligently” optimizes Java 8 stream code. 11 Java projects totaling ∼642 thousands of lines of code were used in the tool’s assessment. An average speedup of 3.49 on the refactored code was observed as part of a experimental study. The tool is publically available at http://git.io/vpTLk. Future Work Handle more advanced ways of relating ASTs to SSA-based IR. Incorporate additional reductions like those involving maps. Applicability of the tool to other streaming APIs and languages. Refactoring side-effect producing code. Finding other kinds of bugs and misuses of Streaming APIs. International Conference on Software Engineering, May 25–May 31, 2019, Montr´eal, Canada