Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
An overview
• Introduction
• Background
• Methodology
• Experimental results
• Conclusion
2/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Compiler optimization
• Compiler optimization is the technique of minimizing or maximizing some
features of an executable code by tuning the output of a compiler.
• Modern compilers support many different optimization phases and these phases
should analyze the code and should produce semantically equivalent
performance enhanced code.
• The three vital parameters defining enhancement of the performance are:
Executiontime
Sizeofcode
Introduction
4/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
• The compiler optimization phase ordering not only possesses challenges to
compiler developer but also for multithreaded programmer to enhance the
performance of Multicore systems.
• Many compilers have numerous optimization techniques which are applied in
predetermined ordering.
• These ordering of optimization techniques may not always give an optimal code.
CODE
Search Space
Introduction
The phase ordering
4/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization flags
• The variation in optimization phase ordering depends on the application that is
compiled, the architecture of the machine on which it runs and the compiler
implementation.
• Many compilers allow optimization flags to be set by the users.
• Turning on optimization flags makes the compiler attempt to improve the
performance and code size at the expense of compilation time.
Introduction
5/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
GNU compiler collection
• The GNU Compiler Collection (GCC) includes front ends for C, C++, Objective C,
Fortran, Java, Ada, and Go, as well as libraries for these languages
• In order to control compilation-time and compiler memory usage, and
the trade-offs between speed and space for the resulting executable, GCC
provides a range of general optimization levels, numbered from 0–3, as
well as individual options for specific types of optimization.
O3
O2
O1
Background
6/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels
• The impact of the different optimization levels on the input code is as described
below:
-O0 or no-O
(default)
Optimization
Easy bug
elimination
Background
7/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels
1. Less compile time.
2. smaller and faster
executable code.
A lot of simple
optimizations
eliminates
redundancy
-O1 or -O
Background
8/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels
• Only optimizations that do not require any speed-space tradeoffs
are used, so the executable should not increase in size.
-O2
1. maximum
optimization without
increasing the
executable size
O1+ additional
optimizations
instruction
scheduling
1. More compile time.
2. More memory usage.the best choice
for deployment of a program
Background
8/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization levels
-O3
1. faster executable
code
2. Maximum Loop
optimization
O1+ O2+
more expensive
optimizations
function
inlining
1. Bulky code
Background
8/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Optimization level
Background
9/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
The challenge
sequential quick sort Parallel quick sort
Which optimization level??
overhead of inter-process communication
Background
10/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Genetic algorithm
Initial population
Selection
Crossover & mutation
Intermediate population
(mating pool)
Replacement
Next population
Background
11/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
PGA for Compiler Optimization
• The work in this research uses GCC 4.8 compiler on Ubuntu 12.04 with OpenMP
3.0 library.
Methodology
12/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
The master-slave model
• In the master-slave model the
master runs the evolutionary
algorithm, controls the slaves
and distributes the work.
• The Slaves take batches of
individuals from the master
and evaluate them. Finally
send the calculated fitness
value back to master.
Methodology
13/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Encoding
1101011101
Methodology
14/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Fitness function
• In the proposed system the PGA works with a population of six Chromosomes on
eight core machine and fitness function is computed at the Master core.
Fitness=|(exe_with_flagi-exe_without_flagi)|
i∈ {1, 2, … . , 12}
Master Node
Generate
random
population
evaluates all
individuals
Slave nodes
algorithm
After 200 generations
Methodology
15/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Algorithm for Slave Nodes
Receives all the chromosomes from the master node with the fitness values.
The slave cores apply the roulette wheel, Stochastic Universal Sampling and Elitism
methods respectively for selection process in parallel
Create next generation applying two point crossover.
Applies mutations using method, two position interchange and produce two new
offspring/chromosomes.
Sends both the chromosomes back to the master-node. (The master collects chromosomes
from all slaves.)
Step 1
Step 2
Step 3
Step 4
Step 5
Methodology
16/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Selection
Methodology
17/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Crossover and mutation
Two point
crossover
Swap mutation
Methodology
18/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Benchmarks
• All the Bench mark programs are parallelized using OpenMP library to reap the
benefits of PGA.
Experimental results
19/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysis
• As one sees the Figures, the results after applying PGA (WGAO) presents a major
improvement with respect to the random optimization (WRO) and compiling
code without applying optimization (WOO).
Experimental results
20/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysis
Experimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysis
Experimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysis
Experimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Performance analysis
Experimental results
21/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Conclusion
• In Compiler Optimization research, the phase ordering is an important
performance enhancement problem.
• This study indicates that by increasing the number of cores the performance of
the benchmark program increases along with the usage of PGA.
• The major concern in the experiment is the master core waiting time collect
values from slaves which is primarily due to the usage of Synchronized
communication between the Master-Slave cores in the system.
• Further it may be explicitly noted that apart from PRIMS algorithm for core-8
system all other Bench marks exhibit better average performance.
22/22
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
Thanks for
your attention.
Introduction Background Methodology Experimental results
Optimizing Code using Parallel Genetic Algorithm93/3/5
References
[1] Satish Kumar T., Sakthivel S. and Sushil Kumar S., “Optimizing Code by Selecting Compiler Flags
using Parallel Genetic Algorithm on Multicore CPUs,” International Journal of Engineering and
Technology, Vol. 32, No. 5, 2014.
[2] Prathibha B., Sarojadevi H., Harsha P., “Compiler Optimization: A Genetic Algorithm Approach,”
International Journal of Computer Applications, Vol. 112, No.10, 2015.

optimizing code in compilers using parallel genetic algorithm

  • 2.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 An overview • Introduction • Background • Methodology • Experimental results • Conclusion 2/22
  • 3.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Compiler optimization • Compiler optimization is the technique of minimizing or maximizing some features of an executable code by tuning the output of a compiler. • Modern compilers support many different optimization phases and these phases should analyze the code and should produce semantically equivalent performance enhanced code. • The three vital parameters defining enhancement of the performance are: Executiontime Sizeofcode Introduction 4/22
  • 4.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 • The compiler optimization phase ordering not only possesses challenges to compiler developer but also for multithreaded programmer to enhance the performance of Multicore systems. • Many compilers have numerous optimization techniques which are applied in predetermined ordering. • These ordering of optimization techniques may not always give an optimal code. CODE Search Space Introduction The phase ordering 4/22
  • 5.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Optimization flags • The variation in optimization phase ordering depends on the application that is compiled, the architecture of the machine on which it runs and the compiler implementation. • Many compilers allow optimization flags to be set by the users. • Turning on optimization flags makes the compiler attempt to improve the performance and code size at the expense of compilation time. Introduction 5/22
  • 6.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 GNU compiler collection • The GNU Compiler Collection (GCC) includes front ends for C, C++, Objective C, Fortran, Java, Ada, and Go, as well as libraries for these languages • In order to control compilation-time and compiler memory usage, and the trade-offs between speed and space for the resulting executable, GCC provides a range of general optimization levels, numbered from 0–3, as well as individual options for specific types of optimization. O3 O2 O1 Background 6/22
  • 7.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Optimization levels • The impact of the different optimization levels on the input code is as described below: -O0 or no-O (default) Optimization Easy bug elimination Background 7/22
  • 8.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Optimization levels 1. Less compile time. 2. smaller and faster executable code. A lot of simple optimizations eliminates redundancy -O1 or -O Background 8/22
  • 9.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Optimization levels • Only optimizations that do not require any speed-space tradeoffs are used, so the executable should not increase in size. -O2 1. maximum optimization without increasing the executable size O1+ additional optimizations instruction scheduling 1. More compile time. 2. More memory usage.the best choice for deployment of a program Background 8/22
  • 10.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Optimization levels -O3 1. faster executable code 2. Maximum Loop optimization O1+ O2+ more expensive optimizations function inlining 1. Bulky code Background 8/22
  • 11.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Optimization level Background 9/22
  • 12.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 The challenge sequential quick sort Parallel quick sort Which optimization level?? overhead of inter-process communication Background 10/22
  • 13.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Genetic algorithm Initial population Selection Crossover & mutation Intermediate population (mating pool) Replacement Next population Background 11/22
  • 14.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 PGA for Compiler Optimization • The work in this research uses GCC 4.8 compiler on Ubuntu 12.04 with OpenMP 3.0 library. Methodology 12/22
  • 15.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 The master-slave model • In the master-slave model the master runs the evolutionary algorithm, controls the slaves and distributes the work. • The Slaves take batches of individuals from the master and evaluate them. Finally send the calculated fitness value back to master. Methodology 13/22
  • 16.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Encoding 1101011101 Methodology 14/22
  • 17.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Fitness function • In the proposed system the PGA works with a population of six Chromosomes on eight core machine and fitness function is computed at the Master core. Fitness=|(exe_with_flagi-exe_without_flagi)| i∈ {1, 2, … . , 12} Master Node Generate random population evaluates all individuals Slave nodes algorithm After 200 generations Methodology 15/22
  • 18.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Algorithm for Slave Nodes Receives all the chromosomes from the master node with the fitness values. The slave cores apply the roulette wheel, Stochastic Universal Sampling and Elitism methods respectively for selection process in parallel Create next generation applying two point crossover. Applies mutations using method, two position interchange and produce two new offspring/chromosomes. Sends both the chromosomes back to the master-node. (The master collects chromosomes from all slaves.) Step 1 Step 2 Step 3 Step 4 Step 5 Methodology 16/22
  • 19.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Selection Methodology 17/22
  • 20.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Crossover and mutation Two point crossover Swap mutation Methodology 18/22
  • 21.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Benchmarks • All the Bench mark programs are parallelized using OpenMP library to reap the benefits of PGA. Experimental results 19/22
  • 22.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Performance analysis • As one sees the Figures, the results after applying PGA (WGAO) presents a major improvement with respect to the random optimization (WRO) and compiling code without applying optimization (WOO). Experimental results 20/22
  • 23.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Performance analysis Experimental results 21/22
  • 24.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Performance analysis Experimental results 21/22
  • 25.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Performance analysis Experimental results 21/22
  • 26.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Performance analysis Experimental results 21/22
  • 27.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Conclusion • In Compiler Optimization research, the phase ordering is an important performance enhancement problem. • This study indicates that by increasing the number of cores the performance of the benchmark program increases along with the usage of PGA. • The major concern in the experiment is the master core waiting time collect values from slaves which is primarily due to the usage of Synchronized communication between the Master-Slave cores in the system. • Further it may be explicitly noted that apart from PRIMS algorithm for core-8 system all other Bench marks exhibit better average performance. 22/22
  • 28.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 Thanks for your attention.
  • 29.
    Introduction Background MethodologyExperimental results Optimizing Code using Parallel Genetic Algorithm93/3/5 References [1] Satish Kumar T., Sakthivel S. and Sushil Kumar S., “Optimizing Code by Selecting Compiler Flags using Parallel Genetic Algorithm on Multicore CPUs,” International Journal of Engineering and Technology, Vol. 32, No. 5, 2014. [2] Prathibha B., Sarojadevi H., Harsha P., “Compiler Optimization: A Genetic Algorithm Approach,” International Journal of Computer Applications, Vol. 112, No.10, 2015.

Editor's Notes

  • #5 search space for attempting optimization phase sequences is