Asymptotes and Algorithms

        By Gary Short
      Gibraltar Software



                            1
Agenda
•   Introduction
•   Performance, does it matter?
•   How do we measure performance?
•   Analysis of Insertion Sort
•   Simplifying things with asymptotic notation
•   Designing algorithms
•   Solving recurrences
•   Questions.

                                                  2
Introduction
•   Gary Short
•   Head of Gibraltar Labs
•   C# MVP
•   gary.short@gibraltarsoftware.com
•   @garyshort
•   http://www.facebook.com/theothergaryshort



                                                3
Performance – Does it Matter?
Performance is the most important thing in
  software engineering today...




                                             4
... Apart from everything else!




                                  5
So Why Bother About Performance?




                                   6
How do we Measure Performance?
• What do we care about?
  – Memory?
  – Bandwidth?
  – Computational time?




                             7
We Need a Model to Work With
• RAM Model
  – Arithmetic – add, subtract, etc
  – Data movement – load, copy, store
  – Control – branching, subroutine call, return
  – Data – Integers, floats
• Instruction are run in series
  – And take constant time
     • Not really, but shhh! –Ed. 


                                                   8
Analysis of Insertion Sort
InsertionSort(A)
  for j = 2 to A.length
      key=[Aj]
      i=j-1
      while i > 0 and A[i] > key
             A[i+1] = A[i]
             i=i-1
      A[i+1] = key

                                    9
That Makes no Sense, Show me!




                                10
So What’s The Running Time?




                              11
Sum Running Time for Each Statement...

T(n) = c1n+c2(n-1)+c3(n-1)+c4 sum(tj) j=2..n+c5
  sum(tj-1) j=2..n+c6sum(tj-1) j=2..n+c7(n-1)




                                                  12
Best Case Running Time
If the input (A) is already sorted then...
A[i] <= key when has initial value of j-1 thus tj=1.
And so...
T(n) = c1n+c2(n-1)+c3(n-1)+c4(n-1)+c7(n-1)
= (c1+c2+c3+c4+c7)n-(c2+c3+c4+c7)
Which can be expressed as an+b for constants a
   and b that depend on ci
So T(n) is a linear function of n

                                                   13
14
Side Note: No One Cares About Best Case




                                          15
Worst Case Scenario
If the input (n) is in reverse sort order then...
We have to compare each A[j] with each
   element in the sub array A[1..j-1].
And so...
T(n) = (c4/2+c5/2+c6/2)n^2 +(c1 +c2+c3+c4/2-
   c5/2-c6/2+c7)n-(c2+c3+c4+c7)
Which can be expressed as an^2 + bn + c
So T(n) is a quadratic function of n
                                                    16
17
In Short...



In worst case insertion sort sucks! 




                                        18
Man That Was a Lot of Maths!




                               19
Simplifying Things With Asymptotic Notation

• Asymptotic notation characterises functions
  by their growth rates
• Functions with the same growth rates have
  the same Asymptotic notation




                                                20
How Does That Help Us?
Let’s say we have a function with running time
T(n) = 4n^2 - 2n + 2
If n = 500 then
4n^2 is 1000 times bigger than 2n
So...
We can ignore smaller order terms and
   coefficients
T(n) = 4n^2 -2n +2 can be written O(n) = n^2

                                                 21
A Short Note on The Abuse of “=“
If T(n) = 4n^2 -2n +2
Then saying T(n) = O(n^2) is not strictly correct
Rather T(n) is in the set O(n^2) and the above
   should be read as T(n) is O(n^2) and not T(n)
   equals O(n^2)
But really on Maths geeks care – Ed. 



                                                    22
So Back to Insertion Sort
So now we can say of Insertion Sort that...
Best case it’s O(n)
And worst case it’s O(n^2)
And since we only care about worst case...
We say that Insertion Sort has O(n^2)
Which sucks! – Ed. 



                                              23
Designing Algorithms



  So can we do better?




                         24
Optimizing Algorithms is Child’s Play
• Sit at table
• Foreach item in itemsOnPlate
  – Eat item
• Wait(MealComplete)
• Foreach dish in dishesUsed
  – WashDish
  – DryDish
• Resume Play
                                         25
Child Will Optimize To…
•   Pause Game
•   Set Speed = MaxInt
•   Run to table
•   Take sliceBread(1)
•   Foreach item on Plate
    – Place item on bread
• Take sliceBread(2)
• Run Outside
• Resume Game
                                    26
Divide And Conquer
• Divide
  – Divide the problem into sub problems
• Conquer
  – Solve the sub problems recursively
• Combine
  – Add the solutions to the sub problems into the
    solution for the original problem.


                                                     27
Merge Sort
• Divide
  – Divide the n elements into two n/2 element arrays
• Conquer
  – Sort the two arrays recursively
• Combine
  – Merge the two sorted arrays to produce the
    answer.


                                                    28
Analysis of Merge Sort
MergeSort(A,p,r)
  if(p<r)
       q = [(p+r)/2]
       MergeSort(A,p,q)
       MergeSort(A,q+1,r)
       Merge(A,p,q,r)
Initial call MergeSort(A,1,A.length)

                                       29
Dancers, or it Never Happened!!




                                  30
So What’s The Running Time?
In the general case...
If the divide step yields ‘a’ sub problems
Each 1/b the size of the original
It takes T(n/b) time to solve one problem of n/b size
So it takes aT(n/b) to solve ‘a’ of them
Then, if it takes D(n) time to divide the problem
And C(n) time to combine the results
Then we get the recurrence...
T(n) = aT(n/b) + D(n) + C(n).

                                                        31
Apply That to Merge Sort...
• Divide
  – Computes the middle of the subarray, taking
    constant time so, D(n) = O(1)
• Conquer
  – Recursively solve two sub problems each of size
    n/2 contributing 2T(n/2) to the running time
• Combine
  – Merge procedure O(n)
• Giving us a recurrence of 2T(n/2)+O(n)

                                                      32
Solve The Recurrence Using The Master Method

For a Recurrence in the form
T(n) = aT(n/b) + f(n)
Then
If f(n) = O(nlogba-k) then T(n) = O(nlogba)
If f(n) = O(nlogba) then T(n) = O(nlogba log n)
if f(n) = Omega(n log b a+k) and if af(n/b) <=
   cf(n) then T(n) = O(f(n))

                                                  33
What?!
• More simply we are comparing f(n) with the
  function n log ba and intuitively
  understanding that the bigger of the two
  determines the solution to the recurrence.




                                               34
And So...
• With Merge Sort we are in the third case of
  the Master Method thus...
• T(n) = O(n log n)
• Which is much better than the O(n^2) of
  Insertion Sort




                                                35
36
What We Learned
•   Performance is important
•   Therefore algorithmic optimization is too
•   We have a model to benchmark
•   And a syntax
•   Divide and conquer
•   Master Method
•   Other resources.

                                                37
38
Questions?




             39

Algorithms - Rocksolid Tour 2013

  • 1.
    Asymptotes and Algorithms By Gary Short Gibraltar Software 1
  • 2.
    Agenda • Introduction • Performance, does it matter? • How do we measure performance? • Analysis of Insertion Sort • Simplifying things with asymptotic notation • Designing algorithms • Solving recurrences • Questions. 2
  • 3.
    Introduction • Gary Short • Head of Gibraltar Labs • C# MVP • gary.short@gibraltarsoftware.com • @garyshort • http://www.facebook.com/theothergaryshort 3
  • 4.
    Performance – Doesit Matter? Performance is the most important thing in software engineering today... 4
  • 5.
    ... Apart fromeverything else! 5
  • 6.
    So Why BotherAbout Performance? 6
  • 7.
    How do weMeasure Performance? • What do we care about? – Memory? – Bandwidth? – Computational time? 7
  • 8.
    We Need aModel to Work With • RAM Model – Arithmetic – add, subtract, etc – Data movement – load, copy, store – Control – branching, subroutine call, return – Data – Integers, floats • Instruction are run in series – And take constant time • Not really, but shhh! –Ed.  8
  • 9.
    Analysis of InsertionSort InsertionSort(A) for j = 2 to A.length key=[Aj] i=j-1 while i > 0 and A[i] > key A[i+1] = A[i] i=i-1 A[i+1] = key 9
  • 10.
    That Makes noSense, Show me! 10
  • 11.
    So What’s TheRunning Time? 11
  • 12.
    Sum Running Timefor Each Statement... T(n) = c1n+c2(n-1)+c3(n-1)+c4 sum(tj) j=2..n+c5 sum(tj-1) j=2..n+c6sum(tj-1) j=2..n+c7(n-1) 12
  • 13.
    Best Case RunningTime If the input (A) is already sorted then... A[i] <= key when has initial value of j-1 thus tj=1. And so... T(n) = c1n+c2(n-1)+c3(n-1)+c4(n-1)+c7(n-1) = (c1+c2+c3+c4+c7)n-(c2+c3+c4+c7) Which can be expressed as an+b for constants a and b that depend on ci So T(n) is a linear function of n 13
  • 14.
  • 15.
    Side Note: NoOne Cares About Best Case 15
  • 16.
    Worst Case Scenario Ifthe input (n) is in reverse sort order then... We have to compare each A[j] with each element in the sub array A[1..j-1]. And so... T(n) = (c4/2+c5/2+c6/2)n^2 +(c1 +c2+c3+c4/2- c5/2-c6/2+c7)n-(c2+c3+c4+c7) Which can be expressed as an^2 + bn + c So T(n) is a quadratic function of n 16
  • 17.
  • 18.
    In Short... In worstcase insertion sort sucks!  18
  • 19.
    Man That Wasa Lot of Maths! 19
  • 20.
    Simplifying Things WithAsymptotic Notation • Asymptotic notation characterises functions by their growth rates • Functions with the same growth rates have the same Asymptotic notation 20
  • 21.
    How Does ThatHelp Us? Let’s say we have a function with running time T(n) = 4n^2 - 2n + 2 If n = 500 then 4n^2 is 1000 times bigger than 2n So... We can ignore smaller order terms and coefficients T(n) = 4n^2 -2n +2 can be written O(n) = n^2 21
  • 22.
    A Short Noteon The Abuse of “=“ If T(n) = 4n^2 -2n +2 Then saying T(n) = O(n^2) is not strictly correct Rather T(n) is in the set O(n^2) and the above should be read as T(n) is O(n^2) and not T(n) equals O(n^2) But really on Maths geeks care – Ed.  22
  • 23.
    So Back toInsertion Sort So now we can say of Insertion Sort that... Best case it’s O(n) And worst case it’s O(n^2) And since we only care about worst case... We say that Insertion Sort has O(n^2) Which sucks! – Ed.  23
  • 24.
    Designing Algorithms So can we do better? 24
  • 25.
    Optimizing Algorithms isChild’s Play • Sit at table • Foreach item in itemsOnPlate – Eat item • Wait(MealComplete) • Foreach dish in dishesUsed – WashDish – DryDish • Resume Play 25
  • 26.
    Child Will OptimizeTo… • Pause Game • Set Speed = MaxInt • Run to table • Take sliceBread(1) • Foreach item on Plate – Place item on bread • Take sliceBread(2) • Run Outside • Resume Game 26
  • 27.
    Divide And Conquer •Divide – Divide the problem into sub problems • Conquer – Solve the sub problems recursively • Combine – Add the solutions to the sub problems into the solution for the original problem. 27
  • 28.
    Merge Sort • Divide – Divide the n elements into two n/2 element arrays • Conquer – Sort the two arrays recursively • Combine – Merge the two sorted arrays to produce the answer. 28
  • 29.
    Analysis of MergeSort MergeSort(A,p,r) if(p<r) q = [(p+r)/2] MergeSort(A,p,q) MergeSort(A,q+1,r) Merge(A,p,q,r) Initial call MergeSort(A,1,A.length) 29
  • 30.
    Dancers, or itNever Happened!! 30
  • 31.
    So What’s TheRunning Time? In the general case... If the divide step yields ‘a’ sub problems Each 1/b the size of the original It takes T(n/b) time to solve one problem of n/b size So it takes aT(n/b) to solve ‘a’ of them Then, if it takes D(n) time to divide the problem And C(n) time to combine the results Then we get the recurrence... T(n) = aT(n/b) + D(n) + C(n). 31
  • 32.
    Apply That toMerge Sort... • Divide – Computes the middle of the subarray, taking constant time so, D(n) = O(1) • Conquer – Recursively solve two sub problems each of size n/2 contributing 2T(n/2) to the running time • Combine – Merge procedure O(n) • Giving us a recurrence of 2T(n/2)+O(n) 32
  • 33.
    Solve The RecurrenceUsing The Master Method For a Recurrence in the form T(n) = aT(n/b) + f(n) Then If f(n) = O(nlogba-k) then T(n) = O(nlogba) If f(n) = O(nlogba) then T(n) = O(nlogba log n) if f(n) = Omega(n log b a+k) and if af(n/b) <= cf(n) then T(n) = O(f(n)) 33
  • 34.
    What?! • More simplywe are comparing f(n) with the function n log ba and intuitively understanding that the bigger of the two determines the solution to the recurrence. 34
  • 35.
    And So... • WithMerge Sort we are in the third case of the Master Method thus... • T(n) = O(n log n) • Which is much better than the O(n^2) of Insertion Sort 35
  • 36.
  • 37.
    What We Learned • Performance is important • Therefore algorithmic optimization is too • We have a model to benchmark • And a syntax • Divide and conquer • Master Method • Other resources. 37
  • 38.
  • 39.