CS 332: Algorithms Introduction to heapsort
Review: The Master Theorem Given: a  divide and conquer  algorithm An algorithm that divides the problem of size  n  into  a  subproblems, each of size  n / b Let the cost of each stage (i.e., the work to divide the problem + combine solved subproblems) be described by the function  f (n) Then, the Master Theorem gives us a cookbook for the algorithm’s running time:
Review: The Master Theorem if  T(n) = aT(n/b) + f(n) then
Sorting Revisited So far we’ve talked about two algorithms to sort an array of numbers What is the advantage of merge sort? Answer: O(n lg n) worst-case running time What is the advantage of insertion sort? Answer: sorts in place Also: When array ā€œnearly sortedā€, runs fast in practice Next on the agenda:  Heapsort Combines advantages of both previous algorithms
A  heap  can be seen as a complete binary tree: What makes a binary tree complete?  Is the example above complete? Heaps 16 14 10 8 7 9 3 2 4 1
A  heap  can be seen as a complete binary tree: The book calls them ā€œnearly completeā€ binary trees; can think of unfilled slots as null pointers Heaps 16 14 10 8 7 9 3 2 4 1 1 1 1 1 1
Heaps In practice, heaps are usually implemented as arrays: 16 14 10 8 7 9 3 2 4 1 A = = 16 14 10 8 7 9 3 2 4 1
Heaps To represent a complete binary tree as an array:  The root node is A[1] Node  i  is A[ i ] The parent of node  i  is A[ i /2] (note: integer divide) The left child of node  i  is A[2 i ] The right child of node  i  is A[2 i  + 1] 16 14 10 8 7 9 3 2 4 1 A = = 16 14 10 8 7 9 3 2 4 1
Referencing Heap Elements So… Parent(i) { return   i/2  ; } Left(i) { return 2*i; } right(i) { return 2*i + 1; } An aside:  How would you implement this  most efficiently? Trick question, I was looking for ā€œ i << 1 ā€, etc.  But, any modern compiler is smart enough to do this for you (and it makes the code hard to follow)
The Heap Property Heaps also satisfy the  heap property : A[ Parent ( i )]    A[ i ] for all nodes  i  > 1 In other words, the value of a node is at most the value of its parent Where is the largest element in a heap stored?
Heap Height Definitions: The  height  of a node in the tree = the number of edges on the longest downward path to a leaf  The height of a tree = the height of its root What is the height of an n-element heap? Why? This is nice: basic heap operations take at most time proportional to the height of the heap
Heap Operations: Heapify() Heapify() : maintain the heap property Given: a node  i  in the heap with children  l  and  r Given: two subtrees rooted at  l  and  r , assumed to be heaps Problem: The subtree rooted at  i  may violate the heap property ( How? ) Action: let the value of the parent node ā€œfloat downā€ so subtree at  i  satisfies the heap property  What do you suppose will be the basic operation between i, l, and r?
Heap Operations: Heapify() Heapify(A, i) {  l = Left(i); r = Right(i); if (l <= heap_size(A) && A[l] > A[i])  largest = l; else largest = i; if (r <= heap_size(A) && A[r] > A[largest]) largest = r; if (largest != i)  Swap(A, i, largest); Heapify(A, largest); }
Heapify() Example 16 4 10 14 7 9 3 2 8 1 16 4 10 14 7 9 3 2 8 1 A =
Heapify() Example 16 4 10 14 7 9 3 2 8 1 16 10 14 7 9 3 2 8 1 A = 4
Heapify() Example 16 4 10 14 7 9 3 2 8 1 16 10 7 9 3 2 8 1 A = 4 14
Heapify() Example 16 14 10 4 7 9 3 2 8 1 16 14 10 4 7 9 3 2 8 1 A =
Heapify() Example 16 14 10 4 7 9 3 2 8 1 16 14 10 7 9 3 2 8 1 A = 4
Heapify() Example 16 14 10 4 7 9 3 2 8 1 16 14 10 7 9 3 2 1 A = 4 8
Heapify() Example 16 14 10 8 7 9 3 2 4 1 16 14 10 8 7 9 3 2 4 1 A =
Heapify() Example 16 14 10 8 7 9 3 2 4 1 16 14 10 8 7 9 3 2 1 A = 4
Heapify() Example 16 14 10 8 7 9 3 2 4 1 16 14 10 8 7 9 3 2 4 1 A =
Analyzing Heapify(): Informal Aside from the recursive call, what is the running time of  Heapify() ? How many times can  Heapify()  recursively call itself? What is the worst-case running time of  Heapify()  on a heap of size n?
Analyzing Heapify(): Formal Fixing up relationships between  i ,  l , and  r  takes   (1) time If the heap at i has n elements, how many elements can the subtrees at l or r have?  Draw it Answer: 2 n /3 (worst case: bottom row 1/2 full) So time taken by  Heapify()  is given by T ( n )   ļ‚£   T (2 n /3) +   (1)
Analyzing Heapify(): Formal So we have  T ( n )   ļ‚£   T (2 n /3) +   (1)  By case 2 of the Master Theorem, T ( n ) = O(lg  n ) Thus,  Heapify()  takes logarithmic time
Heap Operations: BuildHeap() We can build a heap in a bottom-up manner by running  Heapify()  on successive subarrays Fact: for array of length  n , all elements in range  A[  n/2   + 1 .. n] are heaps ( Why? ) So:  Walk backwards through the array from n/2 to 1, calling  Heapify()  on each node. Order of processing guarantees that the children of node  i  are heaps when  i  is processed
BuildHeap() // given an unsorted array A, make A a heap BuildHeap(A) { heap_size(A) = length(A); for (i =   length[A]/2    downto 1) Heapify(A, i); }
BuildHeap() Example Work through example A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7} 4 1 3 2 16 9 10 14 8 7
Analyzing BuildHeap() Each call to  Heapify()  takes O(lg  n ) time There are O( n ) such calls (specifically,   n/2  ) Thus the running time is O( n  lg  n ) Is this a correct asymptotic upper bound? Is this an asymptotically tight bound? A tighter bound is O (n )  How can this be?  Is there a flaw in the above reasoning?
Analyzing BuildHeap(): Tight To  Heapify()  a subtree takes O( h ) time where  h  is the height of the subtree h  = O(lg  m ), m = # nodes in subtree The height of most subtrees is small Fact: an  n -element heap has at most   n /2 h +1   nodes of height  h CLR 7.3 uses this fact to prove that  BuildHeap()  takes O( n ) time
Heapsort Given  BuildHeap() ,  an in-place sorting algorithm is easily constructed: Maximum element is at A[1] Discard by swapping with element at A[n] Decrement heap_size[A] A[n] now contains correct value Restore heap property at A[1] by calling  Heapify() Repeat, always swapping A[1] for A[heap_size(A)]
Heapsort Heapsort(A) { BuildHeap(A); for (i = length(A) downto 2) { Swap(A[1], A[i]); heap_size(A) -= 1; Heapify(A, 1); } }
Analyzing Heapsort The call to  BuildHeap()  takes O( n ) time  Each of the  n  - 1 calls to  Heapify()  takes O(lg  n ) time Thus the total time taken by  HeapSort()   = O( n ) + ( n  - 1) O(lg  n ) = O( n ) + O( n  lg  n ) = O( n  lg  n )
Priority Queues Heapsort is a nice algorithm, but in practice Quicksort (coming up) usually wins But the heap data structure is incredibly useful for implementing  priority queues A data structure for maintaining a set  S  of elements, each with an associated value or  key Supports the operations  Insert() ,  Maximum() , and  ExtractMax() What might a priority queue be useful for?
Priority Queue Operations Insert(S, x)  inserts the element x into set S Maximum(S)  returns the element of S with the maximum key ExtractMax(S)  removes and returns the element of S with the maximum key How could we implement these operations using a heap?

lecture 5

  • 1.
    CS 332: AlgorithmsIntroduction to heapsort
  • 2.
    Review: The MasterTheorem Given: a divide and conquer algorithm An algorithm that divides the problem of size n into a subproblems, each of size n / b Let the cost of each stage (i.e., the work to divide the problem + combine solved subproblems) be described by the function f (n) Then, the Master Theorem gives us a cookbook for the algorithm’s running time:
  • 3.
    Review: The MasterTheorem if T(n) = aT(n/b) + f(n) then
  • 4.
    Sorting Revisited Sofar we’ve talked about two algorithms to sort an array of numbers What is the advantage of merge sort? Answer: O(n lg n) worst-case running time What is the advantage of insertion sort? Answer: sorts in place Also: When array ā€œnearly sortedā€, runs fast in practice Next on the agenda: Heapsort Combines advantages of both previous algorithms
  • 5.
    A heap can be seen as a complete binary tree: What makes a binary tree complete? Is the example above complete? Heaps 16 14 10 8 7 9 3 2 4 1
  • 6.
    A heap can be seen as a complete binary tree: The book calls them ā€œnearly completeā€ binary trees; can think of unfilled slots as null pointers Heaps 16 14 10 8 7 9 3 2 4 1 1 1 1 1 1
  • 7.
    Heaps In practice,heaps are usually implemented as arrays: 16 14 10 8 7 9 3 2 4 1 A = = 16 14 10 8 7 9 3 2 4 1
  • 8.
    Heaps To representa complete binary tree as an array: The root node is A[1] Node i is A[ i ] The parent of node i is A[ i /2] (note: integer divide) The left child of node i is A[2 i ] The right child of node i is A[2 i + 1] 16 14 10 8 7 9 3 2 4 1 A = = 16 14 10 8 7 9 3 2 4 1
  • 9.
    Referencing Heap ElementsSo… Parent(i) { return  i/2  ; } Left(i) { return 2*i; } right(i) { return 2*i + 1; } An aside: How would you implement this most efficiently? Trick question, I was looking for ā€œ i << 1 ā€, etc. But, any modern compiler is smart enough to do this for you (and it makes the code hard to follow)
  • 10.
    The Heap PropertyHeaps also satisfy the heap property : A[ Parent ( i )]  A[ i ] for all nodes i > 1 In other words, the value of a node is at most the value of its parent Where is the largest element in a heap stored?
  • 11.
    Heap Height Definitions:The height of a node in the tree = the number of edges on the longest downward path to a leaf The height of a tree = the height of its root What is the height of an n-element heap? Why? This is nice: basic heap operations take at most time proportional to the height of the heap
  • 12.
    Heap Operations: Heapify()Heapify() : maintain the heap property Given: a node i in the heap with children l and r Given: two subtrees rooted at l and r , assumed to be heaps Problem: The subtree rooted at i may violate the heap property ( How? ) Action: let the value of the parent node ā€œfloat downā€ so subtree at i satisfies the heap property What do you suppose will be the basic operation between i, l, and r?
  • 13.
    Heap Operations: Heapify()Heapify(A, i) { l = Left(i); r = Right(i); if (l <= heap_size(A) && A[l] > A[i]) largest = l; else largest = i; if (r <= heap_size(A) && A[r] > A[largest]) largest = r; if (largest != i) Swap(A, i, largest); Heapify(A, largest); }
  • 14.
    Heapify() Example 164 10 14 7 9 3 2 8 1 16 4 10 14 7 9 3 2 8 1 A =
  • 15.
    Heapify() Example 164 10 14 7 9 3 2 8 1 16 10 14 7 9 3 2 8 1 A = 4
  • 16.
    Heapify() Example 164 10 14 7 9 3 2 8 1 16 10 7 9 3 2 8 1 A = 4 14
  • 17.
    Heapify() Example 1614 10 4 7 9 3 2 8 1 16 14 10 4 7 9 3 2 8 1 A =
  • 18.
    Heapify() Example 1614 10 4 7 9 3 2 8 1 16 14 10 7 9 3 2 8 1 A = 4
  • 19.
    Heapify() Example 1614 10 4 7 9 3 2 8 1 16 14 10 7 9 3 2 1 A = 4 8
  • 20.
    Heapify() Example 1614 10 8 7 9 3 2 4 1 16 14 10 8 7 9 3 2 4 1 A =
  • 21.
    Heapify() Example 1614 10 8 7 9 3 2 4 1 16 14 10 8 7 9 3 2 1 A = 4
  • 22.
    Heapify() Example 1614 10 8 7 9 3 2 4 1 16 14 10 8 7 9 3 2 4 1 A =
  • 23.
    Analyzing Heapify(): InformalAside from the recursive call, what is the running time of Heapify() ? How many times can Heapify() recursively call itself? What is the worst-case running time of Heapify() on a heap of size n?
  • 24.
    Analyzing Heapify(): FormalFixing up relationships between i , l , and r takes  (1) time If the heap at i has n elements, how many elements can the subtrees at l or r have? Draw it Answer: 2 n /3 (worst case: bottom row 1/2 full) So time taken by Heapify() is given by T ( n ) ļ‚£ T (2 n /3) +  (1)
  • 25.
    Analyzing Heapify(): FormalSo we have T ( n ) ļ‚£ T (2 n /3) +  (1) By case 2 of the Master Theorem, T ( n ) = O(lg n ) Thus, Heapify() takes logarithmic time
  • 26.
    Heap Operations: BuildHeap()We can build a heap in a bottom-up manner by running Heapify() on successive subarrays Fact: for array of length n , all elements in range A[  n/2  + 1 .. n] are heaps ( Why? ) So: Walk backwards through the array from n/2 to 1, calling Heapify() on each node. Order of processing guarantees that the children of node i are heaps when i is processed
  • 27.
    BuildHeap() // givenan unsorted array A, make A a heap BuildHeap(A) { heap_size(A) = length(A); for (i =  length[A]/2  downto 1) Heapify(A, i); }
  • 28.
    BuildHeap() Example Workthrough example A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7} 4 1 3 2 16 9 10 14 8 7
  • 29.
    Analyzing BuildHeap() Eachcall to Heapify() takes O(lg n ) time There are O( n ) such calls (specifically,  n/2  ) Thus the running time is O( n lg n ) Is this a correct asymptotic upper bound? Is this an asymptotically tight bound? A tighter bound is O (n ) How can this be? Is there a flaw in the above reasoning?
  • 30.
    Analyzing BuildHeap(): TightTo Heapify() a subtree takes O( h ) time where h is the height of the subtree h = O(lg m ), m = # nodes in subtree The height of most subtrees is small Fact: an n -element heap has at most  n /2 h +1  nodes of height h CLR 7.3 uses this fact to prove that BuildHeap() takes O( n ) time
  • 31.
    Heapsort Given BuildHeap() , an in-place sorting algorithm is easily constructed: Maximum element is at A[1] Discard by swapping with element at A[n] Decrement heap_size[A] A[n] now contains correct value Restore heap property at A[1] by calling Heapify() Repeat, always swapping A[1] for A[heap_size(A)]
  • 32.
    Heapsort Heapsort(A) {BuildHeap(A); for (i = length(A) downto 2) { Swap(A[1], A[i]); heap_size(A) -= 1; Heapify(A, 1); } }
  • 33.
    Analyzing Heapsort Thecall to BuildHeap() takes O( n ) time Each of the n - 1 calls to Heapify() takes O(lg n ) time Thus the total time taken by HeapSort() = O( n ) + ( n - 1) O(lg n ) = O( n ) + O( n lg n ) = O( n lg n )
  • 34.
    Priority Queues Heapsortis a nice algorithm, but in practice Quicksort (coming up) usually wins But the heap data structure is incredibly useful for implementing priority queues A data structure for maintaining a set S of elements, each with an associated value or key Supports the operations Insert() , Maximum() , and ExtractMax() What might a priority queue be useful for?
  • 35.
    Priority Queue OperationsInsert(S, x) inserts the element x into set S Maximum(S) returns the element of S with the maximum key ExtractMax(S) removes and returns the element of S with the maximum key How could we implement these operations using a heap?