Advanced Data Structures
Sartaj Sahni
Clip Art Sources
• www.barrysclipart.com
• www.livinggraphics.com
• www.rad.kumc.edu
• www.livinggraphics.com
What The Course Is About
• Study data structures for:
 External sorting
 Single and double ended priority queues
 Dictionaries
 Multidimensional search
 Computational geometry
 Image processing
 Packet routing and classification
 …
What The Course Is About
• Concerned with:
 Worst-case complexity
 Average complexity
 Amortized complexity
Prerequisites
 Asymptotic Complexity
 Big Oh, Theta, and Omega notations
 Undergraduate data structures
 Stacks and Queues
 Linked lists
 Trees
 Graphs
C, C++, Java, or Python
Web Site
www.cise.ufl.edu/~sahni/cop5536
http://elearning.ufl.edu
 Handouts, syllabus, readings, assignments,
past exams, past exam solutions, TAs,
Internet lectures, PowerPoint presentations,
etc.
Assignments, Tests, & Grades
• 25% for assignments
 There will be two assignments.
• 25% for each test
 There will be three tests.
Grades (Rough Cutoffs)
• A >= 85%
• A- >= 81%
• B+ >= 77%
• B >= 72%
• B- >= 67%
• C+ >= 63%
• C >= 60%
• C- >= 55%
Kinds Of Complexity
Worst-case complexity.
Average complexity.
• Amortized complexity.
Data Structure Z
• Operations
 Initialize
 Insert
 Delete
• Examples
 Linear List
 Stack
 Queue
 …
Data Structure Z
• Suppose that the worst-case complexity is
 Initialize O(1)
 Insert O(s)
 Delete O(s)
where s is the size of Z.
• How much time does it take to perform a
sequence of 1 initialize followed by n inserts and
deletes?
• O(n2
)
Data Structure Z
• Suppose further that the average complexity is
 Initialize O(1)
 Insert O(log s)
 Delete O(log s)
• How much time does it take to perform a
sequence of 1 initialize followed by n inserts and
deletes?
• O(n2
)
An Application P
• Initialize Z
• Solve P by performing many inserts and deletes
plus other tasks.
• Examples
 Dijkstra’s single-source shortest paths
 Minimum cost spanning trees
An Application P
•Total time to solve P using Z is
time for inserts/deletes + time for other tasks
= O(n2
) + time for other tasks
where n is the number of inserts/deletes
At times a better bound can be obtained using
amortized complexity.
Amortized Complexity
• The amortized complexity of a task is the
amount you charge the task.
• The conventional way to bound the cost of doing
a task n times is to use one of the expressions
 n*(worst-case cost of task)
 worst-case costof task i
• The amortized complexity way to bound the cost
of doing a task n times is to use one of the
expressions
 n*(amortized cost of task)
 amortized cost of task i
Amortized Complexity
• The amortized complexity of a task may bear no
direct relationship to the actual complexity of
the task. I.e., it may be <, =, or > actual task
complexity.
Amortized Complexity
• In worst-case complexity analysis, each task is
charged an amount that is >= its cost. So,
actual costof task i
worst-case cost of task i)
• In amortized analysis, some tasks may be charged an
amount that is < their cost. The amount charged must
ensure:
actual costof task i
amortized cost of task i)
Potential Function P()
• P(i) = amortizedCost(i) – actualCost(i) + P(i – 1)
• (P(i) – P(i–1)) =
(amortizedCost(i) –actualCost(i))
• P(n) – P(0) = (amortizedCost(i) –actualCost(i))
• P(n) – P(0) >= 0
• When P(0) = 0, P(i) is the amount by which the
first i tasks/operations have been over charged.
Arithmetic Statements
• Rewrite an arithmetic statement as a
sequence of statements that do not use
parentheses.
• a = x+((a+b)*c+d)+y;
is equivalent to the sequence:
z1 = a+b;
z2 = z1*c+d;
a = x+z2+y;
Arithmetic Statements
• The rewriting is done using a stack and a
method processNextSymbol.
• create an empty stack;
for (int i = 1; i <= n; i++)
// n is number of symbols in statement
processNextSymbol();
a = x+((a+b)*c+d)+y;
Arithmetic Statements
• processNextSymbol extracts the next
symbol from the input statement.
• Symbols other than ) and ; are simply
pushed on to the stack.
a = x+((a+b)*c+d)+y;
a
=
x
+
(
(
a
+
b
Arithmetic Statements
• If the next symbol is ), symbols are
popped from the stack up to and
including the first (, an assignment
statement is generated, and the left
hand symbol is added to the stack.
a = x+((a+b)*c+d)+y;
a
=
x
+
(
(
a
+
b
z1 = a+b;
Arithmetic Statements
a = x+((a+b)*c+d)+y;
a
=
x
+
(
z1
z1 = a+b;
*
c
+
d
z2 = z1*c+d;
• If the next symbol is ), symbols are
popped from the stack up to and
including the first (, an assignment
statement is generated, and the left
hand symbol is added to the stack.
Arithmetic Statements
a = x+((a+b)*c+d)+y;
a
=
x
+
z2
z1 = a+b;
z2 = z1*c+d;
+
y
• If the next symbol is ), symbols are
popped from the stack up to and
including the first (, an assignment
statement is generated, and the left
hand symbol is added to the stack.
Arithmetic Statements
• If the next symbol is ;, symbols are
popped from the stack until the stack
becomes empty. The final
assignment statement a
= x+z2+y;
is generated.
a = x+((a+b)*c+d)+y;
z1 = a+b;
a
=
x
+
z2
+
y
z2 = z1*c+d;
Complexity Of processNextSymbol
• O(number of symbols that get popped from
stack)
• O(i), where i is for loop index.
a = x+((a+b)*c+d)+y;
Overall Complexity (Conventional Analysis)
• So, overall complexity is O(i) = O(n2
).
• Alternatively, O(n*n) = O(n2
).
• Although correct, a more careful analysis permits
us to conclude that the complexity is O(n).
create an empty stack;
for (int i = 1; i <= n; i++)
// n is number of symbols in statement
processNextSymbol();
Ways To Determine Amortized
Complexity
• Aggregate method.
• Accounting method.
• Potential function method.
Aggregate Method
• Somehow obtain a good upper bound on the
actual cost of the n invocations of
processNextSymbol()
• Divide this bound by n to get the amortized
cost of one invocation of
processNextSymbol()
• Easy to see that
actual costamortized cost
Aggregate Method
• The actual cost of the n invocations of
processNextSymbol()
equals number of stack pop and push operations.
• The n invocations cause at most n symbols to be
pushed on to the stack.
• This count includes the symbols for new variables,
because each new variable is the result of a ) being
processed. Note that no )s get pushed on to the
stack.
Aggregate Method
• The actual cost of the n invocations of
processNextSymbol() is
at most 2n.
• So, using 2n/n = 2 as the amortized cost of
processNextSymbol() is
OK, because this cost results in actual
costamortized cost
• Since the amortized cost of processNextSymbol() is
2, the actual cost of all n invocations is at most 2n.
Aggregate Method
• The aggregate method isn’t very useful, because to
figure out the amortized cost we must first obtain a
good bound on the aggregate cost of a sequence of
invocations.
• Since our objective was to use amortized complexity
to get a better bound on the cost of a sequence of
invocations, if we can obtain this better bound
through other techniques, we can omit dividing the
bound by n to obtain the amortized cost.

introduction to data structure is providing a basic

  • 1.
  • 2.
    Clip Art Sources •www.barrysclipart.com • www.livinggraphics.com • www.rad.kumc.edu • www.livinggraphics.com
  • 3.
    What The CourseIs About • Study data structures for:  External sorting  Single and double ended priority queues  Dictionaries  Multidimensional search  Computational geometry  Image processing  Packet routing and classification  …
  • 4.
    What The CourseIs About • Concerned with:  Worst-case complexity  Average complexity  Amortized complexity
  • 5.
    Prerequisites  Asymptotic Complexity Big Oh, Theta, and Omega notations  Undergraduate data structures  Stacks and Queues  Linked lists  Trees  Graphs C, C++, Java, or Python
  • 6.
    Web Site www.cise.ufl.edu/~sahni/cop5536 http://elearning.ufl.edu  Handouts,syllabus, readings, assignments, past exams, past exam solutions, TAs, Internet lectures, PowerPoint presentations, etc.
  • 7.
    Assignments, Tests, &Grades • 25% for assignments  There will be two assignments. • 25% for each test  There will be three tests.
  • 8.
    Grades (Rough Cutoffs) •A >= 85% • A- >= 81% • B+ >= 77% • B >= 72% • B- >= 67% • C+ >= 63% • C >= 60% • C- >= 55%
  • 9.
    Kinds Of Complexity Worst-casecomplexity. Average complexity. • Amortized complexity.
  • 10.
    Data Structure Z •Operations  Initialize  Insert  Delete • Examples  Linear List  Stack  Queue  …
  • 11.
    Data Structure Z •Suppose that the worst-case complexity is  Initialize O(1)  Insert O(s)  Delete O(s) where s is the size of Z. • How much time does it take to perform a sequence of 1 initialize followed by n inserts and deletes? • O(n2 )
  • 12.
    Data Structure Z •Suppose further that the average complexity is  Initialize O(1)  Insert O(log s)  Delete O(log s) • How much time does it take to perform a sequence of 1 initialize followed by n inserts and deletes? • O(n2 )
  • 13.
    An Application P •Initialize Z • Solve P by performing many inserts and deletes plus other tasks. • Examples  Dijkstra’s single-source shortest paths  Minimum cost spanning trees
  • 14.
    An Application P •Totaltime to solve P using Z is time for inserts/deletes + time for other tasks = O(n2 ) + time for other tasks where n is the number of inserts/deletes At times a better bound can be obtained using amortized complexity.
  • 15.
    Amortized Complexity • Theamortized complexity of a task is the amount you charge the task. • The conventional way to bound the cost of doing a task n times is to use one of the expressions  n*(worst-case cost of task)  worst-case costof task i • The amortized complexity way to bound the cost of doing a task n times is to use one of the expressions  n*(amortized cost of task)  amortized cost of task i
  • 16.
    Amortized Complexity • Theamortized complexity of a task may bear no direct relationship to the actual complexity of the task. I.e., it may be <, =, or > actual task complexity.
  • 17.
    Amortized Complexity • Inworst-case complexity analysis, each task is charged an amount that is >= its cost. So, actual costof task i worst-case cost of task i) • In amortized analysis, some tasks may be charged an amount that is < their cost. The amount charged must ensure: actual costof task i amortized cost of task i)
  • 18.
    Potential Function P() •P(i) = amortizedCost(i) – actualCost(i) + P(i – 1) • (P(i) – P(i–1)) = (amortizedCost(i) –actualCost(i)) • P(n) – P(0) = (amortizedCost(i) –actualCost(i)) • P(n) – P(0) >= 0 • When P(0) = 0, P(i) is the amount by which the first i tasks/operations have been over charged.
  • 19.
    Arithmetic Statements • Rewritean arithmetic statement as a sequence of statements that do not use parentheses. • a = x+((a+b)*c+d)+y; is equivalent to the sequence: z1 = a+b; z2 = z1*c+d; a = x+z2+y;
  • 20.
    Arithmetic Statements • Therewriting is done using a stack and a method processNextSymbol. • create an empty stack; for (int i = 1; i <= n; i++) // n is number of symbols in statement processNextSymbol(); a = x+((a+b)*c+d)+y;
  • 21.
    Arithmetic Statements • processNextSymbolextracts the next symbol from the input statement. • Symbols other than ) and ; are simply pushed on to the stack. a = x+((a+b)*c+d)+y; a = x + ( ( a + b
  • 22.
    Arithmetic Statements • Ifthe next symbol is ), symbols are popped from the stack up to and including the first (, an assignment statement is generated, and the left hand symbol is added to the stack. a = x+((a+b)*c+d)+y; a = x + ( ( a + b z1 = a+b;
  • 23.
    Arithmetic Statements a =x+((a+b)*c+d)+y; a = x + ( z1 z1 = a+b; * c + d z2 = z1*c+d; • If the next symbol is ), symbols are popped from the stack up to and including the first (, an assignment statement is generated, and the left hand symbol is added to the stack.
  • 24.
    Arithmetic Statements a =x+((a+b)*c+d)+y; a = x + z2 z1 = a+b; z2 = z1*c+d; + y • If the next symbol is ), symbols are popped from the stack up to and including the first (, an assignment statement is generated, and the left hand symbol is added to the stack.
  • 25.
    Arithmetic Statements • Ifthe next symbol is ;, symbols are popped from the stack until the stack becomes empty. The final assignment statement a = x+z2+y; is generated. a = x+((a+b)*c+d)+y; z1 = a+b; a = x + z2 + y z2 = z1*c+d;
  • 26.
    Complexity Of processNextSymbol •O(number of symbols that get popped from stack) • O(i), where i is for loop index. a = x+((a+b)*c+d)+y;
  • 27.
    Overall Complexity (ConventionalAnalysis) • So, overall complexity is O(i) = O(n2 ). • Alternatively, O(n*n) = O(n2 ). • Although correct, a more careful analysis permits us to conclude that the complexity is O(n). create an empty stack; for (int i = 1; i <= n; i++) // n is number of symbols in statement processNextSymbol();
  • 28.
    Ways To DetermineAmortized Complexity • Aggregate method. • Accounting method. • Potential function method.
  • 29.
    Aggregate Method • Somehowobtain a good upper bound on the actual cost of the n invocations of processNextSymbol() • Divide this bound by n to get the amortized cost of one invocation of processNextSymbol() • Easy to see that actual costamortized cost
  • 30.
    Aggregate Method • Theactual cost of the n invocations of processNextSymbol() equals number of stack pop and push operations. • The n invocations cause at most n symbols to be pushed on to the stack. • This count includes the symbols for new variables, because each new variable is the result of a ) being processed. Note that no )s get pushed on to the stack.
  • 31.
    Aggregate Method • Theactual cost of the n invocations of processNextSymbol() is at most 2n. • So, using 2n/n = 2 as the amortized cost of processNextSymbol() is OK, because this cost results in actual costamortized cost • Since the amortized cost of processNextSymbol() is 2, the actual cost of all n invocations is at most 2n.
  • 32.
    Aggregate Method • Theaggregate method isn’t very useful, because to figure out the amortized cost we must first obtain a good bound on the aggregate cost of a sequence of invocations. • Since our objective was to use amortized complexity to get a better bound on the cost of a sequence of invocations, if we can obtain this better bound through other techniques, we can omit dividing the bound by n to obtain the amortized cost.

Editor's Notes

  • #1 Acknowledge on-campus overflow students.
  • #3 Build upon data structures knowledge from an undergraduate data structures course and study more advanced data structures for a variety of applications.
  • #10 First we take a look at what we can/cannot do with worst case and average complexities.
  • #11 For any n, the worst-case time is the max of the times over all instances. Linear list and search tree worst case time is O(s), for example. Stack with resizing on full is O(s) for insert.
  • #12 For any s, the average insert time is the average over all possible inserts when the data structure size is s. Search tree average is O(log s). Note that all inserts in the sequence of size n may be worst-case inserts; hence O(n^2).
  • #15 Second expression is used when the bound on the cost of a task depends on the task index or on the nature of the task (insert, search, delete).
  • #17 In wc analysis, actual cost of task i <= wc cost of task i. So, sum of actuals <= sum of wc costs. In amortized analysis, have to determine a cost (amortized cost) that ensures sum of actuals <= sum of amortized costs.
  • #18 P(i) is potential after i’th operation. P(0) is initial potential. This function keeps track of the accumulated difference between the amortized (i.e., charged) costs and actual costs.
  • #26 Strictly speaking, #pops+1 to account for the push that takes place except when next symbol is ;
  • #27 Note that if we do worst-case amount of work when i = 10 (say), we can’t do worst-case amount of work when i = 11 as now the stack has only 1 element on it!
  • #29 Because sum of the amortized costs equals the obtained good upper bound.
  • #30 Actually, there are n-1 pushes as there is no push when ; is processed. Only pushed items may be popped. So, there are n-1 pops.
  • #31 Note that the amortized cost of 2 is sometimes less than the actual cost, sometimes more, and sometimes equal. We could assign an amortized cost of 3 as well.