Numerical Linear Algebra in digital image processing

Numerical
Linear Algebra
EECS 442 – David Fouhey
Fall 2019, University of Michigan
http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/

Administrivia
• HW 1 out – due in two weeks
• Follow submission format (wrong format = 0)
• The homeworks are not fill-in-the-blank. This is
harder to do but mirrors life
• If it’s ambiguous: make a decision, document what
you think and why in your homework, and move on
• Highly encouraged to work together. See piazza
• Please check syllabus for what’s allowed. I
guarantee checking the syllabus thoroughly will
help boost your grade.

This Week – Math
Two goals for the next two classes:
• Math with computers ≠ Math
• Practical math you need to know but
may not have been taught

This Week – Goal
• Not a “Linear algebra in two lectures” – that’s
impossible.
• Some of this you should know!
• Aimed at reviving your knowledge and plugging
any gaps
• Aimed at giving you intuitions

Adding Numbers
• 1 + 1 = ?
• Suppose is normally distributed with mean
and standard deviation for
• How is the average, or , distributed
(qualitatively), in terms of variance?
• The Free Drinks in Vegas Theorem: has mean
and standard deviation .

Free Drinks in Vegas
Each game/variable has mean $0.10, std $2
100 games is
uncertain
and fun!
100K games is
guaranteed profit:
99.999999% lowest
value is $0.064.
$0.01 for drinks
$0.054 for profits

Let’s Make It Big
• Suppose I average 50M normally distributed
numbers (mean: 31, standard deviation: 1)
• For instance: have predicted and actual depth
for 200 480x640 images and want to know the
average error (|predicted – actual|)
numerator = 0
for x in xs:
numerator += x
return numerator / len(xs)

Let’s Make It Big
• What should happen qualitatively?
• Theory says that the average is distributed with
mean 31 and standard deviation
• What will happen?
• Reality: 17.47

What’s a Number?
1 0 1 1 1 0 0
27
26
25
24
23
22
21
20
1 185
185
128 + 32 + 16 + 8 + 1 =

Adding Two Numbers
“Integers” on a computer are integers modulo 2k
Carry
Flag
Result
28
27
1
26
0
25
1
24
1
23
1
22
0
21
0
20
1 185
0 1 1 0 1 0 0 1 105
1 34
0 0 1 0 0 0 1 0

Some Gotchas
Why?
32 + (3 / 4) x 40 = 32
32 + (3 x 40) / 4 = 62
32 + 3 / 4 x 40 =
32 + 0 x 40 =
32 + 0 =
32
Underflow
32 + 3 x 40 / 4 =
32 + 120 / 4 =
32 + 30 =
62
No Underflow
Ok – you have to multiply before dividing

Some Gotchas
42
32 + 9 x 40 / 10 =
32 + 104 / 10 =
Overflow
32 + (9 x 40) / 10 =
uint8
32 + (9 x 40) / 10 = 68
math
Why 104?
9 x 40 = 360
360 % 256 = 104
Should be:
9x4=36
32 + 10 =
42

What’s a Number?
27
1
26
0
25
1
24
1
23
1
22
0
21
0
20
1 185
How can we do fractions?
25
24
23
22
21
20
2-1
2-2
1 0 1 1 1 0 0 1 45.25
45 0.25

Fixed-Point Arithmetic
25
1
24
0
23
1
22
1
21
1
20
0
2-1
0
2-2
1 45.25
What’s the largest number we can represent?
63.75 – Why?
How precisely can we measure at 63?
How precisely can we measure at 0?
0.25
0.25
Fine for many purposes but for science, seems silly

Floating Point
0 1 1 1
1 0 0 1
Sign (S) Exponent (E) Fraction (F)
(− 𝟏
𝑺
)(𝟐
𝑬+ 𝒃𝒊𝒂𝒔
)(𝟏+
𝑭
𝟐
𝟑 )
1 7 1
-1 27-7
= 20
=1 1+1/8 = 1.125
Bias: allows exponent to be negative; Note: fraction = significant = mantissa;
exponents of all ones or all zeros are special numbers

Floating Point
Sign Exponent
Fraction
0 0 0 -20
x 1.00 = -1
0/8
0 0 1 -20
x 1.125 = -1.125
1/8
-20
x 1.25 = -1.25
0 1 0
2/8
1 1 0
1 1 1
-20
x 1.75 = -1.75
-20
x 1.875 = -1.875
…
6/8
7/8
1 0 1 1 1
7-7=0
*(-bias)*
-1

Floating Point
Fraction
0 0 0 -22
x 1.00 = -4
0/8
0 0 1 -22
x 1.125 = -4.5
1/8
0 1 0
1 1 0
1 1 1
-22
x 1.25 = -5
-22
x 1.75 = -7
-22
x 1.875 = -7.5
…
2/8
6/8
7/8
Sign Exponent
1 1 0 0 1
9-7=2
*(-bias)*
-1

Floating Point
0 0 0
1 0 1 1 1
Sign Exponent
Fraction
0 0 1
-20
x 1.00 = -1
-20
x 1.125 = -1.125
0 0 0
1 1 0 0 1
0 0 1
-22
x 1.00 = -4
-22
x 1.125 = -4.5
Gap between numbers is relative, not absolute

Revisiting Adding Numbers
Sign Exponent Fraction
1 1 0 0 1 0 0 0 -22
x 1.00 = -4
1 0 1 1 0 0 0 0 -2-1
x 1.00 = -0.5
1 1 0 0 1 0 0 1 -22
x 1.125 = -4.5
Actual implementation is complex

1 1 0 0 1 0 0 0 -22
x 1.00 = -4
1 0 1 0 0 0 0 0 -2-3
x 1.00 = -0.125
-22
x 1.00 = -4
1 1 0 0 1 0 0 0
1 1 0 0 1 0 0 1 -22
x 1.125 = -4.5
?
-22
x 1.03125 = -4.125

1 1 0 0 1 0 0 0 -22
x 1.00 = -4
1 0 1 0 0 0 0 0 -2-3
x 1.00 = -0.125
-22
x 1.03125 = -4.125
-22
x 1.00 = -4
1 1 0 0 1 0 0 0
For a and b, these can happen
a + b = a a+b-a ≠ b

S Exponent Fraction
8 bits
2127
≈ 1038
23 bits
≈ 7 decimal digits
S Exponent Fraction
11 bits
21023
≈ 10308
52 bits
≈ 15 decimal digits
IEEE 754 Single Precision / Single
IEEE 754 Double Precision / Double

Trying it Out
a+b=a ->
numerator is
stuck,
denominator
isn’t
Roundoff
error occurs

Take-homes
• Computer numbers aren’t math numbers
• Overflow, accidental zeros, roundoff error, and
basic equalities are almost certainly incorrect
for some values
• Floating point defaults and numpy try to protect
you.
• Generally safe to use a double and use built-in-
functions in numpy (not necessarily others!)
• Spooky behavior = look for numerical issues

Vectors
[2,3] = + 3 x [0,1]
2 x [1,0]
2 x + 3 x
e1 e2
2 x + 3 x
x = [2,3] Can be arbitrary # of
dimensions
(typically denoted Rn
)

Vectors
x = [2,3]
𝒙 =
[2
3]
𝒙1=2
𝒙2=3
Just an array!
Get in the habit of thinking of
them as columns.

Scaling Vectors
x = [2,3]
2x = [4,6] • Can scale vector by a scalar
• Scalar = single number
• Dimensions changed
independently
• Changes magnitude / length,
does not change direction.

Adding Vectors
y = [3,1]
x+y = [5,4]
x = [2,3]
• Can add vectors
• Dimensions changed independently
• Order irrelevant
• Can change direction and magnitude

Scaling and Adding
y = [3,1]
2x+y = [7,7]
Can do both at the same
time
x = [2,3]

Measuring Length
y = [3,1]
x = [2,3]
Magnitude / length / (L2) norm of vector
‖𝒙‖=‖𝒙‖2=
(∑
𝑖
𝑛
𝑥𝑖
2
)
1/2
There are other norms; assume L2
unless told otherwise
‖𝒙‖2=√13
‖𝒚‖2 =√10
Why?

Normalizing a Vector
x = [2,3]
y = [3,1]
𝒙′
=𝒙 /‖𝒙‖𝟐
𝒚′
= 𝒚 /‖𝒚‖𝟐
Diving by norm gives
something on the unit
sphere (all vectors with
length 1)

Dot Products
𝒙′
𝒚′
𝒙⋅𝒚 =∑
𝑖=1
𝑛
𝑥𝑖 𝑦𝑖=𝒙𝑻
𝒚
𝜃
𝒙 ⋅ 𝒚 =cos (𝜃 )‖𝒙‖‖𝒚‖
What happens with
normalized / unit
vectors?

Dot Products
𝒆𝟏
𝒆𝟐
𝒙⋅𝒚 =∑
𝑖
𝑛
𝑥𝑖 𝑦𝑖
𝒙=[2,3]
What’s , ?
Ans: ;
• Dot product is projection
• Amount of x that’s also
pointing in direction of y

Dot Products
What’s ?
Ans:
𝒙⋅𝒚 =∑
𝑖
𝑛
𝑥𝑖 𝑦𝑖
𝒙=[2,3]

Special Angles
𝒙′
𝒚′
𝜃
[1
0 ]⋅[0
1 ]=0 ∗1+ 1∗ 0=0
Perpendicular /
orthogonal vectors
have dot product 0
irrespective of their
magnitude
𝒙
𝒚

Special Angles
[𝑥1
𝑥2
]⋅
[𝑦1
𝑦2
]=𝑥1 𝑦 1+ 𝑥2 𝑦2=0
Perpendicular /
orthogonal vectors
have dot product 0
irrespective of their
magnitude
𝒙′
𝒚′
𝜃
𝒙
𝒚

Orthogonal Vectors
𝒙=[2,3]
• Geometrically,
what’s the set of
vectors that are
orthogonal to x?
• A line [3,-2]

Orthogonal Vectors
• What’s the set of vectors that are
orthogonal to x = [5,0,0]?
• A plane/2D space of vectors/any
vector
• What’s the set of vectors that are
orthogonal to x and y = [0,5,0]?
• A line/1D space of vectors/any
vector
• Ambiguity in sign and magnitude
𝒙
𝒙
𝒚

Cross Product
• Set has an ambiguity in sign and
magnitude
• Cross product is: (1) orthogonal to
x, y (2) has sign given by right
hand rule and (3) has magnitude
given by area of parallelogram of x
and y
• Important: if x and y are the same
direction or either is 0, then .
• Only in 3D!
𝒙
𝒚
𝒙 × 𝒚
Image credit: Wikipedia.org

Operations You Should Know
• Scale (vector, scalar → vector)
• Add (vector, vector → vector)
• Magnitude (vector → scalar)
• Dot product (vector, vector → scalar)
• Dot products are projection / angles
• Cross product (vector, vector → vector)
• Vectors facing same direction have cross product 0
• You can never mix vectors of different sizes

Matrices
Horizontally concatenate n, m-dim column vectors
and you get a mxn matrix A (here 2x3)
𝑨=[ 𝒗1 ,⋯ , 𝒗 𝑛]=
[𝑣11
𝑣21
𝑣31
𝑣12
𝑣22
𝑣32
]
a
(scalar)
lowercase
undecorated
a
(vector)
lowercase
bold or arrow
A
(matrix)
uppercase
bold

Matrices
Vertically concatenate m, n-dim row vectors
and you get a mxn matrix A (here 2x3)
𝐴=
[
𝒖1
𝑇
⋮
𝒖𝑛
𝑇 ]=
[𝑢11
𝑢12
𝑢13
𝑢21
𝑢22
𝑢23
]
Transpose: flip
rows / columns [
𝑎
𝑏
𝑐 ]
𝑇
=[𝑎 𝑏 𝑐 ] (3x1)T
= 1x3

Matrix-Vector Product
𝒚2 𝑥1= 𝑨2 𝑥3 𝒙3 𝑥 1
𝒚 =𝑥1 𝒗𝟏 +𝑥2 𝒗𝟐 +𝑥3 𝒗𝟑
Linear combination of columns of A
[𝑦1
𝑦2
]=¿[𝒗𝟏 𝒗𝟐 𝒗𝟑 ][
𝑥1
𝑥2
𝑥3
]

𝒚2 𝑥1= 𝑨2 𝑥3 𝒙3 𝑥 1
𝑦 1=𝒖𝟏
𝑻
𝒙
Dot product between rows of A and x
𝑦 2=𝒖𝟐
𝑻
𝒙
[𝒖𝟏
𝑻
𝒖𝟐
𝑻
]
[𝑦1
𝑦2
]=¿ 𝒙
3
3

Matrix Multiplication
[− 𝒂𝟏
𝑻
−
¿⋮ ¿− ¿
−¿
]¿
𝑨𝑩=¿
Generally: Amn and Bnp yield product (AB)mp
Yes – in A, I’m referring to the rows, and in B,
I’m referring to the columns

[− 𝒂𝟏
𝑻
−
¿⋮ ¿− ¿
−¿
]
¿
𝑨𝑩=¿
[
𝒂𝟏
𝑻
𝒃𝟏 ⋯ 𝒂𝟏
𝑻
𝒃𝒑
⋮ ⋱ ⋮
𝒂𝒎
𝑻
𝒃𝟏 ⋯ 𝒂𝒎
𝑻
𝒃𝒑
]
𝑨 𝑩𝑖𝑗=𝒂𝒊
𝑻
𝒃𝒋
Generally: Amn and Bnp yield product (AB)mp

• Dimensions must match
• (Yes, it’s associative): ABx = (A)(Bx) = (AB)x
• (No it’s not commutative): ABx ≠ (BA)x ≠ (BxA)

Operations They Don’t Teach
[𝑎+𝑒 𝑏+𝑒
𝑐 +𝑒 𝑑+𝑒]
[𝑎 𝑏
𝑐 𝑑]+
[𝑒 𝑓
𝑔 h]=¿
[𝑎+𝑒 𝑏+ 𝑓
𝑐+𝑔 𝑑+h ]
You Probably Saw Matrix Addition
[𝑎 𝑏
𝑐 𝑑]+𝑒=¿
What is this? FYI: e is a scalar

Broadcasting
[𝑎 𝑏
𝑐 𝑑]+𝑒
¿
[𝑎 𝑏
𝑐 𝑑 ]+
[𝑒 𝑒
𝑒 𝑒]
¿
[𝑎 𝑏
𝑐 𝑑 ]+𝟏2 𝑥 2 𝑒
If you want to be pedantic and proper, you expand
e by multiplying a matrix of 1s (denoted 1)
Many smart matrix libraries do this automatically.
This is the source of many bugs.

Broadcasting Example
𝑷 =
[
𝑥1 𝑦1
⋮ ⋮
𝑥𝑛 𝑦𝑛
] 𝒗 =
[𝑎
𝑏]
Given: a nx2 matrix P and a 2D column vector v,
Want: nx2 difference matrix D
𝑫=
[
𝑥1 − 𝑎 𝑦1 − 𝑏
⋮ ⋮
𝑥𝑛 − 𝑎 𝑦𝑛 − 𝑏]
𝑷 −𝒗𝑇
=¿
[
𝑥1 𝑦1
⋮ ⋮
𝑥𝑛 𝑦𝑛
]−
[𝑎 𝑏]
[𝑎 𝑏]
⋮
Blue stuff is
assumed /
broadcast

Two Uses for Matrices
1. Storing things in a rectangular array (images,
maps)
• Typical operations: element-wise operations,
convolution (which we’ll cover next)
• Atypical operations: almost anything you learned in
a math linear algebra class
2. A linear operator that maps vectors to
another space (Ax)
• Typical/Atypical: reverse of above

Images as Matrices
Suppose someone hands you this matrix.
What’s wrong with it?
No
contrast!

Contrast – Gamma curve
Typical way to
change the contrast
is to apply a
nonlinear correction
The quantity
controls how much
contrast gets added

Contrast – Gamma curve
10%
50%
90%
Now the darkest
regions (10th
pctile)
are much darker than
the moderately dark
regions (50th
pctile).
new 10%
new
50%
new
90%

Implementation
imNew = im**4
Python+Numpy (right way):
Python+Numpy (slow way – why? ):
imNew = np.zeros(im.shape)
for y in range(im.shape[0]):
for x in range(im.shape[1]):
imNew[y,x] = im[y,x]**expFactor

Images as Matrices
Suppose someone hands you this matrix.
The contrast is wrong!
No
contrast!

Element-wise Operations
( 𝑨⊙ 𝑩)𝑖𝑗= 𝑨𝑖𝑗 ∗ 𝑩𝑖𝑗
“Hadamard Product” / Element-wise multiplication
( 𝑨/ 𝑩)𝑖𝑗=
𝐴𝑖𝑗
𝐵𝑖𝑗
Element-wise division
( 𝑨𝑝
)𝑖𝑗= 𝐴𝑖𝑗
𝑝
Element-wise power – beware notation

Sums Across Axes
𝑨=
[
𝑥1 𝑦1
⋮ ⋮
𝑥𝑛 𝑦𝑛
]
Suppose have
Nx2 matrix A
Σ( 𝑨 ,1)=
[
𝑥1 +𝑦1
⋮
𝑥𝑛 +𝑦𝑛
]
ND col. vec.
Σ( 𝑨 ,0)=
[∑
𝑖=1
𝑛
𝑥𝑖 ,∑
𝑖=1
𝑛
𝑦𝑖
]
2D row vec
Note – libraries distinguish between N-D column vector and Nx1 matrix.

Vectorizing Example
• Suppose I represent each image as a 128-
dimensional vector
• I want to compute all the pairwise distances
between {x1, …, xN} and {y1, …, yM} so I can
find, for every xi the nearest yj
• Identity:
• Or:

Vectorizing Example
𝑿=
[− 𝒙1 −
¿⋮ ¿− ¿
−¿]𝒀=
[− 𝒚1 −
¿⋮ ¿− ¿
−¿]
(𝑿 𝒀𝑻
)𝑖𝑗=𝒙𝒊
𝑻
𝒚 𝒋
𝒀 𝑻
=¿
𝚺( 𝑿
𝟐
,𝟏)=
[
‖𝒙𝟏‖
𝟐
⋮
‖𝒙𝑵‖
𝟐 ]
Compute a Nx1
vector of norms
(can also do Mx1)
Compute a NxM
matrix of dot products

Vectorizing Example
𝐃=(Σ(𝑿
𝟐
,1)+Σ(𝒀
𝟐
,1)
𝑻
−2 𝑿𝒀
𝑻
)
1/2
[
‖𝒙𝟏‖
𝟐
⋮
‖𝒙 𝑵‖
𝟐 ]+[‖𝒚 1‖
𝟐
⋯ ‖𝒚 𝑀‖
𝟐
]
(Σ( 𝑿
2
, 1)+Σ (𝒀
2
,1)
𝑇
)𝑖𝑗=‖𝒙𝑖‖
2
+‖𝒚 𝑗‖
2
[
‖𝒙𝟏‖
2
+‖𝒚 𝟏‖
2
⋯ ‖𝒙𝟏‖
2
+‖𝒚 𝑴‖
2
⋮ ⋱ ⋮
‖𝒙 𝑵‖
2
+‖𝒚 𝟏‖
2
⋯ ‖𝒙 𝑵‖
2
+‖𝒚 𝑴‖
2 ] Why?

Vectorizing Example
𝐃𝑖𝑗=‖𝒙𝒊‖
2
+‖𝒚 𝒋‖
2
+2𝒙𝑻
𝒚
Numpy code:
XNorm = np.sum(X**2,axis=1,keepdims=True)
YNorm = np.sum(Y**2,axis=1,keepdims=True)
D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5
𝐃=(Σ(𝑿
𝟐
,1)+Σ(𝒀
𝟐
,1)
𝑻
−2 𝑿𝒀
𝑻
)
1/2
*May have to make sure this is at least 0
(sometimes roundoff issues happen)

Does it Make a Difference?
Computing pairwise distances between 300 and
400 128-dimensional vectors
1. for x in X, for y in Y, using native python: 9s
2. for x in X, for y in Y, using numpy to compute
distance: 0.8s
3. vectorized: 0.0045s (~2000x faster than 1,
175x faster than 2)
Expressing things in primitives that are
optimized is usually faster

Linear Independence
𝒚 =
[
0
− 2
1 ]=¿
1
2
𝒂−
1
3
𝒃
𝒙 =
[
0
0
4 ]=¿
2𝒂
• Is the set {a,b,c} linearly independent?
• Is the set {a,b,x} linearly independent?
• Max # of independent 3D vectors?
𝒂=
[
0
0
2 ]𝒃=
[
0
6
0]𝒄=
[
5
0
0]
Suppose:
A set of vectors is linearly independent if you can’t
write one as a linear combination of the others.

Span
Span: all linear
combinations of a
set of vectors
Span({ }) =
Span({[0,2]}) = ?
All vertical lines
through origin =
Is blue in {red}’s
span?

Span
Span: all linear
combinations of a
set of vectors
Span({ , }) = ?

𝑨𝒙=¿
Right-multiplying A by x
mixes columns of A
according to entries of x
• The output space of f(x) = Ax is constrained to
be the span of the columns of A.
• Can’t output things you can’t construct out of
your columns

An Intuition
x
Ax
y1
y2
y3
x1 x2 x3
y
𝒚 = 𝑨𝒙=
[
¿ ¿ ¿
𝒄𝟏 𝒄𝟐 𝒄𝒏
¿ ¿ ¿
][
𝑥1
𝑥2
𝑥3
]
x – knobs on machine (e.g., fuel, brakes)
y – state of the world (e.g., where you are)
A – machine (e.g., your car)

Linear Independence
𝒚 = 𝑨𝒙=
[
¿ ¿ ¿
𝒄𝟏 𝛼 𝒄𝟏 𝒄𝟐
¿ ¿ ¿
][
𝑥1
𝑥2
𝑥3
]
Suppose the columns of 3x3 matrix A are not
linearly independent (c1, αc1, c2 for instance)
𝒚 =𝑥1 𝒄𝟏+𝛼 𝑥2 𝒄𝟏+𝑥3 𝒄𝟐
𝒚 =(𝑥1+𝛼 𝑥2)𝒄𝟏+𝑥3 𝒄𝟐

Linear Independence Intuition
Knobs of x are redundant. Even if y has 3
outputs, you can only control it in two directions
𝒚 =(𝑥1+𝛼 𝑥2)𝒄𝟏+𝑥3 𝒄𝟐
x
Ax
y1
y2
y3
x1 x2 x3
y

Linear Independence
𝑨𝒙 =( 𝑥1 + 𝛼 𝑥 2 ) 𝒄 𝟏+ 𝑥 3 𝒄𝟐
• Or, given a vector y there’s not a unique
vector x s.t. y =Ax
• Not all y have a corresponding x s.t. y=Ax
𝒚 = 𝑨
[
𝑥1+ 𝛽
𝑥2 − 𝛽 /𝛼
𝑥3
]
• Can write y an infinite number of ways by
adding to x1 and subtracting from x2
Recall:
¿(𝑥1+ 𝛽+𝛼 𝑥2− 𝛼
𝛽
𝛼 )𝑐1+𝑥3 𝑐2

Linear Independence
𝑨𝒙 =( 𝑥1 + 𝛼 𝑥 2 ) 𝒄 𝟏+ 𝑥 3 𝒄𝟐
• An infinite number of non-zero vectors x can
map to a zero-vector y
• Called the right null-space of A.
𝒚 = 𝑨
[
𝛽
− 𝛽 / 𝛼
0 ]
¿(𝛽− 𝛼
𝛽
𝛼)𝒄𝟏+0𝒄𝟐
• What else can we cancel out?

Rank
• Rank of a nxn matrix A – number of linearly
independent columns (or rows) of A / the
dimension of the span of the columns
• Matrices with full rank (n x n, rank n) behave
nicely: can be inverted, span the full output
space, are one-to-one.
• Matrices with full rank are machines where
every knob is useful and every output state can
be made by the machine

Inverses
• Given , y is a linear combination of columns of
A proportional to x. If A is full-rank, we should
be able to invert this mapping.
• Given some y (output) and A, what x (inputs)
produced it?
• x = A-1
y
• Note: if you don’t need to compute it, never
ever compute it. Solving for x is much faster
and stable than obtaining A-1
.

Symmetric Matrices
• Symmetric: or
• Have lots of special
properties [
𝑎11 𝑎12 𝑎13
𝑎21 𝑎22 𝑎23
𝑎31 𝑎32 𝑎33
]
Any matrix of the form is symmetric.
Quick check: 𝑨
𝑻
=(𝑿
𝑻
𝑿)𝑻
𝑨
𝑻
=𝑿
𝑻
(𝑿
𝑻
)𝑻
𝑨𝑻
=𝑿𝑻
𝑿

Special Matrices – Rotations
[
𝑟1 1 𝑟12 𝑟 13
𝑟 21 𝑟22 𝑟 23
𝑟 31 𝑟32 𝑟 33
]
• Rotation matrices rotate vectors and do not
change vector L2 norms ()
• Every row/column is unit norm
• Every row is linearly independent
• Transpose is inverse
• Determinant is 1 (otherwise it’s also a coordinate
flip/reflection), eigenvalues are 1

Eigensystems
• An eigenvector and eigenvalue of a matrix
satisfy ( is scaled by )
• Vectors and values are always paired and
typically you assume
• Biggest eigenvalue of A gives bounds on how
much stretches a vector x.
• Hints of what people really mean:
• “Largest eigenvector” = vector w/ largest value
• Spectral just means there’s eigenvectors

Suppose I have points in a grid

Now I apply f(x) = Ax to these points
Pointy-end: Ax . Non-Pointy-End: x

Red box – unit square, Blue box – after f(x) = Ax.
What are the yellow lines and why?
𝑨=¿
[1.1 0
0 1.1]

𝑨=¿
[0 .8 0
0 1.25]
Now I apply f(x) = Ax to these points
Pointy-end: Ax . Non-Pointy-End: x

What are the yellow lines and why?
𝑨=¿
[0 .8 0
0 1.25]

Can we draw any yellow lines?
𝑨=¿
[c os ⁡(𝑡) − sin ⁡(𝑡)
sin ⁡(𝑡) cos ⁡(𝑡) ]

Eigenvectors of Symmetric
Matrices
• Always n mutually orthogonal eigenvectors
with n (not necessarily) distinct eigenvalues
• For symmetric , the eigenvector with the
largest eigenvalue maximizes (smallest/min)
• So for unit vectors (where ), that eigenvector
maximizes
• A surprisingly large number of optimization
problems rely on (max/min)imizing this

The Singular Value Decomposition
U
A =
Rotation
Can always write a mxn matrix A as:
Eigenvectors
of AAT
∑
Scale
Sqrt of
Eigenvalues
of AT
A
σ1
σ2
σ3
0
0

The Singular Value Decomposition
U ∑
A =
Rotation Scale
VT
Rotation
Can always write a mxn matrix A as:
Eigenvectors
of AAT
Sqrt of
Eigenvalues
of AT
A
Eigenvectors
of AT
A

Singular Value Decomposition
• Every matrix is a rotation, scaling, and rotation
• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
U
A =
σ
1 σ
2 σ
3
0
0
VT

U
Â =
σ
1 σ
2
0
0
VT
0

• Secretly behind basically many things you do
with matrices

Solving Least-Squares
Start with two points (xi,yi)
[𝑦1
𝑦2
]=
[𝑥1 1
𝑥2 1 ][𝑚
𝑏 ]
𝒚 =𝑨𝒗
[𝑦1
𝑦2
]=
[𝑚 𝑥1 +𝑏
𝑚 𝑥2 +𝑏 ]
We know how to solve this –
invert A and find v (i.e., (m,b)
that fits points)
(x1,y1)
(x2,y2)

Start with two points (xi,yi)
[𝑦1
𝑦2
]=
[𝑥1 1
𝑥2 1 ][𝑚
𝑏 ]
𝒚 =𝑨𝒗
‖[𝑦1
𝑦 2
]−
[𝑚 𝑥1 +𝑏
𝑚 𝑥2+ 𝑏]‖
2
‖𝒚 − 𝑨𝒗‖
2
=¿
¿ ( 𝑦1 −(𝑚 𝑥1 +𝑏))
2
+( 𝑦2 − (𝑚 𝑥2 +𝑏))
2
(x1,y1)
(x2,y2)
The sum of squared differences between
the actual value of y and
what the model says y should be.

Suppose there are n > 2 points
[
𝑦1
⋮
𝑦 𝑁
]=
[
𝑥1 1
⋮ ⋮
𝑥𝑁 1][𝑚
𝑏 ]
𝒚 =𝑨𝒗
Compute again
‖𝒚 − 𝑨𝒗‖
2
=∑
𝑖=1
𝑛
(𝑦𝑖 −(𝑚 𝑥𝑖+𝑏))
2

Given y, A, and v with y = Av overdetermined
(A tall / more equations than unknowns)
We want to minimize , or find:
arg mi n𝒗‖𝒚 − 𝑨𝒗‖
2
(The value of x that makes
the expression smallest)
Solution satisfies
or
(Don’t actually compute the inverse!)

When is Least-Squares Possible?
Given y, A, and v. Want y = Av
A
y = v
Want n outputs, have n knobs
to fiddle with, every knob is
useful if A is full rank.
A
y
=
v
A: rows (outputs) > columns
(knobs). Thus can’t get precise
output you want (not enough
knobs). So settle for “closest”
knob setting.

When is Least-Squares Possible?
Given y, A, and v. Want y = Av
A
y = v
Want n outputs, have n knobs
to fiddle with, every knob is
useful if A is full rank.
A
y =
v
A: columns (knobs) > rows
(outputs). Thus, any output can
be expressed in infinite ways.

Homogeneous Least-Squares
Given a set of unit vectors (aka directions) and I
want vector that is as orthogonal to all the as
possible (for some definition of orthogonal)
𝑨𝒗=
[− 𝒙𝟏
𝑻
−
¿⋮ ¿− ¿
−¿
]𝒗
Stack into A, compute Av
¿
[
𝒙𝟏
𝑻
𝒗
⋮
𝒙𝒏
𝑻
𝒗]
𝒙𝟏
𝒙𝟐
𝒙𝒏
…
𝒗
‖𝑨𝒗‖
𝟐
=∑
𝒊
𝒏
(𝒙𝒊
𝑻
𝒗 )
𝟐
Compute
0 if
orthog
Sum of how orthog. v is to each x

• A lot of times, given a matrix A we want to find
the v that minimizes .
• I.e., want
• What’s a trivial solution?
• Set v = 0 → Av = 0
• Exclude this by forcing v to have unit norm

Let’s look at
‖𝑨𝒗‖2
2
=¿ Rewrite as dot product
‖𝑨𝒗‖2
2
=𝒗𝑻
𝑨𝑻
𝐀𝐯=𝐯𝐓
(𝐀𝐓
𝐀 ) 𝐯
‖𝑨𝒗‖2
2
=( 𝐀𝐯 )T
(𝐀𝐯 ) Distribute transpose
We want the vector minimizing this quadratic form
Where have we seen this?

arg min
‖𝒗‖2
=1
‖𝑨𝒗‖
2
*Note: is positive semi-definite so it has all non-negative eigenvalues
(1) “Smallest”* eigenvector of
(2) “Smallest” right singular vector of
Ubiquitious tool in vision:
For min → max, switch smallest → largest

Derivatives
Remember derivatives?
Derivative: rate at which a function f(x) changes
at a point as well as the direction that increases
the function

Given quadratic function f(x)
is function
aka
𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2
+5

Given quadratic function f(x)
What’s special
about x=2?
minim. at 2
at 2
a = minimum of f →
Reverse is not true
𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2
+5

Rates of change
Suppose I want to
increase f(x) by
changing x:
Blue area: move left
Red area: move right
Derivative tells you
direction of ascent
and rate
𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2
+5

What Calculus Should I Know
• Really need intuition
• Need chain rule
• Rest you should look up / use a computer
algebra system / use a cookbook
• Partial derivatives (and that’s it from
multivariable calculus)

Partial Derivatives
• Pretend other variables are constant, take a
derivative. That’s it.
• Make our function a function of two variables
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
𝑓 (𝑥 )=(𝑥 −2)2
+5
𝜕
𝜕𝑥
𝑓 ( 𝑥)=2 (𝑥 −2) ∗1=2(𝑥− 2)
𝜕
𝜕𝑥
𝑓 2 ( 𝑥)=2(𝑥− 2)
Pretend it’s
constant →
derivative = 0

Zooming Out
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
Dark = f(x,y) low
Bright = f(x,y) high

Taking a slice of
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
Slice of y=0 is the
function from before:

Taking a slice of
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
is rate of change &
direction in x
dimension

Zooming Out
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
is
and is the rate of
change & direction in
y dimension

Zooming Out
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
Gradient/Jacobian:
Making a vector of
gives rate and
direction of change.
Arrows point OUT of
minimum / basin.

What Should I Know?
• Gradients are simply partial derivatives per-
dimension: if in has n dimensions, has n
dimensions
• Gradients point in direction of ascent and tell
the rate of ascent
• If a is minimum of →
• Reverse is not true, especially in high-
dimensional spaces

Numerical Linear Algebra in digital image processing

Numerical Linear Algebra in digital image processing

More Related Content

Similar to Numerical Linear Algebra in digital image processing

More from Indra Hermawan

Recently uploaded

Numerical Linear Algebra in digital image processing