Numerical
Linear Algebra
EECS 442 – David Fouhey
Fall 2019, University of Michigan
http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/
Administrivia
• HW 1 out – due in two weeks
• Follow submission format (wrong format = 0)
• The homeworks are not fill-in-the-blank. This is
harder to do but mirrors life
• If it’s ambiguous: make a decision, document what
you think and why in your homework, and move on
• Highly encouraged to work together. See piazza
• Please check syllabus for what’s allowed. I
guarantee checking the syllabus thoroughly will
help boost your grade.
This Week – Math
Two goals for the next two classes:
• Math with computers ≠ Math
• Practical math you need to know but
may not have been taught
This Week – Goal
• Not a “Linear algebra in two lectures” – that’s
impossible.
• Some of this you should know!
• Aimed at reviving your knowledge and plugging
any gaps
• Aimed at giving you intuitions
Adding Numbers
• 1 + 1 = ?
• Suppose is normally distributed with mean
and standard deviation for
• How is the average, or , distributed
(qualitatively), in terms of variance?
• The Free Drinks in Vegas Theorem: has mean
and standard deviation .
Free Drinks in Vegas
Each game/variable has mean $0.10, std $2
100 games is
uncertain
and fun!
100K games is
guaranteed profit:
99.999999% lowest
value is $0.064.
$0.01 for drinks
$0.054 for profits
Let’s Make It Big
• Suppose I average 50M normally distributed
numbers (mean: 31, standard deviation: 1)
• For instance: have predicted and actual depth
for 200 480x640 images and want to know the
average error (|predicted – actual|)
numerator = 0
for x in xs:
numerator += x
return numerator / len(xs)
Let’s Make It Big
• What should happen qualitatively?
• Theory says that the average is distributed with
mean 31 and standard deviation
• What will happen?
• Reality: 17.47
Trying it Out
Hmm.
Hmm.
What’s a Number?
1 0 1 1 1 0 0
27
26
25
24
23
22
21
20
1 185
185
128 + 32 + 16 + 8 + 1 =
Adding Two Numbers
“Integers” on a computer are integers modulo 2k
Carry
Flag
Result
28
27
1
26
0
25
1
24
1
23
1
22
0
21
0
20
1 185
0 1 1 0 1 0 0 1 105
1 34
0 0 1 0 0 0 1 0
Some Gotchas
Why?
32 + (3 / 4) x 40 = 32
32 + (3 x 40) / 4 = 62
32 + 3 / 4 x 40 =
32 + 0 x 40 =
32 + 0 =
32
Underflow
32 + 3 x 40 / 4 =
32 + 120 / 4 =
32 + 30 =
62
No Underflow
Ok – you have to multiply before dividing
Some Gotchas
42
32 + 9 x 40 / 10 =
32 + 104 / 10 =
Overflow
32 + (9 x 40) / 10 =
uint8
32 + (9 x 40) / 10 = 68
math
Why 104?
9 x 40 = 360
360 % 256 = 104
Should be:
9x4=36
32 + 10 =
42
What’s a Number?
27
1
26
0
25
1
24
1
23
1
22
0
21
0
20
1 185
How can we do fractions?
25
24
23
22
21
20
2-1
2-2
1 0 1 1 1 0 0 1 45.25
45 0.25
Fixed-Point Arithmetic
25
1
24
0
23
1
22
1
21
1
20
0
2-1
0
2-2
1 45.25
What’s the largest number we can represent?
63.75 – Why?
How precisely can we measure at 63?
How precisely can we measure at 0?
0.25
0.25
Fine for many purposes but for science, seems silly
Floating Point
0 1 1 1
1 0 0 1
Sign (S) Exponent (E) Fraction (F)
(− 𝟏
𝑺
)(𝟐
𝑬+ 𝒃𝒊𝒂𝒔
)(𝟏+
𝑭
𝟐
𝟑 )
1 7 1
-1 27-7
= 20
=1 1+1/8 = 1.125
Bias: allows exponent to be negative; Note: fraction = significant = mantissa;
exponents of all ones or all zeros are special numbers
Floating Point
Sign Exponent
Fraction
0 0 0 -20
x 1.00 = -1
0/8
0 0 1 -20
x 1.125 = -1.125
1/8
-20
x 1.25 = -1.25
0 1 0
2/8
1 1 0
1 1 1
-20
x 1.75 = -1.75
-20
x 1.875 = -1.875
…
6/8
7/8
1 0 1 1 1
7-7=0
*(-bias)*
-1
Floating Point
Fraction
0 0 0 -22
x 1.00 = -4
0/8
0 0 1 -22
x 1.125 = -4.5
1/8
0 1 0
1 1 0
1 1 1
-22
x 1.25 = -5
-22
x 1.75 = -7
-22
x 1.875 = -7.5
…
2/8
6/8
7/8
Sign Exponent
1 1 0 0 1
9-7=2
*(-bias)*
-1
Floating Point
0 0 0
1 0 1 1 1
Sign Exponent
Fraction
0 0 1
-20
x 1.00 = -1
-20
x 1.125 = -1.125
0 0 0
1 1 0 0 1
0 0 1
-22
x 1.00 = -4
-22
x 1.125 = -4.5
Gap between numbers is relative, not absolute
Revisiting Adding Numbers
Sign Exponent Fraction
1 1 0 0 1 0 0 0 -22
x 1.00 = -4
1 0 1 1 0 0 0 0 -2-1
x 1.00 = -0.5
1 1 0 0 1 0 0 1 -22
x 1.125 = -4.5
Actual implementation is complex
Revisiting Adding Numbers
Sign Exponent Fraction
1 1 0 0 1 0 0 0 -22
x 1.00 = -4
1 0 1 0 0 0 0 0 -2-3
x 1.00 = -0.125
-22
x 1.00 = -4
1 1 0 0 1 0 0 0
1 1 0 0 1 0 0 1 -22
x 1.125 = -4.5
?
-22
x 1.03125 = -4.125
Revisiting Adding Numbers
Sign Exponent Fraction
1 1 0 0 1 0 0 0 -22
x 1.00 = -4
1 0 1 0 0 0 0 0 -2-3
x 1.00 = -0.125
-22
x 1.03125 = -4.125
-22
x 1.00 = -4
1 1 0 0 1 0 0 0
For a and b, these can happen
a + b = a a+b-a ≠ b
Revisiting Adding Numbers
S Exponent Fraction
8 bits
2127
≈ 1038
23 bits
≈ 7 decimal digits
S Exponent Fraction
11 bits
21023
≈ 10308
52 bits
≈ 15 decimal digits
IEEE 754 Single Precision / Single
IEEE 754 Double Precision / Double
Trying it Out
a+b=a ->
numerator is
stuck,
denominator
isn’t
Roundoff
error occurs
Take-homes
• Computer numbers aren’t math numbers
• Overflow, accidental zeros, roundoff error, and
basic equalities are almost certainly incorrect
for some values
• Floating point defaults and numpy try to protect
you.
• Generally safe to use a double and use built-in-
functions in numpy (not necessarily others!)
• Spooky behavior = look for numerical issues
Vectors
[2,3] = + 3 x [0,1]
2 x [1,0]
2 x + 3 x
e1 e2
2 x + 3 x
x = [2,3] Can be arbitrary # of
dimensions
(typically denoted Rn
)
Vectors
x = [2,3]
𝒙 =
[2
3]
𝒙1=2
𝒙2=3
Just an array!
Get in the habit of thinking of
them as columns.
Scaling Vectors
x = [2,3]
2x = [4,6] • Can scale vector by a scalar
• Scalar = single number
• Dimensions changed
independently
• Changes magnitude / length,
does not change direction.
Adding Vectors
y = [3,1]
x+y = [5,4]
x = [2,3]
• Can add vectors
• Dimensions changed independently
• Order irrelevant
• Can change direction and magnitude
Scaling and Adding
y = [3,1]
2x+y = [7,7]
Can do both at the same
time
x = [2,3]
Measuring Length
y = [3,1]
x = [2,3]
Magnitude / length / (L2) norm of vector
‖𝒙‖=‖𝒙‖2=
(∑
𝑖
𝑛
𝑥𝑖
2
)
1/2
There are other norms; assume L2
unless told otherwise
‖𝒙‖2=√13
‖𝒚‖2 =√10
Why?
Normalizing a Vector
x = [2,3]
y = [3,1]
𝒙′
=𝒙 /‖𝒙‖𝟐
𝒚′
= 𝒚 /‖𝒚‖𝟐
Diving by norm gives
something on the unit
sphere (all vectors with
length 1)
Dot Products
𝒙′
𝒚′
𝒙⋅𝒚 =∑
𝑖=1
𝑛
𝑥𝑖 𝑦𝑖=𝒙𝑻
𝒚
𝜃
𝒙 ⋅ 𝒚 =cos (𝜃 )‖𝒙‖‖𝒚‖
What happens with
normalized / unit
vectors?
Dot Products
𝒆𝟏
𝒆𝟐
𝒙⋅𝒚 =∑
𝑖
𝑛
𝑥𝑖 𝑦𝑖
𝒙=[2,3]
What’s , ?
Ans: ;
• Dot product is projection
• Amount of x that’s also
pointing in direction of y
Dot Products
What’s ?
Ans:
𝒙⋅𝒚 =∑
𝑖
𝑛
𝑥𝑖 𝑦𝑖
𝒙=[2,3]
Special Angles
𝒙′
𝒚′
𝜃
[1
0 ]⋅[0
1 ]=0 ∗1+ 1∗ 0=0
Perpendicular /
orthogonal vectors
have dot product 0
irrespective of their
magnitude
𝒙
𝒚
Special Angles
[𝑥1
𝑥2
]⋅
[𝑦1
𝑦2
]=𝑥1 𝑦 1+ 𝑥2 𝑦2=0
Perpendicular /
orthogonal vectors
have dot product 0
irrespective of their
magnitude
𝒙′
𝒚′
𝜃
𝒙
𝒚
Orthogonal Vectors
𝒙=[2,3]
• Geometrically,
what’s the set of
vectors that are
orthogonal to x?
• A line [3,-2]
Orthogonal Vectors
• What’s the set of vectors that are
orthogonal to x = [5,0,0]?
• A plane/2D space of vectors/any
vector
• What’s the set of vectors that are
orthogonal to x and y = [0,5,0]?
• A line/1D space of vectors/any
vector
• Ambiguity in sign and magnitude
𝒙
𝒙
𝒚
Cross Product
• Set has an ambiguity in sign and
magnitude
• Cross product is: (1) orthogonal to
x, y (2) has sign given by right
hand rule and (3) has magnitude
given by area of parallelogram of x
and y
• Important: if x and y are the same
direction or either is 0, then .
• Only in 3D!
𝒙
𝒚
𝒙 × 𝒚
Image credit: Wikipedia.org
Operations You Should Know
• Scale (vector, scalar → vector)
• Add (vector, vector → vector)
• Magnitude (vector → scalar)
• Dot product (vector, vector → scalar)
• Dot products are projection / angles
• Cross product (vector, vector → vector)
• Vectors facing same direction have cross product 0
• You can never mix vectors of different sizes
Matrices
Horizontally concatenate n, m-dim column vectors
and you get a mxn matrix A (here 2x3)
𝑨=[ 𝒗1 ,⋯ , 𝒗 𝑛]=
[𝑣11
𝑣21
𝑣31
𝑣12
𝑣22
𝑣32
]
a
(scalar)
lowercase
undecorated
a
(vector)
lowercase
bold or arrow
A
(matrix)
uppercase
bold
Matrices
Vertically concatenate m, n-dim row vectors
and you get a mxn matrix A (here 2x3)
𝐴=
[
𝒖1
𝑇
⋮
𝒖𝑛
𝑇 ]=
[𝑢11
𝑢12
𝑢13
𝑢21
𝑢22
𝑢23
]
Transpose: flip
rows / columns [
𝑎
𝑏
𝑐 ]
𝑇
=[𝑎 𝑏 𝑐 ] (3x1)T
= 1x3
Matrix-Vector Product
𝒚2 𝑥1= 𝑨2 𝑥3 𝒙3 𝑥 1
𝒚 =𝑥1 𝒗𝟏 +𝑥2 𝒗𝟐 +𝑥3 𝒗𝟑
Linear combination of columns of A
[𝑦1
𝑦2
]=¿[𝒗𝟏 𝒗𝟐 𝒗𝟑 ][
𝑥1
𝑥2
𝑥3
]
Matrix-Vector Product
𝒚2 𝑥1= 𝑨2 𝑥3 𝒙3 𝑥 1
𝑦 1=𝒖𝟏
𝑻
𝒙
Dot product between rows of A and x
𝑦 2=𝒖𝟐
𝑻
𝒙
[𝒖𝟏
𝑻
𝒖𝟐
𝑻
]
[𝑦1
𝑦2
]=¿ 𝒙
3
3
Matrix Multiplication
[− 𝒂𝟏
𝑻
−
¿⋮ ¿− ¿
−¿
]¿
𝑨𝑩=¿
Generally: Amn and Bnp yield product (AB)mp
Yes – in A, I’m referring to the rows, and in B,
I’m referring to the columns
Matrix Multiplication
[− 𝒂𝟏
𝑻
−
¿⋮ ¿− ¿
−¿
]
¿
𝑨𝑩=¿
[
𝒂𝟏
𝑻
𝒃𝟏 ⋯ 𝒂𝟏
𝑻
𝒃𝒑
⋮ ⋱ ⋮
𝒂𝒎
𝑻
𝒃𝟏 ⋯ 𝒂𝒎
𝑻
𝒃𝒑
]
𝑨 𝑩𝑖𝑗=𝒂𝒊
𝑻
𝒃𝒋
Generally: Amn and Bnp yield product (AB)mp
Matrix Multiplication
• Dimensions must match
• Dimensions must match
• Dimensions must match
• (Yes, it’s associative): ABx = (A)(Bx) = (AB)x
• (No it’s not commutative): ABx ≠ (BA)x ≠ (BxA)
Operations They Don’t Teach
[𝑎+𝑒 𝑏+𝑒
𝑐 +𝑒 𝑑+𝑒]
[𝑎 𝑏
𝑐 𝑑]+
[𝑒 𝑓
𝑔 h]=¿
[𝑎+𝑒 𝑏+ 𝑓
𝑐+𝑔 𝑑+h ]
You Probably Saw Matrix Addition
[𝑎 𝑏
𝑐 𝑑]+𝑒=¿
What is this? FYI: e is a scalar
Broadcasting
[𝑎 𝑏
𝑐 𝑑]+𝑒
¿
[𝑎 𝑏
𝑐 𝑑 ]+
[𝑒 𝑒
𝑒 𝑒]
¿
[𝑎 𝑏
𝑐 𝑑 ]+𝟏2 𝑥 2 𝑒
If you want to be pedantic and proper, you expand
e by multiplying a matrix of 1s (denoted 1)
Many smart matrix libraries do this automatically.
This is the source of many bugs.
Broadcasting Example
𝑷 =
[
𝑥1 𝑦1
⋮ ⋮
𝑥𝑛 𝑦𝑛
] 𝒗 =
[𝑎
𝑏]
Given: a nx2 matrix P and a 2D column vector v,
Want: nx2 difference matrix D
𝑫=
[
𝑥1 − 𝑎 𝑦1 − 𝑏
⋮ ⋮
𝑥𝑛 − 𝑎 𝑦𝑛 − 𝑏]
𝑷 −𝒗𝑇
=¿
[
𝑥1 𝑦1
⋮ ⋮
𝑥𝑛 𝑦𝑛
]−
[𝑎 𝑏]
[𝑎 𝑏]
⋮
Blue stuff is
assumed /
broadcast
Two Uses for Matrices
1. Storing things in a rectangular array (images,
maps)
• Typical operations: element-wise operations,
convolution (which we’ll cover next)
• Atypical operations: almost anything you learned in
a math linear algebra class
2. A linear operator that maps vectors to
another space (Ax)
• Typical/Atypical: reverse of above
Images as Matrices
Suppose someone hands you this matrix.
What’s wrong with it?
No
contrast!
Contrast – Gamma curve
Typical way to
change the contrast
is to apply a
nonlinear correction
The quantity
controls how much
contrast gets added
Contrast – Gamma curve
10%
50%
90%
Now the darkest
regions (10th
pctile)
are much darker than
the moderately dark
regions (50th
pctile).
new 10%
new
50%
new
90%
Implementation
imNew = im**4
Python+Numpy (right way):
Python+Numpy (slow way – why? ):
imNew = np.zeros(im.shape)
for y in range(im.shape[0]):
for x in range(im.shape[1]):
imNew[y,x] = im[y,x]**expFactor
Numerical
Linear Algebra
EECS 442 – David Fouhey
Fall 2019, University of Michigan
http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/
Images as Matrices
Suppose someone hands you this matrix.
The contrast is wrong!
No
contrast!
Results
Phew! Much Better.
Implementation
imNew = im**4
Python+Numpy (right way):
Python+Numpy (slow way – why? ):
imNew = np.zeros(im.shape)
for y in range(im.shape[0]):
for x in range(im.shape[1]):
imNew[y,x] = im[y,x]**expFactor
Element-wise Operations
( 𝑨⊙ 𝑩)𝑖𝑗= 𝑨𝑖𝑗 ∗ 𝑩𝑖𝑗
“Hadamard Product” / Element-wise multiplication
( 𝑨/ 𝑩)𝑖𝑗=
𝐴𝑖𝑗
𝐵𝑖𝑗
Element-wise division
( 𝑨𝑝
)𝑖𝑗= 𝐴𝑖𝑗
𝑝
Element-wise power – beware notation
Sums Across Axes
𝑨=
[
𝑥1 𝑦1
⋮ ⋮
𝑥𝑛 𝑦𝑛
]
Suppose have
Nx2 matrix A
Σ( 𝑨 ,1)=
[
𝑥1 +𝑦1
⋮
𝑥𝑛 +𝑦𝑛
]
ND col. vec.
Σ( 𝑨 ,0)=
[∑
𝑖=1
𝑛
𝑥𝑖 ,∑
𝑖=1
𝑛
𝑦𝑖
]
2D row vec
Note – libraries distinguish between N-D column vector and Nx1 matrix.
Vectorizing Example
• Suppose I represent each image as a 128-
dimensional vector
• I want to compute all the pairwise distances
between {x1, …, xN} and {y1, …, yM} so I can
find, for every xi the nearest yj
• Identity:
• Or:
Vectorizing Example
𝑿=
[− 𝒙1 −
¿⋮ ¿− ¿
−¿]𝒀=
[− 𝒚1 −
¿⋮ ¿− ¿
−¿]
(𝑿 𝒀𝑻
)𝑖𝑗=𝒙𝒊
𝑻
𝒚 𝒋
𝒀 𝑻
=¿
𝚺( 𝑿
𝟐
,𝟏)=
[
‖𝒙𝟏‖
𝟐
⋮
‖𝒙𝑵‖
𝟐 ]
Compute a Nx1
vector of norms
(can also do Mx1)
Compute a NxM
matrix of dot products
Vectorizing Example
𝐃=(Σ(𝑿
𝟐
,1)+Σ(𝒀
𝟐
,1)
𝑻
−2 𝑿𝒀
𝑻
)
1/2
[
‖𝒙𝟏‖
𝟐
⋮
‖𝒙 𝑵‖
𝟐 ]+[‖𝒚 1‖
𝟐
⋯ ‖𝒚 𝑀‖
𝟐
]
(Σ( 𝑿
2
, 1)+Σ (𝒀
2
,1)
𝑇
)𝑖𝑗=‖𝒙𝑖‖
2
+‖𝒚 𝑗‖
2
[
‖𝒙𝟏‖
2
+‖𝒚 𝟏‖
2
⋯ ‖𝒙𝟏‖
2
+‖𝒚 𝑴‖
2
⋮ ⋱ ⋮
‖𝒙 𝑵‖
2
+‖𝒚 𝟏‖
2
⋯ ‖𝒙 𝑵‖
2
+‖𝒚 𝑴‖
2 ] Why?
Vectorizing Example
𝐃𝑖𝑗=‖𝒙𝒊‖
2
+‖𝒚 𝒋‖
2
+2𝒙𝑻
𝒚
Numpy code:
XNorm = np.sum(X**2,axis=1,keepdims=True)
YNorm = np.sum(Y**2,axis=1,keepdims=True)
D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5
𝐃=(Σ(𝑿
𝟐
,1)+Σ(𝒀
𝟐
,1)
𝑻
−2 𝑿𝒀
𝑻
)
1/2
*May have to make sure this is at least 0
(sometimes roundoff issues happen)
Does it Make a Difference?
Computing pairwise distances between 300 and
400 128-dimensional vectors
1. for x in X, for y in Y, using native python: 9s
2. for x in X, for y in Y, using numpy to compute
distance: 0.8s
3. vectorized: 0.0045s (~2000x faster than 1,
175x faster than 2)
Expressing things in primitives that are
optimized is usually faster
Linear Independence
𝒚 =
[
0
− 2
1 ]=¿
1
2
𝒂−
1
3
𝒃
𝒙 =
[
0
0
4 ]=¿
2𝒂
• Is the set {a,b,c} linearly independent?
• Is the set {a,b,x} linearly independent?
• Max # of independent 3D vectors?
𝒂=
[
0
0
2 ]𝒃=
[
0
6
0]𝒄=
[
5
0
0]
Suppose:
A set of vectors is linearly independent if you can’t
write one as a linear combination of the others.
Span
Span: all linear
combinations of a
set of vectors
Span({ }) =
Span({[0,2]}) = ?
All vertical lines
through origin =
Is blue in {red}’s
span?
Span
Span: all linear
combinations of a
set of vectors
Span({ , }) = ?
Span
Span: all linear
combinations of a
set of vectors
Span({ , }) = ?
Matrix-Vector Product
𝑨𝒙=¿
Right-multiplying A by x
mixes columns of A
according to entries of x
• The output space of f(x) = Ax is constrained to
be the span of the columns of A.
• Can’t output things you can’t construct out of
your columns
An Intuition
x
Ax
y1
y2
y3
x1 x2 x3
y
𝒚 = 𝑨𝒙=
[
¿ ¿ ¿
𝒄𝟏 𝒄𝟐 𝒄𝒏
¿ ¿ ¿
][
𝑥1
𝑥2
𝑥3
]
x – knobs on machine (e.g., fuel, brakes)
y – state of the world (e.g., where you are)
A – machine (e.g., your car)
Linear Independence
𝒚 = 𝑨𝒙=
[
¿ ¿ ¿
𝒄𝟏 𝛼 𝒄𝟏 𝒄𝟐
¿ ¿ ¿
][
𝑥1
𝑥2
𝑥3
]
Suppose the columns of 3x3 matrix A are not
linearly independent (c1, αc1, c2 for instance)
𝒚 =𝑥1 𝒄𝟏+𝛼 𝑥2 𝒄𝟏+𝑥3 𝒄𝟐
𝒚 =(𝑥1+𝛼 𝑥2)𝒄𝟏+𝑥3 𝒄𝟐
Linear Independence Intuition
Knobs of x are redundant. Even if y has 3
outputs, you can only control it in two directions
𝒚 =(𝑥1+𝛼 𝑥2)𝒄𝟏+𝑥3 𝒄𝟐
x
Ax
y1
y2
y3
x1 x2 x3
y
Linear Independence
𝑨𝒙 =( 𝑥1 + 𝛼 𝑥 2 ) 𝒄 𝟏+ 𝑥 3 𝒄𝟐
• Or, given a vector y there’s not a unique
vector x s.t. y =Ax
• Not all y have a corresponding x s.t. y=Ax
𝒚 = 𝑨
[
𝑥1+ 𝛽
𝑥2 − 𝛽 /𝛼
𝑥3
]
• Can write y an infinite number of ways by
adding to x1 and subtracting from x2
Recall:
¿(𝑥1+ 𝛽+𝛼 𝑥2− 𝛼
𝛽
𝛼 )𝑐1+𝑥3 𝑐2
Linear Independence
𝑨𝒙 =( 𝑥1 + 𝛼 𝑥 2 ) 𝒄 𝟏+ 𝑥 3 𝒄𝟐
• An infinite number of non-zero vectors x can
map to a zero-vector y
• Called the right null-space of A.
𝒚 = 𝑨
[
𝛽
− 𝛽 / 𝛼
0 ]
¿(𝛽− 𝛼
𝛽
𝛼)𝒄𝟏+0𝒄𝟐
• What else can we cancel out?
Rank
• Rank of a nxn matrix A – number of linearly
independent columns (or rows) of A / the
dimension of the span of the columns
• Matrices with full rank (n x n, rank n) behave
nicely: can be inverted, span the full output
space, are one-to-one.
• Matrices with full rank are machines where
every knob is useful and every output state can
be made by the machine
Inverses
• Given , y is a linear combination of columns of
A proportional to x. If A is full-rank, we should
be able to invert this mapping.
• Given some y (output) and A, what x (inputs)
produced it?
• x = A-1
y
• Note: if you don’t need to compute it, never
ever compute it. Solving for x is much faster
and stable than obtaining A-1
.
Symmetric Matrices
• Symmetric: or
• Have lots of special
properties [
𝑎11 𝑎12 𝑎13
𝑎21 𝑎22 𝑎23
𝑎31 𝑎32 𝑎33
]
Any matrix of the form is symmetric.
Quick check: 𝑨
𝑻
=(𝑿
𝑻
𝑿)𝑻
𝑨
𝑻
=𝑿
𝑻
(𝑿
𝑻
)𝑻
𝑨𝑻
=𝑿𝑻
𝑿
Special Matrices – Rotations
[
𝑟1 1 𝑟12 𝑟 13
𝑟 21 𝑟22 𝑟 23
𝑟 31 𝑟32 𝑟 33
]
• Rotation matrices rotate vectors and do not
change vector L2 norms ()
• Every row/column is unit norm
• Every row is linearly independent
• Transpose is inverse
• Determinant is 1 (otherwise it’s also a coordinate
flip/reflection), eigenvalues are 1
Eigensystems
• An eigenvector and eigenvalue of a matrix
satisfy ( is scaled by )
• Vectors and values are always paired and
typically you assume
• Biggest eigenvalue of A gives bounds on how
much stretches a vector x.
• Hints of what people really mean:
• “Largest eigenvector” = vector w/ largest value
• Spectral just means there’s eigenvectors
Suppose I have points in a grid
Now I apply f(x) = Ax to these points
Pointy-end: Ax . Non-Pointy-End: x
Red box – unit square, Blue box – after f(x) = Ax.
What are the yellow lines and why?
𝑨=¿
[1.1 0
0 1.1]
𝑨=¿
[0 .8 0
0 1.25]
Now I apply f(x) = Ax to these points
Pointy-end: Ax . Non-Pointy-End: x
Red box – unit square, Blue box – after f(x) = Ax.
What are the yellow lines and why?
𝑨=¿
[0 .8 0
0 1.25]
Red box – unit square, Blue box – after f(x) = Ax.
Can we draw any yellow lines?
𝑨=¿
[c os ⁡(𝑡) − sin ⁡(𝑡)
sin ⁡(𝑡) cos ⁡(𝑡) ]
Eigenvectors of Symmetric
Matrices
• Always n mutually orthogonal eigenvectors
with n (not necessarily) distinct eigenvalues
• For symmetric , the eigenvector with the
largest eigenvalue maximizes (smallest/min)
• So for unit vectors (where ), that eigenvector
maximizes
• A surprisingly large number of optimization
problems rely on (max/min)imizing this
The Singular Value Decomposition
U
A =
Rotation
Can always write a mxn matrix A as:
Eigenvectors
of AAT
∑
Scale
Sqrt of
Eigenvalues
of AT
A
σ1
σ2
σ3
0
0
The Singular Value Decomposition
U ∑
A =
Rotation Scale
VT
Rotation
Can always write a mxn matrix A as:
Eigenvectors
of AAT
Sqrt of
Eigenvalues
of AT
A
Eigenvectors
of AT
A
Singular Value Decomposition
• Every matrix is a rotation, scaling, and rotation
• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
U
A =
σ
1 σ
2 σ
3
0
0
VT
Singular Value Decomposition
• Every matrix is a rotation, scaling, and rotation
• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
U
 =
σ
1 σ
2
0
0
VT
0
Singular Value Decomposition
• Every matrix is a rotation, scaling, and rotation
• Number of non-zero singular values = rank /
number of linearly independent vectors
• “Closest” matrix to A with a lower rank
• Secretly behind basically many things you do
with matrices
Solving Least-Squares
Start with two points (xi,yi)
[𝑦1
𝑦2
]=
[𝑥1 1
𝑥2 1 ][𝑚
𝑏 ]
𝒚 =𝑨𝒗
[𝑦1
𝑦2
]=
[𝑚 𝑥1 +𝑏
𝑚 𝑥2 +𝑏 ]
We know how to solve this –
invert A and find v (i.e., (m,b)
that fits points)
(x1,y1)
(x2,y2)
Solving Least-Squares
Start with two points (xi,yi)
[𝑦1
𝑦2
]=
[𝑥1 1
𝑥2 1 ][𝑚
𝑏 ]
𝒚 =𝑨𝒗
‖[𝑦1
𝑦 2
]−
[𝑚 𝑥1 +𝑏
𝑚 𝑥2+ 𝑏]‖
2
‖𝒚 − 𝑨𝒗‖
2
=¿
¿ ( 𝑦1 −(𝑚 𝑥1 +𝑏))
2
+( 𝑦2 − (𝑚 𝑥2 +𝑏))
2
(x1,y1)
(x2,y2)
The sum of squared differences between
the actual value of y and
what the model says y should be.
Solving Least-Squares
Suppose there are n > 2 points
[
𝑦1
⋮
𝑦 𝑁
]=
[
𝑥1 1
⋮ ⋮
𝑥𝑁 1][𝑚
𝑏 ]
𝒚 =𝑨𝒗
Compute again
‖𝒚 − 𝑨𝒗‖
2
=∑
𝑖=1
𝑛
(𝑦𝑖 −(𝑚 𝑥𝑖+𝑏))
2
Solving Least-Squares
Given y, A, and v with y = Av overdetermined
(A tall / more equations than unknowns)
We want to minimize , or find:
arg mi n𝒗‖𝒚 − 𝑨𝒗‖
2
(The value of x that makes
the expression smallest)
Solution satisfies
or
(Don’t actually compute the inverse!)
When is Least-Squares Possible?
Given y, A, and v. Want y = Av
A
y = v
Want n outputs, have n knobs
to fiddle with, every knob is
useful if A is full rank.
A
y
=
v
A: rows (outputs) > columns
(knobs). Thus can’t get precise
output you want (not enough
knobs). So settle for “closest”
knob setting.
When is Least-Squares Possible?
Given y, A, and v. Want y = Av
A
y = v
Want n outputs, have n knobs
to fiddle with, every knob is
useful if A is full rank.
A
y =
v
A: columns (knobs) > rows
(outputs). Thus, any output can
be expressed in infinite ways.
Homogeneous Least-Squares
Given a set of unit vectors (aka directions) and I
want vector that is as orthogonal to all the as
possible (for some definition of orthogonal)
𝑨𝒗=
[− 𝒙𝟏
𝑻
−
¿⋮ ¿− ¿
−¿
]𝒗
Stack into A, compute Av
¿
[
𝒙𝟏
𝑻
𝒗
⋮
𝒙𝒏
𝑻
𝒗]
𝒙𝟏
𝒙𝟐
𝒙𝒏
…
𝒗
‖𝑨𝒗‖
𝟐
=∑
𝒊
𝒏
(𝒙𝒊
𝑻
𝒗 )
𝟐
Compute
0 if
orthog
Sum of how orthog. v is to each x
Homogeneous Least-Squares
• A lot of times, given a matrix A we want to find
the v that minimizes .
• I.e., want
• What’s a trivial solution?
• Set v = 0 → Av = 0
• Exclude this by forcing v to have unit norm
Homogeneous Least-Squares
Let’s look at
‖𝑨𝒗‖2
2
=¿ Rewrite as dot product
‖𝑨𝒗‖2
2
=𝒗𝑻
𝑨𝑻
𝐀𝐯=𝐯𝐓
(𝐀𝐓
𝐀 ) 𝐯
‖𝑨𝒗‖2
2
=( 𝐀𝐯 )T
(𝐀𝐯 ) Distribute transpose
We want the vector minimizing this quadratic form
Where have we seen this?
Homogeneous Least-Squares
arg min
‖𝒗‖2
=1
‖𝑨𝒗‖
2
*Note: is positive semi-definite so it has all non-negative eigenvalues
(1) “Smallest”* eigenvector of
(2) “Smallest” right singular vector of
Ubiquitious tool in vision:
For min → max, switch smallest → largest
Derivatives
Remember derivatives?
Derivative: rate at which a function f(x) changes
at a point as well as the direction that increases
the function
Given quadratic function f(x)
is function
aka
𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2
+5
Given quadratic function f(x)
What’s special
about x=2?
minim. at 2
at 2
a = minimum of f →
Reverse is not true
𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2
+5
Rates of change
Suppose I want to
increase f(x) by
changing x:
Blue area: move left
Red area: move right
Derivative tells you
direction of ascent
and rate
𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2
+5
What Calculus Should I Know
• Really need intuition
• Need chain rule
• Rest you should look up / use a computer
algebra system / use a cookbook
• Partial derivatives (and that’s it from
multivariable calculus)
Partial Derivatives
• Pretend other variables are constant, take a
derivative. That’s it.
• Make our function a function of two variables
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
𝑓 (𝑥 )=(𝑥 −2)2
+5
𝜕
𝜕𝑥
𝑓 ( 𝑥)=2 (𝑥 −2) ∗1=2(𝑥− 2)
𝜕
𝜕𝑥
𝑓 2 ( 𝑥)=2(𝑥− 2)
Pretend it’s
constant →
derivative = 0
Zooming Out
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
Dark = f(x,y) low
Bright = f(x,y) high
Taking a slice of
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
Slice of y=0 is the
function from before:
Taking a slice of
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
is rate of change &
direction in x
dimension
Zooming Out
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
is
and is the rate of
change & direction in
y dimension
Zooming Out
𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2
+5 +( 𝑦 +1 )2
Gradient/Jacobian:
Making a vector of
gives rate and
direction of change.
Arrows point OUT of
minimum / basin.
What Should I Know?
• Gradients are simply partial derivatives per-
dimension: if in has n dimensions, has n
dimensions
• Gradients point in direction of ascent and tell
the rate of ascent
• If a is minimum of →
• Reverse is not true, especially in high-
dimensional spaces
Numerical Linear Algebra in digital image processing

Numerical Linear Algebra in digital image processing

  • 1.
    Numerical Linear Algebra EECS 442– David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/
  • 2.
    Administrivia • HW 1out – due in two weeks • Follow submission format (wrong format = 0) • The homeworks are not fill-in-the-blank. This is harder to do but mirrors life • If it’s ambiguous: make a decision, document what you think and why in your homework, and move on • Highly encouraged to work together. See piazza • Please check syllabus for what’s allowed. I guarantee checking the syllabus thoroughly will help boost your grade.
  • 3.
    This Week –Math Two goals for the next two classes: • Math with computers ≠ Math • Practical math you need to know but may not have been taught
  • 4.
    This Week –Goal • Not a “Linear algebra in two lectures” – that’s impossible. • Some of this you should know! • Aimed at reviving your knowledge and plugging any gaps • Aimed at giving you intuitions
  • 5.
    Adding Numbers • 1+ 1 = ? • Suppose is normally distributed with mean and standard deviation for • How is the average, or , distributed (qualitatively), in terms of variance? • The Free Drinks in Vegas Theorem: has mean and standard deviation .
  • 6.
    Free Drinks inVegas Each game/variable has mean $0.10, std $2 100 games is uncertain and fun! 100K games is guaranteed profit: 99.999999% lowest value is $0.064. $0.01 for drinks $0.054 for profits
  • 7.
    Let’s Make ItBig • Suppose I average 50M normally distributed numbers (mean: 31, standard deviation: 1) • For instance: have predicted and actual depth for 200 480x640 images and want to know the average error (|predicted – actual|) numerator = 0 for x in xs: numerator += x return numerator / len(xs)
  • 8.
    Let’s Make ItBig • What should happen qualitatively? • Theory says that the average is distributed with mean 31 and standard deviation • What will happen? • Reality: 17.47
  • 9.
  • 10.
    What’s a Number? 10 1 1 1 0 0 27 26 25 24 23 22 21 20 1 185 185 128 + 32 + 16 + 8 + 1 =
  • 11.
    Adding Two Numbers “Integers”on a computer are integers modulo 2k Carry Flag Result 28 27 1 26 0 25 1 24 1 23 1 22 0 21 0 20 1 185 0 1 1 0 1 0 0 1 105 1 34 0 0 1 0 0 0 1 0
  • 12.
    Some Gotchas Why? 32 +(3 / 4) x 40 = 32 32 + (3 x 40) / 4 = 62 32 + 3 / 4 x 40 = 32 + 0 x 40 = 32 + 0 = 32 Underflow 32 + 3 x 40 / 4 = 32 + 120 / 4 = 32 + 30 = 62 No Underflow Ok – you have to multiply before dividing
  • 13.
    Some Gotchas 42 32 +9 x 40 / 10 = 32 + 104 / 10 = Overflow 32 + (9 x 40) / 10 = uint8 32 + (9 x 40) / 10 = 68 math Why 104? 9 x 40 = 360 360 % 256 = 104 Should be: 9x4=36 32 + 10 = 42
  • 14.
    What’s a Number? 27 1 26 0 25 1 24 1 23 1 22 0 21 0 20 1185 How can we do fractions? 25 24 23 22 21 20 2-1 2-2 1 0 1 1 1 0 0 1 45.25 45 0.25
  • 15.
    Fixed-Point Arithmetic 25 1 24 0 23 1 22 1 21 1 20 0 2-1 0 2-2 1 45.25 What’sthe largest number we can represent? 63.75 – Why? How precisely can we measure at 63? How precisely can we measure at 0? 0.25 0.25 Fine for many purposes but for science, seems silly
  • 16.
    Floating Point 0 11 1 1 0 0 1 Sign (S) Exponent (E) Fraction (F) (− 𝟏 𝑺 )(𝟐 𝑬+ 𝒃𝒊𝒂𝒔 )(𝟏+ 𝑭 𝟐 𝟑 ) 1 7 1 -1 27-7 = 20 =1 1+1/8 = 1.125 Bias: allows exponent to be negative; Note: fraction = significant = mantissa; exponents of all ones or all zeros are special numbers
  • 17.
    Floating Point Sign Exponent Fraction 00 0 -20 x 1.00 = -1 0/8 0 0 1 -20 x 1.125 = -1.125 1/8 -20 x 1.25 = -1.25 0 1 0 2/8 1 1 0 1 1 1 -20 x 1.75 = -1.75 -20 x 1.875 = -1.875 … 6/8 7/8 1 0 1 1 1 7-7=0 *(-bias)* -1
  • 18.
    Floating Point Fraction 0 00 -22 x 1.00 = -4 0/8 0 0 1 -22 x 1.125 = -4.5 1/8 0 1 0 1 1 0 1 1 1 -22 x 1.25 = -5 -22 x 1.75 = -7 -22 x 1.875 = -7.5 … 2/8 6/8 7/8 Sign Exponent 1 1 0 0 1 9-7=2 *(-bias)* -1
  • 19.
    Floating Point 0 00 1 0 1 1 1 Sign Exponent Fraction 0 0 1 -20 x 1.00 = -1 -20 x 1.125 = -1.125 0 0 0 1 1 0 0 1 0 0 1 -22 x 1.00 = -4 -22 x 1.125 = -4.5 Gap between numbers is relative, not absolute
  • 20.
    Revisiting Adding Numbers SignExponent Fraction 1 1 0 0 1 0 0 0 -22 x 1.00 = -4 1 0 1 1 0 0 0 0 -2-1 x 1.00 = -0.5 1 1 0 0 1 0 0 1 -22 x 1.125 = -4.5 Actual implementation is complex
  • 21.
    Revisiting Adding Numbers SignExponent Fraction 1 1 0 0 1 0 0 0 -22 x 1.00 = -4 1 0 1 0 0 0 0 0 -2-3 x 1.00 = -0.125 -22 x 1.00 = -4 1 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 -22 x 1.125 = -4.5 ? -22 x 1.03125 = -4.125
  • 22.
    Revisiting Adding Numbers SignExponent Fraction 1 1 0 0 1 0 0 0 -22 x 1.00 = -4 1 0 1 0 0 0 0 0 -2-3 x 1.00 = -0.125 -22 x 1.03125 = -4.125 -22 x 1.00 = -4 1 1 0 0 1 0 0 0 For a and b, these can happen a + b = a a+b-a ≠ b
  • 23.
    Revisiting Adding Numbers SExponent Fraction 8 bits 2127 ≈ 1038 23 bits ≈ 7 decimal digits S Exponent Fraction 11 bits 21023 ≈ 10308 52 bits ≈ 15 decimal digits IEEE 754 Single Precision / Single IEEE 754 Double Precision / Double
  • 24.
    Trying it Out a+b=a-> numerator is stuck, denominator isn’t Roundoff error occurs
  • 25.
    Take-homes • Computer numbersaren’t math numbers • Overflow, accidental zeros, roundoff error, and basic equalities are almost certainly incorrect for some values • Floating point defaults and numpy try to protect you. • Generally safe to use a double and use built-in- functions in numpy (not necessarily others!) • Spooky behavior = look for numerical issues
  • 26.
    Vectors [2,3] = +3 x [0,1] 2 x [1,0] 2 x + 3 x e1 e2 2 x + 3 x x = [2,3] Can be arbitrary # of dimensions (typically denoted Rn )
  • 27.
    Vectors x = [2,3] 𝒙= [2 3] 𝒙1=2 𝒙2=3 Just an array! Get in the habit of thinking of them as columns.
  • 28.
    Scaling Vectors x =[2,3] 2x = [4,6] • Can scale vector by a scalar • Scalar = single number • Dimensions changed independently • Changes magnitude / length, does not change direction.
  • 29.
    Adding Vectors y =[3,1] x+y = [5,4] x = [2,3] • Can add vectors • Dimensions changed independently • Order irrelevant • Can change direction and magnitude
  • 30.
    Scaling and Adding y= [3,1] 2x+y = [7,7] Can do both at the same time x = [2,3]
  • 31.
    Measuring Length y =[3,1] x = [2,3] Magnitude / length / (L2) norm of vector ‖𝒙‖=‖𝒙‖2= (∑ 𝑖 𝑛 𝑥𝑖 2 ) 1/2 There are other norms; assume L2 unless told otherwise ‖𝒙‖2=√13 ‖𝒚‖2 =√10 Why?
  • 32.
    Normalizing a Vector x= [2,3] y = [3,1] 𝒙′ =𝒙 /‖𝒙‖𝟐 𝒚′ = 𝒚 /‖𝒚‖𝟐 Diving by norm gives something on the unit sphere (all vectors with length 1)
  • 33.
    Dot Products 𝒙′ 𝒚′ 𝒙⋅𝒚 =∑ 𝑖=1 𝑛 𝑥𝑖𝑦𝑖=𝒙𝑻 𝒚 𝜃 𝒙 ⋅ 𝒚 =cos (𝜃 )‖𝒙‖‖𝒚‖ What happens with normalized / unit vectors?
  • 34.
    Dot Products 𝒆𝟏 𝒆𝟐 𝒙⋅𝒚 =∑ 𝑖 𝑛 𝑥𝑖𝑦𝑖 𝒙=[2,3] What’s , ? Ans: ; • Dot product is projection • Amount of x that’s also pointing in direction of y
  • 35.
    Dot Products What’s ? Ans: 𝒙⋅𝒚=∑ 𝑖 𝑛 𝑥𝑖 𝑦𝑖 𝒙=[2,3]
  • 36.
    Special Angles 𝒙′ 𝒚′ 𝜃 [1 0 ]⋅[0 1]=0 ∗1+ 1∗ 0=0 Perpendicular / orthogonal vectors have dot product 0 irrespective of their magnitude 𝒙 𝒚
  • 37.
    Special Angles [𝑥1 𝑥2 ]⋅ [𝑦1 𝑦2 ]=𝑥1 𝑦1+ 𝑥2 𝑦2=0 Perpendicular / orthogonal vectors have dot product 0 irrespective of their magnitude 𝒙′ 𝒚′ 𝜃 𝒙 𝒚
  • 38.
    Orthogonal Vectors 𝒙=[2,3] • Geometrically, what’sthe set of vectors that are orthogonal to x? • A line [3,-2]
  • 39.
    Orthogonal Vectors • What’sthe set of vectors that are orthogonal to x = [5,0,0]? • A plane/2D space of vectors/any vector • What’s the set of vectors that are orthogonal to x and y = [0,5,0]? • A line/1D space of vectors/any vector • Ambiguity in sign and magnitude 𝒙 𝒙 𝒚
  • 40.
    Cross Product • Sethas an ambiguity in sign and magnitude • Cross product is: (1) orthogonal to x, y (2) has sign given by right hand rule and (3) has magnitude given by area of parallelogram of x and y • Important: if x and y are the same direction or either is 0, then . • Only in 3D! 𝒙 𝒚 𝒙 × 𝒚 Image credit: Wikipedia.org
  • 41.
    Operations You ShouldKnow • Scale (vector, scalar → vector) • Add (vector, vector → vector) • Magnitude (vector → scalar) • Dot product (vector, vector → scalar) • Dot products are projection / angles • Cross product (vector, vector → vector) • Vectors facing same direction have cross product 0 • You can never mix vectors of different sizes
  • 42.
    Matrices Horizontally concatenate n,m-dim column vectors and you get a mxn matrix A (here 2x3) 𝑨=[ 𝒗1 ,⋯ , 𝒗 𝑛]= [𝑣11 𝑣21 𝑣31 𝑣12 𝑣22 𝑣32 ] a (scalar) lowercase undecorated a (vector) lowercase bold or arrow A (matrix) uppercase bold
  • 43.
    Matrices Vertically concatenate m,n-dim row vectors and you get a mxn matrix A (here 2x3) 𝐴= [ 𝒖1 𝑇 ⋮ 𝒖𝑛 𝑇 ]= [𝑢11 𝑢12 𝑢13 𝑢21 𝑢22 𝑢23 ] Transpose: flip rows / columns [ 𝑎 𝑏 𝑐 ] 𝑇 =[𝑎 𝑏 𝑐 ] (3x1)T = 1x3
  • 44.
    Matrix-Vector Product 𝒚2 𝑥1=𝑨2 𝑥3 𝒙3 𝑥 1 𝒚 =𝑥1 𝒗𝟏 +𝑥2 𝒗𝟐 +𝑥3 𝒗𝟑 Linear combination of columns of A [𝑦1 𝑦2 ]=¿[𝒗𝟏 𝒗𝟐 𝒗𝟑 ][ 𝑥1 𝑥2 𝑥3 ]
  • 45.
    Matrix-Vector Product 𝒚2 𝑥1=𝑨2 𝑥3 𝒙3 𝑥 1 𝑦 1=𝒖𝟏 𝑻 𝒙 Dot product between rows of A and x 𝑦 2=𝒖𝟐 𝑻 𝒙 [𝒖𝟏 𝑻 𝒖𝟐 𝑻 ] [𝑦1 𝑦2 ]=¿ 𝒙 3 3
  • 46.
    Matrix Multiplication [− 𝒂𝟏 𝑻 − ¿⋮¿− ¿ −¿ ]¿ 𝑨𝑩=¿ Generally: Amn and Bnp yield product (AB)mp Yes – in A, I’m referring to the rows, and in B, I’m referring to the columns
  • 47.
    Matrix Multiplication [− 𝒂𝟏 𝑻 − ¿⋮¿− ¿ −¿ ] ¿ 𝑨𝑩=¿ [ 𝒂𝟏 𝑻 𝒃𝟏 ⋯ 𝒂𝟏 𝑻 𝒃𝒑 ⋮ ⋱ ⋮ 𝒂𝒎 𝑻 𝒃𝟏 ⋯ 𝒂𝒎 𝑻 𝒃𝒑 ] 𝑨 𝑩𝑖𝑗=𝒂𝒊 𝑻 𝒃𝒋 Generally: Amn and Bnp yield product (AB)mp
  • 48.
    Matrix Multiplication • Dimensionsmust match • Dimensions must match • Dimensions must match • (Yes, it’s associative): ABx = (A)(Bx) = (AB)x • (No it’s not commutative): ABx ≠ (BA)x ≠ (BxA)
  • 49.
    Operations They Don’tTeach [𝑎+𝑒 𝑏+𝑒 𝑐 +𝑒 𝑑+𝑒] [𝑎 𝑏 𝑐 𝑑]+ [𝑒 𝑓 𝑔 h]=¿ [𝑎+𝑒 𝑏+ 𝑓 𝑐+𝑔 𝑑+h ] You Probably Saw Matrix Addition [𝑎 𝑏 𝑐 𝑑]+𝑒=¿ What is this? FYI: e is a scalar
  • 50.
    Broadcasting [𝑎 𝑏 𝑐 𝑑]+𝑒 ¿ [𝑎𝑏 𝑐 𝑑 ]+ [𝑒 𝑒 𝑒 𝑒] ¿ [𝑎 𝑏 𝑐 𝑑 ]+𝟏2 𝑥 2 𝑒 If you want to be pedantic and proper, you expand e by multiplying a matrix of 1s (denoted 1) Many smart matrix libraries do this automatically. This is the source of many bugs.
  • 51.
    Broadcasting Example 𝑷 = [ 𝑥1𝑦1 ⋮ ⋮ 𝑥𝑛 𝑦𝑛 ] 𝒗 = [𝑎 𝑏] Given: a nx2 matrix P and a 2D column vector v, Want: nx2 difference matrix D 𝑫= [ 𝑥1 − 𝑎 𝑦1 − 𝑏 ⋮ ⋮ 𝑥𝑛 − 𝑎 𝑦𝑛 − 𝑏] 𝑷 −𝒗𝑇 =¿ [ 𝑥1 𝑦1 ⋮ ⋮ 𝑥𝑛 𝑦𝑛 ]− [𝑎 𝑏] [𝑎 𝑏] ⋮ Blue stuff is assumed / broadcast
  • 52.
    Two Uses forMatrices 1. Storing things in a rectangular array (images, maps) • Typical operations: element-wise operations, convolution (which we’ll cover next) • Atypical operations: almost anything you learned in a math linear algebra class 2. A linear operator that maps vectors to another space (Ax) • Typical/Atypical: reverse of above
  • 53.
    Images as Matrices Supposesomeone hands you this matrix. What’s wrong with it? No contrast!
  • 54.
    Contrast – Gammacurve Typical way to change the contrast is to apply a nonlinear correction The quantity controls how much contrast gets added
  • 55.
    Contrast – Gammacurve 10% 50% 90% Now the darkest regions (10th pctile) are much darker than the moderately dark regions (50th pctile). new 10% new 50% new 90%
  • 56.
    Implementation imNew = im**4 Python+Numpy(right way): Python+Numpy (slow way – why? ): imNew = np.zeros(im.shape) for y in range(im.shape[0]): for x in range(im.shape[1]): imNew[y,x] = im[y,x]**expFactor
  • 57.
    Numerical Linear Algebra EECS 442– David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/
  • 58.
    Images as Matrices Supposesomeone hands you this matrix. The contrast is wrong! No contrast!
  • 59.
  • 60.
    Implementation imNew = im**4 Python+Numpy(right way): Python+Numpy (slow way – why? ): imNew = np.zeros(im.shape) for y in range(im.shape[0]): for x in range(im.shape[1]): imNew[y,x] = im[y,x]**expFactor
  • 61.
    Element-wise Operations ( 𝑨⊙𝑩)𝑖𝑗= 𝑨𝑖𝑗 ∗ 𝑩𝑖𝑗 “Hadamard Product” / Element-wise multiplication ( 𝑨/ 𝑩)𝑖𝑗= 𝐴𝑖𝑗 𝐵𝑖𝑗 Element-wise division ( 𝑨𝑝 )𝑖𝑗= 𝐴𝑖𝑗 𝑝 Element-wise power – beware notation
  • 62.
    Sums Across Axes 𝑨= [ 𝑥1𝑦1 ⋮ ⋮ 𝑥𝑛 𝑦𝑛 ] Suppose have Nx2 matrix A Σ( 𝑨 ,1)= [ 𝑥1 +𝑦1 ⋮ 𝑥𝑛 +𝑦𝑛 ] ND col. vec. Σ( 𝑨 ,0)= [∑ 𝑖=1 𝑛 𝑥𝑖 ,∑ 𝑖=1 𝑛 𝑦𝑖 ] 2D row vec Note – libraries distinguish between N-D column vector and Nx1 matrix.
  • 63.
    Vectorizing Example • SupposeI represent each image as a 128- dimensional vector • I want to compute all the pairwise distances between {x1, …, xN} and {y1, …, yM} so I can find, for every xi the nearest yj • Identity: • Or:
  • 64.
    Vectorizing Example 𝑿= [− 𝒙1− ¿⋮ ¿− ¿ −¿]𝒀= [− 𝒚1 − ¿⋮ ¿− ¿ −¿] (𝑿 𝒀𝑻 )𝑖𝑗=𝒙𝒊 𝑻 𝒚 𝒋 𝒀 𝑻 =¿ 𝚺( 𝑿 𝟐 ,𝟏)= [ ‖𝒙𝟏‖ 𝟐 ⋮ ‖𝒙𝑵‖ 𝟐 ] Compute a Nx1 vector of norms (can also do Mx1) Compute a NxM matrix of dot products
  • 65.
    Vectorizing Example 𝐃=(Σ(𝑿 𝟐 ,1)+Σ(𝒀 𝟐 ,1) 𝑻 −2 𝑿𝒀 𝑻 ) 1/2 [ ‖𝒙𝟏‖ 𝟐 ⋮ ‖𝒙𝑵‖ 𝟐 ]+[‖𝒚 1‖ 𝟐 ⋯ ‖𝒚 𝑀‖ 𝟐 ] (Σ( 𝑿 2 , 1)+Σ (𝒀 2 ,1) 𝑇 )𝑖𝑗=‖𝒙𝑖‖ 2 +‖𝒚 𝑗‖ 2 [ ‖𝒙𝟏‖ 2 +‖𝒚 𝟏‖ 2 ⋯ ‖𝒙𝟏‖ 2 +‖𝒚 𝑴‖ 2 ⋮ ⋱ ⋮ ‖𝒙 𝑵‖ 2 +‖𝒚 𝟏‖ 2 ⋯ ‖𝒙 𝑵‖ 2 +‖𝒚 𝑴‖ 2 ] Why?
  • 66.
    Vectorizing Example 𝐃𝑖𝑗=‖𝒙𝒊‖ 2 +‖𝒚 𝒋‖ 2 +2𝒙𝑻 𝒚 Numpycode: XNorm = np.sum(X**2,axis=1,keepdims=True) YNorm = np.sum(Y**2,axis=1,keepdims=True) D = (XNorm+YNorm.T-2*np.dot(X,Y.T))**0.5 𝐃=(Σ(𝑿 𝟐 ,1)+Σ(𝒀 𝟐 ,1) 𝑻 −2 𝑿𝒀 𝑻 ) 1/2 *May have to make sure this is at least 0 (sometimes roundoff issues happen)
  • 67.
    Does it Makea Difference? Computing pairwise distances between 300 and 400 128-dimensional vectors 1. for x in X, for y in Y, using native python: 9s 2. for x in X, for y in Y, using numpy to compute distance: 0.8s 3. vectorized: 0.0045s (~2000x faster than 1, 175x faster than 2) Expressing things in primitives that are optimized is usually faster
  • 68.
    Linear Independence 𝒚 = [ 0 −2 1 ]=¿ 1 2 𝒂− 1 3 𝒃 𝒙 = [ 0 0 4 ]=¿ 2𝒂 • Is the set {a,b,c} linearly independent? • Is the set {a,b,x} linearly independent? • Max # of independent 3D vectors? 𝒂= [ 0 0 2 ]𝒃= [ 0 6 0]𝒄= [ 5 0 0] Suppose: A set of vectors is linearly independent if you can’t write one as a linear combination of the others.
  • 69.
    Span Span: all linear combinationsof a set of vectors Span({ }) = Span({[0,2]}) = ? All vertical lines through origin = Is blue in {red}’s span?
  • 70.
    Span Span: all linear combinationsof a set of vectors Span({ , }) = ?
  • 71.
    Span Span: all linear combinationsof a set of vectors Span({ , }) = ?
  • 72.
    Matrix-Vector Product 𝑨𝒙=¿ Right-multiplying Aby x mixes columns of A according to entries of x • The output space of f(x) = Ax is constrained to be the span of the columns of A. • Can’t output things you can’t construct out of your columns
  • 73.
    An Intuition x Ax y1 y2 y3 x1 x2x3 y 𝒚 = 𝑨𝒙= [ ¿ ¿ ¿ 𝒄𝟏 𝒄𝟐 𝒄𝒏 ¿ ¿ ¿ ][ 𝑥1 𝑥2 𝑥3 ] x – knobs on machine (e.g., fuel, brakes) y – state of the world (e.g., where you are) A – machine (e.g., your car)
  • 74.
    Linear Independence 𝒚 =𝑨𝒙= [ ¿ ¿ ¿ 𝒄𝟏 𝛼 𝒄𝟏 𝒄𝟐 ¿ ¿ ¿ ][ 𝑥1 𝑥2 𝑥3 ] Suppose the columns of 3x3 matrix A are not linearly independent (c1, αc1, c2 for instance) 𝒚 =𝑥1 𝒄𝟏+𝛼 𝑥2 𝒄𝟏+𝑥3 𝒄𝟐 𝒚 =(𝑥1+𝛼 𝑥2)𝒄𝟏+𝑥3 𝒄𝟐
  • 75.
    Linear Independence Intuition Knobsof x are redundant. Even if y has 3 outputs, you can only control it in two directions 𝒚 =(𝑥1+𝛼 𝑥2)𝒄𝟏+𝑥3 𝒄𝟐 x Ax y1 y2 y3 x1 x2 x3 y
  • 76.
    Linear Independence 𝑨𝒙 =(𝑥1 + 𝛼 𝑥 2 ) 𝒄 𝟏+ 𝑥 3 𝒄𝟐 • Or, given a vector y there’s not a unique vector x s.t. y =Ax • Not all y have a corresponding x s.t. y=Ax 𝒚 = 𝑨 [ 𝑥1+ 𝛽 𝑥2 − 𝛽 /𝛼 𝑥3 ] • Can write y an infinite number of ways by adding to x1 and subtracting from x2 Recall: ¿(𝑥1+ 𝛽+𝛼 𝑥2− 𝛼 𝛽 𝛼 )𝑐1+𝑥3 𝑐2
  • 77.
    Linear Independence 𝑨𝒙 =(𝑥1 + 𝛼 𝑥 2 ) 𝒄 𝟏+ 𝑥 3 𝒄𝟐 • An infinite number of non-zero vectors x can map to a zero-vector y • Called the right null-space of A. 𝒚 = 𝑨 [ 𝛽 − 𝛽 / 𝛼 0 ] ¿(𝛽− 𝛼 𝛽 𝛼)𝒄𝟏+0𝒄𝟐 • What else can we cancel out?
  • 78.
    Rank • Rank ofa nxn matrix A – number of linearly independent columns (or rows) of A / the dimension of the span of the columns • Matrices with full rank (n x n, rank n) behave nicely: can be inverted, span the full output space, are one-to-one. • Matrices with full rank are machines where every knob is useful and every output state can be made by the machine
  • 79.
    Inverses • Given ,y is a linear combination of columns of A proportional to x. If A is full-rank, we should be able to invert this mapping. • Given some y (output) and A, what x (inputs) produced it? • x = A-1 y • Note: if you don’t need to compute it, never ever compute it. Solving for x is much faster and stable than obtaining A-1 .
  • 80.
    Symmetric Matrices • Symmetric:or • Have lots of special properties [ 𝑎11 𝑎12 𝑎13 𝑎21 𝑎22 𝑎23 𝑎31 𝑎32 𝑎33 ] Any matrix of the form is symmetric. Quick check: 𝑨 𝑻 =(𝑿 𝑻 𝑿)𝑻 𝑨 𝑻 =𝑿 𝑻 (𝑿 𝑻 )𝑻 𝑨𝑻 =𝑿𝑻 𝑿
  • 81.
    Special Matrices –Rotations [ 𝑟1 1 𝑟12 𝑟 13 𝑟 21 𝑟22 𝑟 23 𝑟 31 𝑟32 𝑟 33 ] • Rotation matrices rotate vectors and do not change vector L2 norms () • Every row/column is unit norm • Every row is linearly independent • Transpose is inverse • Determinant is 1 (otherwise it’s also a coordinate flip/reflection), eigenvalues are 1
  • 82.
    Eigensystems • An eigenvectorand eigenvalue of a matrix satisfy ( is scaled by ) • Vectors and values are always paired and typically you assume • Biggest eigenvalue of A gives bounds on how much stretches a vector x. • Hints of what people really mean: • “Largest eigenvector” = vector w/ largest value • Spectral just means there’s eigenvectors
  • 83.
    Suppose I havepoints in a grid
  • 84.
    Now I applyf(x) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x
  • 85.
    Red box –unit square, Blue box – after f(x) = Ax. What are the yellow lines and why? 𝑨=¿ [1.1 0 0 1.1]
  • 86.
    𝑨=¿ [0 .8 0 01.25] Now I apply f(x) = Ax to these points Pointy-end: Ax . Non-Pointy-End: x
  • 87.
    Red box –unit square, Blue box – after f(x) = Ax. What are the yellow lines and why? 𝑨=¿ [0 .8 0 0 1.25]
  • 88.
    Red box –unit square, Blue box – after f(x) = Ax. Can we draw any yellow lines? 𝑨=¿ [c os ⁡(𝑡) − sin ⁡(𝑡) sin ⁡(𝑡) cos ⁡(𝑡) ]
  • 89.
    Eigenvectors of Symmetric Matrices •Always n mutually orthogonal eigenvectors with n (not necessarily) distinct eigenvalues • For symmetric , the eigenvector with the largest eigenvalue maximizes (smallest/min) • So for unit vectors (where ), that eigenvector maximizes • A surprisingly large number of optimization problems rely on (max/min)imizing this
  • 90.
    The Singular ValueDecomposition U A = Rotation Can always write a mxn matrix A as: Eigenvectors of AAT ∑ Scale Sqrt of Eigenvalues of AT A σ1 σ2 σ3 0 0
  • 91.
    The Singular ValueDecomposition U ∑ A = Rotation Scale VT Rotation Can always write a mxn matrix A as: Eigenvectors of AAT Sqrt of Eigenvalues of AT A Eigenvectors of AT A
  • 92.
    Singular Value Decomposition •Every matrix is a rotation, scaling, and rotation • Number of non-zero singular values = rank / number of linearly independent vectors • “Closest” matrix to A with a lower rank U A = σ 1 σ 2 σ 3 0 0 VT
  • 93.
    Singular Value Decomposition •Every matrix is a rotation, scaling, and rotation • Number of non-zero singular values = rank / number of linearly independent vectors • “Closest” matrix to A with a lower rank U Â = σ 1 σ 2 0 0 VT 0
  • 94.
    Singular Value Decomposition •Every matrix is a rotation, scaling, and rotation • Number of non-zero singular values = rank / number of linearly independent vectors • “Closest” matrix to A with a lower rank • Secretly behind basically many things you do with matrices
  • 95.
    Solving Least-Squares Start withtwo points (xi,yi) [𝑦1 𝑦2 ]= [𝑥1 1 𝑥2 1 ][𝑚 𝑏 ] 𝒚 =𝑨𝒗 [𝑦1 𝑦2 ]= [𝑚 𝑥1 +𝑏 𝑚 𝑥2 +𝑏 ] We know how to solve this – invert A and find v (i.e., (m,b) that fits points) (x1,y1) (x2,y2)
  • 96.
    Solving Least-Squares Start withtwo points (xi,yi) [𝑦1 𝑦2 ]= [𝑥1 1 𝑥2 1 ][𝑚 𝑏 ] 𝒚 =𝑨𝒗 ‖[𝑦1 𝑦 2 ]− [𝑚 𝑥1 +𝑏 𝑚 𝑥2+ 𝑏]‖ 2 ‖𝒚 − 𝑨𝒗‖ 2 =¿ ¿ ( 𝑦1 −(𝑚 𝑥1 +𝑏)) 2 +( 𝑦2 − (𝑚 𝑥2 +𝑏)) 2 (x1,y1) (x2,y2) The sum of squared differences between the actual value of y and what the model says y should be.
  • 97.
    Solving Least-Squares Suppose thereare n > 2 points [ 𝑦1 ⋮ 𝑦 𝑁 ]= [ 𝑥1 1 ⋮ ⋮ 𝑥𝑁 1][𝑚 𝑏 ] 𝒚 =𝑨𝒗 Compute again ‖𝒚 − 𝑨𝒗‖ 2 =∑ 𝑖=1 𝑛 (𝑦𝑖 −(𝑚 𝑥𝑖+𝑏)) 2
  • 98.
    Solving Least-Squares Given y,A, and v with y = Av overdetermined (A tall / more equations than unknowns) We want to minimize , or find: arg mi n𝒗‖𝒚 − 𝑨𝒗‖ 2 (The value of x that makes the expression smallest) Solution satisfies or (Don’t actually compute the inverse!)
  • 99.
    When is Least-SquaresPossible? Given y, A, and v. Want y = Av A y = v Want n outputs, have n knobs to fiddle with, every knob is useful if A is full rank. A y = v A: rows (outputs) > columns (knobs). Thus can’t get precise output you want (not enough knobs). So settle for “closest” knob setting.
  • 100.
    When is Least-SquaresPossible? Given y, A, and v. Want y = Av A y = v Want n outputs, have n knobs to fiddle with, every knob is useful if A is full rank. A y = v A: columns (knobs) > rows (outputs). Thus, any output can be expressed in infinite ways.
  • 101.
    Homogeneous Least-Squares Given aset of unit vectors (aka directions) and I want vector that is as orthogonal to all the as possible (for some definition of orthogonal) 𝑨𝒗= [− 𝒙𝟏 𝑻 − ¿⋮ ¿− ¿ −¿ ]𝒗 Stack into A, compute Av ¿ [ 𝒙𝟏 𝑻 𝒗 ⋮ 𝒙𝒏 𝑻 𝒗] 𝒙𝟏 𝒙𝟐 𝒙𝒏 … 𝒗 ‖𝑨𝒗‖ 𝟐 =∑ 𝒊 𝒏 (𝒙𝒊 𝑻 𝒗 ) 𝟐 Compute 0 if orthog Sum of how orthog. v is to each x
  • 102.
    Homogeneous Least-Squares • Alot of times, given a matrix A we want to find the v that minimizes . • I.e., want • What’s a trivial solution? • Set v = 0 → Av = 0 • Exclude this by forcing v to have unit norm
  • 103.
    Homogeneous Least-Squares Let’s lookat ‖𝑨𝒗‖2 2 =¿ Rewrite as dot product ‖𝑨𝒗‖2 2 =𝒗𝑻 𝑨𝑻 𝐀𝐯=𝐯𝐓 (𝐀𝐓 𝐀 ) 𝐯 ‖𝑨𝒗‖2 2 =( 𝐀𝐯 )T (𝐀𝐯 ) Distribute transpose We want the vector minimizing this quadratic form Where have we seen this?
  • 104.
    Homogeneous Least-Squares arg min ‖𝒗‖2 =1 ‖𝑨𝒗‖ 2 *Note:is positive semi-definite so it has all non-negative eigenvalues (1) “Smallest”* eigenvector of (2) “Smallest” right singular vector of Ubiquitious tool in vision: For min → max, switch smallest → largest
  • 105.
    Derivatives Remember derivatives? Derivative: rateat which a function f(x) changes at a point as well as the direction that increases the function
  • 106.
    Given quadratic functionf(x) is function aka 𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2 +5
  • 107.
    Given quadratic functionf(x) What’s special about x=2? minim. at 2 at 2 a = minimum of f → Reverse is not true 𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2 +5
  • 108.
    Rates of change SupposeI want to increase f(x) by changing x: Blue area: move left Red area: move right Derivative tells you direction of ascent and rate 𝑓 (𝑥 , 𝑦 )=( 𝑥 −2)2 +5
  • 109.
    What Calculus ShouldI Know • Really need intuition • Need chain rule • Rest you should look up / use a computer algebra system / use a cookbook • Partial derivatives (and that’s it from multivariable calculus)
  • 110.
    Partial Derivatives • Pretendother variables are constant, take a derivative. That’s it. • Make our function a function of two variables 𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2 +5 +( 𝑦 +1 )2 𝑓 (𝑥 )=(𝑥 −2)2 +5 𝜕 𝜕𝑥 𝑓 ( 𝑥)=2 (𝑥 −2) ∗1=2(𝑥− 2) 𝜕 𝜕𝑥 𝑓 2 ( 𝑥)=2(𝑥− 2) Pretend it’s constant → derivative = 0
  • 111.
    Zooming Out 𝑓 2(𝑥 , 𝑦 )=(𝑥 −2)2 +5 +( 𝑦 +1 )2 Dark = f(x,y) low Bright = f(x,y) high
  • 112.
    Taking a sliceof 𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2 +5 +( 𝑦 +1 )2 Slice of y=0 is the function from before:
  • 113.
    Taking a sliceof 𝑓 2 (𝑥 , 𝑦 )=(𝑥 −2)2 +5 +( 𝑦 +1 )2 is rate of change & direction in x dimension
  • 114.
    Zooming Out 𝑓 2(𝑥 , 𝑦 )=(𝑥 −2)2 +5 +( 𝑦 +1 )2 is and is the rate of change & direction in y dimension
  • 115.
    Zooming Out 𝑓 2(𝑥 , 𝑦 )=(𝑥 −2)2 +5 +( 𝑦 +1 )2 Gradient/Jacobian: Making a vector of gives rate and direction of change. Arrows point OUT of minimum / basin.
  • 116.
    What Should IKnow? • Gradients are simply partial derivatives per- dimension: if in has n dimensions, has n dimensions • Gradients point in direction of ascent and tell the rate of ascent • If a is minimum of → • Reverse is not true, especially in high- dimensional spaces