Optimum engineering design - Day 5. Clasical optimization methods

Optimization Methods
in Engineering Design
Day-5

Course Materials
• Arora, Introduction to Optimum Design, 3e, Elsevier,
(https://www.researchgate.net/publication/273120102_Introductio
n_to_Optimum_design)
• Parkinson, Optimization Methods for Engineering Design, Brigham
Young University
(http://apmonitor.com/me575/index.php/Main/BookChapters)
• Iqbal, Fundamental Engineering Optimization Methods, BookBoon
(https://bookboon.com/en/fundamental-engineering-optimization-
methods-ebook)

Numerical Optimization
• Consider an unconstrained NP problem: min
𝒙
𝑓 𝒙
• Use an iterative method to solve the problem: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘𝒅𝑘,
where 𝒅𝑘
is a search direction and 𝛼𝑘 is the step size, such that the
function value decreases at each step, i.e., 𝑓 𝒙𝑘+1
< 𝑓 𝒙𝑘
• We expect lim
𝑘→∞
𝒙𝑘 = 𝒙∗
• The general iterative method is a two-step process:
– Finding a suitable search direction 𝒅𝑘
along which the function
value locally decreases and any constraints are obeyed.
– Performing line search along 𝒅𝑘
to find 𝒙𝑘+1
such that 𝑓 𝒙𝑘+1
attains its minimum value.

The Iterative Method
• Iterative algorithm:
1. Initialize: chose 𝒙0
2. Check termination: 𝛻𝑓 𝒙𝑘
≅ 0
3. Find a suitable search direction 𝒅𝑘
,
that obeys the descent condition:
𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘 < 0
4. Search along 𝒅𝑘
to find where
𝑓 𝒙𝑘+1
attains minimum value
(line search problem)
5. Return to step 2

The Line Search Problem
• Assuming a suitable search direction 𝒅𝑘 has been determined, we
seek to determine a step length 𝛼𝑘, that minimizes 𝑓 𝒙𝑘+1 .
• Assuming 𝒙𝑘
and 𝒅𝑘
are known, the projected function value along
𝒅𝑘 is expressed as:
𝑓 𝒙𝑘
+ 𝛼𝑘𝒅𝑘
= 𝑓 𝒙𝑘
+ 𝛼𝒅𝑘
= 𝑓(𝛼)
• The line search problem to choose 𝛼 to minimize 𝑓 𝒙𝑘+1 along 𝒅𝑘
is defined as:
min
𝛼
𝑓(𝛼) = 𝑓 𝒙𝑘
+ α𝒅𝑘
• Assuming that a solution exists, it is found by setting 𝑓′ 𝛼 = 0.

Example: Quadratic Function
• Consider minimizing a quadratic function:
𝑓 𝒙 = 1
2 𝒙𝑇𝑨𝒙 − 𝒃𝑇𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃
• Given a descent direction 𝒅, the line search problem is defined as:
min
𝛼
𝑓(𝛼) = 𝒙𝑘
+ 𝛼𝒅
𝑇
𝑨 𝒙𝑘
+ 𝛼𝒅 − 𝒃𝑇
𝒙𝑘
+ 𝛼𝒅
• A solution is found by setting 𝑓′
𝛼 = 0, where
𝑓′ 𝛼 = 𝒅𝑇𝑨 𝒙𝑘 + 𝛼𝒅 − 𝒅𝑇𝒃 = 0
𝛼 = −
𝒅𝑇
𝑨𝒙𝑘
− 𝒃
𝒅𝑇𝑨𝒅
= −
𝛻𝑓(𝒙𝑘
)𝑇
𝒅
𝒅𝑇𝑨𝒅
• Finally, 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝒅.

Computer Methods for Line Search Problem
• Interval reduction methods
– Golden search
– Fibonacci search
• Approximate search methods
– Arjimo’s rule
– Quadrature curve fitting

Interval Reduction Methods
• The interval reduction methods find the minimum of a unimodal
function in two steps:
– Bracketing the minimum to an interval
– Reducing the interval to desired accuracy
• The bracketing step aims to find a three-point pattern, such that for
𝑥1, 𝑥2, 𝑥3, 𝑓 𝑥1 ≥ 𝑓 𝑥2 < 𝑓 𝑥3 .

Fibonacci’s Method
• The Fibonacci’s method uses Fibonacci numbers to achieve
maximum interval reduction in a given number of steps.
• The Fibonacci number sequence is generated as:
𝐹0 = 𝐹1 = 1, 𝐹𝑖 = 𝐹𝑖−1 + 𝐹𝑖−2, 𝑖 ≥ 2.
• The properties of Fibonacci numbers include:
– They achieve the golden ratio 𝜏 = lim
𝑛→∞
𝐹𝑛−1
𝐹𝑛
=
5−1
2
≅ 0.618034
– The number of interval reductions 𝑛 required to achieve a desired
accuracy 𝜀 (where 1/𝐹𝑛 < 𝜀) is specified in advance.
– For given 𝐼1 and 𝑛, 𝐼2 =
𝐹𝑛−1
𝐹𝑛
𝐼1, 𝐼3 = 𝐼1 − 𝐼2, 𝐼4 = 𝐼2 − 𝐼3, etc.

The Golden Section Method
• The golden section method uses the golden ratio: 𝜏 = 0.618034.
• The golden section algorithm is given as:
1. Initialize: specify 𝑥1, 𝑥4 𝐼1 = 𝑥4 − 𝑥1 , 𝜀, 𝑛: 𝜏𝑛 <
𝜀
𝐼1
2. Compute 𝑥2 = 𝜏𝑥1 + 1 − 𝜏 𝑥4, evaluate 𝑓2
3. For 𝑖 = 1, … , 𝑛 − 1
Compute 𝑥3 = 1 − 𝜏 𝑥1 + 𝜏𝑥4, evaluate 𝑓3; if 𝑓2 < 𝑓3, set
𝑥4 ← 𝑥1, 𝑥1 ← 𝑥3; else set 𝑥1 ← 𝑥2, 𝑥2 ← 𝑥3, 𝑓2 ← 𝑓3

Approximate Search Methods
• Consider the line search problem: min
𝛼
𝑓(𝛼) = 𝑓 𝒙𝑘 + α𝒅𝑘
• Sufficient Descent Condition. The sufficient descent condition guards
against 𝒅𝑘 becoming too close to 𝛻𝑓 𝒙𝑘 . The condition is stated as:
𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘
< −𝑐 𝛻𝑓 𝒙𝑘 2
, 𝑐 > 0
• Sufficient Decrease Condition. The sufficient decrease condition ensures
a nontrivial reduction in the function value. The condition is stated as:
𝑓 𝒙𝑘 + 𝛼𝒅𝑘 − 𝑓 𝒙𝑘 ≤ 𝜇 𝛼 𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘, 0 < 𝜇 < 1
• Curvature Condition. The curvature condition guards against 𝛼 becoming
too small. The condition is stated as:
𝑓 𝒙𝑘
+ 𝛼𝒅𝑘 𝑇
𝒅𝑘
≥ 𝑓 𝒙𝑘
+ 𝜂 𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘
, 0 < 𝜇 < 𝜂 < 1

Approximate Line Search
• Strong Wolfe Conditions. The strong Wolfe conditions commonly
used by all line search algorithms include:
1. The sufficient decrease condition (Arjimo’s rule):
𝑓 𝛼 ≤ 𝑓 0 + 𝜇𝛼𝑓′
(0), 0 < 𝜇 < 1
2. Strong curvature condition:
𝑓′
𝛼 ≤ 𝜂 𝑓′
0 , 0 < 𝜇 ≤ 𝜂 < 1

• The approximate line search includes two steps:
– Bracketing the minimum
– Estimating the minimum
• Bracketing the Minimum. In the bracketing step we seek an interval
𝛼, 𝛼 such that 𝑓′
𝛼 < 0 and 𝑓′
𝛼 > 0.
– Since for any descent direction, 𝑓′
0 < 0, therefore, 𝛼 = 0 serves
as a lower bound on 𝛼. To find an upper bound, gradually increase
𝛼, e.g., 𝛼 = 1,2, …,
– Assume that for some 𝛼𝑖 > 0, we get 𝑓′
𝛼𝑖 < 0 and 𝑓′
𝛼𝑖+1 > 0;
then, 𝛼𝑖 serves as an upper bound.

• Estimating the Minimum. Once the minimum has been bracketed
to a small interval, a quadratic or cubic polynomial approximation is
used to find the minimizer.
• If the polynomial minimizer 𝛼 satisfies strong Wolfe’s conditions for
the desired 𝜇 and 𝜂 values (say 𝜇 = 0.2, 𝜂 = 0.5), it is taken as the
function minimizer.
• Otherwise, 𝛼 is used to replace one of the 𝛼 or 𝛼, and the
polynomial approximation step repeated.

Quadratic Curve Fitting
• Assuming that the interval 𝛼𝑙, 𝛼𝑢 contains the minimum of a
unimodal function, 𝑓 𝛼 , its quadratic approximation, given as:
𝑞 𝛼 = 𝑎0 + 𝑎1𝛼 + 𝑎2𝛼2
, is obtained using three points
𝛼𝑙, 𝛼𝑚, 𝛼𝑢 , where the mid-point may be used for 𝛼𝑚
The quadratic coefficients {𝑎0, 𝑎1, 𝑎2} are solved as:
𝑎2 =
1
𝛼𝑢−𝛼𝑚
𝑓 𝛼𝑢 −𝑓 𝛼𝑙
𝛼𝑢−𝛼𝑙
−
𝑓 𝛼𝑚 −𝑓 𝛼𝑙
𝛼𝑚−𝛼𝑙
𝑎1 =
1
𝛼𝑚−𝛼𝑙
𝑓 𝛼𝑚 − 𝑓 𝛼𝑙 − 𝑎2(𝛼𝑙 + 𝛼𝑚)
𝑎0 = 𝑓(𝛼𝑙) − 𝑎1𝛼𝑙 − 𝑎2𝛼𝑙
2
Then, the minimum is given as: 𝛼𝑚𝑖𝑛 = −
𝑎1
2𝑎2

Example: Approximate Search
• Let 𝑓 𝛼 = 𝑒−𝛼 + 𝛼2, 𝑓′ 𝛼 = 2𝛼 − 𝑒−𝛼, 𝑓 0 = 1, 𝑓′ 0 = −1.
Let 𝜇 = 0.2, and try 𝛼 = 0.1, 0.2, …, to bracket the minimum.
• From the sufficient decrease condition, the minimum is bracketed
in the interval: [0, 0.5]
• Using quadratic approximation, the minimum is found as:
𝑥∗
= 0.3531
The exact solution is given as: 𝛼𝑚𝑖𝑛 = 0.3517
• The Matlab commands are:
Define the function:
f=@(x) x.*x+exp(-x);
mu=0.2; al=0:.1:1;

Example: Approximate Search
• Bracketing the minimum:
f1=feval(f,al)
1.0000 0.9148 0.8587 0.8308 0.8303 0.8565 0.9088 0.9866
1.0893 1.2166 1.3679
>> f2=f(0)-mu*al
1.0000 0.9800 0.9600 0.9400 0.9200 0.9000 0.8800 0.8600
0.8400 0.8200 0.8000
>> idx=find(f1<=f2)
• Quadratic approximation to find the minimum:
al=0; am=0.25; au=0.5;
a2 = ((f(au)-f(al))/(au-al)-(f(am)-f(al))/(am-al))/(au-am);
a1 = (f(am)-f(al))/(am-al)-a2*(al+am);
xmin = -a1/a2/2 % 0.3531

Computer Methods for Finding the Search Direction
• Gradient based methods
– Steepest descent method
– Conjugate gradient method
– Quasi Newton methods
• Hessian based methods
– Newton’s method
– Trust region methods

Steepest Descent Method
• The steepest descent method determines the search direction as:
𝒅𝑘 = −𝛻𝑓(𝒙𝑘),
• The update rule is given as: 𝒙𝑘+1
= 𝒙𝑘
− 𝛼𝑘 ∙ 𝛻𝑓(𝒙𝑘
)
where 𝛼𝑘 is determined by minimizing 𝑓(𝒙𝑘+1) along 𝒅𝑘
• Example: quadratic function
𝑓 𝒙 =
1
2
𝒙𝑇
𝑨𝒙 − 𝒃𝑇
𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃
Then, 𝒙𝑘+1
= 𝒙𝑘
− 𝛼 ∙ 𝛻𝑓 𝒙𝑘
; 𝛼 =
𝛻 𝑓 𝒙𝑘 𝑇
𝛻 𝑓 𝒙𝑘
𝛻 𝑓 𝒙𝑘 𝑇
𝐀𝛻 𝑓 𝒙𝑘
Define 𝒓𝑘 = 𝒃 − 𝑨𝒙𝑘
Then, 𝒙𝑘+1
= 𝒙𝑘
+ 𝛼𝑘𝒓𝑘; 𝛼𝑘 =
𝒓𝑘
𝑇
𝒓𝑘
𝒓𝑘
𝑇𝐴𝒓𝑘

Steepest Descent Algorithm
• Initialize: choose 𝒙0
• For 𝑘 = 0,1,2, …
– Compute 𝛻𝑓(𝒙𝑘
)
– Check convergence: if 𝛻𝑓(𝒙𝑘
) < 𝜖, stop.
– Set 𝒅𝑘 = −𝛻𝑓(𝒙𝑘)
– Line search problem: Find min
𝛼≥0
𝑓 𝒙𝑘 + 𝛼𝒅𝑘
– Set 𝒙𝑘+1
= 𝒙𝑘
+ 𝛼𝒅𝑘
.

Example: Steepest Descent
• Consider min
𝒙
𝑓 𝒙 = 0.1𝑥1
2
+ 𝑥2
2
,
𝛻𝑓 𝒙 =
0.2𝑥1
2𝑥2
, 𝛻2𝑓 𝑥 =
0.1 0
0 1
; let 𝒙0 =
5
1
, then, 𝑓 𝒙0 = 3.5,
𝑑1 = −𝛻𝑓 𝒙0 =
−1
−2
, 𝛼 = 0.61
𝒙1 =
4.39
−0.22
, 𝑓 𝒙1 = 1.98
Continuing..

Example: Steepest Descent
• MATLAB code:
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1];
xall=x';
for i=1:10
d=-df(x);
a=d'*d/(d'*H*d);
x=x+a*d;
xall=[xall;x'];
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal

Steepest Descent Method
• The steepest descent method becomes slow close to the optimum
• The method progresses in a zigzag fashion, since
𝑑
𝑑𝛼
𝑓 𝒙𝑘
+ 𝛼𝒅𝑘
= 𝛻 𝑓 𝒙𝑘+1 𝑇
𝒅𝑘
= −𝛻 𝑓 𝒙𝑘+1 𝑇
𝛻 𝑓 𝒙𝑘
= 0
• The method has linear convergence with rate constant
𝐶 =
𝑓 𝒙𝑘+1 −𝑓 𝒙∗
𝑓 𝒙𝑘 −𝑓 𝒙∗ ≤
𝑐𝑜𝑛𝑑 𝑨 −1
𝑐𝑜𝑛𝑑 𝑨 +1
2

Preconditioning
• Preconditioning (scaling) can be used to reduce the condition
number of the Hessian matrix and hence aid convergence
• Consider 𝑓 𝒙 = 0.1𝑥1
2
+ 𝑥2
2
= 𝒙𝑇
𝑨𝒙, where 𝑨 = 𝑑𝑖𝑎𝑔(0.1, 1)
• Define a linear transformation: 𝒙 = 𝑷𝒚, where 𝑷 = 𝑑𝑖𝑎𝑔( 10, 1);
then, 𝑓 𝒙 = 𝒚𝑇
𝑷𝑇
𝑨𝑷𝒚 = 𝒚𝑇
𝒚
• Since 𝑐𝑜𝑛𝑑 𝑰 = 1, the steepest descent method in the case of a
quadratic function converges in a single iteration

Conjugate Gradient Method
• For any square matrix 𝑨, the set of 𝑨-conjugate vectors is defined
by: 𝒅𝑖𝑇
𝑨𝒅𝑗 = 0, 𝑖 ≠ 𝑗
• Let 𝒈𝑘 = 𝛻 𝑓 𝒙𝑘 denote the gradient; then, starting from
𝒅0
= −𝒈0, a set of 𝑨-conjugate directions is generated as:
𝒅0 = −𝒈0; 𝒅𝑘+1 = −𝒈𝑘+1 + 𝛽𝑘𝒅𝑘 𝑘 ≥ 0, …
where 𝛽𝑘 =
𝒈𝑘+1
𝑇
𝑨𝒅𝑘
𝒅𝑘𝑇
𝑨𝒅𝑘
There are multiple ways to generate conjugate directions
• Using {𝒅0
, 𝒅2
, … , 𝒅𝑛−1
} as search directions, a quadratic function is
minimized in 𝑛 steps.

Conjugate Directions Method
• The parameter 𝛽𝑘 can be computed in different ways:
– By substituting 𝑨𝒅𝑘
=
1
𝛼𝑘
(𝒈𝑘+1 − 𝒈𝑘), we obtain:
𝛽𝑘 =
𝒈𝑘+1
𝑇
(𝒈𝑘+1−𝒈𝑘)
𝒅𝑘𝑇
(the Hestenes-Stiefel formula)
– In the case of exact line search, 𝑔𝑘+1
𝑇
𝒅𝑘
= 0; then
𝛽𝑘 =
𝒈𝑘+1
𝑇
𝒈𝑘
𝑇𝒈𝑘
(the Polak-Ribiere formula)
– Also, for exact line search 𝒈𝑘+1
𝑇
𝒈𝑘 = 𝛽𝑘−1(𝒈𝑘 + 𝛼𝑘𝑨𝒅𝑘
)𝑇
𝒅𝑘−1
= 0,
resulting in 𝛽𝑘 =
𝒈𝑘+1
𝑇
𝒈𝑘+1
𝒈𝑘
𝑇𝒈𝑘
(the Fletcher-Reeves formula)
Other versions of 𝛽𝑘 have also been proposed.

Example: Conjugate Gradient Method
• Consider min
𝒙
𝑓 𝒙 = 0.1𝑥1
2
+ 𝑥2
2
,
𝛻𝑓 𝒙 =
0.2𝑥1
2𝑥2
, 𝛻2𝑓 𝑥 =
0.1 0
0 1
; let 𝒙0 =
5
1
, then 𝑓 𝒙0 = 3.5,
𝑑0 = − 𝛻𝑓 𝒙0 =
−1
−2
, 𝛼 = 0.61
𝒙1 =
4.39
−0.22
, 𝑓 𝒙1 = 1.98
𝛽0 = 0.19
𝑑1 =
−0.535
0.027
, 𝛼 = 8.2
𝒙1 =
0
0

Example: Conjugate Gradient Method
• MATLAB code
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1]; n=2;
xall=zeros(n+1,n); xall(1,:)=x';
d=-df(x); a=d'*d/(d'*H*d);
x=x+a*d; xall(2,:)=x';
for i=1:size(x,1)-1
b=df(x)'*H*d/(d'*H*d);
d=-df(x)+b*d;
r=-df(x);
a=r'*r/(d'*H*d);
x=x+a*d;
xall(i+2,:)=x';
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal

Conjugate Gradient Algorithm
• Conjugate-Gradient Algorithm (Griva, Nash & Sofer, p454):
• Initialize: Choose 𝒙0 = 𝟎, 𝒓0 = 𝒃, 𝒅(−1) = 0, 𝛽0 = 0.
• For 𝑖 = 0,1, …
– Check convergence: if 𝒓𝑖 < 𝜖, stop.
– If 𝑖 > 0, set 𝛽𝑖 =
𝒓𝑖
𝑇
𝒓𝑖
𝒓𝑖−1
𝑇 𝒓𝑖−1
– Set 𝒅𝑖
= 𝒓𝑖 + 𝛽𝑖𝒅𝑖−1
; 𝛼𝑖 =
𝒓𝑖
𝑇
𝒓𝑖
𝒅𝑖𝑇
𝑨𝒅𝑖
; 𝒙𝑖+1 = 𝒙𝑖 + 𝛼𝑖𝒅𝑖
;
𝒓𝑖+1 = 𝒓𝑖 − 𝛼𝑖𝑨𝒅𝑖.

Conjugate Gradient Method
• Assume that an update that includes steps 𝛼𝑖 along 𝑛 conjugate
vectors 𝒅𝑖 is assembled as: 𝑦 = 𝛼𝑖𝒅𝑖
𝑛
𝑖=1 .
• Then, for a quadratic function, the minimization problem is
decomposed into a set of one-dimensional problems, i.e.,
min
𝑦
𝑓(𝒚) ≡ min
𝛼𝑖
1
2
𝛼𝑖
2
𝒅𝑖𝑇
𝑨𝒅𝑖
− 𝛼𝑖𝒃𝑇
𝒅𝑖
𝑛
𝑖=1
• By setting the derivative with respect to 𝛼𝑖 equal to zero, i.e.,
𝛼𝑖𝒅𝑖𝑇
𝑨𝒅𝑖
− 𝒃𝑇
𝒅𝑖
= 0, we obtain: 𝛼𝑖 =
𝒃𝑇𝒅𝑖
𝒅𝑖𝑇
𝑨𝒅𝑖
.
• This shows that the CG algorithm iteratively determines the
conjugate directions 𝒅𝑖 and their coefficients 𝛼𝑖.

CG Rate of Convergence
• Conjugate gradient methods achieve superlinear convergence:
– In the case of quadratic functions, the minimum is reached exactly
in 𝑛 iterations.
– For general nonlinear functions, convergence in 2𝑛 iterations is to
be expected.
• Nonlinear CG methods typically have the lowest per iteration
computational costs of all gradient methods.

Newton’s Method
• Consider minimizing the second order approximation of 𝑓 𝒙 :
min
𝒅
𝑓 𝒙𝑘 + Δ𝒙 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘
𝑇Δ𝒙 + 1
2 Δ𝒙𝑇𝑯𝑘Δ𝒙
• Apply FONC: 𝑯𝑘𝒅 + 𝒈𝑘 = 𝟎, where 𝒈𝑘 = 𝛻𝑓 𝒙𝑘
Then, assuming that 𝑯𝑘 = 𝛻2
𝑓 𝒙𝑘 stays positive definite, the
Newton’s update rule is derived as: 𝒙𝑘+1 = 𝒙𝑘 − 𝑯𝑘
−1
𝒈𝑘
• Note:
– The convergence of the Newton’s method is dependent on 𝑯𝑘
staying positive definite.
– A step size may be included in the Newton’s method, i.e.,
𝒙𝑘+1 = 𝒙𝑘 − 𝛼𝑘𝑯𝑘
−1
𝒈𝑘

Marquardt Modification to Newton’s Method
• To ensure the positive definite condition on 𝑯𝑘, Marquardt
proposed the following modification to Newton’s method:
𝑯𝑘 + 𝜆𝑰 𝒅 = −𝒈𝑘
where 𝜆 is selected to ensure that the Hessian is positive definite.
• Since 𝑯𝑘 + 𝜆𝑰 is also symmetric, the resulting system of linear
equations can be solved for 𝒅 as:
𝑳𝑫𝑳𝑇
𝒅 = −𝛻𝑓 𝒙𝑘

Newton’s Algorithm
Newton’s Method (Griva, Nash, & Sofer, p. 373):
1. Initialize: Choose 𝒙0, specify 𝜖
2. For 𝑘 = 0,1, …
3. Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜖, stop
4. Factorize modified Hessian as 𝛻2
𝑓 𝒙𝑘 + 𝑬 = 𝑳𝑫𝑳𝑇
and solve
𝑳𝑫𝑳𝑇 𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅
5. Perform line search to determine 𝛼𝑘 and update the solution
estimate as 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘

Rate of Convergence
• Newton’s method achieves quadratic rate of convergence in the
close neighborhood of the optimal point, and superlinear
convergence otherwise.
• The main drawback of the Newton’s method is its computational
cost: the Hessian matrix needs to be computed at every step, and a
linear system of equations needs to be solved to obtain the update.
• Due to the high computational and storage costs, classic Newton’s
method is rarely used in practice.

Quasi Newton’s Methods
• The quasi-Newton methods derive from a generalization of secant
method, that approximates the second derivative as:
𝑓′′
(𝑥𝑘) ≅
𝑓′ 𝑥𝑘 −𝑓′(𝑥𝑘−1)
𝑥𝑘−𝑥𝑘−1
• In the multi-dimensional case, the secant condition is generalized
as: 𝑯𝑘 𝒙𝑘 − 𝒙𝑘−1 = 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1
• Define 𝑭𝑘 = 𝑯𝑘
−1
, then
𝒙𝑘 − 𝒙𝑘−1 = 𝑭𝑘 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1
• The quasi-Newton methods iteratively update 𝑯𝑘 or 𝑭𝑘 as:
– Direct update: 𝑯𝑘+1 = 𝑯𝑘 + ∆𝑯𝑘, 𝑯0 = 𝑰
– Inverse update: 𝑭𝑘+1 = 𝑭𝑘 + ∆𝑭𝑘, 𝑭 = 𝑯−1
, 𝑭0 = 𝑰

Quasi-Newton Methods
• Quasi-Newton update:
Let 𝒔𝑘 = 𝒙𝑘+1 − 𝒙𝑘, 𝒚𝑘 = 𝛻𝑓 𝒙𝑘+1 − 𝛻𝑓 𝒙𝑘 ; then,
– The DFP (Davison-Fletcher-Powell) formula for inverse Hessian
update is given as:
𝑭𝑘+1 = 𝑭𝑘 −
𝑭𝑘𝒚𝑘 𝑭𝑘𝒚𝑘
𝑇
𝒚𝑘
𝑇𝑭𝑘𝒚𝑘
+
𝒔𝑘𝒔𝑘
𝑇
𝒚𝑘
𝑇𝒔𝑘
– The BGFS (Broyden, Fletcher, Goldfarb, Shanno) formula for direct
Hessian update is given as:
𝑯𝑘+1 = 𝑯𝑘 −
𝑯𝑘𝒔𝑘 𝑯𝑘𝒔𝑘
𝑇
𝒔𝑘
𝑇𝑯𝑘𝒔𝑘
+
𝒚𝑘𝒚𝑘
𝑇
𝒚𝑘
𝑇𝒔𝑘

Quasi-Newton Algorithm
The Quasi-Newton Algorithm (Griva, Nash & Sofer, p.415):
• Initialize: Choose 𝒙0, 𝑯0 (e.g., 𝑯0 = 𝑰), specify 𝜀
• For 𝑘 = 0,1, …
– Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop
– Solve 𝑯𝑘𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅𝑘
(alternatively, 𝒅 = −𝑭𝑘𝛻𝑓 𝒙𝑘 )
– Solve min
𝛼
𝑓 𝒙𝑘 + 𝛼𝒅𝑘 for 𝛼𝑘, and update the current estimate:
𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘
– Compute 𝒔𝑘, 𝒚𝑘, and update 𝑯𝑘 (or 𝑭𝑘 as applicable)

Example: Quasi-Newton Method
• Consider the problem: min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 2𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
, where
𝑯 =
4 − 1
−1 2
, 𝛻𝑓 = 𝑯
𝑥1
𝑥2
. Let 𝒙0 =
1
1
, 𝑓0 = 4, 𝑯0 = 𝑰, 𝑭0 = 𝑰;
Choose 𝒅0
= −𝛻𝑓 𝑥0
=
−3
−1
;
then 𝑓 𝛼 = 2 1 − 3𝛼 2
+ 1 − 𝛼 2
− (1 − 3𝛼)(1 − 𝛼),
Using 𝑓′
𝛼 = 0 → 𝛼 =
5
16
→ 𝒙1 = 0.625
0.688
, 𝑓1 = 0.875;
then 𝒚1 =
−3.44
0.313
, 𝑭1 =
1.193 0.065
0.065 1.022
, 𝑯1 =
0.381 −0.206
−0.206 0.9313
,
and using either update formula 𝒅1
=
0.4375
−1.313
; for the next step,
𝑓 𝛼 = 5.36𝛼2 − 3.83𝛼 + 0.875 → 𝛼 = −0.3572, 𝒙2 =
0.2188
0.2188
.

Example: Quasi-Newton Method
• For quadratic function, convergence is achieved in two iterations.

Trust-Region Methods
• The trust-region methods locally employ a quadratic approximation
𝑞𝑘 𝒙𝑘 to the nonlinear objective function.
• The approximation is valid in the neighborhood of 𝒙𝑘 defined by
Ω𝑘 = 𝒙: 𝚪(𝒙 − 𝒙𝑘) ≤ ∆𝑘 , where 𝚪 is a scaling parameter.
• The method aims to find a 𝒙𝑘+1 ∈ Ω𝑘, that satisfies the sufficient
decrease condition in 𝑓(𝒙).
• The quality of the quadratic approximation is estimated by the
reliability index: 𝛾𝑘 =
𝑓(𝒙𝑘)−𝑓(𝒙𝑘+1)
𝑞𝑘 𝒙𝑘 −𝑞𝑘 𝒙𝑘+1
. If this ratio is close to unity,
the trust region may be expanded in the next iteration.

Trust-Region Methods
• At each iteration 𝑘, trust-region algorithm solves a constrained
optimization sub-problem involving quadratic approximation:
min
𝒅
𝑞𝑘 𝒅 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘
𝑇
𝒅 +
1
2
𝒅𝑇
𝛻2
𝑓 𝒙𝑘 𝒅
Subject to: 𝒅 ≤ ∆𝑘
Lagrangian function: ℒ 𝑥, 𝜆 = 𝑓 𝒙𝑘 + 𝜆 𝒅 − ∆𝑘
FONC: 𝛻2𝑓 𝒙𝑘 + 𝜆𝑰 𝒅𝑘 = −𝛻𝑓 𝒙𝑘 , 𝜆 𝒅 − ∆𝑘 = 0
• The resulting search direction 𝒅𝑘 is given as: 𝒅𝑘 = 𝒅𝑘(𝜆).
– For large ∆𝑘 and a positive-definite 𝛻2𝑓 𝒙𝑘 , the Lagrange
multiplier 𝜆 → 0, and 𝒅𝑘
(𝜆) reduces to the Newton’s direction.
– For ∆𝑘→ 0, 𝜆 → ∞, and 𝒅𝑘
(𝜆) aligns with the steepest-descent
direction.

Trust-Region Algorithm
• Trust-Region Algorithm (Griva, Nash & Sofer, p.392):
• Initialize: choose 𝒙0, ∆0; specify 𝜀, 0 < 𝜇 < 𝜂 < 1 (e.g., 𝜇 =
1
4
; 𝜂 =
3
4
)
• For 𝑘 = 0,1, …
– Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop
– Solve the subproblem: min
𝒅
𝑞𝑘 𝒅 subject to 𝒅 ≤ ∆𝑘
– Compute 𝛾𝑘,
• if 𝛾𝑘 < 𝜇, set 𝒙𝑘+1 = 𝒙𝑘, ∆𝑘+1=
1
2
∆𝑘
• else if 𝛾𝑘 < 𝜂, set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅𝑘
, ∆𝑘+1= ∆𝑘
• else set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅𝑘
, ∆𝑘+1= 2∆𝑘

Computer Methods for Constrained Problems
• Penalty and Barrier methods
• Augmented Lagrangian method (AL)
• Sequential linear programming (SLP)
• Sequential quadratic programming (SQP)

Penalty and Barrier Methods
• Consider the general optimization problem: min
𝒙
𝑓 𝒙
Subject to
ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝;
𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚;
𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈, 𝑖 = 1, … , 𝑛.
• Define a composite function to be used for constraint compliance:
Φ 𝒙, 𝑟 = 𝑓 𝒙 + 𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓
where 𝑃 defines a loss function, and 𝒓 is a vector of weights (penalty
parameters)

Penalty and Barrier Methods
• Penalty Function Method. A penalty function method employs a
quadratic loss function and iterates through the infeasible region
𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓 = 𝑟 𝑔𝑖
+
𝒙
2
𝑖 + ℎ𝑖 𝒙 2
𝑖
𝑔𝑖
+
𝒙 = max 0, 𝑔𝑖 𝒙 , 𝑟 > 0
• Barrier Function Method. A barrier method employs a log barrier
function and iterates through the feasible region
𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓 =
1
𝑟
log −𝑔𝑖 𝑥
𝑖
• For both penalty and barrier methods, as 𝑟 → ∞, 𝒙(𝑟) → 𝒙∗

The Augmented Lagrangian Method
• Consider an equality-constrained problem: min
𝒙
𝑓 𝒙
Subject to: ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙
• Define the augmented Lagrangian (AL) as:
𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑣𝑗ℎ𝑗 𝒙 +
1
2
𝑟ℎ𝑗
2
𝒙
𝑗
where the additional term defines an exterior penalty function with
𝑟 as the penalty parameter.
• For inequality constrained problems, the AL may be defined as:
𝒫 𝒙, 𝒖, 𝑟 = 𝑓 𝒙 +
𝑢𝑖𝑔𝑖 𝒙 +
1
2
𝑟𝑔𝑖
2
𝒙 , if 𝑔𝑗 +
𝑢𝑗
𝑟
≥ 0
−
1
2𝑟
𝑢𝑖
2
, if 𝑔𝑗 +
𝑢𝑗
𝑟
< 0
𝑖
where a large 𝑟 makes the Hessian of AL positive definite at 𝒙.

The Augmented Lagrangian Method
• The dual function for the AL is defined as:
𝜓 𝒗 = min
𝒙
𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑣𝑗ℎ𝑗 𝒙 +
1
2
𝑟 ℎ𝑗 𝒙
2
𝑗
• The resulting dual optimization problem is: max
𝒗
𝜓 𝒗
• The dual problem may be solved via Newton’s method as:
𝒗𝑘+1
= 𝒗𝑘
−
𝑑2𝜓
𝑑𝑣𝑖𝑑𝑣𝑗
−1
𝒉
where
𝑑2𝜓
𝑑𝑣𝑖𝑑𝑣𝑗
= −𝛻ℎ𝑖
𝑇
𝛻2𝒫 −1𝛻ℎ𝑗
• For large 𝒓, the Newton’s update may be approximated as:
𝑣𝑗
𝑘+1
= 𝑣𝑗
𝑘
+ 𝑟
𝑗ℎ𝑗, 𝑗 = 1, … , 𝑙

Example: Augmented Lagrangian
• Maximize the volume of a cylindrical tank subject to surface area
constraint:
max
𝑑,𝑙
𝑓 𝑑, 𝑙 =
𝜋𝑑2𝑙
4
, subject to ℎ:
𝜋𝑑2
4
+ 𝜋𝑑𝑙 − 𝐴0 = 0
• We can normalize the problem as:
min
𝑑,𝑙
𝑓 𝑑, 𝑙 = −𝑑2
𝑙, subject to ℎ: 𝑑2
+ 4𝑑𝑙 − 1 = 0
• The solution to the primal problem is obtained as:
Lagrangian function: ℒ 𝑑, 𝑙, 𝜆 = −𝑑2
𝑙 + 𝜆(𝑑2
+ 4𝑑𝑙 − 1)
FONC: 𝜆 𝑑 + 2𝑙 − 𝑑𝑙 = 0, 𝜆𝑑 𝑑 + 4 − 𝑑2
= 0, 𝑑2
+ 4𝑑𝑙 − 1 = 0
Optimal solution: 𝑑∗ = 2𝑙∗ = 4𝜆∗ =
1
3
.

Example: Augmented Lagrangian
• Alternatively, define the Augmented Lagrangian function as:
𝒫 𝑑, 𝑙, 𝜆, 𝑟 = −𝑑2𝑙 + 𝜆 𝑑2 + 4𝑑𝑙 − 1 +
1
2
𝑟 𝑑2 + 4𝑑𝑙 − 1 2
• Define the dual function: 𝜓 𝜆 = min
𝑑,𝑙
𝒫 𝑑, 𝑙, 𝜆, 𝑟
• Define dual optimization problem: max
𝑑,𝑙
𝜓 𝜆
• Solution to the dual problem: 𝜆∗
= 𝜆𝑚𝑎𝑥 = 0.144
• Solution to the design variables: 𝑑∗ = 2𝑙∗ = 0.577

Sequential Linear Programming
• Consider the general optimization problem: min
𝒙
𝑓 𝒙
Subject to
ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝;
𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚;
𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈, 𝑖 = 1, … , 𝑛.
• Let 𝒙𝑘 denote the current estimate of the design variables, and let
𝒅 denote the change in variables; define the first order expansion
of the objective and constraint functions in the neighborhood of 𝒙𝑘
𝑓 𝒙𝑘
+ 𝒅 = 𝑓 𝒙𝑘
+ 𝛻𝑓 𝒙𝑘 𝑇
𝒅
𝑔𝑖 𝒙𝑘 + 𝒅 = 𝑔𝑖 𝒙𝑘 + 𝛻𝑔𝑖 𝒙𝑘 𝑇
𝒅, 𝑖 = 1, … , 𝑚
ℎ𝑗 𝒙𝑘 + 𝒅 = ℎ𝑗 𝒙𝑘 + 𝛻ℎ𝑗 𝒙𝑘 𝑇
𝒅, 𝑗 = 1, … , 𝑙

• Let 𝑓𝑘 = 𝑓 𝒙𝑘 , 𝑔𝑖
𝑘
= 𝑔𝑖 𝒙𝑘 , ℎ𝑗
𝑘
= ℎ𝑗 𝒙𝑘 ; 𝑏𝑖 = −𝑔𝑖
𝑘
, 𝑒𝑗 = −ℎ𝑗
𝑘
,
𝒄 = 𝛻𝑓 𝒙𝑘 , 𝒂𝑖 = 𝛻𝑔𝑖 𝒙𝑘 , 𝒏𝑗 = 𝛻ℎ𝑗 𝒙𝑘 ,
𝑨 = 𝒂1, 𝒂2, … , 𝒂𝑚 , 𝑵 = 𝒏1, 𝒏2, … , 𝒏𝑙 .
• Using first order expansion, define an LP subprogram for the
current iteration of the NLP problem:
min
𝒅
𝑓 = 𝒄𝑇𝒅
Subject to: 𝑨𝑇
𝒅 ≤ 𝒃,
𝑵𝑇𝒅 = 𝒆
where 𝑓 represents first-order change in the cost function, and the
columns of 𝑨 and 𝑵 matrices represent, respectively, the gradients
of inequality and equality constraints.
• The resulting LP problem can be solved via the Simplex method.

• We may note that:
– Since both positive and negative changes to design variables 𝒙𝑘 are
allowed, the variables 𝑑𝑖 are unrestricted in sign
– The SLP method requires additional constraints of the form:
− ∆𝑖𝑙
𝑘
≤ 𝑑𝑖
𝑘
≤ ∆𝑖𝑢
𝑘
(termed move limits) to bind the LP solution.
These limits represent maximum allowable change in 𝑑𝑖 in the
current iteration and are selected as percentage of current value.
– Move limits serve dual purpose of binding the solution and
obviating the need for line search.
– Overly restrictive move limits tend to make the SLP problem
infeasible.

SLP Example
• Consider the convex NLP problem:
min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
Subject to: 1 − 𝑥1
2
− 𝑥2
2
≤ 0; −𝑥1 ≤ 0, −𝑥2 ≤ 0
The problem has a single minimum at: 𝒙∗
=
1
2
,
1
2
• The objective and constraint gradients are:
𝛻𝑓𝑇
= 2𝑥1 − 𝑥2, 2𝑥2 − 𝑥1 ,
𝛻𝑔1
𝑇
= −2𝑥1, −2𝑥2 , 𝛻𝑔2
𝑇
= −1,0 , 𝛻𝑔3
𝑇
= [0, −1].
• Let 𝒙0
= 1, 1 , then 𝑓0
= 1, 𝒄𝑇
= 1 1 , 𝑏1 = 𝑏2 = 𝑏3 = 1;
𝒂1
𝑇
= −2 − 2 , 𝒂2
𝑇
= −1 0 , 𝒂3
𝑇
= 0 − 1

SLP Example
• Define the LP subproblem at the current step as:
min
𝑑1,𝑑2
𝑓 𝑥1, 𝑥2 = 𝑑1 + 𝑑2
Subject to:
−2 −2
−1 0
0 −1
𝑑1
𝑑2
≤
1
1
1
• In the absence of move limits, the LP problem is unbounded; using
50% move limits, the SLP update is given as: 𝒅∗
= −
1
2
, −
1
2
𝑇
,
𝒙1 =
1
2
,
1
2
𝑇
, with resulting constraint violation: 𝑔𝑖 =
1
2
, 0, 0 ;
smaller move limits may be used to reduce the constraint violation.

SLP Algorithm (Arora, p. 508):
• Initialize: choose 𝒙0, 𝜀1 > 0, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
– Choose move limits ∆𝑖𝑙
𝑘
, ∆𝑖𝑢
𝑘
as some fraction of current design 𝒙𝑘
– Compute 𝑓𝑘
, 𝒄, 𝑔𝑖
𝑘
, ℎ𝑗
𝑘
, 𝑏𝑖, 𝑒𝑗
– Formulate and solve the LP subproblem for 𝒅𝑘
– If 𝑔𝑖 ≤ 𝜀1; 𝑖 = 1, … , 𝑚; ℎ𝑗 ≤ 𝜀1; 𝑖 = 1, … , 𝑝; and 𝒅𝑘 ≤ 𝜀2, stop
– Substitute 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘, 𝑘 ← 𝑘 + 1.

Sequential Quadratic Programming
• Sequential quadratic programming (SQP) uses a quadratic
approximation to the objective function at every step of iteration.
• The SQP problem is defined as:
min
𝒅
𝑓 = 𝒄𝑇
𝒅 +
1
2
𝒅𝑇
𝒅
Subject to, 𝑨𝑇𝒅 ≤ 𝒃, 𝑵𝑇𝒅 = 𝒆
• SQP does not require move limits, alleviating the shortcomings of
the SLP method.
• The SQP problem is convex; hence, it has a single global minimum.
• SQP can be solved via Simplex based linear complementarity problem
(LCP) framework.

• The Lagrangian function for the SQP problem is defined as:
ℒ 𝒅, 𝒖, 𝒗 = 𝒄𝑇𝒅 + 1
2
𝒅𝑇𝒅 + 𝒖𝑇 𝑨𝑇𝒅 − 𝒃 + 𝒔 + 𝒗𝑇(𝑵𝑇𝒅 − 𝒆)
• Then the KKT conditions are:
Optimality: 𝛁ℒ = 𝒄 + 𝒅 + 𝑨𝒖 + 𝑵𝒗 = 𝟎,
Feasibility: 𝑨𝑇
𝒅 + 𝒔 = 𝒃, 𝑵𝑇
𝒅 = 𝒆 ,
Complementarity: 𝒖𝑇
𝒔 = 𝟎,
Non-negativity: 𝒖 ≥ 𝟎, 𝒔 ≥ 𝟎

• Since 𝒗 is unrestricted in sign, let 𝒗 = 𝒚 − 𝒛, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎, and
the KKT conditions are compactly written as:
𝑰 𝑨
𝑨𝑇
𝟎
𝑵𝑇
𝟎
𝟎
𝑰
𝟎
𝑵 −𝑵
𝟎 𝟎
𝟎 𝟎
𝒅
𝒖
𝒔
𝒚
𝒛
=
−𝒄
𝒃
𝒆
,
or 𝑷𝑿 = 𝑸
• The complementary slackness conditions, 𝒖𝑇𝒔 = 𝟎, translate as:
𝑿𝑖𝑿𝑖+𝑚 = 0, 𝑖 = 𝑛 + 1, ⋯ , 𝑛 + 𝑚.
• The resulting problem can be solved via Simplex method using LCP
framework.

Descent Function Approach
• In SQP methods, the line search step is based on minimization of a
descent function that penalizes constraint violations, i.e.,
Φ 𝒙 = 𝑓 𝒙 + 𝑅𝑉 𝒙
where 𝑓 𝒙 is the cost function, 𝑉 𝒙 represents current
maximum constraint violation, and 𝑅 > 0 is a penalty parameter.
• The descent function value at the current iteration is computed as:
Φ𝑘 = 𝑓𝑘 + 𝑅𝑉𝑘,
𝑅 = max 𝑅𝑘, 𝑟𝑘 where 𝑟𝑘 = 𝑢𝑖
𝑘
𝑚
𝑖=1 + 𝑣𝑗
𝑘
𝑝
𝑗=1
𝑉𝑘 = max {0; 𝑔𝑖, 𝑖 = 1, . . . , 𝑚; ℎ𝑗 , 𝑗 = 1, … , 𝑝}
• The line search subproblem is defined as:
min
𝛼
Φ 𝛼 = Φ 𝒙𝑘
+ 𝛼𝒅𝑘

SQP Algorithm
SQP Algorithm (Arora, p. 526):
• Initialize: choose 𝒙0, 𝑅0 = 1, 𝜀1 > 0, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
, 𝑔𝑖
𝑘
, ℎ𝑗
𝑘
, 𝒄, 𝑏𝑖, 𝑒𝑗; compute 𝑉𝑘.
– Formulate and solve the QP subproblem to obtain 𝒅𝑘 and the
Lagrange multipliers 𝒖𝑘
and 𝒗𝑘
.
– If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘
≤ 𝜀2, stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set 𝒙𝑘+1
← 𝒙𝑘
+ 𝛼𝒅𝑘
, 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1
• The above algorithm is convergent, i.e., Φ 𝒙𝑘
≤ Φ 𝒙0
; 𝒙𝑘
converges to the KKT point 𝒙∗

SQP with Approximate Line Search
• The SQP algorithm can use with approximate line search as follows:
Let 𝑡𝑗, 𝑗 = 0,1, … denote a trial step size,
𝒙𝑘+1,𝑗
denote the trial design point,
𝑓𝑘+1,𝑗
= 𝑓( 𝒙𝑘+1,𝑗
) denote the function value at the trial solution, and
Φ𝑘+1,𝑗 = 𝑓𝑘+1,𝑗
+ 𝑅𝑉𝑘+1,𝑗 is the penalty function at the trial solution.
• The trial solution is required to satisfy the descent condition:
Φ𝑘+1,𝑗 + 𝑡𝑗𝛾 𝒅𝑘 2
≤ Φ𝑘,𝑗, 0 < 𝛾 < 1
where a common choice is: 𝛾 =
1
2
, 𝜇 =
1
2
, 𝑡𝑗 = 𝜇𝑗
, 𝑗 = 0,1,2, ….
• The above descent condition ensures that the constraint violation
decreases at each step of the method.

SQP Example
• Consider the NLP problem: min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
subject to 𝑔1: 1 − 𝑥1
2
− 𝑥2
2
≤ 0, 𝑔2: −𝑥1 ≤ 0, 𝑔3: −𝑥2 ≤ 0
Then 𝛻𝑓𝑇
= 2𝑥1 − 𝑥2, 2𝑥2 − 𝑥1 , 𝛻𝑔1
𝑇
= −2𝑥1, −2𝑥2 , 𝛻𝑔2
𝑇
=
−1,0 , 𝛻𝑔3
𝑇
= [0, −1]. Let 𝑥0 = 1, 1 ; then, 𝑓0 = 1, 𝒄 = 1, 1 𝑇,
𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = −1.
• Since all constraints are initially inactive, 𝑉0 = 0, and 𝒅 = −𝒄 =
−1, −1 𝑇; the line search problem is: min
𝛼
Φ 𝛼 = 1 − 𝛼 2;
• By setting Φ′
𝛼 = 0, we get the analytical solution: 𝛼 = 1; thus
𝑥1 = 0, 0 , which results in a large constraint violation

SQP Example
• Alternatively, we may use approximate line search as follows:
– Let 𝑅0 = 10, 𝛾 = 𝜇 =
1
2
; let 𝑡0 = 1, then 𝒙1,0 = 0,0 , 𝑓1,0 = 0,
𝑉1,0 = 1, Φ1,0 = 10; 𝒅0 2 = 2, and the descent condition
Φ1,0 +
1
2
𝒅0 2
≤ Φ0 = 1 is not met at the trial point.
– Next, for 𝑡1 =
1
2
, we get: 𝒙1,1 =
1
2
,
1
2
, 𝑓1,1 =
1
4
, V1,1 =
1
2
,
Φ1,1 = 5
1
4
, and the descent condition fails again;
– Next, for 𝑡2 =
1
4
, we get: 𝒙1,2
=
3
4
,
3
4
, V1,2 = 0, 𝑓1,2
= Φ1,2 =
9
16
,
and the descent condition checks as: Φ1,2 +
1
8
𝒅0 2
≤ Φ0 = 1.
– Therefore, we set 𝛼 = 𝑡2 =
1
4
, 𝒙1
= 𝒙1,2
=
3
4
,
3
4
with no
constraint violation.

The Active Set Strategy
• To reduce the computational cost of solving the QP subproblem, we
may only include the active constraints in the problem.
• For 𝒙𝑘
∈ Ω, the set of potentially active constraints is defined as:
ℐ𝑘 = 𝑖: 𝑔𝑖
𝑘
> −𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: 𝑗 = 1, … , 𝑝 for some 𝜀.
• For 𝒙𝑘
∉ Ω, let 𝑉𝑘 = max {0; 𝑔𝑖
𝑘
, 𝑖 = 1, . . . , 𝑚; ℎ𝑗
𝑘
, 𝑗 = 1, … , 𝑝};
then, the active constraint set is defined as:
ℐ𝑘 = 𝑖: 𝑔𝑖
𝑘
> 𝑉𝑘 − 𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: ℎ𝑗
𝑘
> 𝑉𝑘 − 𝜀; 𝑗 = 1, … , 𝑝
• The gradients of inactive constraints, i.e., those not in ℐ𝑘, do not
need to be computed

SQP via Newton’s Method
• Consider the following equality constrained problem:
min
𝒙
𝑓(𝒙), subject to ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙
• The Lagrangian function is given as: ℒ 𝒙, 𝒗 = 𝑓 𝒙 + 𝒗𝑇𝒉(𝒙)
• The KKT conditions are: 𝛻ℒ 𝒙, 𝒗 = 𝛻𝑓 𝒙 + 𝑵𝒗 = 𝟎, 𝒉 𝒙 = 𝟎
where 𝑵 = 𝛁𝒉(𝒙) is a Jacobian matrix whose 𝑖th column is 𝛻ℎ𝑖 𝒙
• Using first order Taylor series expansion (with shorthand notation):
𝛻ℒ𝑘+1 = 𝛻ℒ𝑘 + 𝛻2ℒ𝑘Δ𝒙 + 𝑁Δ𝒗
𝒉𝑘+1 = 𝒉𝑘 + 𝑵𝑇Δ𝒙
• By expanding Δ𝒗 = 𝒗𝑘+1
− 𝒗𝑘
, 𝛻ℒ𝑘
= 𝛻𝑓𝑘
+ 𝑵𝒗𝑘
, and assuming
𝒗𝑘 ≅ 𝒗𝑘+1 we obtain: 𝛻2
ℒ𝑘
𝑵
𝑵𝑇
𝟎
Δ𝒙𝑘
𝒗𝑘+1 = −
𝛻𝑓𝑘
𝒉𝑘
which is similar to N-R update, but uses Hessian of the Lagrangian

SQP via Newton’s Method
• Alternately, we consider minimizing the quadratic approximation:
min
Δ𝒙
1
2
Δ𝒙𝑇𝛻2ℒΔ𝒙 + 𝛻𝑓𝑇Δ𝒙
Subject to: ℎ𝑖 𝑥 + 𝒏𝑖
𝑇
Δ𝒙 = 0, 𝑖 = 𝑖, … , 𝑙
• The KKT conditions are: 𝛻𝑓 + 𝛻2
ℒΔ𝒙 + 𝑵𝒗 = 𝟎, 𝒉 + 𝑵Δ𝒙 = 𝟎
• Thus the QP subproblem can be solved via Newton’s method!
𝛻2
ℒ𝑘
𝑵
𝑵𝑇
𝟎
Δ𝒙𝑘
𝒗𝑘+1 = −
𝛻𝑓𝑘
𝒉𝑘
• The Hessian of the Lagrangian can be updated via BFGS method as:
𝑯𝑘+1
= 𝑯𝑘
+ 𝑫𝑘
− 𝑬𝑘
where 𝑫𝑘 =
𝒚𝑘𝒚𝑘𝑇
𝒚𝑘𝑇
Δ𝒙𝑘
, 𝑬𝑘 =
𝒄𝑘𝒄𝑘𝑇
𝒄𝑘𝑇
Δ𝒙𝑘
, 𝒄𝑘 = 𝑯𝑘Δ𝒙𝑘, 𝒚𝑘 = 𝛻ℒ𝑘+1 − ℒ𝑘

Example: SQP with Hessian Update
• Consider the NLP problem: min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
subject to 𝑔1: 1 − 𝑥1
2
− 𝑥2
2
≤ 0, 𝑔2: −𝑥1 ≤ 0, 𝑔3: −𝑥2 ≤ 0
Let 𝑥0
= 1, 1 ; then, 𝑓0
= 1, 𝒄 = 1, 1 𝑇
, 𝑔1 1,1 = 𝑔2 1,1 =
𝑔3 1,1 = −1; 𝛻𝑔1
𝑇
= −2, −2 , 𝛻𝑔2
𝑇
= −1,0 , 𝛻𝑔3
𝑇
= [0, −1].
• Using approximate line search, 𝛼 =
1
4
, 𝒙1 =
3
4
,
3
4
.
• For the Hessian update, we have:
𝑓1 = 0.5625, 𝑔1 = −0.125, 𝑔2 = 𝑔3 = −0.75; 𝒄1 = [0.75, 0.75];
𝛻𝑔1
𝑇
= −
3
2
, −
3
2
, 𝛻𝑔2
𝑇
= −1,0 , 𝛻𝑔3
𝑇
= 0, −1 ; Δ𝒙0
= −0.25, −0.25 ;
then, 𝑫0 = 8
1 1
1 1
, 𝑬0 = 8
1 1
1 1
, 𝑯1 = 𝑯0

SQP with Hessian Update
• For the next step, the QP problem is defined as:
min
𝑑1,𝑑2
𝑓 =
3
4
𝑑1 + 𝑑2 +
1
2
𝑑1
2
+ 𝑑2
2
Subject to: −
3
2
𝑑1 + 𝑑2 ≤ 0, −𝑑1 ≤ 0, −𝑑2 ≤ 0
• The application of KKT conditions results in a linear system of
equations, which are solved to obtain:
𝒙𝑇
= 𝑑1, 𝑑2, 𝑢1, 𝑢2, 𝑢3, 𝑠1, 𝑠2, 𝑠3 = 0.188, 0.188, 0, 0, 0,0.125, 0.75, 0.75

Modified SQP Algorithm
Modified SQP Algorithm (Arora, p. 558):
• Initialize: choose 𝒙0, 𝑅0 = 1, 𝑯0 = 𝐼; 𝜀1, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
, 𝑔𝑖
𝑘
, ℎ𝑗
𝑘
, 𝒄, 𝑏𝑖, 𝑒𝑗, and 𝑉𝑘. If 𝑘 > 0, compute 𝑯𝑘
– Formulate and solve the modified QP subproblem for search
direction 𝒅𝑘
and the Lagrange multipliers 𝒖𝑘
and 𝒗𝑘
.
– If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘
≤ 𝜀2, stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set 𝒙𝑘+1
← 𝒙𝑘
+ 𝛼𝒅𝑘
, 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1.

SQP Algorithm
%SQP subproblem via Hessian update
% input: xk (current design); Lk (Hessian of Lagrangian
estimate)
%initialize
n=size(xk,1);
if ~exist('Lk','var'), Lk=diag(xk+(~xk)); end
tol=1e-7;
%function and constraint values
fk=f(xk);
dfk=df(xk);
gk=g(xk);
dgk=dg(xk);
%N-R update
A=[Lk dgk; dgk' 0*dgk'*dgk];
b=[-dfk;-gk];
dx=Ab;
dxk=dx(1:n);
lam=dx(n+1:end);

SQP Algorithm
%inactive constraints
idx1=find(lam<0);
if idx1
[dxk,lam]=inactive(lam,A,b,n);
end
%check termination
if abs(dxk)<tol, return, end
%adjust increment for constraint compliance
P=@(xk) f(xk)+lam'*abs(g(xk));
while P(xk+dxk)>P(xk),
dxk=dxk/2;
if abs(dxk)<tol, break, end
end
%Hessian update
dL=@(x) df(x)+dg(x)*lam;
Lk=update(Lk, xk, dxk, dL);
xk=xk+dxk;
disp([xk' f(xk) P(xk)])

SQP Algorithm
%function definitions
function [dxk,lam]=inactive(lam,A,b,n)
idx1=find(lam<0);
lam(idx1)=0;
idx2=find(lam);
v=[1:n,n+idx2];
A=A(v,v); b=b(v);
dx=Ab;
dxk=dx(1:n);
lam(idx2)=dx(n+1:end);
end
function Lk=update(Lk, xk, dxk, dL)
ga=dL(xk+dxk)-dL(xk);
Hx=Lk*dxk;
Dk=ga*ga'/(ga'*dxk);
Ek=Hx*Hx'/(Hx'*dxk);
Lk=Lk+Dk-Ek;
end

Generalized Reduced Gradient
• The GRG method finds the search direction by projecting the
objective function gradient onto the constraint hyperplane.
• The GRG points tangent to the constraint hyperplane, so that
iterative steps try to conform to the constraints.
• The constraints are effectively used to implicitly eliminate variables
and reduce problem dimensions.

Implicit Elimination
• Consider an equality constrained problem in two variables:
Objective: min 𝑓 𝒙 , 𝒙𝑇
= 𝑥1, 𝑥2
Subject to: 𝑔 𝒙 = 0
• The variation in the objective and constraint functions are:
𝑑𝑓 = 𝛻𝑓 𝑇
𝑑𝒙 =
𝜕𝑓
𝜕𝑥1
𝑑𝑥1 +
𝜕𝑓
𝜕𝑥2
𝑑𝑥2
𝑑𝑔 = 𝛻𝑔 𝑇
𝑑𝒙 =
𝜕𝑔
𝜕𝑥1
𝑑𝑥1 +
𝜕𝑔
𝜕𝑥2
𝑑𝑥2 = 0
• Solve for 𝑑𝑥2 = −
𝜕𝑔/𝜕𝑥1
𝜕𝑔/𝜕𝑥2
𝑑𝑥1 and substitute in the objective function:
𝑑𝑓 =
𝜕𝑓
𝜕𝑥1
−
𝜕𝑓
𝜕𝑥2
𝜕𝑔/𝜕𝑥1
𝜕𝑔/𝜕𝑥2
𝑑𝑥1
• Then the reduced gradient of 𝑓 along 𝑥1 is given as:
𝛻𝑓𝑅 =
𝜕𝑓
𝜕𝑥1
−
𝜕𝑓
𝜕𝑥2
𝜕𝑔/𝜕𝑥1
𝜕𝑔/𝜕𝑥2

Implicit Elimination
• Consider a problem in 𝑛 variable with 𝑚 equality constraints:
Objective: min 𝑓 𝒙 , 𝒙𝑇 = 𝑥1, 𝑥2, … , 𝑥𝑛
Subject to: 𝑔𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑚
• We define 𝑚 basic variables in terms of 𝑛 − 𝑚 nonbasic variables;
let 𝒙𝑇
= 𝒚𝑇
, 𝒛𝑇
, where 𝒚 are basic and 𝒛 are nonbasic.
• The gradient vector is partitioned as: 𝛻𝑓𝑇 = 𝛻𝑓 𝒚 𝑇, 𝛻𝑓 𝒛 𝑇 .
• The variations in the objective and constraint functions are:
𝑑𝑓 = 𝛻𝑓 𝒚 𝑇
𝑑𝒚 + 𝛻𝑓 𝒛 𝑇
𝑑𝒛
𝑑𝒈 =
𝜕𝜓
𝜕𝒚
𝑑𝒚 +
𝜕𝜓
𝜕𝒛
𝑑𝒛 = 𝟎
where the matrices of partial derivatives are defined as:
𝜕𝜓
𝜕𝒚 𝑖𝑗
=
𝜕𝑔𝑖
𝜕𝑦𝑗
;
𝜕𝜓
𝜕𝒛 𝑖𝑗
=
𝜕𝑔𝑖
𝜕𝑧𝑗

• Since
𝜕𝜓
𝜕𝒚
is a square 𝑚 × 𝑚 matrix, we may solve for 𝑑𝒚 as:
𝑑𝒚 = −
𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
, and substitute in 𝑑𝑓 to obtain:
𝑑𝑓 = 𝛻𝑓 𝒛 𝑇
𝑑𝒛 − 𝛻𝑓 𝒚 𝑇 𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
𝑑𝒛
• Then the reduce gradient 𝛻𝑓𝑅 is defined as:
𝛻𝑓𝑅
𝑇
= 𝛻𝑓 𝒛 𝑇
− 𝛻𝑓 𝒚 𝑇 𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
• Next, we choose negative of 𝛻𝑓𝑅
𝑇
as the search direction and
perform a line search to determine step size; then Δ𝒛 = −𝛼𝛻𝑓𝑅,
Δ𝒚 =
𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
Δ𝒛

GRG Algorithm
• Initialize: choose 𝒙0; evaluate objective function and constraints;
convert binding inequality constraints to equality constraints.
• Partition the variables into 𝑚 basic and 𝑛 − 𝑚 nonbasic ones, e.g.,
choose first 𝑚 values, or 𝑚 highest values as basic variables.
• Compute the 𝛻𝑓𝑅 along nonbasic variables. If 𝛻𝑓𝑅 = 0, exit.
• Set Δ𝒛 = −𝛻𝑓𝑅/ 𝛻𝑓𝑅 , Δ𝒚 = −
𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
Δ𝒛.
• Do a line search along Δ𝒙 to obtain α.
• Check feasibility at 𝒙𝑘 + 𝛼Δ𝒙. If necessary, use Newton-Raphson
iterations to adjust Δ𝒚 as: Δ𝒚𝑘+1 = Δ𝒚𝑘 −
𝜕𝜓
𝜕𝒚
−1
𝑔𝑘
• Update: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼Δ𝒙

• Consider an equality constrained problem
Objective: min 𝑓 𝒙 = 3𝑥1 + 2𝑥2 + 2𝑥1
2
− 𝑥1𝑥2 + 1.5𝑥2
2
Subject to: 𝑔 𝒙 = 𝑥1
2
− 𝑥2 − 1 = 0
• Let 𝒙0 =
−1
0
; then 𝑓0 = −1, 𝛻𝑓0 =
−1
3
, 𝑔0 = 0, 𝛻𝑔0 =
−2
−1
.
• Let 𝒚 = 𝑥2 on the first iteration; then 𝛻𝑓𝑅
𝑇
= −1 − 3
−2
−1
= −7.
• Let Δ𝒛 = 1, then Δ𝒚 =
−2
−1
1 = 2. By doing a line search along
Δ𝒙 =
0.333
0.667
, we obtain 𝒙1
=
−0.350
−0.577
, 𝑓1
= −2.13.
• The optimum is reached in three iterations: 𝒙∗ =
−0.634
−0.598
,
𝑓 𝒙∗ = −2.137.

• Consider an inequality constrained problem:
Objective: min 𝑓 𝒙 = 𝑥1
2
+ 𝑥2
Subject to: 𝑔1 𝒙 = 𝑥1
2
+ 𝑥2
2
− 9 ≤ 0, 𝑥1 + 𝑥2 − 1 ≤ 0
• Add slack variable to inequality constraints:
𝑔1 𝒙 = 𝑥1
2
+ 𝑥2
2
− 9 + 𝑠1 = 0, 𝑔2 𝒙 = 𝑥1 + 𝑥2 − 1 + 𝑠2 = 0
Then 𝛻𝑓 𝒙 =
2𝑥1
1
; 𝛻𝑔1 𝒙 =
2𝑥1
2𝑥2
; 𝛻𝑔2 𝒙 =
1
1
• Let 𝒙0 =
2.56
−1.56
; then 𝑓0 = 4.99, 𝛻𝑓0 =
5.12
1
, 𝒈0 =
−0.013
0
,
• Since 𝑔2 is binding, add 𝑠2 to variables: 𝛻𝑓0
=
5.12
1
1
, 𝛻𝑔2
0
=
1
1
1

• Let 𝑦 = 𝑥1, 𝒛 =
𝑥2
𝑠2
; then 𝛻𝑓 𝑦 = 5.12, 𝛻𝑓 𝒛 =
1
0
, 𝛻𝑔2 𝑦 = 1,
𝛻𝑔2 𝒛 =
1
1
, therefore 𝛻𝑓𝑅 𝒛 =
1
0
−
1
1
5.12 =
−4.12
−5.12
• Let Δ𝒛 = −𝛻𝑓𝑅 𝒛 , Δ𝑦 = −[1 1]Δ𝒛 = −9.24; then, Δ𝒙 =
−9.24
4.12
and
𝒔0 = Δ𝒙/ Δ𝒙 . Suppose we limit the maximum step size to 𝛼 ≤ 0.5,
then 𝒙1 = 𝒙0 + 0.5𝒔0 =
2.103
−1.356
with 𝑓 𝑥1 = 𝑓1 = 3.068. There are
no constraint violations, hence first iteration is completed.
• After seven iterations: 𝒙7 =
0.003
−3.0
with 𝑓7 = −3.0
• The optimum is at: 𝒙∗ =
0.0
−3.0
with 𝑓∗ = −3.0

GRG for LP Problems
• Consider an LP problem: min 𝑓(𝒙) = 𝒄𝑇
𝒙
Subject to: 𝑨𝒙 = 𝒃, 𝒙 ≥ 𝟎
• Let 𝒙 be partitioned into 𝑚 basic variables and 𝑛 − 𝑚 nonbasic
variables: 𝒙𝑇
= [𝒚𝑇
, 𝒛𝑇
].
• The objective function is partitioned as: 𝑓 𝒙 = 𝒄𝑦
𝑇
𝒚 + 𝒄𝑧
𝑇
𝒛
• The constraints are partitioned as: 𝑩𝒚 + 𝑵𝒛 = 𝒃, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎.
Then 𝒚 = 𝑩−1𝒃 − 𝑩−1𝑵𝒛
• The objective function in terms of independent variables is:
𝑓 𝒛 = 𝒄𝑦
𝑇
𝑩−1
𝒃𝒛 + (𝒄𝑧
𝑇
− 𝒄𝑦
𝑇
𝑩−1
𝑵)𝒛
• The reduced costs for nonbasic variables are given as:
𝒓𝑐
𝑇
= 𝒄𝑧
𝑇
− 𝒄𝑦
𝑇
𝑩−1
𝑵, or 𝒓𝑐
𝑇
= 𝒄𝑧
𝑇
− 𝝀𝑇
𝑵

GRG for LP Problems
• Using Tableu notation, the reduced costs are computed as:
𝑩 𝑵 𝒃
𝒄𝒚
𝑇 𝒄𝑧
𝑇 0 →
𝑰 𝑩−1𝑵 𝑩−1𝒃
𝟎 𝒓𝑐
𝑇 −𝒄𝒚
𝑇𝑩−1𝒃
• The objective function variation is given as:
𝑑𝑓 = 𝛻𝑓𝒚
𝑇𝑑𝒚 + 𝛻𝑓𝒛
𝑇𝑑𝒛
• The reduced gradient along the constraint surface is given as:
𝛻𝑓𝑅
𝑇
= 𝛻𝒛𝑓𝑇
− 𝛻𝒚𝑓𝑇
𝑩−1
𝑵 = 𝒓𝑐
𝑇

GRG Algorithm for LP Problems
1. Choose the largest 𝑚 components of 𝒙 as basic variables
2. Compute the reduced gradient 𝛻𝑓𝑅
𝑇
= 𝒓𝑐
𝑇
3. Let Δ𝑧𝑖 =
−𝑟𝑖 𝑖𝑓 𝑟𝑖 ≤ 0
−𝑥𝑖𝑟𝑖 𝑖𝑓 𝑟𝑖 > 0
4. If Δ𝒛 = 0, stop; otherwise set Δ𝒚 = 𝑩−1
𝑵Δ𝒛
5. Compute step size: let 𝛼1 = max 𝛼: 𝒚 + Δ𝒚 ≥ 0, 𝒛 + Δ𝒛 ≥ 0 ,
𝛼2 = min 𝑓 𝒙 + Δ𝒙 ≥ 0 , 𝛼 = min{𝛼1, 𝛼2}
6. Update: 𝒙𝑘+1
= 𝒙𝑘
+ 𝛼Δ𝒙
7. If 𝛼2 ≥ 𝛼1, update 𝑩, 𝑵 (use pivoting)
8. Return to 1
View publication stats
View publication stats

Optimum engineering design - Day 5. Clasical optimization methods

More Related Content

Similar to Optimum engineering design - Day 5. Clasical optimization methods

More from SantiagoGarridoBulln

Recently uploaded

Optimum engineering design - Day 5. Clasical optimization methods