Optimization Methods
in Engineering Design
Day-5
Course Materials
• Arora, Introduction to Optimum Design, 3e, Elsevier,
(https://www.researchgate.net/publication/273120102_Introductio
n_to_Optimum_design)
• Parkinson, Optimization Methods for Engineering Design, Brigham
Young University
(http://apmonitor.com/me575/index.php/Main/BookChapters)
• Iqbal, Fundamental Engineering Optimization Methods, BookBoon
(https://bookboon.com/en/fundamental-engineering-optimization-
methods-ebook)
Numerical Optimization
• Consider an unconstrained NP problem: min
𝒙
𝑓 𝒙
• Use an iterative method to solve the problem: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘𝒅𝑘,
where 𝒅𝑘
is a search direction and 𝛼𝑘 is the step size, such that the
function value decreases at each step, i.e., 𝑓 𝒙𝑘+1
< 𝑓 𝒙𝑘
• We expect lim
𝑘→∞
𝒙𝑘 = 𝒙∗
• The general iterative method is a two-step process:
– Finding a suitable search direction 𝒅𝑘
along which the function
value locally decreases and any constraints are obeyed.
– Performing line search along 𝒅𝑘
to find 𝒙𝑘+1
such that 𝑓 𝒙𝑘+1
attains its minimum value.
The Iterative Method
• Iterative algorithm:
1. Initialize: chose 𝒙0
2. Check termination: 𝛻𝑓 𝒙𝑘
≅ 0
3. Find a suitable search direction 𝒅𝑘
,
that obeys the descent condition:
𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘 < 0
4. Search along 𝒅𝑘
to find where
𝑓 𝒙𝑘+1
attains minimum value
(line search problem)
5. Return to step 2
The Line Search Problem
• Assuming a suitable search direction 𝒅𝑘 has been determined, we
seek to determine a step length 𝛼𝑘, that minimizes 𝑓 𝒙𝑘+1 .
• Assuming 𝒙𝑘
and 𝒅𝑘
are known, the projected function value along
𝒅𝑘 is expressed as:
𝑓 𝒙𝑘
+ 𝛼𝑘𝒅𝑘
= 𝑓 𝒙𝑘
+ 𝛼𝒅𝑘
= 𝑓(𝛼)
• The line search problem to choose 𝛼 to minimize 𝑓 𝒙𝑘+1 along 𝒅𝑘
is defined as:
min
𝛼
𝑓(𝛼) = 𝑓 𝒙𝑘
+ α𝒅𝑘
• Assuming that a solution exists, it is found by setting 𝑓′ 𝛼 = 0.
Example: Quadratic Function
• Consider minimizing a quadratic function:
𝑓 𝒙 = 1
2 𝒙𝑇𝑨𝒙 − 𝒃𝑇𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃
• Given a descent direction 𝒅, the line search problem is defined as:
min
𝛼
𝑓(𝛼) = 𝒙𝑘
+ 𝛼𝒅
𝑇
𝑨 𝒙𝑘
+ 𝛼𝒅 − 𝒃𝑇
𝒙𝑘
+ 𝛼𝒅
• A solution is found by setting 𝑓′
𝛼 = 0, where
𝑓′ 𝛼 = 𝒅𝑇𝑨 𝒙𝑘 + 𝛼𝒅 − 𝒅𝑇𝒃 = 0
𝛼 = −
𝒅𝑇
𝑨𝒙𝑘
− 𝒃
𝒅𝑇𝑨𝒅
= −
𝛻𝑓(𝒙𝑘
)𝑇
𝒅
𝒅𝑇𝑨𝒅
• Finally, 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝒅.
Computer Methods for Line Search Problem
• Interval reduction methods
– Golden search
– Fibonacci search
• Approximate search methods
– Arjimo’s rule
– Quadrature curve fitting
Interval Reduction Methods
• The interval reduction methods find the minimum of a unimodal
function in two steps:
– Bracketing the minimum to an interval
– Reducing the interval to desired accuracy
• The bracketing step aims to find a three-point pattern, such that for
𝑥1, 𝑥2, 𝑥3, 𝑓 𝑥1 ≥ 𝑓 𝑥2 < 𝑓 𝑥3 .
Fibonacci’s Method
• The Fibonacci’s method uses Fibonacci numbers to achieve
maximum interval reduction in a given number of steps.
• The Fibonacci number sequence is generated as:
𝐹0 = 𝐹1 = 1, 𝐹𝑖 = 𝐹𝑖−1 + 𝐹𝑖−2, 𝑖 ≥ 2.
• The properties of Fibonacci numbers include:
– They achieve the golden ratio 𝜏 = lim
𝑛→∞
𝐹𝑛−1
𝐹𝑛
=
5−1
2
≅ 0.618034
– The number of interval reductions 𝑛 required to achieve a desired
accuracy 𝜀 (where 1/𝐹𝑛 < 𝜀) is specified in advance.
– For given 𝐼1 and 𝑛, 𝐼2 =
𝐹𝑛−1
𝐹𝑛
𝐼1, 𝐼3 = 𝐼1 − 𝐼2, 𝐼4 = 𝐼2 − 𝐼3, etc.
The Golden Section Method
• The golden section method uses the golden ratio: 𝜏 = 0.618034.
• The golden section algorithm is given as:
1. Initialize: specify 𝑥1, 𝑥4 𝐼1 = 𝑥4 − 𝑥1 , 𝜀, 𝑛: 𝜏𝑛 <
𝜀
𝐼1
2. Compute 𝑥2 = 𝜏𝑥1 + 1 − 𝜏 𝑥4, evaluate 𝑓2
3. For 𝑖 = 1, … , 𝑛 − 1
Compute 𝑥3 = 1 − 𝜏 𝑥1 + 𝜏𝑥4, evaluate 𝑓3; if 𝑓2 < 𝑓3, set
𝑥4 ← 𝑥1, 𝑥1 ← 𝑥3; else set 𝑥1 ← 𝑥2, 𝑥2 ← 𝑥3, 𝑓2 ← 𝑓3
Approximate Search Methods
• Consider the line search problem: min
𝛼
𝑓(𝛼) = 𝑓 𝒙𝑘 + α𝒅𝑘
• Sufficient Descent Condition. The sufficient descent condition guards
against 𝒅𝑘 becoming too close to 𝛻𝑓 𝒙𝑘 . The condition is stated as:
𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘
< −𝑐 𝛻𝑓 𝒙𝑘 2
, 𝑐 > 0
• Sufficient Decrease Condition. The sufficient decrease condition ensures
a nontrivial reduction in the function value. The condition is stated as:
𝑓 𝒙𝑘 + 𝛼𝒅𝑘 − 𝑓 𝒙𝑘 ≤ 𝜇 𝛼 𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘, 0 < 𝜇 < 1
• Curvature Condition. The curvature condition guards against 𝛼 becoming
too small. The condition is stated as:
𝑓 𝒙𝑘
+ 𝛼𝒅𝑘 𝑇
𝒅𝑘
≥ 𝑓 𝒙𝑘
+ 𝜂 𝛻𝑓 𝒙𝑘 𝑇
𝒅𝑘
, 0 < 𝜇 < 𝜂 < 1
Approximate Line Search
• Strong Wolfe Conditions. The strong Wolfe conditions commonly
used by all line search algorithms include:
1. The sufficient decrease condition (Arjimo’s rule):
𝑓 𝛼 ≤ 𝑓 0 + 𝜇𝛼𝑓′
(0), 0 < 𝜇 < 1
2. Strong curvature condition:
𝑓′
𝛼 ≤ 𝜂 𝑓′
0 , 0 < 𝜇 ≤ 𝜂 < 1
Approximate Line Search
• The approximate line search includes two steps:
– Bracketing the minimum
– Estimating the minimum
• Bracketing the Minimum. In the bracketing step we seek an interval
𝛼, 𝛼 such that 𝑓′
𝛼 < 0 and 𝑓′
𝛼 > 0.
– Since for any descent direction, 𝑓′
0 < 0, therefore, 𝛼 = 0 serves
as a lower bound on 𝛼. To find an upper bound, gradually increase
𝛼, e.g., 𝛼 = 1,2, …,
– Assume that for some 𝛼𝑖 > 0, we get 𝑓′
𝛼𝑖 < 0 and 𝑓′
𝛼𝑖+1 > 0;
then, 𝛼𝑖 serves as an upper bound.
Approximate Line Search
• Estimating the Minimum. Once the minimum has been bracketed
to a small interval, a quadratic or cubic polynomial approximation is
used to find the minimizer.
• If the polynomial minimizer 𝛼 satisfies strong Wolfe’s conditions for
the desired 𝜇 and 𝜂 values (say 𝜇 = 0.2, 𝜂 = 0.5), it is taken as the
function minimizer.
• Otherwise, 𝛼 is used to replace one of the 𝛼 or 𝛼, and the
polynomial approximation step repeated.
Quadratic Curve Fitting
• Assuming that the interval 𝛼𝑙, 𝛼𝑢 contains the minimum of a
unimodal function, 𝑓 𝛼 , its quadratic approximation, given as:
𝑞 𝛼 = 𝑎0 + 𝑎1𝛼 + 𝑎2𝛼2
, is obtained using three points
𝛼𝑙, 𝛼𝑚, 𝛼𝑢 , where the mid-point may be used for 𝛼𝑚
The quadratic coefficients {𝑎0, 𝑎1, 𝑎2} are solved as:
𝑎2 =
1
𝛼𝑢−𝛼𝑚
𝑓 𝛼𝑢 −𝑓 𝛼𝑙
𝛼𝑢−𝛼𝑙
−
𝑓 𝛼𝑚 −𝑓 𝛼𝑙
𝛼𝑚−𝛼𝑙
𝑎1 =
1
𝛼𝑚−𝛼𝑙
𝑓 𝛼𝑚 − 𝑓 𝛼𝑙 − 𝑎2(𝛼𝑙 + 𝛼𝑚)
𝑎0 = 𝑓(𝛼𝑙) − 𝑎1𝛼𝑙 − 𝑎2𝛼𝑙
2
Then, the minimum is given as: 𝛼𝑚𝑖𝑛 = −
𝑎1
2𝑎2
Example: Approximate Search
• Let 𝑓 𝛼 = 𝑒−𝛼 + 𝛼2, 𝑓′ 𝛼 = 2𝛼 − 𝑒−𝛼, 𝑓 0 = 1, 𝑓′ 0 = −1.
Let 𝜇 = 0.2, and try 𝛼 = 0.1, 0.2, …, to bracket the minimum.
• From the sufficient decrease condition, the minimum is bracketed
in the interval: [0, 0.5]
• Using quadratic approximation, the minimum is found as:
𝑥∗
= 0.3531
The exact solution is given as: 𝛼𝑚𝑖𝑛 = 0.3517
• The Matlab commands are:
Define the function:
f=@(x) x.*x+exp(-x);
mu=0.2; al=0:.1:1;
Example: Approximate Search
• Bracketing the minimum:
f1=feval(f,al)
1.0000 0.9148 0.8587 0.8308 0.8303 0.8565 0.9088 0.9866
1.0893 1.2166 1.3679
>> f2=f(0)-mu*al
1.0000 0.9800 0.9600 0.9400 0.9200 0.9000 0.8800 0.8600
0.8400 0.8200 0.8000
>> idx=find(f1<=f2)
• Quadratic approximation to find the minimum:
al=0; am=0.25; au=0.5;
a2 = ((f(au)-f(al))/(au-al)-(f(am)-f(al))/(am-al))/(au-am);
a1 = (f(am)-f(al))/(am-al)-a2*(al+am);
xmin = -a1/a2/2 % 0.3531
Computer Methods for Finding the Search Direction
• Gradient based methods
– Steepest descent method
– Conjugate gradient method
– Quasi Newton methods
• Hessian based methods
– Newton’s method
– Trust region methods
Steepest Descent Method
• The steepest descent method determines the search direction as:
𝒅𝑘 = −𝛻𝑓(𝒙𝑘),
• The update rule is given as: 𝒙𝑘+1
= 𝒙𝑘
− 𝛼𝑘 ∙ 𝛻𝑓(𝒙𝑘
)
where 𝛼𝑘 is determined by minimizing 𝑓(𝒙𝑘+1) along 𝒅𝑘
• Example: quadratic function
𝑓 𝒙 =
1
2
𝒙𝑇
𝑨𝒙 − 𝒃𝑇
𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃
Then, 𝒙𝑘+1
= 𝒙𝑘
− 𝛼 ∙ 𝛻𝑓 𝒙𝑘
; 𝛼 =
𝛻 𝑓 𝒙𝑘 𝑇
𝛻 𝑓 𝒙𝑘
𝛻 𝑓 𝒙𝑘 𝑇
𝐀𝛻 𝑓 𝒙𝑘
Define 𝒓𝑘 = 𝒃 − 𝑨𝒙𝑘
Then, 𝒙𝑘+1
= 𝒙𝑘
+ 𝛼𝑘𝒓𝑘; 𝛼𝑘 =
𝒓𝑘
𝑇
𝒓𝑘
𝒓𝑘
𝑇𝐴𝒓𝑘
Steepest Descent Algorithm
• Initialize: choose 𝒙0
• For 𝑘 = 0,1,2, …
– Compute 𝛻𝑓(𝒙𝑘
)
– Check convergence: if 𝛻𝑓(𝒙𝑘
) < 𝜖, stop.
– Set 𝒅𝑘 = −𝛻𝑓(𝒙𝑘)
– Line search problem: Find min
𝛼≥0
𝑓 𝒙𝑘 + 𝛼𝒅𝑘
– Set 𝒙𝑘+1
= 𝒙𝑘
+ 𝛼𝒅𝑘
.
Example: Steepest Descent
• Consider min
𝒙
𝑓 𝒙 = 0.1𝑥1
2
+ 𝑥2
2
,
𝛻𝑓 𝒙 =
0.2𝑥1
2𝑥2
, 𝛻2𝑓 𝑥 =
0.1 0
0 1
; let 𝒙0 =
5
1
, then, 𝑓 𝒙0 = 3.5,
𝑑1 = −𝛻𝑓 𝒙0 =
−1
−2
, 𝛼 = 0.61
𝒙1 =
4.39
−0.22
, 𝑓 𝒙1 = 1.98
Continuing..
Example: Steepest Descent
• MATLAB code:
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1];
xall=x';
for i=1:10
d=-df(x);
a=d'*d/(d'*H*d);
x=x+a*d;
xall=[xall;x'];
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal
Steepest Descent Method
• The steepest descent method becomes slow close to the optimum
• The method progresses in a zigzag fashion, since
𝑑
𝑑𝛼
𝑓 𝒙𝑘
+ 𝛼𝒅𝑘
= 𝛻 𝑓 𝒙𝑘+1 𝑇
𝒅𝑘
= −𝛻 𝑓 𝒙𝑘+1 𝑇
𝛻 𝑓 𝒙𝑘
= 0
• The method has linear convergence with rate constant
𝐶 =
𝑓 𝒙𝑘+1 −𝑓 𝒙∗
𝑓 𝒙𝑘 −𝑓 𝒙∗ ≤
𝑐𝑜𝑛𝑑 𝑨 −1
𝑐𝑜𝑛𝑑 𝑨 +1
2
Preconditioning
• Preconditioning (scaling) can be used to reduce the condition
number of the Hessian matrix and hence aid convergence
• Consider 𝑓 𝒙 = 0.1𝑥1
2
+ 𝑥2
2
= 𝒙𝑇
𝑨𝒙, where 𝑨 = 𝑑𝑖𝑎𝑔(0.1, 1)
• Define a linear transformation: 𝒙 = 𝑷𝒚, where 𝑷 = 𝑑𝑖𝑎𝑔( 10, 1);
then, 𝑓 𝒙 = 𝒚𝑇
𝑷𝑇
𝑨𝑷𝒚 = 𝒚𝑇
𝒚
• Since 𝑐𝑜𝑛𝑑 𝑰 = 1, the steepest descent method in the case of a
quadratic function converges in a single iteration
Conjugate Gradient Method
• For any square matrix 𝑨, the set of 𝑨-conjugate vectors is defined
by: 𝒅𝑖𝑇
𝑨𝒅𝑗 = 0, 𝑖 ≠ 𝑗
• Let 𝒈𝑘 = 𝛻 𝑓 𝒙𝑘 denote the gradient; then, starting from
𝒅0
= −𝒈0, a set of 𝑨-conjugate directions is generated as:
𝒅0 = −𝒈0; 𝒅𝑘+1 = −𝒈𝑘+1 + 𝛽𝑘𝒅𝑘 𝑘 ≥ 0, …
where 𝛽𝑘 =
𝒈𝑘+1
𝑇
𝑨𝒅𝑘
𝒅𝑘𝑇
𝑨𝒅𝑘
There are multiple ways to generate conjugate directions
• Using {𝒅0
, 𝒅2
, … , 𝒅𝑛−1
} as search directions, a quadratic function is
minimized in 𝑛 steps.
Conjugate Directions Method
• The parameter 𝛽𝑘 can be computed in different ways:
– By substituting 𝑨𝒅𝑘
=
1
𝛼𝑘
(𝒈𝑘+1 − 𝒈𝑘), we obtain:
𝛽𝑘 =
𝒈𝑘+1
𝑇
(𝒈𝑘+1−𝒈𝑘)
𝒅𝑘𝑇
(𝒈𝑘+1−𝒈𝑘)
(the Hestenes-Stiefel formula)
– In the case of exact line search, 𝑔𝑘+1
𝑇
𝒅𝑘
= 0; then
𝛽𝑘 =
𝒈𝑘+1
𝑇
(𝒈𝑘+1−𝒈𝑘)
𝒈𝑘
𝑇𝒈𝑘
(the Polak-Ribiere formula)
– Also, for exact line search 𝒈𝑘+1
𝑇
𝒈𝑘 = 𝛽𝑘−1(𝒈𝑘 + 𝛼𝑘𝑨𝒅𝑘
)𝑇
𝒅𝑘−1
= 0,
resulting in 𝛽𝑘 =
𝒈𝑘+1
𝑇
𝒈𝑘+1
𝒈𝑘
𝑇𝒈𝑘
(the Fletcher-Reeves formula)
Other versions of 𝛽𝑘 have also been proposed.
Example: Conjugate Gradient Method
• Consider min
𝒙
𝑓 𝒙 = 0.1𝑥1
2
+ 𝑥2
2
,
𝛻𝑓 𝒙 =
0.2𝑥1
2𝑥2
, 𝛻2𝑓 𝑥 =
0.1 0
0 1
; let 𝒙0 =
5
1
, then 𝑓 𝒙0 = 3.5,
𝑑0 = − 𝛻𝑓 𝒙0 =
−1
−2
, 𝛼 = 0.61
𝒙1 =
4.39
−0.22
, 𝑓 𝒙1 = 1.98
𝛽0 = 0.19
𝑑1 =
−0.535
0.027
, 𝛼 = 8.2
𝒙1 =
0
0
Example: Conjugate Gradient Method
• MATLAB code
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1]; n=2;
xall=zeros(n+1,n); xall(1,:)=x';
d=-df(x); a=d'*d/(d'*H*d);
x=x+a*d; xall(2,:)=x';
for i=1:size(x,1)-1
b=df(x)'*H*d/(d'*H*d);
d=-df(x)+b*d;
r=-df(x);
a=r'*r/(d'*H*d);
x=x+a*d;
xall(i+2,:)=x';
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal
Conjugate Gradient Algorithm
• Conjugate-Gradient Algorithm (Griva, Nash & Sofer, p454):
• Initialize: Choose 𝒙0 = 𝟎, 𝒓0 = 𝒃, 𝒅(−1) = 0, 𝛽0 = 0.
• For 𝑖 = 0,1, …
– Check convergence: if 𝒓𝑖 < 𝜖, stop.
– If 𝑖 > 0, set 𝛽𝑖 =
𝒓𝑖
𝑇
𝒓𝑖
𝒓𝑖−1
𝑇 𝒓𝑖−1
– Set 𝒅𝑖
= 𝒓𝑖 + 𝛽𝑖𝒅𝑖−1
; 𝛼𝑖 =
𝒓𝑖
𝑇
𝒓𝑖
𝒅𝑖𝑇
𝑨𝒅𝑖
; 𝒙𝑖+1 = 𝒙𝑖 + 𝛼𝑖𝒅𝑖
;
𝒓𝑖+1 = 𝒓𝑖 − 𝛼𝑖𝑨𝒅𝑖.
Conjugate Gradient Method
• Assume that an update that includes steps 𝛼𝑖 along 𝑛 conjugate
vectors 𝒅𝑖 is assembled as: 𝑦 = 𝛼𝑖𝒅𝑖
𝑛
𝑖=1 .
• Then, for a quadratic function, the minimization problem is
decomposed into a set of one-dimensional problems, i.e.,
min
𝑦
𝑓(𝒚) ≡ min
𝛼𝑖
1
2
𝛼𝑖
2
𝒅𝑖𝑇
𝑨𝒅𝑖
− 𝛼𝑖𝒃𝑇
𝒅𝑖
𝑛
𝑖=1
• By setting the derivative with respect to 𝛼𝑖 equal to zero, i.e.,
𝛼𝑖𝒅𝑖𝑇
𝑨𝒅𝑖
− 𝒃𝑇
𝒅𝑖
= 0, we obtain: 𝛼𝑖 =
𝒃𝑇𝒅𝑖
𝒅𝑖𝑇
𝑨𝒅𝑖
.
• This shows that the CG algorithm iteratively determines the
conjugate directions 𝒅𝑖 and their coefficients 𝛼𝑖.
CG Rate of Convergence
• Conjugate gradient methods achieve superlinear convergence:
– In the case of quadratic functions, the minimum is reached exactly
in 𝑛 iterations.
– For general nonlinear functions, convergence in 2𝑛 iterations is to
be expected.
• Nonlinear CG methods typically have the lowest per iteration
computational costs of all gradient methods.
Newton’s Method
• Consider minimizing the second order approximation of 𝑓 𝒙 :
min
𝒅
𝑓 𝒙𝑘 + Δ𝒙 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘
𝑇Δ𝒙 + 1
2 Δ𝒙𝑇𝑯𝑘Δ𝒙
• Apply FONC: 𝑯𝑘𝒅 + 𝒈𝑘 = 𝟎, where 𝒈𝑘 = 𝛻𝑓 𝒙𝑘
Then, assuming that 𝑯𝑘 = 𝛻2
𝑓 𝒙𝑘 stays positive definite, the
Newton’s update rule is derived as: 𝒙𝑘+1 = 𝒙𝑘 − 𝑯𝑘
−1
𝒈𝑘
• Note:
– The convergence of the Newton’s method is dependent on 𝑯𝑘
staying positive definite.
– A step size may be included in the Newton’s method, i.e.,
𝒙𝑘+1 = 𝒙𝑘 − 𝛼𝑘𝑯𝑘
−1
𝒈𝑘
Marquardt Modification to Newton’s Method
• To ensure the positive definite condition on 𝑯𝑘, Marquardt
proposed the following modification to Newton’s method:
𝑯𝑘 + 𝜆𝑰 𝒅 = −𝒈𝑘
where 𝜆 is selected to ensure that the Hessian is positive definite.
• Since 𝑯𝑘 + 𝜆𝑰 is also symmetric, the resulting system of linear
equations can be solved for 𝒅 as:
𝑳𝑫𝑳𝑇
𝒅 = −𝛻𝑓 𝒙𝑘
Newton’s Algorithm
Newton’s Method (Griva, Nash, & Sofer, p. 373):
1. Initialize: Choose 𝒙0, specify 𝜖
2. For 𝑘 = 0,1, …
3. Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜖, stop
4. Factorize modified Hessian as 𝛻2
𝑓 𝒙𝑘 + 𝑬 = 𝑳𝑫𝑳𝑇
and solve
𝑳𝑫𝑳𝑇 𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅
5. Perform line search to determine 𝛼𝑘 and update the solution
estimate as 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘
Rate of Convergence
• Newton’s method achieves quadratic rate of convergence in the
close neighborhood of the optimal point, and superlinear
convergence otherwise.
• The main drawback of the Newton’s method is its computational
cost: the Hessian matrix needs to be computed at every step, and a
linear system of equations needs to be solved to obtain the update.
• Due to the high computational and storage costs, classic Newton’s
method is rarely used in practice.
Quasi Newton’s Methods
• The quasi-Newton methods derive from a generalization of secant
method, that approximates the second derivative as:
𝑓′′
(𝑥𝑘) ≅
𝑓′ 𝑥𝑘 −𝑓′(𝑥𝑘−1)
𝑥𝑘−𝑥𝑘−1
• In the multi-dimensional case, the secant condition is generalized
as: 𝑯𝑘 𝒙𝑘 − 𝒙𝑘−1 = 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1
• Define 𝑭𝑘 = 𝑯𝑘
−1
, then
𝒙𝑘 − 𝒙𝑘−1 = 𝑭𝑘 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1
• The quasi-Newton methods iteratively update 𝑯𝑘 or 𝑭𝑘 as:
– Direct update: 𝑯𝑘+1 = 𝑯𝑘 + ∆𝑯𝑘, 𝑯0 = 𝑰
– Inverse update: 𝑭𝑘+1 = 𝑭𝑘 + ∆𝑭𝑘, 𝑭 = 𝑯−1
, 𝑭0 = 𝑰
Quasi-Newton Methods
• Quasi-Newton update:
Let 𝒔𝑘 = 𝒙𝑘+1 − 𝒙𝑘, 𝒚𝑘 = 𝛻𝑓 𝒙𝑘+1 − 𝛻𝑓 𝒙𝑘 ; then,
– The DFP (Davison-Fletcher-Powell) formula for inverse Hessian
update is given as:
𝑭𝑘+1 = 𝑭𝑘 −
𝑭𝑘𝒚𝑘 𝑭𝑘𝒚𝑘
𝑇
𝒚𝑘
𝑇𝑭𝑘𝒚𝑘
+
𝒔𝑘𝒔𝑘
𝑇
𝒚𝑘
𝑇𝒔𝑘
– The BGFS (Broyden, Fletcher, Goldfarb, Shanno) formula for direct
Hessian update is given as:
𝑯𝑘+1 = 𝑯𝑘 −
𝑯𝑘𝒔𝑘 𝑯𝑘𝒔𝑘
𝑇
𝒔𝑘
𝑇𝑯𝑘𝒔𝑘
+
𝒚𝑘𝒚𝑘
𝑇
𝒚𝑘
𝑇𝒔𝑘
Quasi-Newton Algorithm
The Quasi-Newton Algorithm (Griva, Nash & Sofer, p.415):
• Initialize: Choose 𝒙0, 𝑯0 (e.g., 𝑯0 = 𝑰), specify 𝜀
• For 𝑘 = 0,1, …
– Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop
– Solve 𝑯𝑘𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅𝑘
(alternatively, 𝒅 = −𝑭𝑘𝛻𝑓 𝒙𝑘 )
– Solve min
𝛼
𝑓 𝒙𝑘 + 𝛼𝒅𝑘 for 𝛼𝑘, and update the current estimate:
𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘
– Compute 𝒔𝑘, 𝒚𝑘, and update 𝑯𝑘 (or 𝑭𝑘 as applicable)
Example: Quasi-Newton Method
• Consider the problem: min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 2𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
, where
𝑯 =
4 − 1
−1 2
, 𝛻𝑓 = 𝑯
𝑥1
𝑥2
. Let 𝒙0 =
1
1
, 𝑓0 = 4, 𝑯0 = 𝑰, 𝑭0 = 𝑰;
Choose 𝒅0
= −𝛻𝑓 𝑥0
=
−3
−1
;
then 𝑓 𝛼 = 2 1 − 3𝛼 2
+ 1 − 𝛼 2
− (1 − 3𝛼)(1 − 𝛼),
Using 𝑓′
𝛼 = 0 → 𝛼 =
5
16
→ 𝒙1 = 0.625
0.688
, 𝑓1 = 0.875;
then 𝒚1 =
−3.44
0.313
, 𝑭1 =
1.193 0.065
0.065 1.022
, 𝑯1 =
0.381 −0.206
−0.206 0.9313
,
and using either update formula 𝒅1
=
0.4375
−1.313
; for the next step,
𝑓 𝛼 = 5.36𝛼2 − 3.83𝛼 + 0.875 → 𝛼 = −0.3572, 𝒙2 =
0.2188
0.2188
.
Example: Quasi-Newton Method
• For quadratic function, convergence is achieved in two iterations.
Trust-Region Methods
• The trust-region methods locally employ a quadratic approximation
𝑞𝑘 𝒙𝑘 to the nonlinear objective function.
• The approximation is valid in the neighborhood of 𝒙𝑘 defined by
Ω𝑘 = 𝒙: 𝚪(𝒙 − 𝒙𝑘) ≤ ∆𝑘 , where 𝚪 is a scaling parameter.
• The method aims to find a 𝒙𝑘+1 ∈ Ω𝑘, that satisfies the sufficient
decrease condition in 𝑓(𝒙).
• The quality of the quadratic approximation is estimated by the
reliability index: 𝛾𝑘 =
𝑓(𝒙𝑘)−𝑓(𝒙𝑘+1)
𝑞𝑘 𝒙𝑘 −𝑞𝑘 𝒙𝑘+1
. If this ratio is close to unity,
the trust region may be expanded in the next iteration.
Trust-Region Methods
• At each iteration 𝑘, trust-region algorithm solves a constrained
optimization sub-problem involving quadratic approximation:
min
𝒅
𝑞𝑘 𝒅 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘
𝑇
𝒅 +
1
2
𝒅𝑇
𝛻2
𝑓 𝒙𝑘 𝒅
Subject to: 𝒅 ≤ ∆𝑘
Lagrangian function: ℒ 𝑥, 𝜆 = 𝑓 𝒙𝑘 + 𝜆 𝒅 − ∆𝑘
FONC: 𝛻2𝑓 𝒙𝑘 + 𝜆𝑰 𝒅𝑘 = −𝛻𝑓 𝒙𝑘 , 𝜆 𝒅 − ∆𝑘 = 0
• The resulting search direction 𝒅𝑘 is given as: 𝒅𝑘 = 𝒅𝑘(𝜆).
– For large ∆𝑘 and a positive-definite 𝛻2𝑓 𝒙𝑘 , the Lagrange
multiplier 𝜆 → 0, and 𝒅𝑘
(𝜆) reduces to the Newton’s direction.
– For ∆𝑘→ 0, 𝜆 → ∞, and 𝒅𝑘
(𝜆) aligns with the steepest-descent
direction.
Trust-Region Algorithm
• Trust-Region Algorithm (Griva, Nash & Sofer, p.392):
• Initialize: choose 𝒙0, ∆0; specify 𝜀, 0 < 𝜇 < 𝜂 < 1 (e.g., 𝜇 =
1
4
; 𝜂 =
3
4
)
• For 𝑘 = 0,1, …
– Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop
– Solve the subproblem: min
𝒅
𝑞𝑘 𝒅 subject to 𝒅 ≤ ∆𝑘
– Compute 𝛾𝑘,
• if 𝛾𝑘 < 𝜇, set 𝒙𝑘+1 = 𝒙𝑘, ∆𝑘+1=
1
2
∆𝑘
• else if 𝛾𝑘 < 𝜂, set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅𝑘
, ∆𝑘+1= ∆𝑘
• else set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅𝑘
, ∆𝑘+1= 2∆𝑘
Computer Methods for Constrained Problems
• Penalty and Barrier methods
• Augmented Lagrangian method (AL)
• Sequential linear programming (SLP)
• Sequential quadratic programming (SQP)
Penalty and Barrier Methods
• Consider the general optimization problem: min
𝒙
𝑓 𝒙
Subject to
ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝;
𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚;
𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈, 𝑖 = 1, … , 𝑛.
• Define a composite function to be used for constraint compliance:
Φ 𝒙, 𝑟 = 𝑓 𝒙 + 𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓
where 𝑃 defines a loss function, and 𝒓 is a vector of weights (penalty
parameters)
Penalty and Barrier Methods
• Penalty Function Method. A penalty function method employs a
quadratic loss function and iterates through the infeasible region
𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓 = 𝑟 𝑔𝑖
+
𝒙
2
𝑖 + ℎ𝑖 𝒙 2
𝑖
𝑔𝑖
+
𝒙 = max 0, 𝑔𝑖 𝒙 , 𝑟 > 0
• Barrier Function Method. A barrier method employs a log barrier
function and iterates through the feasible region
𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓 =
1
𝑟
log −𝑔𝑖 𝑥
𝑖
• For both penalty and barrier methods, as 𝑟 → ∞, 𝒙(𝑟) → 𝒙∗
The Augmented Lagrangian Method
• Consider an equality-constrained problem: min
𝒙
𝑓 𝒙
Subject to: ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙
• Define the augmented Lagrangian (AL) as:
𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑣𝑗ℎ𝑗 𝒙 +
1
2
𝑟ℎ𝑗
2
𝒙
𝑗
where the additional term defines an exterior penalty function with
𝑟 as the penalty parameter.
• For inequality constrained problems, the AL may be defined as:
𝒫 𝒙, 𝒖, 𝑟 = 𝑓 𝒙 +
𝑢𝑖𝑔𝑖 𝒙 +
1
2
𝑟𝑔𝑖
2
𝒙 , if 𝑔𝑗 +
𝑢𝑗
𝑟
≥ 0
−
1
2𝑟
𝑢𝑖
2
, if 𝑔𝑗 +
𝑢𝑗
𝑟
< 0
𝑖
where a large 𝑟 makes the Hessian of AL positive definite at 𝒙.
The Augmented Lagrangian Method
• The dual function for the AL is defined as:
𝜓 𝒗 = min
𝒙
𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑣𝑗ℎ𝑗 𝒙 +
1
2
𝑟 ℎ𝑗 𝒙
2
𝑗
• The resulting dual optimization problem is: max
𝒗
𝜓 𝒗
• The dual problem may be solved via Newton’s method as:
𝒗𝑘+1
= 𝒗𝑘
−
𝑑2𝜓
𝑑𝑣𝑖𝑑𝑣𝑗
−1
𝒉
where
𝑑2𝜓
𝑑𝑣𝑖𝑑𝑣𝑗
= −𝛻ℎ𝑖
𝑇
𝛻2𝒫 −1𝛻ℎ𝑗
• For large 𝒓, the Newton’s update may be approximated as:
𝑣𝑗
𝑘+1
= 𝑣𝑗
𝑘
+ 𝑟
𝑗ℎ𝑗, 𝑗 = 1, … , 𝑙
Example: Augmented Lagrangian
• Maximize the volume of a cylindrical tank subject to surface area
constraint:
max
𝑑,𝑙
𝑓 𝑑, 𝑙 =
𝜋𝑑2𝑙
4
, subject to ℎ:
𝜋𝑑2
4
+ 𝜋𝑑𝑙 − 𝐴0 = 0
• We can normalize the problem as:
min
𝑑,𝑙
𝑓 𝑑, 𝑙 = −𝑑2
𝑙, subject to ℎ: 𝑑2
+ 4𝑑𝑙 − 1 = 0
• The solution to the primal problem is obtained as:
Lagrangian function: ℒ 𝑑, 𝑙, 𝜆 = −𝑑2
𝑙 + 𝜆(𝑑2
+ 4𝑑𝑙 − 1)
FONC: 𝜆 𝑑 + 2𝑙 − 𝑑𝑙 = 0, 𝜆𝑑 𝑑 + 4 − 𝑑2
= 0, 𝑑2
+ 4𝑑𝑙 − 1 = 0
Optimal solution: 𝑑∗ = 2𝑙∗ = 4𝜆∗ =
1
3
.
Example: Augmented Lagrangian
• Alternatively, define the Augmented Lagrangian function as:
𝒫 𝑑, 𝑙, 𝜆, 𝑟 = −𝑑2𝑙 + 𝜆 𝑑2 + 4𝑑𝑙 − 1 +
1
2
𝑟 𝑑2 + 4𝑑𝑙 − 1 2
• Define the dual function: 𝜓 𝜆 = min
𝑑,𝑙
𝒫 𝑑, 𝑙, 𝜆, 𝑟
• Define dual optimization problem: max
𝑑,𝑙
𝜓 𝜆
• Solution to the dual problem: 𝜆∗
= 𝜆𝑚𝑎𝑥 = 0.144
• Solution to the design variables: 𝑑∗ = 2𝑙∗ = 0.577
Sequential Linear Programming
• Consider the general optimization problem: min
𝒙
𝑓 𝒙
Subject to
ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝;
𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚;
𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈, 𝑖 = 1, … , 𝑛.
• Let 𝒙𝑘 denote the current estimate of the design variables, and let
𝒅 denote the change in variables; define the first order expansion
of the objective and constraint functions in the neighborhood of 𝒙𝑘
𝑓 𝒙𝑘
+ 𝒅 = 𝑓 𝒙𝑘
+ 𝛻𝑓 𝒙𝑘 𝑇
𝒅
𝑔𝑖 𝒙𝑘 + 𝒅 = 𝑔𝑖 𝒙𝑘 + 𝛻𝑔𝑖 𝒙𝑘 𝑇
𝒅, 𝑖 = 1, … , 𝑚
ℎ𝑗 𝒙𝑘 + 𝒅 = ℎ𝑗 𝒙𝑘 + 𝛻ℎ𝑗 𝒙𝑘 𝑇
𝒅, 𝑗 = 1, … , 𝑙
Sequential Linear Programming
• Let 𝑓𝑘 = 𝑓 𝒙𝑘 , 𝑔𝑖
𝑘
= 𝑔𝑖 𝒙𝑘 , ℎ𝑗
𝑘
= ℎ𝑗 𝒙𝑘 ; 𝑏𝑖 = −𝑔𝑖
𝑘
, 𝑒𝑗 = −ℎ𝑗
𝑘
,
𝒄 = 𝛻𝑓 𝒙𝑘 , 𝒂𝑖 = 𝛻𝑔𝑖 𝒙𝑘 , 𝒏𝑗 = 𝛻ℎ𝑗 𝒙𝑘 ,
𝑨 = 𝒂1, 𝒂2, … , 𝒂𝑚 , 𝑵 = 𝒏1, 𝒏2, … , 𝒏𝑙 .
• Using first order expansion, define an LP subprogram for the
current iteration of the NLP problem:
min
𝒅
𝑓 = 𝒄𝑇𝒅
Subject to: 𝑨𝑇
𝒅 ≤ 𝒃,
𝑵𝑇𝒅 = 𝒆
where 𝑓 represents first-order change in the cost function, and the
columns of 𝑨 and 𝑵 matrices represent, respectively, the gradients
of inequality and equality constraints.
• The resulting LP problem can be solved via the Simplex method.
Sequential Linear Programming
• We may note that:
– Since both positive and negative changes to design variables 𝒙𝑘 are
allowed, the variables 𝑑𝑖 are unrestricted in sign
– The SLP method requires additional constraints of the form:
− ∆𝑖𝑙
𝑘
≤ 𝑑𝑖
𝑘
≤ ∆𝑖𝑢
𝑘
(termed move limits) to bind the LP solution.
These limits represent maximum allowable change in 𝑑𝑖 in the
current iteration and are selected as percentage of current value.
– Move limits serve dual purpose of binding the solution and
obviating the need for line search.
– Overly restrictive move limits tend to make the SLP problem
infeasible.
SLP Example
• Consider the convex NLP problem:
min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
Subject to: 1 − 𝑥1
2
− 𝑥2
2
≤ 0; −𝑥1 ≤ 0, −𝑥2 ≤ 0
The problem has a single minimum at: 𝒙∗
=
1
2
,
1
2
• The objective and constraint gradients are:
𝛻𝑓𝑇
= 2𝑥1 − 𝑥2, 2𝑥2 − 𝑥1 ,
𝛻𝑔1
𝑇
= −2𝑥1, −2𝑥2 , 𝛻𝑔2
𝑇
= −1,0 , 𝛻𝑔3
𝑇
= [0, −1].
• Let 𝒙0
= 1, 1 , then 𝑓0
= 1, 𝒄𝑇
= 1 1 , 𝑏1 = 𝑏2 = 𝑏3 = 1;
𝒂1
𝑇
= −2 − 2 , 𝒂2
𝑇
= −1 0 , 𝒂3
𝑇
= 0 − 1
SLP Example
• Define the LP subproblem at the current step as:
min
𝑑1,𝑑2
𝑓 𝑥1, 𝑥2 = 𝑑1 + 𝑑2
Subject to:
−2 −2
−1 0
0 −1
𝑑1
𝑑2
≤
1
1
1
• In the absence of move limits, the LP problem is unbounded; using
50% move limits, the SLP update is given as: 𝒅∗
= −
1
2
, −
1
2
𝑇
,
𝒙1 =
1
2
,
1
2
𝑇
, with resulting constraint violation: 𝑔𝑖 =
1
2
, 0, 0 ;
smaller move limits may be used to reduce the constraint violation.
Sequential Linear Programming
SLP Algorithm (Arora, p. 508):
• Initialize: choose 𝒙0, 𝜀1 > 0, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
– Choose move limits ∆𝑖𝑙
𝑘
, ∆𝑖𝑢
𝑘
as some fraction of current design 𝒙𝑘
– Compute 𝑓𝑘
, 𝒄, 𝑔𝑖
𝑘
, ℎ𝑗
𝑘
, 𝑏𝑖, 𝑒𝑗
– Formulate and solve the LP subproblem for 𝒅𝑘
– If 𝑔𝑖 ≤ 𝜀1; 𝑖 = 1, … , 𝑚; ℎ𝑗 ≤ 𝜀1; 𝑖 = 1, … , 𝑝; and 𝒅𝑘 ≤ 𝜀2, stop
– Substitute 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘, 𝑘 ← 𝑘 + 1.
Sequential Quadratic Programming
• Sequential quadratic programming (SQP) uses a quadratic
approximation to the objective function at every step of iteration.
• The SQP problem is defined as:
min
𝒅
𝑓 = 𝒄𝑇
𝒅 +
1
2
𝒅𝑇
𝒅
Subject to, 𝑨𝑇𝒅 ≤ 𝒃, 𝑵𝑇𝒅 = 𝒆
• SQP does not require move limits, alleviating the shortcomings of
the SLP method.
• The SQP problem is convex; hence, it has a single global minimum.
• SQP can be solved via Simplex based linear complementarity problem
(LCP) framework.
Sequential Quadratic Programming
• The Lagrangian function for the SQP problem is defined as:
ℒ 𝒅, 𝒖, 𝒗 = 𝒄𝑇𝒅 + 1
2
𝒅𝑇𝒅 + 𝒖𝑇 𝑨𝑇𝒅 − 𝒃 + 𝒔 + 𝒗𝑇(𝑵𝑇𝒅 − 𝒆)
• Then the KKT conditions are:
Optimality: 𝛁ℒ = 𝒄 + 𝒅 + 𝑨𝒖 + 𝑵𝒗 = 𝟎,
Feasibility: 𝑨𝑇
𝒅 + 𝒔 = 𝒃, 𝑵𝑇
𝒅 = 𝒆 ,
Complementarity: 𝒖𝑇
𝒔 = 𝟎,
Non-negativity: 𝒖 ≥ 𝟎, 𝒔 ≥ 𝟎
Sequential Quadratic Programming
• Since 𝒗 is unrestricted in sign, let 𝒗 = 𝒚 − 𝒛, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎, and
the KKT conditions are compactly written as:
𝑰 𝑨
𝑨𝑇
𝟎
𝑵𝑇
𝟎
𝟎
𝑰
𝟎
𝑵 −𝑵
𝟎 𝟎
𝟎 𝟎
𝒅
𝒖
𝒔
𝒚
𝒛
=
−𝒄
𝒃
𝒆
,
or 𝑷𝑿 = 𝑸
• The complementary slackness conditions, 𝒖𝑇𝒔 = 𝟎, translate as:
𝑿𝑖𝑿𝑖+𝑚 = 0, 𝑖 = 𝑛 + 1, ⋯ , 𝑛 + 𝑚.
• The resulting problem can be solved via Simplex method using LCP
framework.
Descent Function Approach
• In SQP methods, the line search step is based on minimization of a
descent function that penalizes constraint violations, i.e.,
Φ 𝒙 = 𝑓 𝒙 + 𝑅𝑉 𝒙
where 𝑓 𝒙 is the cost function, 𝑉 𝒙 represents current
maximum constraint violation, and 𝑅 > 0 is a penalty parameter.
• The descent function value at the current iteration is computed as:
Φ𝑘 = 𝑓𝑘 + 𝑅𝑉𝑘,
𝑅 = max 𝑅𝑘, 𝑟𝑘 where 𝑟𝑘 = 𝑢𝑖
𝑘
𝑚
𝑖=1 + 𝑣𝑗
𝑘
𝑝
𝑗=1
𝑉𝑘 = max {0; 𝑔𝑖, 𝑖 = 1, . . . , 𝑚; ℎ𝑗 , 𝑗 = 1, … , 𝑝}
• The line search subproblem is defined as:
min
𝛼
Φ 𝛼 = Φ 𝒙𝑘
+ 𝛼𝒅𝑘
SQP Algorithm
SQP Algorithm (Arora, p. 526):
• Initialize: choose 𝒙0, 𝑅0 = 1, 𝜀1 > 0, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
– Compute 𝑓𝑘
, 𝑔𝑖
𝑘
, ℎ𝑗
𝑘
, 𝒄, 𝑏𝑖, 𝑒𝑗; compute 𝑉𝑘.
– Formulate and solve the QP subproblem to obtain 𝒅𝑘 and the
Lagrange multipliers 𝒖𝑘
and 𝒗𝑘
.
– If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘
≤ 𝜀2, stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set 𝒙𝑘+1
← 𝒙𝑘
+ 𝛼𝒅𝑘
, 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1
• The above algorithm is convergent, i.e., Φ 𝒙𝑘
≤ Φ 𝒙0
; 𝒙𝑘
converges to the KKT point 𝒙∗
SQP with Approximate Line Search
• The SQP algorithm can use with approximate line search as follows:
Let 𝑡𝑗, 𝑗 = 0,1, … denote a trial step size,
𝒙𝑘+1,𝑗
denote the trial design point,
𝑓𝑘+1,𝑗
= 𝑓( 𝒙𝑘+1,𝑗
) denote the function value at the trial solution, and
Φ𝑘+1,𝑗 = 𝑓𝑘+1,𝑗
+ 𝑅𝑉𝑘+1,𝑗 is the penalty function at the trial solution.
• The trial solution is required to satisfy the descent condition:
Φ𝑘+1,𝑗 + 𝑡𝑗𝛾 𝒅𝑘 2
≤ Φ𝑘,𝑗, 0 < 𝛾 < 1
where a common choice is: 𝛾 =
1
2
, 𝜇 =
1
2
, 𝑡𝑗 = 𝜇𝑗
, 𝑗 = 0,1,2, ….
• The above descent condition ensures that the constraint violation
decreases at each step of the method.
SQP Example
• Consider the NLP problem: min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
subject to 𝑔1: 1 − 𝑥1
2
− 𝑥2
2
≤ 0, 𝑔2: −𝑥1 ≤ 0, 𝑔3: −𝑥2 ≤ 0
Then 𝛻𝑓𝑇
= 2𝑥1 − 𝑥2, 2𝑥2 − 𝑥1 , 𝛻𝑔1
𝑇
= −2𝑥1, −2𝑥2 , 𝛻𝑔2
𝑇
=
−1,0 , 𝛻𝑔3
𝑇
= [0, −1]. Let 𝑥0 = 1, 1 ; then, 𝑓0 = 1, 𝒄 = 1, 1 𝑇,
𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = −1.
• Since all constraints are initially inactive, 𝑉0 = 0, and 𝒅 = −𝒄 =
−1, −1 𝑇; the line search problem is: min
𝛼
Φ 𝛼 = 1 − 𝛼 2;
• By setting Φ′
𝛼 = 0, we get the analytical solution: 𝛼 = 1; thus
𝑥1 = 0, 0 , which results in a large constraint violation
SQP Example
• Alternatively, we may use approximate line search as follows:
– Let 𝑅0 = 10, 𝛾 = 𝜇 =
1
2
; let 𝑡0 = 1, then 𝒙1,0 = 0,0 , 𝑓1,0 = 0,
𝑉1,0 = 1, Φ1,0 = 10; 𝒅0 2 = 2, and the descent condition
Φ1,0 +
1
2
𝒅0 2
≤ Φ0 = 1 is not met at the trial point.
– Next, for 𝑡1 =
1
2
, we get: 𝒙1,1 =
1
2
,
1
2
, 𝑓1,1 =
1
4
, V1,1 =
1
2
,
Φ1,1 = 5
1
4
, and the descent condition fails again;
– Next, for 𝑡2 =
1
4
, we get: 𝒙1,2
=
3
4
,
3
4
, V1,2 = 0, 𝑓1,2
= Φ1,2 =
9
16
,
and the descent condition checks as: Φ1,2 +
1
8
𝒅0 2
≤ Φ0 = 1.
– Therefore, we set 𝛼 = 𝑡2 =
1
4
, 𝒙1
= 𝒙1,2
=
3
4
,
3
4
with no
constraint violation.
The Active Set Strategy
• To reduce the computational cost of solving the QP subproblem, we
may only include the active constraints in the problem.
• For 𝒙𝑘
∈ Ω, the set of potentially active constraints is defined as:
ℐ𝑘 = 𝑖: 𝑔𝑖
𝑘
> −𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: 𝑗 = 1, … , 𝑝 for some 𝜀.
• For 𝒙𝑘
∉ Ω, let 𝑉𝑘 = max {0; 𝑔𝑖
𝑘
, 𝑖 = 1, . . . , 𝑚; ℎ𝑗
𝑘
, 𝑗 = 1, … , 𝑝};
then, the active constraint set is defined as:
ℐ𝑘 = 𝑖: 𝑔𝑖
𝑘
> 𝑉𝑘 − 𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: ℎ𝑗
𝑘
> 𝑉𝑘 − 𝜀; 𝑗 = 1, … , 𝑝
• The gradients of inactive constraints, i.e., those not in ℐ𝑘, do not
need to be computed
SQP via Newton’s Method
• Consider the following equality constrained problem:
min
𝒙
𝑓(𝒙), subject to ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙
• The Lagrangian function is given as: ℒ 𝒙, 𝒗 = 𝑓 𝒙 + 𝒗𝑇𝒉(𝒙)
• The KKT conditions are: 𝛻ℒ 𝒙, 𝒗 = 𝛻𝑓 𝒙 + 𝑵𝒗 = 𝟎, 𝒉 𝒙 = 𝟎
where 𝑵 = 𝛁𝒉(𝒙) is a Jacobian matrix whose 𝑖th column is 𝛻ℎ𝑖 𝒙
• Using first order Taylor series expansion (with shorthand notation):
𝛻ℒ𝑘+1 = 𝛻ℒ𝑘 + 𝛻2ℒ𝑘Δ𝒙 + 𝑁Δ𝒗
𝒉𝑘+1 = 𝒉𝑘 + 𝑵𝑇Δ𝒙
• By expanding Δ𝒗 = 𝒗𝑘+1
− 𝒗𝑘
, 𝛻ℒ𝑘
= 𝛻𝑓𝑘
+ 𝑵𝒗𝑘
, and assuming
𝒗𝑘 ≅ 𝒗𝑘+1 we obtain: 𝛻2
ℒ𝑘
𝑵
𝑵𝑇
𝟎
Δ𝒙𝑘
𝒗𝑘+1 = −
𝛻𝑓𝑘
𝒉𝑘
which is similar to N-R update, but uses Hessian of the Lagrangian
SQP via Newton’s Method
• Alternately, we consider minimizing the quadratic approximation:
min
Δ𝒙
1
2
Δ𝒙𝑇𝛻2ℒΔ𝒙 + 𝛻𝑓𝑇Δ𝒙
Subject to: ℎ𝑖 𝑥 + 𝒏𝑖
𝑇
Δ𝒙 = 0, 𝑖 = 𝑖, … , 𝑙
• The KKT conditions are: 𝛻𝑓 + 𝛻2
ℒΔ𝒙 + 𝑵𝒗 = 𝟎, 𝒉 + 𝑵Δ𝒙 = 𝟎
• Thus the QP subproblem can be solved via Newton’s method!
𝛻2
ℒ𝑘
𝑵
𝑵𝑇
𝟎
Δ𝒙𝑘
𝒗𝑘+1 = −
𝛻𝑓𝑘
𝒉𝑘
• The Hessian of the Lagrangian can be updated via BFGS method as:
𝑯𝑘+1
= 𝑯𝑘
+ 𝑫𝑘
− 𝑬𝑘
where 𝑫𝑘 =
𝒚𝑘𝒚𝑘𝑇
𝒚𝑘𝑇
Δ𝒙𝑘
, 𝑬𝑘 =
𝒄𝑘𝒄𝑘𝑇
𝒄𝑘𝑇
Δ𝒙𝑘
, 𝒄𝑘 = 𝑯𝑘Δ𝒙𝑘, 𝒚𝑘 = 𝛻ℒ𝑘+1 − ℒ𝑘
Example: SQP with Hessian Update
• Consider the NLP problem: min
𝑥1,𝑥2
𝑓(𝑥1, 𝑥2) = 𝑥1
2
− 𝑥1𝑥2 + 𝑥2
2
subject to 𝑔1: 1 − 𝑥1
2
− 𝑥2
2
≤ 0, 𝑔2: −𝑥1 ≤ 0, 𝑔3: −𝑥2 ≤ 0
Let 𝑥0
= 1, 1 ; then, 𝑓0
= 1, 𝒄 = 1, 1 𝑇
, 𝑔1 1,1 = 𝑔2 1,1 =
𝑔3 1,1 = −1; 𝛻𝑔1
𝑇
= −2, −2 , 𝛻𝑔2
𝑇
= −1,0 , 𝛻𝑔3
𝑇
= [0, −1].
• Using approximate line search, 𝛼 =
1
4
, 𝒙1 =
3
4
,
3
4
.
• For the Hessian update, we have:
𝑓1 = 0.5625, 𝑔1 = −0.125, 𝑔2 = 𝑔3 = −0.75; 𝒄1 = [0.75, 0.75];
𝛻𝑔1
𝑇
= −
3
2
, −
3
2
, 𝛻𝑔2
𝑇
= −1,0 , 𝛻𝑔3
𝑇
= 0, −1 ; Δ𝒙0
= −0.25, −0.25 ;
then, 𝑫0 = 8
1 1
1 1
, 𝑬0 = 8
1 1
1 1
, 𝑯1 = 𝑯0
SQP with Hessian Update
• For the next step, the QP problem is defined as:
min
𝑑1,𝑑2
𝑓 =
3
4
𝑑1 + 𝑑2 +
1
2
𝑑1
2
+ 𝑑2
2
Subject to: −
3
2
𝑑1 + 𝑑2 ≤ 0, −𝑑1 ≤ 0, −𝑑2 ≤ 0
• The application of KKT conditions results in a linear system of
equations, which are solved to obtain:
𝒙𝑇
= 𝑑1, 𝑑2, 𝑢1, 𝑢2, 𝑢3, 𝑠1, 𝑠2, 𝑠3 = 0.188, 0.188, 0, 0, 0,0.125, 0.75, 0.75
Modified SQP Algorithm
Modified SQP Algorithm (Arora, p. 558):
• Initialize: choose 𝒙0, 𝑅0 = 1, 𝑯0 = 𝐼; 𝜀1, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
– Compute 𝑓𝑘
, 𝑔𝑖
𝑘
, ℎ𝑗
𝑘
, 𝒄, 𝑏𝑖, 𝑒𝑗, and 𝑉𝑘. If 𝑘 > 0, compute 𝑯𝑘
– Formulate and solve the modified QP subproblem for search
direction 𝒅𝑘
and the Lagrange multipliers 𝒖𝑘
and 𝒗𝑘
.
– If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘
≤ 𝜀2, stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set 𝒙𝑘+1
← 𝒙𝑘
+ 𝛼𝒅𝑘
, 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1.
SQP Algorithm
%SQP subproblem via Hessian update
% input: xk (current design); Lk (Hessian of Lagrangian
estimate)
%initialize
n=size(xk,1);
if ~exist('Lk','var'), Lk=diag(xk+(~xk)); end
tol=1e-7;
%function and constraint values
fk=f(xk);
dfk=df(xk);
gk=g(xk);
dgk=dg(xk);
%N-R update
A=[Lk dgk; dgk' 0*dgk'*dgk];
b=[-dfk;-gk];
dx=Ab;
dxk=dx(1:n);
lam=dx(n+1:end);
SQP Algorithm
%inactive constraints
idx1=find(lam<0);
if idx1
[dxk,lam]=inactive(lam,A,b,n);
end
%check termination
if abs(dxk)<tol, return, end
%adjust increment for constraint compliance
P=@(xk) f(xk)+lam'*abs(g(xk));
while P(xk+dxk)>P(xk),
dxk=dxk/2;
if abs(dxk)<tol, break, end
end
%Hessian update
dL=@(x) df(x)+dg(x)*lam;
Lk=update(Lk, xk, dxk, dL);
xk=xk+dxk;
disp([xk' f(xk) P(xk)])
SQP Algorithm
%function definitions
function [dxk,lam]=inactive(lam,A,b,n)
idx1=find(lam<0);
lam(idx1)=0;
idx2=find(lam);
v=[1:n,n+idx2];
A=A(v,v); b=b(v);
dx=Ab;
dxk=dx(1:n);
lam(idx2)=dx(n+1:end);
end
function Lk=update(Lk, xk, dxk, dL)
ga=dL(xk+dxk)-dL(xk);
Hx=Lk*dxk;
Dk=ga*ga'/(ga'*dxk);
Ek=Hx*Hx'/(Hx'*dxk);
Lk=Lk+Dk-Ek;
end
Generalized Reduced Gradient
• The GRG method finds the search direction by projecting the
objective function gradient onto the constraint hyperplane.
• The GRG points tangent to the constraint hyperplane, so that
iterative steps try to conform to the constraints.
• The constraints are effectively used to implicitly eliminate variables
and reduce problem dimensions.
Implicit Elimination
• Consider an equality constrained problem in two variables:
Objective: min 𝑓 𝒙 , 𝒙𝑇
= 𝑥1, 𝑥2
Subject to: 𝑔 𝒙 = 0
• The variation in the objective and constraint functions are:
𝑑𝑓 = 𝛻𝑓 𝑇
𝑑𝒙 =
𝜕𝑓
𝜕𝑥1
𝑑𝑥1 +
𝜕𝑓
𝜕𝑥2
𝑑𝑥2
𝑑𝑔 = 𝛻𝑔 𝑇
𝑑𝒙 =
𝜕𝑔
𝜕𝑥1
𝑑𝑥1 +
𝜕𝑔
𝜕𝑥2
𝑑𝑥2 = 0
• Solve for 𝑑𝑥2 = −
𝜕𝑔/𝜕𝑥1
𝜕𝑔/𝜕𝑥2
𝑑𝑥1 and substitute in the objective function:
𝑑𝑓 =
𝜕𝑓
𝜕𝑥1
−
𝜕𝑓
𝜕𝑥2
𝜕𝑔/𝜕𝑥1
𝜕𝑔/𝜕𝑥2
𝑑𝑥1
• Then the reduced gradient of 𝑓 along 𝑥1 is given as:
𝛻𝑓𝑅 =
𝜕𝑓
𝜕𝑥1
−
𝜕𝑓
𝜕𝑥2
𝜕𝑔/𝜕𝑥1
𝜕𝑔/𝜕𝑥2
Implicit Elimination
• Consider a problem in 𝑛 variable with 𝑚 equality constraints:
Objective: min 𝑓 𝒙 , 𝒙𝑇 = 𝑥1, 𝑥2, … , 𝑥𝑛
Subject to: 𝑔𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑚
• We define 𝑚 basic variables in terms of 𝑛 − 𝑚 nonbasic variables;
let 𝒙𝑇
= 𝒚𝑇
, 𝒛𝑇
, where 𝒚 are basic and 𝒛 are nonbasic.
• The gradient vector is partitioned as: 𝛻𝑓𝑇 = 𝛻𝑓 𝒚 𝑇, 𝛻𝑓 𝒛 𝑇 .
• The variations in the objective and constraint functions are:
𝑑𝑓 = 𝛻𝑓 𝒚 𝑇
𝑑𝒚 + 𝛻𝑓 𝒛 𝑇
𝑑𝒛
𝑑𝒈 =
𝜕𝜓
𝜕𝒚
𝑑𝒚 +
𝜕𝜓
𝜕𝒛
𝑑𝒛 = 𝟎
where the matrices of partial derivatives are defined as:
𝜕𝜓
𝜕𝒚 𝑖𝑗
=
𝜕𝑔𝑖
𝜕𝑦𝑗
;
𝜕𝜓
𝜕𝒛 𝑖𝑗
=
𝜕𝑔𝑖
𝜕𝑧𝑗
Generalized Reduced Gradient
• Since
𝜕𝜓
𝜕𝒚
is a square 𝑚 × 𝑚 matrix, we may solve for 𝑑𝒚 as:
𝑑𝒚 = −
𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
, and substitute in 𝑑𝑓 to obtain:
𝑑𝑓 = 𝛻𝑓 𝒛 𝑇
𝑑𝒛 − 𝛻𝑓 𝒚 𝑇 𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
𝑑𝒛
• Then the reduce gradient 𝛻𝑓𝑅 is defined as:
𝛻𝑓𝑅
𝑇
= 𝛻𝑓 𝒛 𝑇
− 𝛻𝑓 𝒚 𝑇 𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
• Next, we choose negative of 𝛻𝑓𝑅
𝑇
as the search direction and
perform a line search to determine step size; then Δ𝒛 = −𝛼𝛻𝑓𝑅,
Δ𝒚 =
𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
Δ𝒛
GRG Algorithm
• Initialize: choose 𝒙0; evaluate objective function and constraints;
convert binding inequality constraints to equality constraints.
• Partition the variables into 𝑚 basic and 𝑛 − 𝑚 nonbasic ones, e.g.,
choose first 𝑚 values, or 𝑚 highest values as basic variables.
• Compute the 𝛻𝑓𝑅 along nonbasic variables. If 𝛻𝑓𝑅 = 0, exit.
• Set Δ𝒛 = −𝛻𝑓𝑅/ 𝛻𝑓𝑅 , Δ𝒚 = −
𝜕𝜓
𝜕𝒚
−1 𝜕𝜓
𝜕𝒛
Δ𝒛.
• Do a line search along Δ𝒙 to obtain α.
• Check feasibility at 𝒙𝑘 + 𝛼Δ𝒙. If necessary, use Newton-Raphson
iterations to adjust Δ𝒚 as: Δ𝒚𝑘+1 = Δ𝒚𝑘 −
𝜕𝜓
𝜕𝒚
−1
𝑔𝑘
• Update: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼Δ𝒙
Generalized Reduced Gradient
• Consider an equality constrained problem
Objective: min 𝑓 𝒙 = 3𝑥1 + 2𝑥2 + 2𝑥1
2
− 𝑥1𝑥2 + 1.5𝑥2
2
Subject to: 𝑔 𝒙 = 𝑥1
2
− 𝑥2 − 1 = 0
• Let 𝒙0 =
−1
0
; then 𝑓0 = −1, 𝛻𝑓0 =
−1
3
, 𝑔0 = 0, 𝛻𝑔0 =
−2
−1
.
• Let 𝒚 = 𝑥2 on the first iteration; then 𝛻𝑓𝑅
𝑇
= −1 − 3
−2
−1
= −7.
• Let Δ𝒛 = 1, then Δ𝒚 =
−2
−1
1 = 2. By doing a line search along
Δ𝒙 =
0.333
0.667
, we obtain 𝒙1
=
−0.350
−0.577
, 𝑓1
= −2.13.
• The optimum is reached in three iterations: 𝒙∗ =
−0.634
−0.598
,
𝑓 𝒙∗ = −2.137.
Generalized Reduced Gradient
• Consider an inequality constrained problem:
Objective: min 𝑓 𝒙 = 𝑥1
2
+ 𝑥2
Subject to: 𝑔1 𝒙 = 𝑥1
2
+ 𝑥2
2
− 9 ≤ 0, 𝑥1 + 𝑥2 − 1 ≤ 0
• Add slack variable to inequality constraints:
𝑔1 𝒙 = 𝑥1
2
+ 𝑥2
2
− 9 + 𝑠1 = 0, 𝑔2 𝒙 = 𝑥1 + 𝑥2 − 1 + 𝑠2 = 0
Then 𝛻𝑓 𝒙 =
2𝑥1
1
; 𝛻𝑔1 𝒙 =
2𝑥1
2𝑥2
; 𝛻𝑔2 𝒙 =
1
1
• Let 𝒙0 =
2.56
−1.56
; then 𝑓0 = 4.99, 𝛻𝑓0 =
5.12
1
, 𝒈0 =
−0.013
0
,
• Since 𝑔2 is binding, add 𝑠2 to variables: 𝛻𝑓0
=
5.12
1
1
, 𝛻𝑔2
0
=
1
1
1
Generalized Reduced Gradient
• Let 𝑦 = 𝑥1, 𝒛 =
𝑥2
𝑠2
; then 𝛻𝑓 𝑦 = 5.12, 𝛻𝑓 𝒛 =
1
0
, 𝛻𝑔2 𝑦 = 1,
𝛻𝑔2 𝒛 =
1
1
, therefore 𝛻𝑓𝑅 𝒛 =
1
0
−
1
1
5.12 =
−4.12
−5.12
• Let Δ𝒛 = −𝛻𝑓𝑅 𝒛 , Δ𝑦 = −[1 1]Δ𝒛 = −9.24; then, Δ𝒙 =
−9.24
4.12
and
𝒔0 = Δ𝒙/ Δ𝒙 . Suppose we limit the maximum step size to 𝛼 ≤ 0.5,
then 𝒙1 = 𝒙0 + 0.5𝒔0 =
2.103
−1.356
with 𝑓 𝑥1 = 𝑓1 = 3.068. There are
no constraint violations, hence first iteration is completed.
• After seven iterations: 𝒙7 =
0.003
−3.0
with 𝑓7 = −3.0
• The optimum is at: 𝒙∗ =
0.0
−3.0
with 𝑓∗ = −3.0
GRG for LP Problems
• Consider an LP problem: min 𝑓(𝒙) = 𝒄𝑇
𝒙
Subject to: 𝑨𝒙 = 𝒃, 𝒙 ≥ 𝟎
• Let 𝒙 be partitioned into 𝑚 basic variables and 𝑛 − 𝑚 nonbasic
variables: 𝒙𝑇
= [𝒚𝑇
, 𝒛𝑇
].
• The objective function is partitioned as: 𝑓 𝒙 = 𝒄𝑦
𝑇
𝒚 + 𝒄𝑧
𝑇
𝒛
• The constraints are partitioned as: 𝑩𝒚 + 𝑵𝒛 = 𝒃, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎.
Then 𝒚 = 𝑩−1𝒃 − 𝑩−1𝑵𝒛
• The objective function in terms of independent variables is:
𝑓 𝒛 = 𝒄𝑦
𝑇
𝑩−1
𝒃𝒛 + (𝒄𝑧
𝑇
− 𝒄𝑦
𝑇
𝑩−1
𝑵)𝒛
• The reduced costs for nonbasic variables are given as:
𝒓𝑐
𝑇
= 𝒄𝑧
𝑇
− 𝒄𝑦
𝑇
𝑩−1
𝑵, or 𝒓𝑐
𝑇
= 𝒄𝑧
𝑇
− 𝝀𝑇
𝑵
GRG for LP Problems
• Using Tableu notation, the reduced costs are computed as:
𝑩 𝑵 𝒃
𝒄𝒚
𝑇 𝒄𝑧
𝑇 0 →
𝑰 𝑩−1𝑵 𝑩−1𝒃
𝟎 𝒓𝑐
𝑇 −𝒄𝒚
𝑇𝑩−1𝒃
• The objective function variation is given as:
𝑑𝑓 = 𝛻𝑓𝒚
𝑇𝑑𝒚 + 𝛻𝑓𝒛
𝑇𝑑𝒛
• The reduced gradient along the constraint surface is given as:
𝛻𝑓𝑅
𝑇
= 𝛻𝒛𝑓𝑇
− 𝛻𝒚𝑓𝑇
𝑩−1
𝑵 = 𝒓𝑐
𝑇
GRG Algorithm for LP Problems
1. Choose the largest 𝑚 components of 𝒙 as basic variables
2. Compute the reduced gradient 𝛻𝑓𝑅
𝑇
= 𝒓𝑐
𝑇
3. Let Δ𝑧𝑖 =
−𝑟𝑖 𝑖𝑓 𝑟𝑖 ≤ 0
−𝑥𝑖𝑟𝑖 𝑖𝑓 𝑟𝑖 > 0
4. If Δ𝒛 = 0, stop; otherwise set Δ𝒚 = 𝑩−1
𝑵Δ𝒛
5. Compute step size: let 𝛼1 = max 𝛼: 𝒚 + Δ𝒚 ≥ 0, 𝒛 + Δ𝒛 ≥ 0 ,
𝛼2 = min 𝑓 𝒙 + Δ𝒙 ≥ 0 , 𝛼 = min{𝛼1, 𝛼2}
6. Update: 𝒙𝑘+1
= 𝒙𝑘
+ 𝛼Δ𝒙
7. If 𝛼2 ≥ 𝛼1, update 𝑩, 𝑵 (use pivoting)
8. Return to 1
View publication stats
View publication stats

Optimum engineering design - Day 5. Clasical optimization methods

  • 1.
  • 2.
    Course Materials • Arora,Introduction to Optimum Design, 3e, Elsevier, (https://www.researchgate.net/publication/273120102_Introductio n_to_Optimum_design) • Parkinson, Optimization Methods for Engineering Design, Brigham Young University (http://apmonitor.com/me575/index.php/Main/BookChapters) • Iqbal, Fundamental Engineering Optimization Methods, BookBoon (https://bookboon.com/en/fundamental-engineering-optimization- methods-ebook)
  • 3.
    Numerical Optimization • Consideran unconstrained NP problem: min 𝒙 𝑓 𝒙 • Use an iterative method to solve the problem: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘𝒅𝑘, where 𝒅𝑘 is a search direction and 𝛼𝑘 is the step size, such that the function value decreases at each step, i.e., 𝑓 𝒙𝑘+1 < 𝑓 𝒙𝑘 • We expect lim 𝑘→∞ 𝒙𝑘 = 𝒙∗ • The general iterative method is a two-step process: – Finding a suitable search direction 𝒅𝑘 along which the function value locally decreases and any constraints are obeyed. – Performing line search along 𝒅𝑘 to find 𝒙𝑘+1 such that 𝑓 𝒙𝑘+1 attains its minimum value.
  • 4.
    The Iterative Method •Iterative algorithm: 1. Initialize: chose 𝒙0 2. Check termination: 𝛻𝑓 𝒙𝑘 ≅ 0 3. Find a suitable search direction 𝒅𝑘 , that obeys the descent condition: 𝛻𝑓 𝒙𝑘 𝑇 𝒅𝑘 < 0 4. Search along 𝒅𝑘 to find where 𝑓 𝒙𝑘+1 attains minimum value (line search problem) 5. Return to step 2
  • 5.
    The Line SearchProblem • Assuming a suitable search direction 𝒅𝑘 has been determined, we seek to determine a step length 𝛼𝑘, that minimizes 𝑓 𝒙𝑘+1 . • Assuming 𝒙𝑘 and 𝒅𝑘 are known, the projected function value along 𝒅𝑘 is expressed as: 𝑓 𝒙𝑘 + 𝛼𝑘𝒅𝑘 = 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 = 𝑓(𝛼) • The line search problem to choose 𝛼 to minimize 𝑓 𝒙𝑘+1 along 𝒅𝑘 is defined as: min 𝛼 𝑓(𝛼) = 𝑓 𝒙𝑘 + α𝒅𝑘 • Assuming that a solution exists, it is found by setting 𝑓′ 𝛼 = 0.
  • 6.
    Example: Quadratic Function •Consider minimizing a quadratic function: 𝑓 𝒙 = 1 2 𝒙𝑇𝑨𝒙 − 𝒃𝑇𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃 • Given a descent direction 𝒅, the line search problem is defined as: min 𝛼 𝑓(𝛼) = 𝒙𝑘 + 𝛼𝒅 𝑇 𝑨 𝒙𝑘 + 𝛼𝒅 − 𝒃𝑇 𝒙𝑘 + 𝛼𝒅 • A solution is found by setting 𝑓′ 𝛼 = 0, where 𝑓′ 𝛼 = 𝒅𝑇𝑨 𝒙𝑘 + 𝛼𝒅 − 𝒅𝑇𝒃 = 0 𝛼 = − 𝒅𝑇 𝑨𝒙𝑘 − 𝒃 𝒅𝑇𝑨𝒅 = − 𝛻𝑓(𝒙𝑘 )𝑇 𝒅 𝒅𝑇𝑨𝒅 • Finally, 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝒅.
  • 7.
    Computer Methods forLine Search Problem • Interval reduction methods – Golden search – Fibonacci search • Approximate search methods – Arjimo’s rule – Quadrature curve fitting
  • 8.
    Interval Reduction Methods •The interval reduction methods find the minimum of a unimodal function in two steps: – Bracketing the minimum to an interval – Reducing the interval to desired accuracy • The bracketing step aims to find a three-point pattern, such that for 𝑥1, 𝑥2, 𝑥3, 𝑓 𝑥1 ≥ 𝑓 𝑥2 < 𝑓 𝑥3 .
  • 9.
    Fibonacci’s Method • TheFibonacci’s method uses Fibonacci numbers to achieve maximum interval reduction in a given number of steps. • The Fibonacci number sequence is generated as: 𝐹0 = 𝐹1 = 1, 𝐹𝑖 = 𝐹𝑖−1 + 𝐹𝑖−2, 𝑖 ≥ 2. • The properties of Fibonacci numbers include: – They achieve the golden ratio 𝜏 = lim 𝑛→∞ 𝐹𝑛−1 𝐹𝑛 = 5−1 2 ≅ 0.618034 – The number of interval reductions 𝑛 required to achieve a desired accuracy 𝜀 (where 1/𝐹𝑛 < 𝜀) is specified in advance. – For given 𝐼1 and 𝑛, 𝐼2 = 𝐹𝑛−1 𝐹𝑛 𝐼1, 𝐼3 = 𝐼1 − 𝐼2, 𝐼4 = 𝐼2 − 𝐼3, etc.
  • 10.
    The Golden SectionMethod • The golden section method uses the golden ratio: 𝜏 = 0.618034. • The golden section algorithm is given as: 1. Initialize: specify 𝑥1, 𝑥4 𝐼1 = 𝑥4 − 𝑥1 , 𝜀, 𝑛: 𝜏𝑛 < 𝜀 𝐼1 2. Compute 𝑥2 = 𝜏𝑥1 + 1 − 𝜏 𝑥4, evaluate 𝑓2 3. For 𝑖 = 1, … , 𝑛 − 1 Compute 𝑥3 = 1 − 𝜏 𝑥1 + 𝜏𝑥4, evaluate 𝑓3; if 𝑓2 < 𝑓3, set 𝑥4 ← 𝑥1, 𝑥1 ← 𝑥3; else set 𝑥1 ← 𝑥2, 𝑥2 ← 𝑥3, 𝑓2 ← 𝑓3
  • 11.
    Approximate Search Methods •Consider the line search problem: min 𝛼 𝑓(𝛼) = 𝑓 𝒙𝑘 + α𝒅𝑘 • Sufficient Descent Condition. The sufficient descent condition guards against 𝒅𝑘 becoming too close to 𝛻𝑓 𝒙𝑘 . The condition is stated as: 𝛻𝑓 𝒙𝑘 𝑇 𝒅𝑘 < −𝑐 𝛻𝑓 𝒙𝑘 2 , 𝑐 > 0 • Sufficient Decrease Condition. The sufficient decrease condition ensures a nontrivial reduction in the function value. The condition is stated as: 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 − 𝑓 𝒙𝑘 ≤ 𝜇 𝛼 𝛻𝑓 𝒙𝑘 𝑇 𝒅𝑘, 0 < 𝜇 < 1 • Curvature Condition. The curvature condition guards against 𝛼 becoming too small. The condition is stated as: 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 𝑇 𝒅𝑘 ≥ 𝑓 𝒙𝑘 + 𝜂 𝛻𝑓 𝒙𝑘 𝑇 𝒅𝑘 , 0 < 𝜇 < 𝜂 < 1
  • 12.
    Approximate Line Search •Strong Wolfe Conditions. The strong Wolfe conditions commonly used by all line search algorithms include: 1. The sufficient decrease condition (Arjimo’s rule): 𝑓 𝛼 ≤ 𝑓 0 + 𝜇𝛼𝑓′ (0), 0 < 𝜇 < 1 2. Strong curvature condition: 𝑓′ 𝛼 ≤ 𝜂 𝑓′ 0 , 0 < 𝜇 ≤ 𝜂 < 1
  • 13.
    Approximate Line Search •The approximate line search includes two steps: – Bracketing the minimum – Estimating the minimum • Bracketing the Minimum. In the bracketing step we seek an interval 𝛼, 𝛼 such that 𝑓′ 𝛼 < 0 and 𝑓′ 𝛼 > 0. – Since for any descent direction, 𝑓′ 0 < 0, therefore, 𝛼 = 0 serves as a lower bound on 𝛼. To find an upper bound, gradually increase 𝛼, e.g., 𝛼 = 1,2, …, – Assume that for some 𝛼𝑖 > 0, we get 𝑓′ 𝛼𝑖 < 0 and 𝑓′ 𝛼𝑖+1 > 0; then, 𝛼𝑖 serves as an upper bound.
  • 14.
    Approximate Line Search •Estimating the Minimum. Once the minimum has been bracketed to a small interval, a quadratic or cubic polynomial approximation is used to find the minimizer. • If the polynomial minimizer 𝛼 satisfies strong Wolfe’s conditions for the desired 𝜇 and 𝜂 values (say 𝜇 = 0.2, 𝜂 = 0.5), it is taken as the function minimizer. • Otherwise, 𝛼 is used to replace one of the 𝛼 or 𝛼, and the polynomial approximation step repeated.
  • 15.
    Quadratic Curve Fitting •Assuming that the interval 𝛼𝑙, 𝛼𝑢 contains the minimum of a unimodal function, 𝑓 𝛼 , its quadratic approximation, given as: 𝑞 𝛼 = 𝑎0 + 𝑎1𝛼 + 𝑎2𝛼2 , is obtained using three points 𝛼𝑙, 𝛼𝑚, 𝛼𝑢 , where the mid-point may be used for 𝛼𝑚 The quadratic coefficients {𝑎0, 𝑎1, 𝑎2} are solved as: 𝑎2 = 1 𝛼𝑢−𝛼𝑚 𝑓 𝛼𝑢 −𝑓 𝛼𝑙 𝛼𝑢−𝛼𝑙 − 𝑓 𝛼𝑚 −𝑓 𝛼𝑙 𝛼𝑚−𝛼𝑙 𝑎1 = 1 𝛼𝑚−𝛼𝑙 𝑓 𝛼𝑚 − 𝑓 𝛼𝑙 − 𝑎2(𝛼𝑙 + 𝛼𝑚) 𝑎0 = 𝑓(𝛼𝑙) − 𝑎1𝛼𝑙 − 𝑎2𝛼𝑙 2 Then, the minimum is given as: 𝛼𝑚𝑖𝑛 = − 𝑎1 2𝑎2
  • 16.
    Example: Approximate Search •Let 𝑓 𝛼 = 𝑒−𝛼 + 𝛼2, 𝑓′ 𝛼 = 2𝛼 − 𝑒−𝛼, 𝑓 0 = 1, 𝑓′ 0 = −1. Let 𝜇 = 0.2, and try 𝛼 = 0.1, 0.2, …, to bracket the minimum. • From the sufficient decrease condition, the minimum is bracketed in the interval: [0, 0.5] • Using quadratic approximation, the minimum is found as: 𝑥∗ = 0.3531 The exact solution is given as: 𝛼𝑚𝑖𝑛 = 0.3517 • The Matlab commands are: Define the function: f=@(x) x.*x+exp(-x); mu=0.2; al=0:.1:1;
  • 17.
    Example: Approximate Search •Bracketing the minimum: f1=feval(f,al) 1.0000 0.9148 0.8587 0.8308 0.8303 0.8565 0.9088 0.9866 1.0893 1.2166 1.3679 >> f2=f(0)-mu*al 1.0000 0.9800 0.9600 0.9400 0.9200 0.9000 0.8800 0.8600 0.8400 0.8200 0.8000 >> idx=find(f1<=f2) • Quadratic approximation to find the minimum: al=0; am=0.25; au=0.5; a2 = ((f(au)-f(al))/(au-al)-(f(am)-f(al))/(am-al))/(au-am); a1 = (f(am)-f(al))/(am-al)-a2*(al+am); xmin = -a1/a2/2 % 0.3531
  • 18.
    Computer Methods forFinding the Search Direction • Gradient based methods – Steepest descent method – Conjugate gradient method – Quasi Newton methods • Hessian based methods – Newton’s method – Trust region methods
  • 19.
    Steepest Descent Method •The steepest descent method determines the search direction as: 𝒅𝑘 = −𝛻𝑓(𝒙𝑘), • The update rule is given as: 𝒙𝑘+1 = 𝒙𝑘 − 𝛼𝑘 ∙ 𝛻𝑓(𝒙𝑘 ) where 𝛼𝑘 is determined by minimizing 𝑓(𝒙𝑘+1) along 𝒅𝑘 • Example: quadratic function 𝑓 𝒙 = 1 2 𝒙𝑇 𝑨𝒙 − 𝒃𝑇 𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃 Then, 𝒙𝑘+1 = 𝒙𝑘 − 𝛼 ∙ 𝛻𝑓 𝒙𝑘 ; 𝛼 = 𝛻 𝑓 𝒙𝑘 𝑇 𝛻 𝑓 𝒙𝑘 𝛻 𝑓 𝒙𝑘 𝑇 𝐀𝛻 𝑓 𝒙𝑘 Define 𝒓𝑘 = 𝒃 − 𝑨𝒙𝑘 Then, 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘𝒓𝑘; 𝛼𝑘 = 𝒓𝑘 𝑇 𝒓𝑘 𝒓𝑘 𝑇𝐴𝒓𝑘
  • 20.
    Steepest Descent Algorithm •Initialize: choose 𝒙0 • For 𝑘 = 0,1,2, … – Compute 𝛻𝑓(𝒙𝑘 ) – Check convergence: if 𝛻𝑓(𝒙𝑘 ) < 𝜖, stop. – Set 𝒅𝑘 = −𝛻𝑓(𝒙𝑘) – Line search problem: Find min 𝛼≥0 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 – Set 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝒅𝑘 .
  • 21.
    Example: Steepest Descent •Consider min 𝒙 𝑓 𝒙 = 0.1𝑥1 2 + 𝑥2 2 , 𝛻𝑓 𝒙 = 0.2𝑥1 2𝑥2 , 𝛻2𝑓 𝑥 = 0.1 0 0 1 ; let 𝒙0 = 5 1 , then, 𝑓 𝒙0 = 3.5, 𝑑1 = −𝛻𝑓 𝒙0 = −1 −2 , 𝛼 = 0.61 𝒙1 = 4.39 −0.22 , 𝑓 𝒙1 = 1.98 Continuing..
  • 22.
    Example: Steepest Descent •MATLAB code: H=[.2 0;0 2]; f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H; x=[5;1]; xall=x'; for i=1:10 d=-df(x); a=d'*d/(d'*H*d); x=x+a*d; xall=[xall;x']; end plot(xall(:,1),xall(:,2)), grid axis([-1 5 -1 5]), axis equal
  • 23.
    Steepest Descent Method •The steepest descent method becomes slow close to the optimum • The method progresses in a zigzag fashion, since 𝑑 𝑑𝛼 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 = 𝛻 𝑓 𝒙𝑘+1 𝑇 𝒅𝑘 = −𝛻 𝑓 𝒙𝑘+1 𝑇 𝛻 𝑓 𝒙𝑘 = 0 • The method has linear convergence with rate constant 𝐶 = 𝑓 𝒙𝑘+1 −𝑓 𝒙∗ 𝑓 𝒙𝑘 −𝑓 𝒙∗ ≤ 𝑐𝑜𝑛𝑑 𝑨 −1 𝑐𝑜𝑛𝑑 𝑨 +1 2
  • 24.
    Preconditioning • Preconditioning (scaling)can be used to reduce the condition number of the Hessian matrix and hence aid convergence • Consider 𝑓 𝒙 = 0.1𝑥1 2 + 𝑥2 2 = 𝒙𝑇 𝑨𝒙, where 𝑨 = 𝑑𝑖𝑎𝑔(0.1, 1) • Define a linear transformation: 𝒙 = 𝑷𝒚, where 𝑷 = 𝑑𝑖𝑎𝑔( 10, 1); then, 𝑓 𝒙 = 𝒚𝑇 𝑷𝑇 𝑨𝑷𝒚 = 𝒚𝑇 𝒚 • Since 𝑐𝑜𝑛𝑑 𝑰 = 1, the steepest descent method in the case of a quadratic function converges in a single iteration
  • 25.
    Conjugate Gradient Method •For any square matrix 𝑨, the set of 𝑨-conjugate vectors is defined by: 𝒅𝑖𝑇 𝑨𝒅𝑗 = 0, 𝑖 ≠ 𝑗 • Let 𝒈𝑘 = 𝛻 𝑓 𝒙𝑘 denote the gradient; then, starting from 𝒅0 = −𝒈0, a set of 𝑨-conjugate directions is generated as: 𝒅0 = −𝒈0; 𝒅𝑘+1 = −𝒈𝑘+1 + 𝛽𝑘𝒅𝑘 𝑘 ≥ 0, … where 𝛽𝑘 = 𝒈𝑘+1 𝑇 𝑨𝒅𝑘 𝒅𝑘𝑇 𝑨𝒅𝑘 There are multiple ways to generate conjugate directions • Using {𝒅0 , 𝒅2 , … , 𝒅𝑛−1 } as search directions, a quadratic function is minimized in 𝑛 steps.
  • 26.
    Conjugate Directions Method •The parameter 𝛽𝑘 can be computed in different ways: – By substituting 𝑨𝒅𝑘 = 1 𝛼𝑘 (𝒈𝑘+1 − 𝒈𝑘), we obtain: 𝛽𝑘 = 𝒈𝑘+1 𝑇 (𝒈𝑘+1−𝒈𝑘) 𝒅𝑘𝑇 (𝒈𝑘+1−𝒈𝑘) (the Hestenes-Stiefel formula) – In the case of exact line search, 𝑔𝑘+1 𝑇 𝒅𝑘 = 0; then 𝛽𝑘 = 𝒈𝑘+1 𝑇 (𝒈𝑘+1−𝒈𝑘) 𝒈𝑘 𝑇𝒈𝑘 (the Polak-Ribiere formula) – Also, for exact line search 𝒈𝑘+1 𝑇 𝒈𝑘 = 𝛽𝑘−1(𝒈𝑘 + 𝛼𝑘𝑨𝒅𝑘 )𝑇 𝒅𝑘−1 = 0, resulting in 𝛽𝑘 = 𝒈𝑘+1 𝑇 𝒈𝑘+1 𝒈𝑘 𝑇𝒈𝑘 (the Fletcher-Reeves formula) Other versions of 𝛽𝑘 have also been proposed.
  • 27.
    Example: Conjugate GradientMethod • Consider min 𝒙 𝑓 𝒙 = 0.1𝑥1 2 + 𝑥2 2 , 𝛻𝑓 𝒙 = 0.2𝑥1 2𝑥2 , 𝛻2𝑓 𝑥 = 0.1 0 0 1 ; let 𝒙0 = 5 1 , then 𝑓 𝒙0 = 3.5, 𝑑0 = − 𝛻𝑓 𝒙0 = −1 −2 , 𝛼 = 0.61 𝒙1 = 4.39 −0.22 , 𝑓 𝒙1 = 1.98 𝛽0 = 0.19 𝑑1 = −0.535 0.027 , 𝛼 = 8.2 𝒙1 = 0 0
  • 28.
    Example: Conjugate GradientMethod • MATLAB code H=[.2 0;0 2]; f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H; x=[5;1]; n=2; xall=zeros(n+1,n); xall(1,:)=x'; d=-df(x); a=d'*d/(d'*H*d); x=x+a*d; xall(2,:)=x'; for i=1:size(x,1)-1 b=df(x)'*H*d/(d'*H*d); d=-df(x)+b*d; r=-df(x); a=r'*r/(d'*H*d); x=x+a*d; xall(i+2,:)=x'; end plot(xall(:,1),xall(:,2)), grid axis([-1 5 -1 5]), axis equal
  • 29.
    Conjugate Gradient Algorithm •Conjugate-Gradient Algorithm (Griva, Nash & Sofer, p454): • Initialize: Choose 𝒙0 = 𝟎, 𝒓0 = 𝒃, 𝒅(−1) = 0, 𝛽0 = 0. • For 𝑖 = 0,1, … – Check convergence: if 𝒓𝑖 < 𝜖, stop. – If 𝑖 > 0, set 𝛽𝑖 = 𝒓𝑖 𝑇 𝒓𝑖 𝒓𝑖−1 𝑇 𝒓𝑖−1 – Set 𝒅𝑖 = 𝒓𝑖 + 𝛽𝑖𝒅𝑖−1 ; 𝛼𝑖 = 𝒓𝑖 𝑇 𝒓𝑖 𝒅𝑖𝑇 𝑨𝒅𝑖 ; 𝒙𝑖+1 = 𝒙𝑖 + 𝛼𝑖𝒅𝑖 ; 𝒓𝑖+1 = 𝒓𝑖 − 𝛼𝑖𝑨𝒅𝑖.
  • 30.
    Conjugate Gradient Method •Assume that an update that includes steps 𝛼𝑖 along 𝑛 conjugate vectors 𝒅𝑖 is assembled as: 𝑦 = 𝛼𝑖𝒅𝑖 𝑛 𝑖=1 . • Then, for a quadratic function, the minimization problem is decomposed into a set of one-dimensional problems, i.e., min 𝑦 𝑓(𝒚) ≡ min 𝛼𝑖 1 2 𝛼𝑖 2 𝒅𝑖𝑇 𝑨𝒅𝑖 − 𝛼𝑖𝒃𝑇 𝒅𝑖 𝑛 𝑖=1 • By setting the derivative with respect to 𝛼𝑖 equal to zero, i.e., 𝛼𝑖𝒅𝑖𝑇 𝑨𝒅𝑖 − 𝒃𝑇 𝒅𝑖 = 0, we obtain: 𝛼𝑖 = 𝒃𝑇𝒅𝑖 𝒅𝑖𝑇 𝑨𝒅𝑖 . • This shows that the CG algorithm iteratively determines the conjugate directions 𝒅𝑖 and their coefficients 𝛼𝑖.
  • 31.
    CG Rate ofConvergence • Conjugate gradient methods achieve superlinear convergence: – In the case of quadratic functions, the minimum is reached exactly in 𝑛 iterations. – For general nonlinear functions, convergence in 2𝑛 iterations is to be expected. • Nonlinear CG methods typically have the lowest per iteration computational costs of all gradient methods.
  • 32.
    Newton’s Method • Considerminimizing the second order approximation of 𝑓 𝒙 : min 𝒅 𝑓 𝒙𝑘 + Δ𝒙 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘 𝑇Δ𝒙 + 1 2 Δ𝒙𝑇𝑯𝑘Δ𝒙 • Apply FONC: 𝑯𝑘𝒅 + 𝒈𝑘 = 𝟎, where 𝒈𝑘 = 𝛻𝑓 𝒙𝑘 Then, assuming that 𝑯𝑘 = 𝛻2 𝑓 𝒙𝑘 stays positive definite, the Newton’s update rule is derived as: 𝒙𝑘+1 = 𝒙𝑘 − 𝑯𝑘 −1 𝒈𝑘 • Note: – The convergence of the Newton’s method is dependent on 𝑯𝑘 staying positive definite. – A step size may be included in the Newton’s method, i.e., 𝒙𝑘+1 = 𝒙𝑘 − 𝛼𝑘𝑯𝑘 −1 𝒈𝑘
  • 33.
    Marquardt Modification toNewton’s Method • To ensure the positive definite condition on 𝑯𝑘, Marquardt proposed the following modification to Newton’s method: 𝑯𝑘 + 𝜆𝑰 𝒅 = −𝒈𝑘 where 𝜆 is selected to ensure that the Hessian is positive definite. • Since 𝑯𝑘 + 𝜆𝑰 is also symmetric, the resulting system of linear equations can be solved for 𝒅 as: 𝑳𝑫𝑳𝑇 𝒅 = −𝛻𝑓 𝒙𝑘
  • 34.
    Newton’s Algorithm Newton’s Method(Griva, Nash, & Sofer, p. 373): 1. Initialize: Choose 𝒙0, specify 𝜖 2. For 𝑘 = 0,1, … 3. Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜖, stop 4. Factorize modified Hessian as 𝛻2 𝑓 𝒙𝑘 + 𝑬 = 𝑳𝑫𝑳𝑇 and solve 𝑳𝑫𝑳𝑇 𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅 5. Perform line search to determine 𝛼𝑘 and update the solution estimate as 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘
  • 35.
    Rate of Convergence •Newton’s method achieves quadratic rate of convergence in the close neighborhood of the optimal point, and superlinear convergence otherwise. • The main drawback of the Newton’s method is its computational cost: the Hessian matrix needs to be computed at every step, and a linear system of equations needs to be solved to obtain the update. • Due to the high computational and storage costs, classic Newton’s method is rarely used in practice.
  • 36.
    Quasi Newton’s Methods •The quasi-Newton methods derive from a generalization of secant method, that approximates the second derivative as: 𝑓′′ (𝑥𝑘) ≅ 𝑓′ 𝑥𝑘 −𝑓′(𝑥𝑘−1) 𝑥𝑘−𝑥𝑘−1 • In the multi-dimensional case, the secant condition is generalized as: 𝑯𝑘 𝒙𝑘 − 𝒙𝑘−1 = 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1 • Define 𝑭𝑘 = 𝑯𝑘 −1 , then 𝒙𝑘 − 𝒙𝑘−1 = 𝑭𝑘 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1 • The quasi-Newton methods iteratively update 𝑯𝑘 or 𝑭𝑘 as: – Direct update: 𝑯𝑘+1 = 𝑯𝑘 + ∆𝑯𝑘, 𝑯0 = 𝑰 – Inverse update: 𝑭𝑘+1 = 𝑭𝑘 + ∆𝑭𝑘, 𝑭 = 𝑯−1 , 𝑭0 = 𝑰
  • 37.
    Quasi-Newton Methods • Quasi-Newtonupdate: Let 𝒔𝑘 = 𝒙𝑘+1 − 𝒙𝑘, 𝒚𝑘 = 𝛻𝑓 𝒙𝑘+1 − 𝛻𝑓 𝒙𝑘 ; then, – The DFP (Davison-Fletcher-Powell) formula for inverse Hessian update is given as: 𝑭𝑘+1 = 𝑭𝑘 − 𝑭𝑘𝒚𝑘 𝑭𝑘𝒚𝑘 𝑇 𝒚𝑘 𝑇𝑭𝑘𝒚𝑘 + 𝒔𝑘𝒔𝑘 𝑇 𝒚𝑘 𝑇𝒔𝑘 – The BGFS (Broyden, Fletcher, Goldfarb, Shanno) formula for direct Hessian update is given as: 𝑯𝑘+1 = 𝑯𝑘 − 𝑯𝑘𝒔𝑘 𝑯𝑘𝒔𝑘 𝑇 𝒔𝑘 𝑇𝑯𝑘𝒔𝑘 + 𝒚𝑘𝒚𝑘 𝑇 𝒚𝑘 𝑇𝒔𝑘
  • 38.
    Quasi-Newton Algorithm The Quasi-NewtonAlgorithm (Griva, Nash & Sofer, p.415): • Initialize: Choose 𝒙0, 𝑯0 (e.g., 𝑯0 = 𝑰), specify 𝜀 • For 𝑘 = 0,1, … – Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop – Solve 𝑯𝑘𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅𝑘 (alternatively, 𝒅 = −𝑭𝑘𝛻𝑓 𝒙𝑘 ) – Solve min 𝛼 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 for 𝛼𝑘, and update the current estimate: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘 – Compute 𝒔𝑘, 𝒚𝑘, and update 𝑯𝑘 (or 𝑭𝑘 as applicable)
  • 39.
    Example: Quasi-Newton Method •Consider the problem: min 𝑥1,𝑥2 𝑓(𝑥1, 𝑥2) = 2𝑥1 2 − 𝑥1𝑥2 + 𝑥2 2 , where 𝑯 = 4 − 1 −1 2 , 𝛻𝑓 = 𝑯 𝑥1 𝑥2 . Let 𝒙0 = 1 1 , 𝑓0 = 4, 𝑯0 = 𝑰, 𝑭0 = 𝑰; Choose 𝒅0 = −𝛻𝑓 𝑥0 = −3 −1 ; then 𝑓 𝛼 = 2 1 − 3𝛼 2 + 1 − 𝛼 2 − (1 − 3𝛼)(1 − 𝛼), Using 𝑓′ 𝛼 = 0 → 𝛼 = 5 16 → 𝒙1 = 0.625 0.688 , 𝑓1 = 0.875; then 𝒚1 = −3.44 0.313 , 𝑭1 = 1.193 0.065 0.065 1.022 , 𝑯1 = 0.381 −0.206 −0.206 0.9313 , and using either update formula 𝒅1 = 0.4375 −1.313 ; for the next step, 𝑓 𝛼 = 5.36𝛼2 − 3.83𝛼 + 0.875 → 𝛼 = −0.3572, 𝒙2 = 0.2188 0.2188 .
  • 40.
    Example: Quasi-Newton Method •For quadratic function, convergence is achieved in two iterations.
  • 41.
    Trust-Region Methods • Thetrust-region methods locally employ a quadratic approximation 𝑞𝑘 𝒙𝑘 to the nonlinear objective function. • The approximation is valid in the neighborhood of 𝒙𝑘 defined by Ω𝑘 = 𝒙: 𝚪(𝒙 − 𝒙𝑘) ≤ ∆𝑘 , where 𝚪 is a scaling parameter. • The method aims to find a 𝒙𝑘+1 ∈ Ω𝑘, that satisfies the sufficient decrease condition in 𝑓(𝒙). • The quality of the quadratic approximation is estimated by the reliability index: 𝛾𝑘 = 𝑓(𝒙𝑘)−𝑓(𝒙𝑘+1) 𝑞𝑘 𝒙𝑘 −𝑞𝑘 𝒙𝑘+1 . If this ratio is close to unity, the trust region may be expanded in the next iteration.
  • 42.
    Trust-Region Methods • Ateach iteration 𝑘, trust-region algorithm solves a constrained optimization sub-problem involving quadratic approximation: min 𝒅 𝑞𝑘 𝒅 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘 𝑇 𝒅 + 1 2 𝒅𝑇 𝛻2 𝑓 𝒙𝑘 𝒅 Subject to: 𝒅 ≤ ∆𝑘 Lagrangian function: ℒ 𝑥, 𝜆 = 𝑓 𝒙𝑘 + 𝜆 𝒅 − ∆𝑘 FONC: 𝛻2𝑓 𝒙𝑘 + 𝜆𝑰 𝒅𝑘 = −𝛻𝑓 𝒙𝑘 , 𝜆 𝒅 − ∆𝑘 = 0 • The resulting search direction 𝒅𝑘 is given as: 𝒅𝑘 = 𝒅𝑘(𝜆). – For large ∆𝑘 and a positive-definite 𝛻2𝑓 𝒙𝑘 , the Lagrange multiplier 𝜆 → 0, and 𝒅𝑘 (𝜆) reduces to the Newton’s direction. – For ∆𝑘→ 0, 𝜆 → ∞, and 𝒅𝑘 (𝜆) aligns with the steepest-descent direction.
  • 43.
    Trust-Region Algorithm • Trust-RegionAlgorithm (Griva, Nash & Sofer, p.392): • Initialize: choose 𝒙0, ∆0; specify 𝜀, 0 < 𝜇 < 𝜂 < 1 (e.g., 𝜇 = 1 4 ; 𝜂 = 3 4 ) • For 𝑘 = 0,1, … – Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop – Solve the subproblem: min 𝒅 𝑞𝑘 𝒅 subject to 𝒅 ≤ ∆𝑘 – Compute 𝛾𝑘, • if 𝛾𝑘 < 𝜇, set 𝒙𝑘+1 = 𝒙𝑘, ∆𝑘+1= 1 2 ∆𝑘 • else if 𝛾𝑘 < 𝜂, set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅𝑘 , ∆𝑘+1= ∆𝑘 • else set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅𝑘 , ∆𝑘+1= 2∆𝑘
  • 44.
    Computer Methods forConstrained Problems • Penalty and Barrier methods • Augmented Lagrangian method (AL) • Sequential linear programming (SLP) • Sequential quadratic programming (SQP)
  • 45.
    Penalty and BarrierMethods • Consider the general optimization problem: min 𝒙 𝑓 𝒙 Subject to ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝; 𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚; 𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈, 𝑖 = 1, … , 𝑛. • Define a composite function to be used for constraint compliance: Φ 𝒙, 𝑟 = 𝑓 𝒙 + 𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓 where 𝑃 defines a loss function, and 𝒓 is a vector of weights (penalty parameters)
  • 46.
    Penalty and BarrierMethods • Penalty Function Method. A penalty function method employs a quadratic loss function and iterates through the infeasible region 𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓 = 𝑟 𝑔𝑖 + 𝒙 2 𝑖 + ℎ𝑖 𝒙 2 𝑖 𝑔𝑖 + 𝒙 = max 0, 𝑔𝑖 𝒙 , 𝑟 > 0 • Barrier Function Method. A barrier method employs a log barrier function and iterates through the feasible region 𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓 = 1 𝑟 log −𝑔𝑖 𝑥 𝑖 • For both penalty and barrier methods, as 𝑟 → ∞, 𝒙(𝑟) → 𝒙∗
  • 47.
    The Augmented LagrangianMethod • Consider an equality-constrained problem: min 𝒙 𝑓 𝒙 Subject to: ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙 • Define the augmented Lagrangian (AL) as: 𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑣𝑗ℎ𝑗 𝒙 + 1 2 𝑟ℎ𝑗 2 𝒙 𝑗 where the additional term defines an exterior penalty function with 𝑟 as the penalty parameter. • For inequality constrained problems, the AL may be defined as: 𝒫 𝒙, 𝒖, 𝑟 = 𝑓 𝒙 + 𝑢𝑖𝑔𝑖 𝒙 + 1 2 𝑟𝑔𝑖 2 𝒙 , if 𝑔𝑗 + 𝑢𝑗 𝑟 ≥ 0 − 1 2𝑟 𝑢𝑖 2 , if 𝑔𝑗 + 𝑢𝑗 𝑟 < 0 𝑖 where a large 𝑟 makes the Hessian of AL positive definite at 𝒙.
  • 48.
    The Augmented LagrangianMethod • The dual function for the AL is defined as: 𝜓 𝒗 = min 𝒙 𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑣𝑗ℎ𝑗 𝒙 + 1 2 𝑟 ℎ𝑗 𝒙 2 𝑗 • The resulting dual optimization problem is: max 𝒗 𝜓 𝒗 • The dual problem may be solved via Newton’s method as: 𝒗𝑘+1 = 𝒗𝑘 − 𝑑2𝜓 𝑑𝑣𝑖𝑑𝑣𝑗 −1 𝒉 where 𝑑2𝜓 𝑑𝑣𝑖𝑑𝑣𝑗 = −𝛻ℎ𝑖 𝑇 𝛻2𝒫 −1𝛻ℎ𝑗 • For large 𝒓, the Newton’s update may be approximated as: 𝑣𝑗 𝑘+1 = 𝑣𝑗 𝑘 + 𝑟 𝑗ℎ𝑗, 𝑗 = 1, … , 𝑙
  • 49.
    Example: Augmented Lagrangian •Maximize the volume of a cylindrical tank subject to surface area constraint: max 𝑑,𝑙 𝑓 𝑑, 𝑙 = 𝜋𝑑2𝑙 4 , subject to ℎ: 𝜋𝑑2 4 + 𝜋𝑑𝑙 − 𝐴0 = 0 • We can normalize the problem as: min 𝑑,𝑙 𝑓 𝑑, 𝑙 = −𝑑2 𝑙, subject to ℎ: 𝑑2 + 4𝑑𝑙 − 1 = 0 • The solution to the primal problem is obtained as: Lagrangian function: ℒ 𝑑, 𝑙, 𝜆 = −𝑑2 𝑙 + 𝜆(𝑑2 + 4𝑑𝑙 − 1) FONC: 𝜆 𝑑 + 2𝑙 − 𝑑𝑙 = 0, 𝜆𝑑 𝑑 + 4 − 𝑑2 = 0, 𝑑2 + 4𝑑𝑙 − 1 = 0 Optimal solution: 𝑑∗ = 2𝑙∗ = 4𝜆∗ = 1 3 .
  • 50.
    Example: Augmented Lagrangian •Alternatively, define the Augmented Lagrangian function as: 𝒫 𝑑, 𝑙, 𝜆, 𝑟 = −𝑑2𝑙 + 𝜆 𝑑2 + 4𝑑𝑙 − 1 + 1 2 𝑟 𝑑2 + 4𝑑𝑙 − 1 2 • Define the dual function: 𝜓 𝜆 = min 𝑑,𝑙 𝒫 𝑑, 𝑙, 𝜆, 𝑟 • Define dual optimization problem: max 𝑑,𝑙 𝜓 𝜆 • Solution to the dual problem: 𝜆∗ = 𝜆𝑚𝑎𝑥 = 0.144 • Solution to the design variables: 𝑑∗ = 2𝑙∗ = 0.577
  • 51.
    Sequential Linear Programming •Consider the general optimization problem: min 𝒙 𝑓 𝒙 Subject to ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝; 𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚; 𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈, 𝑖 = 1, … , 𝑛. • Let 𝒙𝑘 denote the current estimate of the design variables, and let 𝒅 denote the change in variables; define the first order expansion of the objective and constraint functions in the neighborhood of 𝒙𝑘 𝑓 𝒙𝑘 + 𝒅 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘 𝑇 𝒅 𝑔𝑖 𝒙𝑘 + 𝒅 = 𝑔𝑖 𝒙𝑘 + 𝛻𝑔𝑖 𝒙𝑘 𝑇 𝒅, 𝑖 = 1, … , 𝑚 ℎ𝑗 𝒙𝑘 + 𝒅 = ℎ𝑗 𝒙𝑘 + 𝛻ℎ𝑗 𝒙𝑘 𝑇 𝒅, 𝑗 = 1, … , 𝑙
  • 52.
    Sequential Linear Programming •Let 𝑓𝑘 = 𝑓 𝒙𝑘 , 𝑔𝑖 𝑘 = 𝑔𝑖 𝒙𝑘 , ℎ𝑗 𝑘 = ℎ𝑗 𝒙𝑘 ; 𝑏𝑖 = −𝑔𝑖 𝑘 , 𝑒𝑗 = −ℎ𝑗 𝑘 , 𝒄 = 𝛻𝑓 𝒙𝑘 , 𝒂𝑖 = 𝛻𝑔𝑖 𝒙𝑘 , 𝒏𝑗 = 𝛻ℎ𝑗 𝒙𝑘 , 𝑨 = 𝒂1, 𝒂2, … , 𝒂𝑚 , 𝑵 = 𝒏1, 𝒏2, … , 𝒏𝑙 . • Using first order expansion, define an LP subprogram for the current iteration of the NLP problem: min 𝒅 𝑓 = 𝒄𝑇𝒅 Subject to: 𝑨𝑇 𝒅 ≤ 𝒃, 𝑵𝑇𝒅 = 𝒆 where 𝑓 represents first-order change in the cost function, and the columns of 𝑨 and 𝑵 matrices represent, respectively, the gradients of inequality and equality constraints. • The resulting LP problem can be solved via the Simplex method.
  • 53.
    Sequential Linear Programming •We may note that: – Since both positive and negative changes to design variables 𝒙𝑘 are allowed, the variables 𝑑𝑖 are unrestricted in sign – The SLP method requires additional constraints of the form: − ∆𝑖𝑙 𝑘 ≤ 𝑑𝑖 𝑘 ≤ ∆𝑖𝑢 𝑘 (termed move limits) to bind the LP solution. These limits represent maximum allowable change in 𝑑𝑖 in the current iteration and are selected as percentage of current value. – Move limits serve dual purpose of binding the solution and obviating the need for line search. – Overly restrictive move limits tend to make the SLP problem infeasible.
  • 54.
    SLP Example • Considerthe convex NLP problem: min 𝑥1,𝑥2 𝑓(𝑥1, 𝑥2) = 𝑥1 2 − 𝑥1𝑥2 + 𝑥2 2 Subject to: 1 − 𝑥1 2 − 𝑥2 2 ≤ 0; −𝑥1 ≤ 0, −𝑥2 ≤ 0 The problem has a single minimum at: 𝒙∗ = 1 2 , 1 2 • The objective and constraint gradients are: 𝛻𝑓𝑇 = 2𝑥1 − 𝑥2, 2𝑥2 − 𝑥1 , 𝛻𝑔1 𝑇 = −2𝑥1, −2𝑥2 , 𝛻𝑔2 𝑇 = −1,0 , 𝛻𝑔3 𝑇 = [0, −1]. • Let 𝒙0 = 1, 1 , then 𝑓0 = 1, 𝒄𝑇 = 1 1 , 𝑏1 = 𝑏2 = 𝑏3 = 1; 𝒂1 𝑇 = −2 − 2 , 𝒂2 𝑇 = −1 0 , 𝒂3 𝑇 = 0 − 1
  • 55.
    SLP Example • Definethe LP subproblem at the current step as: min 𝑑1,𝑑2 𝑓 𝑥1, 𝑥2 = 𝑑1 + 𝑑2 Subject to: −2 −2 −1 0 0 −1 𝑑1 𝑑2 ≤ 1 1 1 • In the absence of move limits, the LP problem is unbounded; using 50% move limits, the SLP update is given as: 𝒅∗ = − 1 2 , − 1 2 𝑇 , 𝒙1 = 1 2 , 1 2 𝑇 , with resulting constraint violation: 𝑔𝑖 = 1 2 , 0, 0 ; smaller move limits may be used to reduce the constraint violation.
  • 56.
    Sequential Linear Programming SLPAlgorithm (Arora, p. 508): • Initialize: choose 𝒙0, 𝜀1 > 0, 𝜀2 > 0. • For 𝑘 = 0,1,2, … – Choose move limits ∆𝑖𝑙 𝑘 , ∆𝑖𝑢 𝑘 as some fraction of current design 𝒙𝑘 – Compute 𝑓𝑘 , 𝒄, 𝑔𝑖 𝑘 , ℎ𝑗 𝑘 , 𝑏𝑖, 𝑒𝑗 – Formulate and solve the LP subproblem for 𝒅𝑘 – If 𝑔𝑖 ≤ 𝜀1; 𝑖 = 1, … , 𝑚; ℎ𝑗 ≤ 𝜀1; 𝑖 = 1, … , 𝑝; and 𝒅𝑘 ≤ 𝜀2, stop – Substitute 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘, 𝑘 ← 𝑘 + 1.
  • 57.
    Sequential Quadratic Programming •Sequential quadratic programming (SQP) uses a quadratic approximation to the objective function at every step of iteration. • The SQP problem is defined as: min 𝒅 𝑓 = 𝒄𝑇 𝒅 + 1 2 𝒅𝑇 𝒅 Subject to, 𝑨𝑇𝒅 ≤ 𝒃, 𝑵𝑇𝒅 = 𝒆 • SQP does not require move limits, alleviating the shortcomings of the SLP method. • The SQP problem is convex; hence, it has a single global minimum. • SQP can be solved via Simplex based linear complementarity problem (LCP) framework.
  • 58.
    Sequential Quadratic Programming •The Lagrangian function for the SQP problem is defined as: ℒ 𝒅, 𝒖, 𝒗 = 𝒄𝑇𝒅 + 1 2 𝒅𝑇𝒅 + 𝒖𝑇 𝑨𝑇𝒅 − 𝒃 + 𝒔 + 𝒗𝑇(𝑵𝑇𝒅 − 𝒆) • Then the KKT conditions are: Optimality: 𝛁ℒ = 𝒄 + 𝒅 + 𝑨𝒖 + 𝑵𝒗 = 𝟎, Feasibility: 𝑨𝑇 𝒅 + 𝒔 = 𝒃, 𝑵𝑇 𝒅 = 𝒆 , Complementarity: 𝒖𝑇 𝒔 = 𝟎, Non-negativity: 𝒖 ≥ 𝟎, 𝒔 ≥ 𝟎
  • 59.
    Sequential Quadratic Programming •Since 𝒗 is unrestricted in sign, let 𝒗 = 𝒚 − 𝒛, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎, and the KKT conditions are compactly written as: 𝑰 𝑨 𝑨𝑇 𝟎 𝑵𝑇 𝟎 𝟎 𝑰 𝟎 𝑵 −𝑵 𝟎 𝟎 𝟎 𝟎 𝒅 𝒖 𝒔 𝒚 𝒛 = −𝒄 𝒃 𝒆 , or 𝑷𝑿 = 𝑸 • The complementary slackness conditions, 𝒖𝑇𝒔 = 𝟎, translate as: 𝑿𝑖𝑿𝑖+𝑚 = 0, 𝑖 = 𝑛 + 1, ⋯ , 𝑛 + 𝑚. • The resulting problem can be solved via Simplex method using LCP framework.
  • 60.
    Descent Function Approach •In SQP methods, the line search step is based on minimization of a descent function that penalizes constraint violations, i.e., Φ 𝒙 = 𝑓 𝒙 + 𝑅𝑉 𝒙 where 𝑓 𝒙 is the cost function, 𝑉 𝒙 represents current maximum constraint violation, and 𝑅 > 0 is a penalty parameter. • The descent function value at the current iteration is computed as: Φ𝑘 = 𝑓𝑘 + 𝑅𝑉𝑘, 𝑅 = max 𝑅𝑘, 𝑟𝑘 where 𝑟𝑘 = 𝑢𝑖 𝑘 𝑚 𝑖=1 + 𝑣𝑗 𝑘 𝑝 𝑗=1 𝑉𝑘 = max {0; 𝑔𝑖, 𝑖 = 1, . . . , 𝑚; ℎ𝑗 , 𝑗 = 1, … , 𝑝} • The line search subproblem is defined as: min 𝛼 Φ 𝛼 = Φ 𝒙𝑘 + 𝛼𝒅𝑘
  • 61.
    SQP Algorithm SQP Algorithm(Arora, p. 526): • Initialize: choose 𝒙0, 𝑅0 = 1, 𝜀1 > 0, 𝜀2 > 0. • For 𝑘 = 0,1,2, … – Compute 𝑓𝑘 , 𝑔𝑖 𝑘 , ℎ𝑗 𝑘 , 𝒄, 𝑏𝑖, 𝑒𝑗; compute 𝑉𝑘. – Formulate and solve the QP subproblem to obtain 𝒅𝑘 and the Lagrange multipliers 𝒖𝑘 and 𝒗𝑘 . – If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘 ≤ 𝜀2, stop. – Compute 𝑅; formulate and solve line search subproblem for 𝛼 – Set 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘 , 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1 • The above algorithm is convergent, i.e., Φ 𝒙𝑘 ≤ Φ 𝒙0 ; 𝒙𝑘 converges to the KKT point 𝒙∗
  • 62.
    SQP with ApproximateLine Search • The SQP algorithm can use with approximate line search as follows: Let 𝑡𝑗, 𝑗 = 0,1, … denote a trial step size, 𝒙𝑘+1,𝑗 denote the trial design point, 𝑓𝑘+1,𝑗 = 𝑓( 𝒙𝑘+1,𝑗 ) denote the function value at the trial solution, and Φ𝑘+1,𝑗 = 𝑓𝑘+1,𝑗 + 𝑅𝑉𝑘+1,𝑗 is the penalty function at the trial solution. • The trial solution is required to satisfy the descent condition: Φ𝑘+1,𝑗 + 𝑡𝑗𝛾 𝒅𝑘 2 ≤ Φ𝑘,𝑗, 0 < 𝛾 < 1 where a common choice is: 𝛾 = 1 2 , 𝜇 = 1 2 , 𝑡𝑗 = 𝜇𝑗 , 𝑗 = 0,1,2, …. • The above descent condition ensures that the constraint violation decreases at each step of the method.
  • 63.
    SQP Example • Considerthe NLP problem: min 𝑥1,𝑥2 𝑓(𝑥1, 𝑥2) = 𝑥1 2 − 𝑥1𝑥2 + 𝑥2 2 subject to 𝑔1: 1 − 𝑥1 2 − 𝑥2 2 ≤ 0, 𝑔2: −𝑥1 ≤ 0, 𝑔3: −𝑥2 ≤ 0 Then 𝛻𝑓𝑇 = 2𝑥1 − 𝑥2, 2𝑥2 − 𝑥1 , 𝛻𝑔1 𝑇 = −2𝑥1, −2𝑥2 , 𝛻𝑔2 𝑇 = −1,0 , 𝛻𝑔3 𝑇 = [0, −1]. Let 𝑥0 = 1, 1 ; then, 𝑓0 = 1, 𝒄 = 1, 1 𝑇, 𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = −1. • Since all constraints are initially inactive, 𝑉0 = 0, and 𝒅 = −𝒄 = −1, −1 𝑇; the line search problem is: min 𝛼 Φ 𝛼 = 1 − 𝛼 2; • By setting Φ′ 𝛼 = 0, we get the analytical solution: 𝛼 = 1; thus 𝑥1 = 0, 0 , which results in a large constraint violation
  • 64.
    SQP Example • Alternatively,we may use approximate line search as follows: – Let 𝑅0 = 10, 𝛾 = 𝜇 = 1 2 ; let 𝑡0 = 1, then 𝒙1,0 = 0,0 , 𝑓1,0 = 0, 𝑉1,0 = 1, Φ1,0 = 10; 𝒅0 2 = 2, and the descent condition Φ1,0 + 1 2 𝒅0 2 ≤ Φ0 = 1 is not met at the trial point. – Next, for 𝑡1 = 1 2 , we get: 𝒙1,1 = 1 2 , 1 2 , 𝑓1,1 = 1 4 , V1,1 = 1 2 , Φ1,1 = 5 1 4 , and the descent condition fails again; – Next, for 𝑡2 = 1 4 , we get: 𝒙1,2 = 3 4 , 3 4 , V1,2 = 0, 𝑓1,2 = Φ1,2 = 9 16 , and the descent condition checks as: Φ1,2 + 1 8 𝒅0 2 ≤ Φ0 = 1. – Therefore, we set 𝛼 = 𝑡2 = 1 4 , 𝒙1 = 𝒙1,2 = 3 4 , 3 4 with no constraint violation.
  • 65.
    The Active SetStrategy • To reduce the computational cost of solving the QP subproblem, we may only include the active constraints in the problem. • For 𝒙𝑘 ∈ Ω, the set of potentially active constraints is defined as: ℐ𝑘 = 𝑖: 𝑔𝑖 𝑘 > −𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: 𝑗 = 1, … , 𝑝 for some 𝜀. • For 𝒙𝑘 ∉ Ω, let 𝑉𝑘 = max {0; 𝑔𝑖 𝑘 , 𝑖 = 1, . . . , 𝑚; ℎ𝑗 𝑘 , 𝑗 = 1, … , 𝑝}; then, the active constraint set is defined as: ℐ𝑘 = 𝑖: 𝑔𝑖 𝑘 > 𝑉𝑘 − 𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: ℎ𝑗 𝑘 > 𝑉𝑘 − 𝜀; 𝑗 = 1, … , 𝑝 • The gradients of inactive constraints, i.e., those not in ℐ𝑘, do not need to be computed
  • 66.
    SQP via Newton’sMethod • Consider the following equality constrained problem: min 𝒙 𝑓(𝒙), subject to ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙 • The Lagrangian function is given as: ℒ 𝒙, 𝒗 = 𝑓 𝒙 + 𝒗𝑇𝒉(𝒙) • The KKT conditions are: 𝛻ℒ 𝒙, 𝒗 = 𝛻𝑓 𝒙 + 𝑵𝒗 = 𝟎, 𝒉 𝒙 = 𝟎 where 𝑵 = 𝛁𝒉(𝒙) is a Jacobian matrix whose 𝑖th column is 𝛻ℎ𝑖 𝒙 • Using first order Taylor series expansion (with shorthand notation): 𝛻ℒ𝑘+1 = 𝛻ℒ𝑘 + 𝛻2ℒ𝑘Δ𝒙 + 𝑁Δ𝒗 𝒉𝑘+1 = 𝒉𝑘 + 𝑵𝑇Δ𝒙 • By expanding Δ𝒗 = 𝒗𝑘+1 − 𝒗𝑘 , 𝛻ℒ𝑘 = 𝛻𝑓𝑘 + 𝑵𝒗𝑘 , and assuming 𝒗𝑘 ≅ 𝒗𝑘+1 we obtain: 𝛻2 ℒ𝑘 𝑵 𝑵𝑇 𝟎 Δ𝒙𝑘 𝒗𝑘+1 = − 𝛻𝑓𝑘 𝒉𝑘 which is similar to N-R update, but uses Hessian of the Lagrangian
  • 67.
    SQP via Newton’sMethod • Alternately, we consider minimizing the quadratic approximation: min Δ𝒙 1 2 Δ𝒙𝑇𝛻2ℒΔ𝒙 + 𝛻𝑓𝑇Δ𝒙 Subject to: ℎ𝑖 𝑥 + 𝒏𝑖 𝑇 Δ𝒙 = 0, 𝑖 = 𝑖, … , 𝑙 • The KKT conditions are: 𝛻𝑓 + 𝛻2 ℒΔ𝒙 + 𝑵𝒗 = 𝟎, 𝒉 + 𝑵Δ𝒙 = 𝟎 • Thus the QP subproblem can be solved via Newton’s method! 𝛻2 ℒ𝑘 𝑵 𝑵𝑇 𝟎 Δ𝒙𝑘 𝒗𝑘+1 = − 𝛻𝑓𝑘 𝒉𝑘 • The Hessian of the Lagrangian can be updated via BFGS method as: 𝑯𝑘+1 = 𝑯𝑘 + 𝑫𝑘 − 𝑬𝑘 where 𝑫𝑘 = 𝒚𝑘𝒚𝑘𝑇 𝒚𝑘𝑇 Δ𝒙𝑘 , 𝑬𝑘 = 𝒄𝑘𝒄𝑘𝑇 𝒄𝑘𝑇 Δ𝒙𝑘 , 𝒄𝑘 = 𝑯𝑘Δ𝒙𝑘, 𝒚𝑘 = 𝛻ℒ𝑘+1 − ℒ𝑘
  • 68.
    Example: SQP withHessian Update • Consider the NLP problem: min 𝑥1,𝑥2 𝑓(𝑥1, 𝑥2) = 𝑥1 2 − 𝑥1𝑥2 + 𝑥2 2 subject to 𝑔1: 1 − 𝑥1 2 − 𝑥2 2 ≤ 0, 𝑔2: −𝑥1 ≤ 0, 𝑔3: −𝑥2 ≤ 0 Let 𝑥0 = 1, 1 ; then, 𝑓0 = 1, 𝒄 = 1, 1 𝑇 , 𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = −1; 𝛻𝑔1 𝑇 = −2, −2 , 𝛻𝑔2 𝑇 = −1,0 , 𝛻𝑔3 𝑇 = [0, −1]. • Using approximate line search, 𝛼 = 1 4 , 𝒙1 = 3 4 , 3 4 . • For the Hessian update, we have: 𝑓1 = 0.5625, 𝑔1 = −0.125, 𝑔2 = 𝑔3 = −0.75; 𝒄1 = [0.75, 0.75]; 𝛻𝑔1 𝑇 = − 3 2 , − 3 2 , 𝛻𝑔2 𝑇 = −1,0 , 𝛻𝑔3 𝑇 = 0, −1 ; Δ𝒙0 = −0.25, −0.25 ; then, 𝑫0 = 8 1 1 1 1 , 𝑬0 = 8 1 1 1 1 , 𝑯1 = 𝑯0
  • 69.
    SQP with HessianUpdate • For the next step, the QP problem is defined as: min 𝑑1,𝑑2 𝑓 = 3 4 𝑑1 + 𝑑2 + 1 2 𝑑1 2 + 𝑑2 2 Subject to: − 3 2 𝑑1 + 𝑑2 ≤ 0, −𝑑1 ≤ 0, −𝑑2 ≤ 0 • The application of KKT conditions results in a linear system of equations, which are solved to obtain: 𝒙𝑇 = 𝑑1, 𝑑2, 𝑢1, 𝑢2, 𝑢3, 𝑠1, 𝑠2, 𝑠3 = 0.188, 0.188, 0, 0, 0,0.125, 0.75, 0.75
  • 70.
    Modified SQP Algorithm ModifiedSQP Algorithm (Arora, p. 558): • Initialize: choose 𝒙0, 𝑅0 = 1, 𝑯0 = 𝐼; 𝜀1, 𝜀2 > 0. • For 𝑘 = 0,1,2, … – Compute 𝑓𝑘 , 𝑔𝑖 𝑘 , ℎ𝑗 𝑘 , 𝒄, 𝑏𝑖, 𝑒𝑗, and 𝑉𝑘. If 𝑘 > 0, compute 𝑯𝑘 – Formulate and solve the modified QP subproblem for search direction 𝒅𝑘 and the Lagrange multipliers 𝒖𝑘 and 𝒗𝑘 . – If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘 ≤ 𝜀2, stop. – Compute 𝑅; formulate and solve line search subproblem for 𝛼 – Set 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘 , 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1.
  • 71.
    SQP Algorithm %SQP subproblemvia Hessian update % input: xk (current design); Lk (Hessian of Lagrangian estimate) %initialize n=size(xk,1); if ~exist('Lk','var'), Lk=diag(xk+(~xk)); end tol=1e-7; %function and constraint values fk=f(xk); dfk=df(xk); gk=g(xk); dgk=dg(xk); %N-R update A=[Lk dgk; dgk' 0*dgk'*dgk]; b=[-dfk;-gk]; dx=Ab; dxk=dx(1:n); lam=dx(n+1:end);
  • 72.
    SQP Algorithm %inactive constraints idx1=find(lam<0); ifidx1 [dxk,lam]=inactive(lam,A,b,n); end %check termination if abs(dxk)<tol, return, end %adjust increment for constraint compliance P=@(xk) f(xk)+lam'*abs(g(xk)); while P(xk+dxk)>P(xk), dxk=dxk/2; if abs(dxk)<tol, break, end end %Hessian update dL=@(x) df(x)+dg(x)*lam; Lk=update(Lk, xk, dxk, dL); xk=xk+dxk; disp([xk' f(xk) P(xk)])
  • 73.
    SQP Algorithm %function definitions function[dxk,lam]=inactive(lam,A,b,n) idx1=find(lam<0); lam(idx1)=0; idx2=find(lam); v=[1:n,n+idx2]; A=A(v,v); b=b(v); dx=Ab; dxk=dx(1:n); lam(idx2)=dx(n+1:end); end function Lk=update(Lk, xk, dxk, dL) ga=dL(xk+dxk)-dL(xk); Hx=Lk*dxk; Dk=ga*ga'/(ga'*dxk); Ek=Hx*Hx'/(Hx'*dxk); Lk=Lk+Dk-Ek; end
  • 74.
    Generalized Reduced Gradient •The GRG method finds the search direction by projecting the objective function gradient onto the constraint hyperplane. • The GRG points tangent to the constraint hyperplane, so that iterative steps try to conform to the constraints. • The constraints are effectively used to implicitly eliminate variables and reduce problem dimensions.
  • 75.
    Implicit Elimination • Consideran equality constrained problem in two variables: Objective: min 𝑓 𝒙 , 𝒙𝑇 = 𝑥1, 𝑥2 Subject to: 𝑔 𝒙 = 0 • The variation in the objective and constraint functions are: 𝑑𝑓 = 𝛻𝑓 𝑇 𝑑𝒙 = 𝜕𝑓 𝜕𝑥1 𝑑𝑥1 + 𝜕𝑓 𝜕𝑥2 𝑑𝑥2 𝑑𝑔 = 𝛻𝑔 𝑇 𝑑𝒙 = 𝜕𝑔 𝜕𝑥1 𝑑𝑥1 + 𝜕𝑔 𝜕𝑥2 𝑑𝑥2 = 0 • Solve for 𝑑𝑥2 = − 𝜕𝑔/𝜕𝑥1 𝜕𝑔/𝜕𝑥2 𝑑𝑥1 and substitute in the objective function: 𝑑𝑓 = 𝜕𝑓 𝜕𝑥1 − 𝜕𝑓 𝜕𝑥2 𝜕𝑔/𝜕𝑥1 𝜕𝑔/𝜕𝑥2 𝑑𝑥1 • Then the reduced gradient of 𝑓 along 𝑥1 is given as: 𝛻𝑓𝑅 = 𝜕𝑓 𝜕𝑥1 − 𝜕𝑓 𝜕𝑥2 𝜕𝑔/𝜕𝑥1 𝜕𝑔/𝜕𝑥2
  • 76.
    Implicit Elimination • Considera problem in 𝑛 variable with 𝑚 equality constraints: Objective: min 𝑓 𝒙 , 𝒙𝑇 = 𝑥1, 𝑥2, … , 𝑥𝑛 Subject to: 𝑔𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑚 • We define 𝑚 basic variables in terms of 𝑛 − 𝑚 nonbasic variables; let 𝒙𝑇 = 𝒚𝑇 , 𝒛𝑇 , where 𝒚 are basic and 𝒛 are nonbasic. • The gradient vector is partitioned as: 𝛻𝑓𝑇 = 𝛻𝑓 𝒚 𝑇, 𝛻𝑓 𝒛 𝑇 . • The variations in the objective and constraint functions are: 𝑑𝑓 = 𝛻𝑓 𝒚 𝑇 𝑑𝒚 + 𝛻𝑓 𝒛 𝑇 𝑑𝒛 𝑑𝒈 = 𝜕𝜓 𝜕𝒚 𝑑𝒚 + 𝜕𝜓 𝜕𝒛 𝑑𝒛 = 𝟎 where the matrices of partial derivatives are defined as: 𝜕𝜓 𝜕𝒚 𝑖𝑗 = 𝜕𝑔𝑖 𝜕𝑦𝑗 ; 𝜕𝜓 𝜕𝒛 𝑖𝑗 = 𝜕𝑔𝑖 𝜕𝑧𝑗
  • 77.
    Generalized Reduced Gradient •Since 𝜕𝜓 𝜕𝒚 is a square 𝑚 × 𝑚 matrix, we may solve for 𝑑𝒚 as: 𝑑𝒚 = − 𝜕𝜓 𝜕𝒚 −1 𝜕𝜓 𝜕𝒛 , and substitute in 𝑑𝑓 to obtain: 𝑑𝑓 = 𝛻𝑓 𝒛 𝑇 𝑑𝒛 − 𝛻𝑓 𝒚 𝑇 𝜕𝜓 𝜕𝒚 −1 𝜕𝜓 𝜕𝒛 𝑑𝒛 • Then the reduce gradient 𝛻𝑓𝑅 is defined as: 𝛻𝑓𝑅 𝑇 = 𝛻𝑓 𝒛 𝑇 − 𝛻𝑓 𝒚 𝑇 𝜕𝜓 𝜕𝒚 −1 𝜕𝜓 𝜕𝒛 • Next, we choose negative of 𝛻𝑓𝑅 𝑇 as the search direction and perform a line search to determine step size; then Δ𝒛 = −𝛼𝛻𝑓𝑅, Δ𝒚 = 𝜕𝜓 𝜕𝒚 −1 𝜕𝜓 𝜕𝒛 Δ𝒛
  • 78.
    GRG Algorithm • Initialize:choose 𝒙0; evaluate objective function and constraints; convert binding inequality constraints to equality constraints. • Partition the variables into 𝑚 basic and 𝑛 − 𝑚 nonbasic ones, e.g., choose first 𝑚 values, or 𝑚 highest values as basic variables. • Compute the 𝛻𝑓𝑅 along nonbasic variables. If 𝛻𝑓𝑅 = 0, exit. • Set Δ𝒛 = −𝛻𝑓𝑅/ 𝛻𝑓𝑅 , Δ𝒚 = − 𝜕𝜓 𝜕𝒚 −1 𝜕𝜓 𝜕𝒛 Δ𝒛. • Do a line search along Δ𝒙 to obtain α. • Check feasibility at 𝒙𝑘 + 𝛼Δ𝒙. If necessary, use Newton-Raphson iterations to adjust Δ𝒚 as: Δ𝒚𝑘+1 = Δ𝒚𝑘 − 𝜕𝜓 𝜕𝒚 −1 𝑔𝑘 • Update: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼Δ𝒙
  • 79.
    Generalized Reduced Gradient •Consider an equality constrained problem Objective: min 𝑓 𝒙 = 3𝑥1 + 2𝑥2 + 2𝑥1 2 − 𝑥1𝑥2 + 1.5𝑥2 2 Subject to: 𝑔 𝒙 = 𝑥1 2 − 𝑥2 − 1 = 0 • Let 𝒙0 = −1 0 ; then 𝑓0 = −1, 𝛻𝑓0 = −1 3 , 𝑔0 = 0, 𝛻𝑔0 = −2 −1 . • Let 𝒚 = 𝑥2 on the first iteration; then 𝛻𝑓𝑅 𝑇 = −1 − 3 −2 −1 = −7. • Let Δ𝒛 = 1, then Δ𝒚 = −2 −1 1 = 2. By doing a line search along Δ𝒙 = 0.333 0.667 , we obtain 𝒙1 = −0.350 −0.577 , 𝑓1 = −2.13. • The optimum is reached in three iterations: 𝒙∗ = −0.634 −0.598 , 𝑓 𝒙∗ = −2.137.
  • 80.
    Generalized Reduced Gradient •Consider an inequality constrained problem: Objective: min 𝑓 𝒙 = 𝑥1 2 + 𝑥2 Subject to: 𝑔1 𝒙 = 𝑥1 2 + 𝑥2 2 − 9 ≤ 0, 𝑥1 + 𝑥2 − 1 ≤ 0 • Add slack variable to inequality constraints: 𝑔1 𝒙 = 𝑥1 2 + 𝑥2 2 − 9 + 𝑠1 = 0, 𝑔2 𝒙 = 𝑥1 + 𝑥2 − 1 + 𝑠2 = 0 Then 𝛻𝑓 𝒙 = 2𝑥1 1 ; 𝛻𝑔1 𝒙 = 2𝑥1 2𝑥2 ; 𝛻𝑔2 𝒙 = 1 1 • Let 𝒙0 = 2.56 −1.56 ; then 𝑓0 = 4.99, 𝛻𝑓0 = 5.12 1 , 𝒈0 = −0.013 0 , • Since 𝑔2 is binding, add 𝑠2 to variables: 𝛻𝑓0 = 5.12 1 1 , 𝛻𝑔2 0 = 1 1 1
  • 81.
    Generalized Reduced Gradient •Let 𝑦 = 𝑥1, 𝒛 = 𝑥2 𝑠2 ; then 𝛻𝑓 𝑦 = 5.12, 𝛻𝑓 𝒛 = 1 0 , 𝛻𝑔2 𝑦 = 1, 𝛻𝑔2 𝒛 = 1 1 , therefore 𝛻𝑓𝑅 𝒛 = 1 0 − 1 1 5.12 = −4.12 −5.12 • Let Δ𝒛 = −𝛻𝑓𝑅 𝒛 , Δ𝑦 = −[1 1]Δ𝒛 = −9.24; then, Δ𝒙 = −9.24 4.12 and 𝒔0 = Δ𝒙/ Δ𝒙 . Suppose we limit the maximum step size to 𝛼 ≤ 0.5, then 𝒙1 = 𝒙0 + 0.5𝒔0 = 2.103 −1.356 with 𝑓 𝑥1 = 𝑓1 = 3.068. There are no constraint violations, hence first iteration is completed. • After seven iterations: 𝒙7 = 0.003 −3.0 with 𝑓7 = −3.0 • The optimum is at: 𝒙∗ = 0.0 −3.0 with 𝑓∗ = −3.0
  • 82.
    GRG for LPProblems • Consider an LP problem: min 𝑓(𝒙) = 𝒄𝑇 𝒙 Subject to: 𝑨𝒙 = 𝒃, 𝒙 ≥ 𝟎 • Let 𝒙 be partitioned into 𝑚 basic variables and 𝑛 − 𝑚 nonbasic variables: 𝒙𝑇 = [𝒚𝑇 , 𝒛𝑇 ]. • The objective function is partitioned as: 𝑓 𝒙 = 𝒄𝑦 𝑇 𝒚 + 𝒄𝑧 𝑇 𝒛 • The constraints are partitioned as: 𝑩𝒚 + 𝑵𝒛 = 𝒃, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎. Then 𝒚 = 𝑩−1𝒃 − 𝑩−1𝑵𝒛 • The objective function in terms of independent variables is: 𝑓 𝒛 = 𝒄𝑦 𝑇 𝑩−1 𝒃𝒛 + (𝒄𝑧 𝑇 − 𝒄𝑦 𝑇 𝑩−1 𝑵)𝒛 • The reduced costs for nonbasic variables are given as: 𝒓𝑐 𝑇 = 𝒄𝑧 𝑇 − 𝒄𝑦 𝑇 𝑩−1 𝑵, or 𝒓𝑐 𝑇 = 𝒄𝑧 𝑇 − 𝝀𝑇 𝑵
  • 83.
    GRG for LPProblems • Using Tableu notation, the reduced costs are computed as: 𝑩 𝑵 𝒃 𝒄𝒚 𝑇 𝒄𝑧 𝑇 0 → 𝑰 𝑩−1𝑵 𝑩−1𝒃 𝟎 𝒓𝑐 𝑇 −𝒄𝒚 𝑇𝑩−1𝒃 • The objective function variation is given as: 𝑑𝑓 = 𝛻𝑓𝒚 𝑇𝑑𝒚 + 𝛻𝑓𝒛 𝑇𝑑𝒛 • The reduced gradient along the constraint surface is given as: 𝛻𝑓𝑅 𝑇 = 𝛻𝒛𝑓𝑇 − 𝛻𝒚𝑓𝑇 𝑩−1 𝑵 = 𝒓𝑐 𝑇
  • 84.
    GRG Algorithm forLP Problems 1. Choose the largest 𝑚 components of 𝒙 as basic variables 2. Compute the reduced gradient 𝛻𝑓𝑅 𝑇 = 𝒓𝑐 𝑇 3. Let Δ𝑧𝑖 = −𝑟𝑖 𝑖𝑓 𝑟𝑖 ≤ 0 −𝑥𝑖𝑟𝑖 𝑖𝑓 𝑟𝑖 > 0 4. If Δ𝒛 = 0, stop; otherwise set Δ𝒚 = 𝑩−1 𝑵Δ𝒛 5. Compute step size: let 𝛼1 = max 𝛼: 𝒚 + Δ𝒚 ≥ 0, 𝒛 + Δ𝒛 ≥ 0 , 𝛼2 = min 𝑓 𝒙 + Δ𝒙 ≥ 0 , 𝛼 = min{𝛼1, 𝛼2} 6. Update: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼Δ𝒙 7. If 𝛼2 ≥ 𝛼1, update 𝑩, 𝑵 (use pivoting) 8. Return to 1 View publication stats View publication stats