最优化方法 - 最优性条件
最优化方法 - 最优性条件
约束极值问题的最优性条件
约束极值问题
\[\begin{array}{cl} \min & f(x), \; x \in \mathbb{R}^n \\ \text{s.t.} & g_i(x) \ge 0, i = 1, \cdots, m, \\ & h_i(x) = 0, j = 1, \cdots, l. \end{array}\]
其中\(g_i(x) \ge 0\)称为不等式约束,\(h_j(x) = 0\)称为等式约束。
集合\(S = \{x \mid g_i(x) \ge 0, i = 1, \cdots, m; \; h_j(x) = 0, j = 1, \cdots, l\}\)称为可行集或可行域。
可行方向与下降方向
下降方向
若\(\exists \delta \gt 0\), \(\forall \lambda \in (0, \delta)\), \(f(\overline{x} + \lambda d) \lt f(\overline{x})\),则称\(d\)为\(f(x)\)在\(\overline{x}\)处的下降方向。
若\(f(x)\)可微,\(\nabla f(\overline{x})d \lt 0\),则\(d\)为\(f(x)\)在\(\overline{x}\)处的下降方向。
可行方向
设\(S \subset \mathbb{R}^n\),\(\overline{x} \in \text{cl} \, S\),\(d \neq 0\),若\(\exists \delta \gt 0\), \(\forall \lambda \in (0, \delta)\), \(\overline{x} + \lambda d \in S\),则称\(d\)为\(S\)在\(\overline{x}\)处的可行方向。
\(D = \{d \mid d \neq 0, x \in \text{cl} S, \exists \delta \gt 0, \forall \lambda \in (0, \delta), \overline{x} + \lambda d \in S \}\)称为在\(\overline{x}\)处的可行方向锥。
不等式约束问题的一阶最优性条件
\[\begin{array}{cl} \min & f(x), \; x \in \mathbb{R}^n \\ \text{s.t.} & g_i(x) \ge 0, i = 1, \cdots, m. \end{array}\]
Fritz John条件
设\(\overline{x} \in S\), \(I = \{ i \mid g_i(\overline{x}) = 0 \}\), \(f\), \(g_i (i \in I)\)在\(\overline{x}\)处可微,\(g_i (i \in I)\)在\(\overline{x}\)处连续。
若\(\overline{x}\)是局部最优解,则存在不全为\(0\)的非负数\(w_0, w_i (i \in I)\),使得
\[w_0 \nabla f(\overline{x}) - \sum _{i \in I} w_i \nabla g_i(\overline{x}) = 0\]
\(\overline{x}\)称为Fritz John点。
Karush-Kuhn-Tucker(KKT)条件
设\(\overline{x} \in S\), \(I = \{ i \mid g_i(\overline{x}) = 0 \}\), \(f\), \(g_i (i \in I)\)在\(\overline{x}\)处可微,\(g_i (i \in I)\)在\(\overline{x}\)处连续,\(\{ \nabla g_i(\overline{x}) \mid i \in I \}\)线性无关。
若\(\overline{x}\)是局部最优解,则存在非负数\(w_i (i \in I)\),使得
\[\nabla f(\overline{x}) - \sum _{i \in I} w_i \nabla g_i(\overline{x}) = 0\]
\(\overline{x}\)称为KKT点。
若\(g_i (i \notin I)\)在\(\overline{x}\)可微,则KKT条件等价
\[\begin{cases} \nabla f(\overline{x}) - \sum _{i \in I} w_i \nabla g_i(\overline{x}) = 0, \\ w_i g_i(\overline{x}) = 0, i = 1, \cdots, m, \\ w_i \ge 0, i = 1, \cdots, m. \end{cases}\]
\(w_i g_i(\overline{x}) = 0\)称为互补松弛条件。
一般约束问题的一阶最优性条件
\[\begin{array}{cl} \min & f(x), \; x \in \mathbb{R}^n \\ \text{s.t.} & g_i(x) \ge 0, i = 1, \cdots, m, \\ & h_i(x) = 0, j = 1, \cdots, l. \end{array}\]
Fritz John条件
设\(\overline{x} \in S\), \(I = \{ i \mid g_i(\overline{x}) = 0 \}\), \(f\), \(g_i (i \in I)\)在\(\overline{x}\)处可微,\(g_i (i \in I)\)在\(\overline{x}\)处连续,\(h_j (j = 1, \cdots, l)\)在点\(\overline{x}\)连续可微。
若\(\overline{x}\)是局部最优解,则存在不全为\(0\)的非负数\(w_0, w_i (i \in I)\)和\(v_j (j = 1, \cdots, l)\),使得
\[w_0 \nabla f(\overline{x}) - \sum _{i \in I} w_i \nabla g_i(\overline{x}) - \sum _{j = 1}^l v_j \nabla h_j(\overline{x}) = 0\]
\(\overline{x}\)称为Fritz John点。
Karush-Kuhn-Tucker(KKT)条件
设\(\overline{x} \in S\), \(I = \{ i \mid g_i(\overline{x}) = 0 \}\), \(f\), \(g_i (i \in I)\)在\(\overline{x}\)处可微,\(g_i (i \in I)\)在\(\overline{x}\)处连续,\(h_j (j = 1, \cdots, l)\)在点\(\overline{x}\)连续可微,\(\{ \nabla g_i(\overline{x}), \nabla h_j(\overline{x}) \mid i \in I, j = 1, \cdots, l \}\)线性无关。
若\(\overline{x}\)是局部最优解,则存在非负数\(w_i (i \in I)\)和\(v_j (j = 1, \cdots, l)\),使得
\[\nabla f(\overline{x}) - \sum _{i \in I} w_i \nabla g_i(\overline{x}) - \sum _{j = 1}^l v_j \nabla h_j(\overline{x}) = 0\]
\(\overline{x}\)称为KKT点。
若\(g_i (i \notin I)\)在\(\overline{x}\)可微,则KKT条件等价
\[\begin{cases} \nabla f(\overline{x}) - \sum _{i \in I} w_i \nabla g_i(\overline{x}) - \sum _{j = 1}^l v_j \nabla h_j(\overline{x}) = 0, \\ w_i g_i(\overline{x}) = 0, i = 1, \cdots, m, \\ w_i \ge 0, i = 1, \cdots, m. \end{cases}\]
\(w_i g_i(\overline{x}) = 0\)称为互补松弛条件。Lagrange函数
\[L(x, w, v) = f(x) - \sum_{i = 1}^m w_i g_i(x) - \sum_{j = 1}^l v_j h_j(x)\]
若\(\overline{x}\)是局部最优解,则存在Lagrange乘子\(\overline{w} \ge 0\)和\(\overline{v}\),使得
\[\nabla_x L(\overline{x}, \overline{w}, \overline{v}) = 0\]
一般情形的一阶必要条件(KKT必要条件)可表示为
\[\begin{cases} \nabla_x L(x, w, v) = 0 \\ w_i g_i (x) = 0, & i = 1, 2, \cdots, m \\ g_i (x) \ge 0, & i = 1, 2, \cdots, m \\ h_j (x) = 0, & j = 1, 2, \cdots, l \\ w_i \ge 0, & i = 1, 2, \cdots, m \\ \end{cases}\]
凸规划的充分条件
\(f\)是凸函数,\(g_i (i = 1, \cdots, m)\)是凹函数,\(h_j (j = 1, \cdots, l)\)是线性函数,可行域为\(S\),\(\overline{x} \in S\), \(I = \{ i \mid g_i(\overline{x}) = 0 \}\),且在\(\overline{x}\)处KKT必要条件成立,则\(\overline{x}\)是全局最优解。
二阶条件
切锥
设非空集合\(S \in \mathbb{R}^n\),点\(\overline{x} \in \text{cl} S\),
\[T = \{ d \mid \exists x^{(k)} \in S, x^{(k)} \to x 及 \lambda _k \gt 0, d = \lim _{k \to \infty} (x^{(k)} - \overline{x}) \}\]
\(T\)称为集合\(S\)在点\(\overline{x}\)的切锥(tangent cone)或序列化可行方向锥(sequential feasible cone)。
设确定集合\(S\)的所有约束函数在\(x \in S\)处连续可微,则\(D(x, S) \subseteq SFD(x, S)\),其中\(D(x, S)\)为\(x\)点的可行方向锥,\(SFD(x, S)\)为\(x\)点的切锥。
定义\(\overline{S} = \left\{ x \mid \begin{array}{cc} x \in \mathbb{R}^n & \\ g_i(x) = 0, & i \in I, \overline{w_i} \gt 0 \\ g_i(x) \ge 0, & i \in I, \overline{w_i} = 0 \\ h_i(x) = 0, & j = 1, \cdots, l \end{array} \right\}\)
设\(\overline{S}\)在点\(\overline{x}\)的切锥为\(T\)。
定义\(\overline{G} = \left\{ d \mid \begin{array}{cc} d \in \mathbb{R}^n & \\ \nabla g_i(\overline{x})d = 0, & i \in I, \overline{w_i} \gt 0 \\ \nabla g_i(\overline{x})d \ge 0, & i \in I, \overline{w_i} = 0 \\ \nabla h_i(\overline{x})d = 0, & j = 1, \cdots, l \end{array} \right\}\)
则\(\overline{G} \supseteq \overline{T}\)。二阶必要条件
设\(\overline{x}\)是局部最优解,\(f\), \(g_i (i = 1, \cdots, m)\)和\(h_j (j = 1, \cdots, l)\)二次连续可微,并存在满足一般情形的一阶必要条件的乘子\(\overline{w} = (\overline{w_1}, \cdots, \overline{w_m})\)和\(v = (\overline{v_1}, \cdots, \overline{v_l})\)。
设点\(\overline{x}\)的约束规格\(\overline{G} = \overline{T}\)成立,则\(\forall d \in \overline{G}\),有
\[d^T \nabla_x^2 L(\overline{x}, \overline{w}, \overline{v}) d \ge 0\]
即\(L\)在点\(\overline{x}\)关于\(x\)的Hesse矩阵\(\nabla_x^2 L(\overline{x}, \overline{w}, \overline{v}) = \nabla ^2 f(\overline{x}) - \sum_{i = 1}^m \overline{w_i} \nabla ^2 g_i(\overline{x}) - \sum_{j = 1}^l \overline{v_j} \nabla ^2 h_j(\overline{x})\)是在\(\overline{G}\)上半正定的。
二阶充分条件
设\(f\), \(g_i (i = 1, \cdots, m)\)和\(h_j (j = 1, \cdots, l)\)二次连续可微,\(\overline{x}\)为可行点,存在满足一般情形的一阶必要条件的乘子\(\overline{w} = (\overline{w_1}, \cdots, \overline{w_m})\)和\(v = (\overline{v_1}, \cdots, \overline{v_l})\),且\(\forall d \in \overline{G}\),有
\[d^T \nabla_x^2 L(\overline{x}, \overline{w}, \overline{v}) d \gt 0\]
则\(x\)是严格局部最优解。
考虑下列非线性规划问题(可行域如图中的弧\(ABCD\)):
\[\begin{array}{cl} \min & x_1, \\ \text{s.t.} & 3(x_1 - 3)^2 + x_2 \ge 0, \\ & (x_1 - 3)^2 + x_2^2 - 10 = 0. \end{array}\]
判断下列各点是否为局部最优解:
\[ x^{(1)} = \begin{bmatrix} 2 \\ -3 \end{bmatrix}, x^{(2)} = \begin{bmatrix} 4 \\ -3 \end{bmatrix}, x^{(3)} = \begin{bmatrix} 3 + \sqrt{10} \\ 0 \end{bmatrix}, x^{(4)} = \begin{bmatrix} 3 - \sqrt{10} \\ 0 \end{bmatrix}. \]
解:目标函数\(f(x) = x_1\)及约束函数\(g(x) = 3(x_1 - 3)^2 + x_2\), \(h(x) = (x_1 - 3)^2 + x_2^2 - 10\)的梯度分别为
\[ \nabla f(x) = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \nabla g(x) = \begin{bmatrix} 6(x_1 - 3) \\ 1 \end{bmatrix}, \nabla h(x) = \begin{bmatrix} 2(x_1 - 3) \\ 2x_2 \end{bmatrix}. \]
Lagrange函数是\(L(x, w, v) = x_1 - w[3(x_1 - 3)^2 + x_2] - v[(x_1 - 3)^2 + x_2^2 - 10]\)
\[ \nabla_x L = \begin{bmatrix} 1 - 6w(x_1 - 3) - 2v(x_1 - 3) \\ -w - 2vx_2 \end{bmatrix}, \nabla_x^2 L = \begin{bmatrix} -6w - 2v & 0 \\ 0 & -2v \end{bmatrix}. \]
- \(x^{(1)}\)是可行点,两个约束均为起作用约束。
\[ \nabla f(x^{(1)}) = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \nabla g(x^{(1)}) = \begin{bmatrix} -6 \\ 1 \end{bmatrix}, \nabla h(x^{(1)}) = \begin{bmatrix} -2 \\ -6 \end{bmatrix}. \]
KKT条件为
\[\begin{cases}
1 + 6w + 2v = 0 \\
-w + 6v = 0 \\
w \ge 0
\end{cases}\]
方程组无解,故\(x^{(1)}\)不是KKT点,不是局部最优解。
- \(x^{(2)}\)是可行点,两个约束均为起作用约束。
\[ \nabla f(x^{(1)}) = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \nabla g(x^{(1)}) = \begin{bmatrix} 6 \\ 1 \end{bmatrix}, \nabla h(x^{(1)}) = \begin{bmatrix} 2 \\ -6 \end{bmatrix}. \]
KKT条件为
\[\begin{cases} 1 - 6w - 2v = 0 \\ -w + 6v = 0 \\ w \ge 0 \end{cases}\]
解得\(w = \frac{3}{19}\), \(v = \frac{1}{38}\),\(x^{(2)}\)是KKT点,问题在\(x^{(2)}\)满足一阶必要条件。
在此点Lagrange函数的Hesse矩阵
\[\nabla_x^2 L(x^{(2)}, w, v) = \begin{bmatrix} -1 & 0 \\ 0 & -\frac{1}{19} \end{bmatrix}\]
求集合\(\overline{G}\)中的元素,由于\(w \gt 0\),
解方程组\(\begin{cases} \nabla g(x^{(2)})^T d = 0 \\ \nabla h(x^{(2)})^T d = 0 \end{cases}\),其中\(d = (d_1, d_2)^T\),即
\(\begin{cases} 6d_1 + d_2 = 0 \\ 2d_1 - 6d_2 = 0 \end{cases}\),解得\(d = (0, 0)^T\)。
方向集\(G = \{ d \mid d \neq 0, \nabla g(x^{(2)})^T d = 0, \nabla h(x^{(2)})^T d = 0 \} = \emptyset\),
故\(x^{(2)}\)是局部最优解。
P239