Convex Optimization 2015.10.21
Convex Optimization 2015.10.21
\(int(K)\): interior of \(K\)
- Definition: Cone \(K\) is proper if it is closed, convex, pointed (contains no line), solid (nonempty interior)
If \(K\) is proper, then \(K^*\) is proper, \(K^* = \{ \lambda : \lambda ^T x \ge 0 \; \forall x \in K \}\)
Fact1: \(x \succeq _k y \iff \lambda ^T x \ge \lambda ^T y \; \forall \lambda \; s.t. \; \lambda \succeq _{K^*} 0\)
Proof: \(x \succeq _k y \iff x - y \in K \overset{because\, K^{**} = K}{\iff} \lambda ^T (x - y) \ge 0 \; \forall \lambda \in K^*\)
Lemma: Let \(K\) be proper, then \(x \in int(K)\) if and only if \(\lambda ^T x \gt 0 \; \forall \lambda \in K^* \backslash \{ 0 \}\)
Proof: If \(x \in int(K)\) then obviously \(\lambda ^T x \ge 0\) for all \(\lambda \in K^* \backslash \{ 0 \}\).
If \(\lambda \in K^* \backslash \{ 0 \}\) and \(\lambda ^T x = 0\) then there exists \(u \in \mathbb{R}^n\) such that \(\lambda ^T u \lt 0\), and \(x + tu \in K\) for $t $ sufficiently small, a contradiction.
Next, assume that \(\lambda ^T x \gt 0\) for all \(\lambda \in K^* \backslash \{ 0 \}\).
Then \(x \in K^{**} = K\), so \(x \in K\).
If \(x \notin int(K)\) then \(\exists (x_i)_{i = 1}^{\infty} \notin K \; s.t. \; \underset{i \to \infty}{\lim} x_i = x\).
Since \(x_i \notin K\), \(\exists \lambda _i \in K^* \; s.t. \; \lambda _i^T x_i \lt 0\) (for all \(i\)).
WLOG \(\lambda'_i\)s are unit vectors.
Then \(\{ \lambda _i : i \in \mathbb{N} \}\) lives inside a compact set (the unit ball of \(\mathbb{R}^n\)).
So we can assume WLOG that \(\lambda _1, \lambda _2, \cdots\) is convergent: \(\underset{i \to \infty}{\lim} \lambda _i = \lambda\).
Then \(\lambda \in K^*\) because \(K^*\) is closed and \(\underset{i \to \infty}{\lim} (\lambda ^T x - \lambda _i^T x_i) = \underset{i \to \infty}{\lim} (\lambda ^T x - \lambda _i^T x) + \underset{i \to \infty}{\lim} (\lambda _i^T x - \lambda _i^T x_i) = 0 + 0 = 0\).
... contradiction!
QED.
Fact2: \(x \succ _k y \iff \lambda ^T x \gt \lambda ^T y \; \forall \lambda \in K^* \backslash \{ 0 \}\)
Proof: \(x \succ _k y \iff x - y \in int(K) \underset{\text{by lemma}}{\iff} \lambda ^T (x - y) \gt 0 \; \forall \lambda \in K^* \backslash \{ 0 \}\)
Convex Functions:
A function \(f : \mathbb{R}^n \to \mathbb{R}\) is convex if and only if \(dom \, f\) is convex and \(f(\theta x + (1 - \theta) y) \le \theta f(x) + (1 - \theta) f(y)\) for all \(x, y \in dom \, f\) and all \(0 \lt \theta \lt 1\).
\(f : \mathbb{R}^n \to \mathbb{R}\) is convex if and only if \(epi(f) \subseteq \mathbb{R}^{n + 1} = \{(x, t) : x \in dom \, f, t \ge f(x)\}\) is convex.
Example: \(f(x) = \max(x_1, \cdots, x_n)\) is convex.
Proof: \(\max(\theta x + (1 - \theta) y) = \underset{i}{\max} [\theta x_i + (1 - \theta) y_i] \le \theta \underset{i}{\max} x_i + (1 - \theta) \underset{j}{\max} y_j = \theta \max (x) + (1 - \theta) \max (y)\)
Observation: \(f\) is convex iff the restriction of \(f\) to every line is convex.
Namely, \(f\) is convex if and only if the function \(g(t) = f(x + tu)\) from \(\mathbb{R}\) to \(\mathbb{R}\) is convex for every \(x \in dom \, f\) and \(u \in \mathbb{R}^n\).
- Derivatives: A function \(f : \mathbb{R}^n \to \mathbb{R}^m\) is differentiable at a point \(x \in int(dom \, f)\) if and only if there exists a matrix \("Df(x)" \in \mathbb{R}^{m \times n}\) such that \(f(x + z) = f(x) + Df(x)z + err(z)\) where \(\underset{z \to 0}{\lim} \frac{err(z)}{\lVert z \rVert _2} = 0\)
(Namely, \(\underset{z \to 0 }{\lim} \frac{f(x + z) - f(z) - Df(x)z}{\lVert z \rVert _2} = 0\))
Proposition: The derivative, if it exists, is unique.
Proof: Assume that \(A, B \in \mathbb{R}^{m \times n}\) both satisfy the conditions of the derivative and that \(A \neq B\)
Snce \(A \neq B\) there exists some vector \(u \in \mathbb{R}^n \; s.t. \; Au \neq Bu\), WLOG \(u\) is a unit vector.
Then \(0 = \underset{t \to 0}{\lim} \frac{f(x + tu) - f(x + tu)}{\lVert tu \rVert _2} = \underset{t \to 0}{\lim} \frac{f(x) + Atu + err_A(tu) - f(x) - Btu - err_B(tu)}{\lVert tu \rVert _2} = (A - B)_u + \underset{t \to 0}{\lim} \frac{err_A(tu) - err_B(tu)}{\lVert tu \rVert _ 2} = (A - B)_u \neq 0\)
- Example: Let \(f(x) = q^T x + c\) for \(q \in \mathbb{R}^n, c \in \mathbb{R}\)
Then \(Df(x) = q^T\) is the derivative.
- Example 2: Let \(f(x) = x^T P x\) for \(P \in S^n\)
Have \(f(x + z) - f(x) = (x^T + z^T) P(x + z) - x^T P x = z^T P x + x^T P z + z^T P z = 2 x^T P z + z^T P z\)
so will have \(Df(x) = 2x^T P\) if we can show \(\underset{z \to 0}{\lim} \frac{\lVert z^T P z \rVert _2}{\lVert z \rVert _2} = 0\)
\(\lVert z^T P z \rVert _2 \le \lVert z \rVert _2 \lVert Pz \rVert _2 \le \lVert z \rVert _2 \lVert P \rVert _2 \lVert z \rVert _2\)
- Example: Let \(f : S_{++}^n \to \mathbb{R}\) be defined by \(f(X) = \log (\det X)\), \(Z \in S^n\)
\[\begin{align*} f(X + Z) - f(x) = & \log \det(X + Z) - \log \det X \\ =& \log \det (X(I + X^{-1} Z)) - \log \det X \\ =& \log \det X + \log \det (I + X^{-1} Z) - \log \det X \\ & \text{(Let $I = QQ^T, X^{-1} Z = Q \text{diag}(\lambda_1, cdots , \lambda _n) Q^T$)} \\ =& \log \prod_{i = 1}^n (1 + \lambda _i) \\ =& \sum_{i = 1}^n \log (1 + \lambda _i) \\ =& \sum_{i = 1}^n \lambda _i + o(\lambda _i) \\ =& tr(X^{-1} Z) \\ =& \left \langle X^{-1}, Z \right \rangle \\ \end{align*}\]
\(tr(B^T A) = \left \langle B, A \right \rangle\)
Second derivative (for functions of range \(\mathbb{R}\), m = 1)
\(f : \mathbb{R}^n \to \mathbb{R}, Df(x) = \nabla f(x)^T, D(Df(x)^T) = D(\nabla f(x)) \in \mathbb{R}^{n \times n}\)
\(\nabla f(x) = [\frac{\partial f}{\partial x_1}, \cdots , \frac{\partial f}{\partial x_n}]^T\)
\(D(\nabla f(x)) = \begin{bmatrix} \frac{\partial ^2 f}{\partial x_1^2} & \cdots & \frac{\partial ^2 f}{\partial x_1 x_n} \\ \vdots & & \vdots \\ \frac{\partial^2 f}{\partial x_n x_1} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} = \nabla ^2 f''(x)\)
means the Hessian of \(f'\).
\(f = \begin{bmatrix} f_1 \\ \vdots \\ f_m \end{bmatrix}\)
\(Df = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix}\)
First order condition for convexities
- Fact: Assume \(f : \mathbb{R}^n \to \mathbb{R}\) is differentiable (This means \(dom \, f\) is open.)
Then \(f\) is convex iff \(dom \, f\) is convex and \(f(y) \ge f(x) + Df(x)(y - x)\) for all \(x, y \in dom \, f\)
- Proof: \(\impliedby\) Let \(x, y \in dom \, f\), \(0 \lt \theta \lt 1\), \(z = \theta x + (1 - \theta) y\)
We have, by \(f(y) \ge f(x) + Df(x) (y - x)\),
\(f(x) \ge f(z) + Df(z)(x - z)\), \(f(y) \ge f(z) + Df(z)(y - z)\)
so \(\theta f(x) + (1 - \theta)f(y) \ge f(z) + \theta Df(z) x + (1 - \theta) Df(z) y - Df(z) z = f(z) \implies \text{convex}\)
\(\implies\) Fix \(x, y \in dom \, f\) and \(x \neq y\), Let \(g(t) = f(x + t(y - x)) = f(ty + (1 - t)x)\)
Then \(g(t) \le tf(y) + (1 - t)f(x)\) by convexity (for \(t \in [0, 1]\))
so \(\frac{f(x + t(y - x)) + (1 -t)f(x)}{t} \le f(y) \implies \frac{f(x + t(y - x)) -f(x)}{t} + f(x) \le f(y)\)
\(\lim _{t \to 0} \frac{Df(x)(t(y - x)) +err_x(t(y - x))}{t} + f(x) \le f(y)\)
\(Df(x)(y - x) + \lim_ {t \to 0} \frac{err_x(t(y - x))}{t} + f(x) \le f(y)\)
\(f(y) \ge Df(x)(y - x) + f(x)\)