Convex Optimization 2015.10.28
Convex Optimization 2015.10.28
Proposition: Let \(f : \mathbb{R} \to \mathbb{R}\) with \(dom \, f\) convex and \(f\) twice differentiable.
Then \(f\) is convex if \(f''(x) \ge 0\) for all \(x \in dom \, f\).
Proof: Let \(z, x \in dom \, f\), then
\[\begin{align*} f(z) =& f(x) + \int _x^z f'(t)dt \\ =& f(x) + \int _x^z (f'(x) + \int _x^t f''(s)ds )dt \\ =& f(x) + f'(x)(z - x) + \int _x^z \int _x^t f''(s)ds dt \\ \ge & f(x) + f'(x)(z - x) & \text{(two case to consider)}\\ \end{align*}\]
QED by "First order conditions"
Chain Rule: Let \(f : \mathbb{R}^n \to \mathbb{R}^m\) be differentiable at \(x \in dom \, f\),
let \(g : \mathbb{R}^m \to \mathbb{R}^k\) be differentiable at \(f(x) \in dom \, g\),
then if \(h : \mathbb{R}^n \to \mathbb{R}^k\) is defined by \(h(y) = g(f(y)) \; \forall y \in \mathbb{R}^n\), \(h\) is differentiable at \(x\) and \(Dh(x) = Dgf(x)) \cdot Df(x)\)
(\(Df : m \times n\) matrix, \(Dg : k \times m\) matrix)
can be written as \(h = g \circ f, D(g \circ f) = (Dg \circ f) \cdot Df\)
Example: Let \(f : \mathbb{R}^m \to \mathbb{R}\), \(A \in \mathbb{R}^{m \times n}\), \(b \in \mathbb{R}^n\), \(l(x) = Ax + b\).
\(D(f \circ l)(x) = [(Df \circ l) \cdot Dl](x) = Df(Ax + b) \cdot A = \nabla f(Ax + b)^T A\)
Example: Let \(f : \mathbb{R}^n \to \mathbb{R}\), \(g : \mathbb{R} \to \mathbb{R}\), then
\(D(g \circ f)(x) = Dg(f(x))Df(x) = g'(f(x)) \cdot \nabla f(x)^T\)
Example: Let \(f : \mathbb{R}^n \to \mathbb{R}\), \(g : \mathbb{R} \to \mathbb{R}\) be defined by \(g(t) = f(x + tu)\) for some vectors \(x, u\).
To compute \(g'(t)\), let \(h(t) = x + tu\), so \(h : \mathbb{R} \to \mathbb{R}^n\) and \(g = f \circ h\).
So \(g'(t) = ((Df \circ h) \cdot Dh)(t) = \nabla f^T(h(t)) \cdot Dh(t) = \nabla f(x + tu)^T \cdot u = u^T \nabla f(x + tu)\).
To compute \(g''(t)\),
\(g''(t) = (D[(u^T \nabla f) \circ h])(t) = ([(Du^T \nabla f) \circ h] \cdot Dh)(t) = (((u^T D \nabla f) \circ h) \cdot u)(t) = u^T \nabla ^2 f(h(t)) \cdot u\)
Corollary: Let \(f : \mathbb{R}^n \to \mathbb{R}\) be twice differentiable, \(dom \, f\) convex.
The \(f\) is convex if \(\nabla ^2 f \succeq 0\).
Example: "log-sum-exp" \(f(x) = log(e^{x_1} + \cdots + e^{x_n}), f : \mathbb{R}^n \to \mathbb{R}, dom \, f = \mathbb{R}^n\)
\(\nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}(x) \\ \vdots \\ \frac{\partial f}{\partial x_n}(x) \end{bmatrix} = \frac{1}{e^{x_1} + \cdots + e^{x_n}} \cdot \begin{bmatrix} e^{x_1} \\ \vdots \\ e^{x_n} \end{bmatrix}\)
\(\frac{\partial}{\partial x_i} \frac{1}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i}\)
\((\nabla ^2 f)_{ij, \, i \neq j} = \frac{\partial}{\partial x_i} \frac{e^{x_j}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot e^{x_i} e^{x_j}\)
\((\nabla ^2 f)_{ii} = \frac{\partial}{\partial x_i} \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}} = -\left( \frac{1}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \cdot (e^{x_i})^2 + \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}\)
Put \(z_i = e^{x_i}\), then \(e^{x_1} + \cdots + e^{x_n} = I^T z\)
\(\nabla ^2 f = -\left(\frac{1}{I^T z} \right)^2 z z^T + \frac{1}{I^T z} \cdot \text{diag}(z) = \frac{1}{I^T z} \left( \text{diag}(z) - \frac{1}{I^T z} z z^T \right)\)
\[\begin{align*} x^T (I^T z \cdot \text{diag}(z) - z z^T)x \ge 0 \impliedby & I^T z \cdot \sum_{i = 1}^n x_i^2 \cdot z_i - (z^T x)^2 \ge 0 \\ \impliedby & (z^T x)^2 \le I^T z \cdot \sum_{i = 1}^n x_i^2 z_i = \lVert (\sqrt{z_1}, \cdots, \sqrt{z_n}) \rVert _2 \cdot \sum_{i = 1}^n \lVert x_1 \sqrt{z_1}, \cdots, x_n \sqrt{z_n} \rVert _2 \\ \end{align*}\]
Exercise: Prove that \(f(x, y) = y^2 / x\) is convex, \(dom \, f = \mathbb{R}_{++} \times \mathbb{R}\)
\(\nabla f = \begin{bmatrix} -\frac{y^2}{x^2} \\ \frac{2y}{x} \end{bmatrix}, \nabla ^2 f = \begin{bmatrix} -\frac{2y^2}{x^3} & -\frac{2y}{x^2} \\ -\frac{2y}{x^2} & \frac{2}{x} \end{bmatrix} = \frac{1}{x^3} \begin{bmatrix} 2y^2 & -2xy \\ -2xy & 2x^2 \end{bmatrix} = \frac{2}{x^3} \begin{bmatrix} y \\ -x \end{bmatrix} \begin{bmatrix} y -x \end{bmatrix}\)
Proposition: Let \(f : \mathbb{R}^n \to \mathbb{R}\) be twice differentiable at \(x\). Then
\(f(x + z) = f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + err_x(z)\)
where \(\lim_{z \to 0} \frac{\lVert err_x(z) \rVert _2}{\lVert z \rVert _2^2} = 0\).
Equivalent to: \(\forall \varepsilon \gt 0, \, \exists r \gt 0, \, s.t. \, (\lVert z \rVert _2 \le r \implies \lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2^2)\)
Proof: Let \(\varepsilon \gt 0\), then \(\exists r \gt 0\) s.t.
\(\nabla f(x + z) = \nabla f(x) + \nabla ^2 f(x) z + err_x(z)\)
where \(\lVert err_x(z) \rVert _2 \le \varepsilon \cdot \lVert z \rVert _2\) for all \(z\) s.t. \(\lVert z \rVert _2 \le r\)
\(\lVert \nabla f(x + z) - \nabla f(x) - \nabla ^2 f(x) z \rVert _2 \le \varepsilon \lVert z \rVert _2\) for all $z $ s.t. \(\lVert z \rVert _2 \le r\)
Let \(z\) s.t. \(\lVert z \rVert _2 \le r\) and let \(u = z / \lVert z \rVert _2\), \(g(t) = f(x + tu), t \in \mathbb{R}\).
Then \(g'(t) = \nabla f(x + tu)u\)
So \[\begin{align*} f(x + tu) = & f(x) + \int _0^t g'(s)ds \\ =& f(x) + \int _0^t u^T \nabla f(x + su)ds \\ =& f(x) + u^T \int _0^t (\nabla f(x) + \nabla ^2 f(x)su + err_x(su))ds \\ =& f(x) + u^T \nabla f(x) (t - 0) + u^T \nabla ^2 f(x) u \int _0^t s ds + u^T \int _0^t err_x(su)ds \\ =& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \lVert err_x(su) \rVert _2 ds \\ \le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \int _0^t \varepsilon \lVert su \rVert _2 ds \\ \le & f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon \int _0^t t ds \\ =& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \lVert u \rVert _2 \varepsilon t^2 \\ =& f(x) + \nabla f(x)^T tu + \frac{1}{2} (tu)^T \nabla ^2 f(x) (tu) + \varepsilon t^2 \\ =& f(x) + \nabla f(x)^T z + \frac{1}{2} z^T \nabla ^2 f(x) z + \varepsilon \lVert z \rVert _2^2 \\ \end{align*}\]