Matrix Calculus
1 minute read
Matrix Derivatives
1. Scalar-by-scalar
- input : $x \in \mathbb{R}$
- output : $f(x) \in \mathbb{R}$
Gradient is also a scalar.
\[\dfrac{df}{dx} \in \mathbb{R}\]2. Scalar-by-vector
- input : $x \in \mathbb{R}^m$
- output : $f \in \mathbb{R}$
Gradient
Also a vector, which has the same size with input $x$.
\[\nabla f = \dfrac{\partial{f}}{\partial{x}} = \left( \dfrac{\partial{f}}{\partial{x_1}}, \cdots \dfrac{\partial{f}}{\partial{x_m}}\right) \in \mathbb{R}^m\]Hessian
A matrix of the size $m \times m$. (m : dimension of $x$)
\[\mathbf{H} = \nabla\nabla f = \left[ \dfrac{\partial^2{f}}{\partial{x_i} \partial{x_j}} \right] \in \mathbb{R}^{m \times m}, \quad i, j \in \{1, \cdots m\} \\]- useful to decide whether a optimization problem has a global optimum
- always symmetric
3. Vector-by-vector
- input : $x \in \mathbb{R}^m$
- output : $y(x) = Ax \in \mathbb{R}^n, \; A \in \mathbb{R}^{n \times m}$