Matrix Calculus
1 minute read

Matrix Derivatives

1. Scalar-by-scalar

Gradient is also a scalar.

\[\dfrac{df}{dx} \in \mathbb{R}\]

2. Scalar-by-vector

Gradient

Also a vector, which has the same size with input $x$.

\[\nabla f = \dfrac{\partial{f}}{\partial{x}} = \left( \dfrac{\partial{f}}{\partial{x_1}}, \cdots \dfrac{\partial{f}}{\partial{x_m}}\right) \in \mathbb{R}^m\]

Hessian

A matrix of the size $m \times m$. (m : dimension of $x$)

\[\mathbf{H} = \nabla\nabla f = \left[ \dfrac{\partial^2{f}}{\partial{x_i} \partial{x_j}} \right] \in \mathbb{R}^{m \times m}, \quad i, j \in \{1, \cdots m\} \\]

3. Vector-by-vector

Gradient

\[\nabla y = \left[ \dfrac{\partial{y_j}}{\partial{x_i}}\right] \in \mathbb{R}^{n \times m}, \quad \nabla y = A\]

Resources

Recent Posts

Lazy learning vs Eager learning
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Perplexity of Language Models
Inverted Indexing
Deep Contextualized Word Representations