Skip to main content

Log-Sum-Exp Trick

The Log-Sum-Exp trick is a really cool computational trick to prevent overflow or underflow when computing the log of the sum of exponentials (exactly as it name suggests!). I got to know about it while trying to code up mixture density networks which required me to calculate the log of the sum of a bunch of Gaussian distributions for its log-likelihood.

Suppose you would like to calculate a quantity that looks like the following,

\begin{equation*} \mathcal{J} = \log \sum_{i=1}^N \exp(x_i) \end{equation*}

If \(x_i\) happens to be a very negative or large positive number, then the exponential will make it even larger or smaller. When calculating with limited precision, for example GPU calculations in float32, this problem is exacerbated and can lead to underflow or overflow.

Thankfully, we can re-write \(\mathcal{J}\) as

\begin{equation*} \mathcal{J} = a + \log \sum_{i=1}^N \exp(x_i-b), \end{equation*}

where \(a = \max \left(x_i;~i=1\text{ ... }N \right)\). If \(x_i\) is a very negative or large positive number, then \(x_i-b\) brings the exponential to more manageable territory for computation.

Also, the above re-arrangement is an approximation of the max of \(x_i\) since \(\mathcal{J}\) exists in the tight bounds of

\begin{equation*} a \leq \mathcal{J} \leq a+\log(N). \end{equation*}

See wikipedia for more details.

Incidentally, TensorFlow also provides an implementation of the Log-Sum-Exp trick to make things convenient for TensorFlow users. See here.

Here's a Youtube video on the Log-Sum-Exp trick (not by me) if reading's not your thing...