Log-Sum-Exp Trick
The Log-Sum-Exp trick is a really cool computational trick to prevent overflow or underflow when computing the log of the sum of exponentials (exactly as it name suggests!). I got to know about it while trying to code up mixture density networks which required me to calculate the log of the sum of a bunch of Gaussian distributions for its log-likelihood.
Suppose you would like to calculate a quantity that looks like the following,
If \(x_i\) happens to be a very negative or large positive number, then the exponential will make it even larger or smaller. When calculating with limited precision, for example GPU calculations in float32
, this problem is exacerbated and can lead to underflow or overflow.
Thankfully, we can re-write \(\mathcal{J}\) as
where \(a = \max \left(x_i;~i=1\text{ ... }N \right)\). If \(x_i\) is a very negative or large positive number, then \(x_i-b\) brings the exponential to more manageable territory for computation.
Also, the above re-arrangement is an approximation of the max of \(x_i\) since \(\mathcal{J}\) exists in the tight bounds of
See wikipedia for more details.
Incidentally, TensorFlow also provides an implementation of the Log-Sum-Exp trick to make things convenient for TensorFlow users. See here.
Here's a Youtube video on the Log-Sum-Exp trick (not by me) if reading's not your thing...