on

# Note 17 of Deep Learning: Monte Carlo Methods

Las Vegas algorithms and Monte Carlo algorithms are two rough categories of randomized algorithms. Las Vegas algorithms always return a precisely correct answer (or report fail), by consuming a random amount of resources; Monte Carlo algorithms return answers (by approximation) with a random amount of error, which may be reduced by expending more resources.

## Monte Carlo Sampling

When a sum or integral cannot be computed exactly (e.g. it has an exponential numbers of terms), we can approximate it using *Monte Carlo sampling*, which views the sum or integral as an expectation under some distribution, and approximate the *expectation by a corresponding average*.

Let

*s* can be approximated by drawing *n* samples $x^{(1)},…,x^{(n)}$ from *p* and computing the empirical average

We also have

which gives us a way of estimating $Var[\hat{s}_n]$.

When x cannot be sampled from *p*, an alternative is to use *importance sampling*, and more generally, to form a sequence of estimators that coverage towards the distribution of interest, which is the approach of **Monte Carlo Markov chains (MCMC)**.

## Importance Sampling

It’s important to decide which part of the integrand should play the role of the probability $p(x)$ and which part should play the role of the quantity $f(x)$ whose expected value is to be estimated. But any decomposition can be rewritten as

where we now sample from $q$ and average $\frac{pf}{q}$.

In many cases, the problem to be solved will specify a given $p$ and $f$, which may not be the *optimal* choice in terms of the number of samples required to obtain a given level of accuracy. We can suppose $q^*$ is the optimal choice which can be derived easily. The optimal $q^*$* corresponds to **optimal importance sampling**.