Simple methods for sampling
We often know a distribution analytically, but can’t sample from it. Sampling from a distribution allows us to perform numerical analysis of many of its properties. Most commonly, we are interested in its integrals or moments, which may be hard to acquire analytically. This kind of problem shows up all the time in Bayesian inference, in which we might look derive the posterior distribution for our favorite parameters, but it may have an unpleasant analytical form.
MCMC provides a general (but costly) solution. For simpler, especially lower dimensional distributions, there are some less complex methods we can use. It’s useful to have these at your disposal when MCMC is overkill, and can be quick solutions to simple sampling problems.
Also useful for simulations! and predictions
$f(X) = x (1-x)^3$
https://en.wikipedia.org/wiki/Inverse_transform_sampling
from sympy import symbols, integrate, latex
x, a, b = symbols('x a b')
f = (x**(a-1))*((1-x)**(b-1))
F = integrate(f, (x, 0, 1), meijerg=True)
const = F.evalf(subs={a:2, b:4}) # normalizing constant; don't use meijerg if x is the only variable
print(latex(F))
from matplotlib import pyplot as plt
import seaborn as sns
from scipy.interpolate import InterpolatedUnivariateSpline
from scipy.stats import uniform
# https://stackoverflow.com/questions/47275957/invert-interpolation-to-give-the-variable-associated-with-a-desired-interpolatio
def f(x):
return x * (1-x)**3
n_points = 1000
grid_points = np.linspace(0, 1, n_points)
density = f(grid_points)
approx_f = InterpolatedUnivariateSpline(grid_points, density)
plt.scatter(grid_points, density)
plt.plot(grid_points, approx_f(grid_points), color='orange')
plt.show()
approx_cdf = np.vectorize(lambda x: approx_f.integral(0, x) / approx_f.integral(0, 1))
plt.plot(grid_points, approx_cdf(grid_points))
plt.show()
inv_approx_cdf = InterpolatedUnivariateSpline(approx_cdf(grid_points), grid_points)
plt.plot(grid_points, inv_approx_cdf(grid_points))
plt.show()
x_uniform = uniform(0, 1).rvs(1000)
samples = inv_approx_cdf(x_uniform)
sns.distplot(samples)
plt.plot(grid_points, density / approx_f.integral(0, 1))
plt.show()
import numpy as np
from scipy.stats import norm
from matplotlib import pyplot as plt
import seaborn as sns
x_plot = np.linspace(-10, 10, 5000)
log_p = norm(0, 1).logpdf(x_plot) + 500
p_sample = np.exp(log_p - np.logaddexp.reduce(log_p))
sns.distplot(np.random.choice(x_plot, p=p_sample, size=10000))
plt.plot(x_plot, norm(0, 1).pdf(x_plot))
plt.show()
import numpy as np
from scipy.optimize import minimize
from scipy.stats import multivariate_normal
from numdifftools import Hessian
def approx_dist(lnprob, v_init=None, v_max=None, hessian=None):
"""
Construct a laplace approximation to the distribution with the given log-density.
Arguments:
-lnprob: The log-density which you would like to approximate. Should take a vector and return a real number.
-v_init: The initial value at which to start the search for the mode of lnprob. If it is not given.
-v_max: The mode of lnprob. If it is not provided, it will be calculated numerically.
-hessian: A function which will compute the Hessian. If it is not given, it will be approximated numerically.
Returns:
-approximate distribution, a scipy.stats.multivariate_normal object
"""
neg_lnprob = lambda v: -lnprob(v)
if v_max is None and v_init is not None:
result = minimize(neg_lnprob, v_init)
x_max = result.x
elif v_max is not None:
x_max = v_max
else:
raise Exception('You must provide either an initial value at which to start the search for the mode (v_init) or the value of the mode (v_max)')
if hessian is None:
hess_calc = Hessian(lnprob)
else:
hess_calc = hessian
h = hess_calc(x_max)
dist = multivariate_normal(x_max, -np.linalg.inv(h))
return dist
http://gregorygundersen.com/blog/2019/11/28/asymptotic-normality-mle/
https://stephens999.github.io/fiveMinuteStats/asymptotic_normality_mle.html
https://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-fall-2006/lecture-notes/lecture3.pdf
MCMC
Rejection sampling
Method | Dimensionality | Summary |
---|---|---|
Inverse transform sampling | 1D only | Invert the CDF and sample from the uniform distribution on [0, 1] |
Grid sampling | Low | Compute a subset of PDF values, and treat the distribution as discrete |
Laplace approximation | Any, as long as it’s unimodal | Find the maximum, and fit a normal approximation around it |