A step-by-step guide to propagating error using the Delta Method
The whole point of statistical analysis is to quantify uncertainty. The results of even very complex analysis can often be summed up by a statement like:
Under certain assumptions which seem plausible, the data indicates that a reasonable estimate for the random variable $\theta$ is $\hat{\theta}$, with a standard error of $\hat{SE}(\hat{\theta})$.
We have various methods of extracting an estimate plus a standard error from the data - we know the standard error of the mean, we can use OLS, etc. We can then propagate error using error propagation rules (link). But what if your case is not covered by those rules? We can imagine some examples:
These are hard because we know the SE of inputs, but not the outputs. If you transformation has a nice form (and a 1st derivative), the delta method lets you propagate the error from the ones you know, to the ones you don’t.
Each of the scenarios outlines has a set of statistics with known standard errors (we’ll call them $\theta$) as well as a set of statistics with unknown standard errors (we’ll call them $f(\theta)$). $f$ is the function mapping the two sets. For example …5
Let’s start with the univariate case. If $f(\theta)$ is our output, then
$\hat{SE}(f(\hat{\theta})) \approx \mid f^\prime (\hat{\theta}) \mid \times \hat{SE}(\hat{\theta}) $
And so an approximate asymptotic confidence interva for $f(\hat{\theta})$ is
$f(\hat{\theta}) \pm z_{\alpha/2} \times \mid f^\prime (\hat{\theta}) \mid \times \hat{SE}(\hat{\theta})$
Under this approximation, note that
The delta method tells us how much the function “inflates” the the SE of $\hat{\theta}$; it inflates it by $\mid f^\prime (\hat{\theta}) \mid$
Keep in mind that this is asymptotic. You can compare w/ bootstrap to verify
All of statistics section 5.5, Thm 5.13
See the appendix for worked examples that we “already know”
You estimate that the input is $\hat{\theta}$; how does this propagate to the output?
$\frac{d}{dx} \alpha x^\beta = \beta \alpha x^{\beta - 1}$
So the $SE$ is
$(\beta \alpha \hat{\theta}^{\beta - 1}) \times SE(\hat{\theta}) = SE(y)$
Derive
$f(\theta)$
Calc
Compare bootstrap
Note that in this section $f: \mathbb{R}^k \rightarrow \mathbb{R}$
$\hat{SE}[f(\hat{\theta})] \approx \sqrt{\nabla f (\hat{\theta})^T Cov(\hat{\theta}) \nabla f (\hat{\theta})}$
Dimensions: (1) = (1 x k) (k x k) (k x 1)
Recall that $\nabla f (\hat{\theta})$ is the kx1 column vector of partial derivatives
When the elements of $\hat{\theta}$ are uncorrelated, then $Cov(\hat{\theta}) = I (\sigma_1^2, …, \sigma_k^2)$, ie just the identity matrix with the main diagonals as the sd
and
$\hat{SE}[f(\hat{\theta})] = \sqrt{\sum_i \frac{df}{d\hat{\theta}}(\hat{\theta})^2 \times \sigma_i^2}$ [Uncorrelated case]
Uncorrelated example: Rev from multiple inputs added together
Challenge: Correlated RVs work but you need to know the correlation
Sum of isoelastic functions
Normal assumptions?
Solve nonlinear combinations of variance by linearizing them with the taylor series
Jackknife/bootstrap
Transform and OLS
Fisher info/hessian
Delta method relies on more assumptions but is much faster than the bootstrap
But a closed form is easiest to work with, both for speed and intuition
$\hat{\Delta} = \hat{\mu_2} - \hat{\mu_1}$
$\hat{SE}(\hat{\theta}) = \sqrt{1 \times \hat{SE}(\hat{\mu_2})^2 + (-1) \times \hat{SE}(\hat{\mu_1})^2} = \sqrt{\hat{SE}(\hat{\mu_2})^2 - \hat{SE}(\hat{\mu_1})^2}$
$\hat{l} = log(\hat{\mu}); \frac{d \hat{l}}{d \hat{\mu}} = \frac{1}{\hat{\mu}}$
$\hat{\delta} = \alpha + \hat{\mu}$
$\hat{SE}(\hat{\delta}) = \hat{SE}(\hat{\mu})$
$\hat{\delta} = \alpha \times \hat{\mu}$
$\hat{SE}(\hat{\delta}) = \alpha \times \hat{SE}(\hat{\mu})$
$MV Normal(\mu, Cov)$
$\mu = (\mathbb{E}[\mu_1], …, \mathbb{E}[\mu_k]$)
$Cov = \frac{1}{n} \Sigma$, where $\Sigma$ is the covariance matrix of $X_1, …, X_k$
I need the standard error of something and it’s not a “standard” statistic
so you need the variance of the sampling distribution of some statistic
Plan
✅ Understand the theory - what is the relevant theorem? where does it originate?
From wikipedia - https://en.wikipedia.org/wiki/Variance#Arbitrary_functions
If X is a random variable, then:
$Var[f(X)] \approx (f’(\mathbb{E}[X]))^2 Var[X]$
(use this to derive a confidence interval and a hypothesis test by applying it to the sampling distribution)
multivariate version - https://en.wikipedia.org/wiki/Delta_method#Multivariate_delta_method
Double check but I think this should work for X is multivariate, which becomes some scalar. like the ratio
$Var[f(X)] \approx\nabla f(X)^T Cov(X) \nabla f (X)$
$(1 \times k) (k \times k) (k \times 1)$
since it’s a pretty elementary application of the taylor series, it was known sometime in the early 20th century
Output
Sources