When sample sizes are really big (millions of units), it’s useful to analyze A/B tests from summaries on the data instead of the raw data. At that size, even $O(n)$ operations start to hurt. luckily we can …
Examples: Japanese cat island marketing campaign
large sample, and also cat island has no wifi so you only have the summaries of a/b test results in North Cat Island and South Cat Island
There are two useful things to know about performing A/B test from summary statistics:
the difference
$\Delta = m_T - m_C$
$SE(\Delta) = \sqrt{\frac{SE_T}{\sqrt{n_T}} + \frac{SE_C}{\sqrt{n_C}}}$
def treatment_effect_with_se(m_t, v_t, n_t, m_c, v_c, n_c):
effect = m_t - m_c
se_t = np.sqrt(v_t) / np.sqrt(n_t)
se_c = np.sqrt(v_c) / np.sqrt(n_c)
se_effect = np.sqrt(se_t**2 + se_c**2)
return effect, se_effect
the lift
$\delta = \frac{m_T}{m_C}$
$SE(\delta) = \frac{m_T^2}{m_C^2} * (\frac{SE_T^2}{m_T^2} + \frac{SE_C^2}{m_C^2})$
def lift_with_se(m_t, v_t, n_t, m_c, v_c, n_c):
lift = m_t / m_c - 1
se_t = np.sqrt(v_t) / np.sqrt(n_t)
se_c = np.sqrt(v_c) / np.sqrt(n_c)
lift_se = np.sqrt((m_t**2 / m_c**2) * ((se_t**2 / m_t**2) + (se_c**2 / m_c**2)))
return lift, lift_se
north_te, north_te_se = treatment_effect_with_se(north_treated_mean,
north_treated_var,
north_treated_n,
north_control_mean,
north_control_var,
north_control_n)
print('North treatment effect was: ', north_te, ' +- ', 1.96*north_te_se)
combining means and combining variances
$m_{combined} = \frac{m_1 \times n_1 + m_2 \times n_2}{n_1 + n_2}$ $\sigma_{combined}^2 = \frac{\sigma_1^2 \times n_1 + \sigma_2^2 \times n_2}{n_1 + n_2}$
def combine(m1, v1, n1, m2, v2, n2):
n_new = n1 + n2
m_new = (m1*n1 + m2*n2) / (n1 + n2)
var_new = (v1*n1 + v2*n2) / (n1 + n2)
return m_new, var_new, n_new
pooled_treated_mean, pooled_treated_var, pooled_treated_n \
= combine(north_treated_mean, north_treated_var, north_treated_n,
south_treated_mean, south_treated_var, south_treated_n)
pooled_control_mean, pooled_control_var, pooled_control_n \
= combine(north_control_mean, north_control_var, north_control_n,
south_control_mean, south_control_var, south_control_n)
pooled_lift, pooled_lift_se = lift_with_se(pooled_treated_mean,
pooled_treated_var,
pooled_treated_n,
pooled_control_mean,
pooled_control_var,
pooled_control_n)
print('Pooled lift was: ', pooled_lift, ' +- ', 1.96*pooled_lift_se)
Note about bessel - https://math.stackexchange.com/questions/2971315/how-do-i-combine-standard-deviations-of-two-groups
Extra note about bayesian shit - https://izbicki.me/blog/gausian-distributions-are-monoids.html
import numpy as np
n = 10000000
v = 1
north_base = 1.1
south_base = 1
north_lift = 1.2
south_lift = 1.25
north_treated = np.abs(np.random.normal(north_base*north_lift, v, size=n))
north_control = np.abs(np.random.normal(north_base, v, size=n))
south_treated = np.abs(np.random.normal(south_base*south_lift, v, size=n))
south_control = np.abs(np.random.normal(south_base, v, size=n))
north_treated_mean, north_treated_var, north_treated_n \
= np.mean(north_treated), np.var(north_treated), len(north_treated)
north_control_mean, north_control_var, north_control_n \
= np.mean(north_control), np.var(north_control), len(north_control)
south_treated_mean, south_treated_var, south_treated_n \
= np.mean(south_treated), np.var(south_treated), len(south_treated)
south_control_mean, south_control_var, south_control_n \
= np.mean(south_control), np.var(south_control), len(south_control)