We use predictive models as our advisors, helping us make better decisions using their output. A reasonable question, then, is “is my model accurate enough to be useful”? An already-present part of the process for most ML practitioners is Cross Validation, that beloved Swiss Army Knife of model validation. Anyone doing their due diligence when t... Read more 24 Sep 2023 - 8 minute read
Most useful forecasts include a range of likely outcomes It’s generally good to try and guess what the future will look like, so we can plan accordingly. How much will our new inventory cost? How many users will show up tomorrow? How much raw material will I need to buy? The first instinct we have is usual to look at historical averages; we kno... Read more 28 Apr 2023 - 9 minute read
We’re familiar with A/B tests that tell us how our metric (usually an average of some kind) changed due to the treatment. But if we want to get a better than average insight into the treatment effect, we should look beyond the mean. This post demonstrates why and how we might look at the way the quantiles of the distribution changed as a result ... Read more 11 Jun 2022 - 14 minute read
My job seems to involve just enough calculus that I can’t afford to forget it, but little enough that I always feel rusty when I need to do it. In those cases, I’m thankful to be able to check my work and make it reproducible with Sympy, a symbolic mathematics library in Python. Here are two examples of recent places I’ve used Sympy to do calcul... Read more 04 Aug 2021 - 7 minute read
Plenty of problems confronted by practicing data scientists have a time series component. Luckily, building time series models for forecasting and description is easy in statsmodels. We’ll walk through a forecasting problem using an autoregressive model with covariates (AR-X) model in Python. Time series data is everywhere For practicing data ... Read more 21 Jul 2021 - 10 minute read