Casual Inference Data analysis and other apocrypha

Don't stop as soon as you hit stat sig! How to safely stop an experiment early with alpha spending in Python

Where it begins: The (understandable) urge to stop early Let me tell you a story - perhaps a familiar one. Product Manager: Hey {data_analyst}, I looked at your dashboard! We only kicked off {AB_test_name} a few days ago, but the results look amazing! It looks like the result is already statistically significant, even though we were going t... Read more

Building your own sklearn transformer is easy and very useful

Scikit-learn pipelines let you snap together transformations like Legos to make a Machine Learning model. The transformers included in the box with Sklearn are handy for anyone doing ML in Python, and practicing data scientists use them all the time. Even better, it’s very easy to build your own transformer, and doing so unlocks a zillion opport... Read more

Elasticity and log-log models for practicing data scientists

Models of elasticity and log-log relationships seem to show up over and over in my work. Since I have only a fuzzy, gin-soaked memory of Econ 101, I always have to remind myself of the properties of these models. The commonly used $y = \alpha x ^\beta$ version of this model ends up being pretty easy to interpret, and has wide applicabilty acro... Read more

Is my regression model good enough to make decisions? Evaluating actual vs predicted plots and relative error of regression models

We use predictive models as our advisors, helping us make better decisions using their output. A reasonable question, then, is “is my model accurate enough to be useful”? An already-present part of the process for most ML practitioners is Cross Validation, that beloved Swiss Army Knife of model validation. Anyone doing their due diligence when t... Read more

Flexible prediction intervals: Quantile Regression in Python

Most useful forecasts include a range of likely outcomes It’s generally good to try and guess what the future will look like, so we can plan accordingly. How much will our new inventory cost? How many users will show up tomorrow? How much raw material will I need to buy? The first instinct we have is usual to look at historical averages; we kno... Read more