Casual Inference Data analysis and other apocrypha

Would collecting more data improve my model's predictions? The learning curve and the value of incremental samples

Since we usually need to pay for data (either with money to buy it or effort to collect it), it’s worth knowing the value of getting more data points to fit your predictive model. We’ll explore the learning curve, a model-agnostic way of understanding how performance changes as we add more data points to our sample. Analysis of the learning curv... Read more

Understanding the difference between prediction and confidence intervals for linear models in Python

The difference between prediction and confidence intervals is often confusing to newcomers, as the distinction between them is often described in statistics jargon that’s hard to follow intuitively. This is unfortunate, because they are useful concepts, and worth exploring for practitioners, even those who don’t much care for statistics jargon. ... Read more

When do we log transform the response variable? Model assumptions, multiplicative combinations and log-linear models

Sometimes, analysts will perform a log transformation of the outcome variable to “make the residuals look normal”. In some cases this is just papering over other issues, but sometimes this kind of transformation genuinely improves the inference or produces a better fitting model. In what cases does this happen? Why does the log transformation wo... Read more

The False-Discovery Rate: An alternative to the FWER for multiple comparisons

We’ve previously explored one common method for dealing with testing multiple simultaneous hypotheses, methods that control the Family-wise error rate. However, we realized that the FWER can be quite conservative. The False-Discovery rate is a powerful alternative to the FWER, which is often used in cases where hundreds or thousands of simultane... Read more

Getting started with multiple hypotheses - Methods that control the FWER and their costs

Multiple hypothesis testing shows up frequently in practice, but needs to be handled thoughtfully. Common methods like the Bonferroni correction are useful quick fixes, but they have important drawbacks that users should understand. *What do methods like Bonferroni and Bonferroni-Holm do, and why do they work? What alternatives are there? Banne... Read more