Casual Inference Data analysis and other apocrypha

How not to run out of gin: Robust decision-making under uncertainty using simulations

Motivating example: spaghetti plots

how would you plan for a complex situation like an unfolding hurricane? it’s hard because there are many unknowns, and we are not just worried about one or two facts (where will it make landfall), we are worrried about many facts - trajectories of facts

spaghetti plots visualize many trajectories and helps us make decisions

what decision (or decision rule) is going to be the most effective given what we know?

in an ideal world, we’d know the exact trajectories under each decision rule. then we’d use these counterfactuals to generate a score for each choice

but we’re not clairvoyant; we only know (or can use ML to get, or can guess about) probabilities that describe what will happen at each step

simulation structure and python code

a decision-observation sequence is an ordered set of decision steps (collect user input) and observation steps (update variable values based on model assumptions). plus an END, which computes the score given the state and decisions

observe node: can come from forecasts, predicitons, conf ints from causal inference or experiments

list of functions which map state, decisions -> state, decisions. observe nodes only change state, decison nodes only change decisions

def simulate(starting_state, starting decisions, steps, score_fxn):
"""
add docstring here
"""
  state = starting_state
  decisions = starting_decisions
  for current_step in steps:
    state_decisions = current_step(state, decisions)
  score = score_fxn(state, decisions)
  return score, state, decisions

a simple example: umbrella

the gin example

simulation retrospective: did it match reality?

compare sampled states to actual

appendix: what if we wanted to add branching?

move from an ordered list to a directed graph

add a BRANCH step type, which changes the next node

Original draft ——————————————————————————-

The reason we look at data is so we can make better decisions. However, even after consulting the data, there is still usually some uncertainty what will happen. We want to make choices that will be effect in not just the average situation, but whatever range of outcomes are plausible. In this post, we’ll cover some basic Decision Theory and use it to show how simulations can help us plan for situations when uncertainty is present.

Note from the Editors: The Editorial team here at Casual Inference: Data Analysis and Other Apocrypha™ remind you to be responsible when drinking and calculating expected values. Alcohol may impair your ability to operate a Monte Carlo Simulation or pronounce the word “heteroskedasticity”. The Editorial Board does not endorse any of the views, recipes, or probability calculations present in this article, which are solely those of the author.

Start by describing the situation

Most of the time when we make a decision, the outcome of our decision isn’t guaranteed. Life is complicated, and our knowledge of what will happen is imperfect. I deal with a prototypical example of this every time I step out to catch the train to work - should I bring an umbrella with me? It’s annoying to carry one if it isn’t necessary, but it’s much more annoying to get caught on the rain as I scramble off the J train to try and make my first meeting of the morning (I really need to start leaving earlier). What should I do? An exhaustive analysis proves frustrating: if I bring the umbrella, I might not need it , but if I don’t bring it I might also get caught in the rain. So are both choices bad ones?

It seems unlikely that both choices are equally bad. After all, on most days I can usually make it past the umbrella rack without having a nervous breakdown, so there must be some more information for us to use. There are two important things to note about this choice:

Let’s try and organize the situation so we can analyze the outcomes of the choices we have. We’re going to use the framework laid out in Leonard J. Savage’s The Foundations of Statistics. In this framework, we model the decision as:

I’m going to replace the unwieldy “state of the world” with “scenario” in this writeup, just since I’ve found that easier to get across to people. Plus, then you can tell your MBA-wielding VP that you did a scenario analysis, which I think you’ll agree sounds very business-y.

It’s often handy to write out the “consequences” part of the model as a scenario $times$ action matrix:

  Take umbrella Don’t take
No Rain Mild inconvenience 🫤 Status Quo
Rain Status Quo Caught in downpour 😢

Eagle-eyed readers will recognize this as a looking a lot like the confusion matrix, which we’ve applied to optimal decision-making before. In that case, we realized that if we know the probability distribution over the rows of the matrix, we can compute the expected value of each action. If we were to assign each of the consequences a score to create a score matrix, then we could just calculate

\[\mathbb{E}[Score | Action] = \sum_i Score[Scenario \ i, Action] \times \mathbb{P}(Scenario \ i)\]

And pick the action with the larger expected value. This is already pretty useful! However, sometimes we want more visibility into what kind of outcome we get. Perhaps the outcome space is difficult to summarize with a single number, for example.

A real life analysis: How much raw material do we need to satisfy demand?

Most problems are more complex than this, and it is often easier to simulate than to solve. really then this is a monte carlo method

some realistic aspects: estimatimg inputs required for demand; lots of uncertainty; complicated process and multi-dimensional;

defining the SAC

running the simulations

picking one that doesn’t fail

estimating the level of waste for our chosen decision

A practical checklist for building decision simulations

Where do the scenario parameters come from?