how would you plan for a complex situation like an unfolding hurricane? it’s hard because there are many unknowns, and we are not just worried about one or two facts (where will it make landfall), we are worrried about many facts - trajectories of facts
spaghetti plots visualize many trajectories and helps us make decisions
what decision (or decision rule) is going to be the most effective given what we know?
in an ideal world, we’d know the exact trajectories under each decision rule. then we’d use these counterfactuals to generate a score for each choice
but we’re not clairvoyant; we only know (or can use ML to get, or can guess about) probabilities that describe what will happen at each step
a decision-observation sequence is an ordered set of decision steps (collect user input) and observation steps (update variable values based on model assumptions). plus an END, which computes the score given the state and decisions
observe node: can come from forecasts, predicitons, conf ints from causal inference or experiments
list of functions which map state, decisions -> state, decisions. observe nodes only change state, decison nodes only change decisions
def simulate(starting_state, starting decisions, steps, score_fxn):
"""
add docstring here
"""
state = starting_state
decisions = starting_decisions
for current_step in steps:
state_decisions = current_step(state, decisions)
score = score_fxn(state, decisions)
return score, state, decisions
umbrella
variablerain
variablecompare sampled states to actual
move from an ordered list to a directed graph
add a BRANCH step type, which changes the next node
The reason we look at data is so we can make better decisions. However, even after consulting the data, there is still usually some uncertainty what will happen. We want to make choices that will be effect in not just the average situation, but whatever range of outcomes are plausible. In this post, we’ll cover some basic Decision Theory and use it to show how simulations can help us plan for situations when uncertainty is present.
Note from the Editors: The Editorial team here at Casual Inference: Data Analysis and Other Apocrypha™
remind you to be responsible when drinking and calculating expected values. Alcohol may impair your ability to operate a Monte Carlo Simulation or pronounce the word “heteroskedasticity”. The Editorial Board does not endorse any of the views, recipes, or probability calculations present in this article, which are solely those of the author.
Most of the time when we make a decision, the outcome of our decision isn’t guaranteed. Life is complicated, and our knowledge of what will happen is imperfect. I deal with a prototypical example of this every time I step out to catch the train to work - should I bring an umbrella with me? It’s annoying to carry one if it isn’t necessary, but it’s much more annoying to get caught on the rain as I scramble off the J train to try and make my first meeting of the morning (I really need to start leaving earlier). What should I do? An exhaustive analysis proves frustrating: if I bring the umbrella, I might not need it , but if I don’t bring it I might also get caught in the rain. So are both choices bad ones?
It seems unlikely that both choices are equally bad. After all, on most days I can usually make it past the umbrella rack without having a nervous breakdown, so there must be some more information for us to use. There are two important things to note about this choice:
Let’s try and organize the situation so we can analyze the outcomes of the choices we have. We’re going to use the framework laid out in Leonard J. Savage’s The Foundations of Statistics. In this framework, we model the decision as:
I’m going to replace the unwieldy “state of the world” with “scenario” in this writeup, just since I’ve found that easier to get across to people. Plus, then you can tell your MBA-wielding VP that you did a scenario analysis, which I think you’ll agree sounds very business-y.
It’s often handy to write out the “consequences” part of the model as a scenario $times$ action matrix:
Take umbrella | Don’t take | |
---|---|---|
No Rain | Mild inconvenience 🫤 | Status Quo |
Rain | Status Quo | Caught in downpour 😢 |
Eagle-eyed readers will recognize this as a looking a lot like the confusion matrix, which we’ve applied to optimal decision-making before. In that case, we realized that if we know the probability distribution over the rows of the matrix, we can compute the expected value of each action. If we were to assign each of the consequences a score to create a score matrix, then we could just calculate
\[\mathbb{E}[Score | Action] = \sum_i Score[Scenario \ i, Action] \times \mathbb{P}(Scenario \ i)\]And pick the action with the larger expected value. This is already pretty useful! However, sometimes we want more visibility into what kind of outcome we get. Perhaps the outcome space is difficult to summarize with a single number, for example.
Most problems are more complex than this, and it is often easier to simulate than to solve. really then this is a monte carlo method
some realistic aspects: estimatimg inputs required for demand; lots of uncertainty; complicated process and multi-dimensional;
defining the SAC
running the simulations
picking one that doesn’t fail
estimating the level of waste for our chosen decision
Where do the scenario parameters come from?