So, I’ve been sciencing data for the last ten years or so. And one of the most common questions I get from those starting out in Data Science is something like “what books should I be reading?”. It’s hard to answer this question directly, since the answer really depends on what you’re looking to learn, and what else you know. I could just list every book I’ve read, but as the old pretentious people of the past might say ars longa, vita brevis - which is Latin for “I don’t have time to read all that, dude”. Instead, I’ve done the next best thing, which is to try and list some of my favorite books which cover most of the fields I think about regularly. I’ve divided it into a few sections:
Keep in mind that this list is not-exhaustive and highly subjective. It’s a guided tour of some resources I’ve found helpful, and perhaps you will too. It focuses more on things I care about (statistics, causal inference) and less on things which I have only a little experience in (Deep Learning, for example).
This book is a map of the most important definitions and theorems in Statistics. It basically starts from zero and builds up to rigorous definitions of plenty of recognizable statistical methods. The first ten chapters or so are a good thorough introduction for an undergrad (or perhaps someone who has forgotten all the proofs from undergrad).
The same author has also written All of Nonparametric Statistics, though I haven’t
Cosma Shalizi
TALR
CI the mixtape, The Effect, CIFTBAT
Pearl’s Little Book
Morgan and Winship
Gelman and Hill regression,RAOS, Shalizi TALR
blogs?
shalizi
gelman
rina artstain
Tyranny of metrics
Superforecasting
Seeing like a state
Nate Silver (I know, sorry)