Lately I’ve been troubled by how little I actually knew about how Bayesian inference really worked. I could explain to you many other machine learning techniques, but with Bayesian modelling… well, there’s a model (which is basically the likelihood, I think?), and then there’s a prior, and then, um…

What actually happens when you run a sampler? What makes inference “variational”? And what is this automatic differentiation doing in my variational inference? Cue long sleepless nights, contemplating my own ignorance.

So to celebrate the new year1, I compiled a list of things to read — blog posts, journal papers, books, anything that would help me understand (or at least, appreciate) the math and computation that happens when I press the Magic Inference Button™. Again, this reading list isn’t focused on how to use Bayesian modelling for a specific use case2; it’s focused on how modern computational methods for Bayesian inference work in general.

So without further ado…

Markov-Chain Monte Carlo

For the uninitiated

  1. MCMC Sampling for Dummies by Thomas Wiecki. A basic introduction to MCMC with accompanying Python snippets. The Metropolis sampler is used an introduction to sampling.
  2. Introduction to Markov Chain Monte Carlo by Charles Geyer. The first chapter of the aptly-named Handbook of Markov Chain Monte Carlo.

Hamiltonian Monte Carlo and the No-U-Turn Sampler

  1. Hamiltonian Monte Carlo explained. A visual and intuitive explanation of HMC: great for starters.
  2. A Conceptual Introduction to Hamiltonian Monte Carlo by Michael Betancourt. An excellent paper for a solid conceptual understanding and principled intuition for HMC.
  3. Exercises in Automatic Differentiation using autograd and jax by Colin Carroll. This is the first in a series of blog posts that explain HMC from the very beginning. See also Hamiltonian Monte Carlo from Scratch and Step Size Adaptation in Hamiltonian Monte Carlo.
  4. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo by Matthew Hoffman and Andrew Gelman. The original NUTS paper.
  5. MCMC Using Hamiltonian Dynamics by Radford Neal.
  6. Hamiltonian Monte Carlo in PyMC3 by Colin Carroll.

Sequential Monte Carlo and particle filters

  1. An Introdution to Sequential Monte Carlo Methods by Arnaud Doucet, Nando de Freitas and Neil Gordon. This chapter from the authors’ textbook on SMC provides motivation for using SMC methods, and gives a brief introduction to a basic particle filter.
  2. Sequential Monte Carlo Methods & Particle Filters Resources by Arnaud Doucet. A list of resources on SMC and particle filters: way more than you probably ever need to know about them.

Other sampling methods

  1. Chapter 11 (Sampling Methods) of Pattern Recognition and Machine Learning by Christopher Bishop. Covers rejection, importance, Metropolis-Hastings, Gibbs and slice sampling. Perhaps not as rampantly useful as NUTS, but good to know nevertheless.
  2. The Markov-chain Monte Carlo Interactive Gallery by Chi Feng. A fantastic library of visualizations of various MCMC samplers.

Variational Inference

For the uninitiated

  1. Deriving Expectation-Maximization by Will Wolf. The first blog post in a series that builds from EM all the way to VI. Also check out Deriving Mean-Field Variational Bayes.
  2. Variational Inference: A Review for Statisticians by David Blei, Alp Kucukelbir and Jon McAuliffe. An high-level overview of variational inference: the authors go over one example (performing VI on GMMs) in depth.
  3. Chapter 10 (Approximate Inference) of Pattern Recognition and Machine Learning by Christopher Bishop.

Automatic differentiation variational inference (ADVI)

  1. Automatic Differentiation Variational Inference by Alp Kucukelbir, Dustin Tran et al. The original ADVI paper.
  2. Automatic Variational Inference in Stan by Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman and David Blei.

Open-Source Software for Bayesian Inference

There are many open-source software libraries for Bayesian modelling and inference, and it is instructive to look into the inference methods that they do (or do not!) implement.

  1. Stan
  2. PyMC3
  3. Pyro
  4. Tensorflow Probability
  5. Edward
  6. Greta
  7. Infer.NET
  8. BUGS
  9. JAGS

Further Topics

Bayesian inference doesn’t stop at MCMC and VI: there is bleeding-edge research being done on other methods of inference. While they aren’t ready for real-world use, it is interesting to see what they are.

Approximate Bayesian computation (ABC) and likelihood-free methods

  1. Likelihood-free Monte Carlo by Scott Sisson and Yanan Fan.

Expectation propagation

  1. Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data by Aki Vehtari, Andrew Gelman, et al.

Operator variational inference (OPVI)

  1. Operator Variational Inference by Rajesh Ranganath, Jaan Altosaar, Dustin Tran and David Blei. The original OPVI paper.

(I’ve tried to include as many relevant and helpful resources as I could find, but if you feel like I’ve missed something, drop me a line!)