My take on Bayesian and frequentist reasoning
Probability is confusing. It didn’t really make much sense to me (why are random variables neither random nor variables?) until I took my first course in measure theory, which really helped to put things into perspective.
Statistics is even more confusing, adding real-world aspects onto the back of theoretical probability. I hoped that there would be some sort of deeper picture here that would straighten things out for me, in much the same way that measure theory did for probability.
It turns out that there are two. And they don’t like each other very much.
Here is my attempt to decode frequentist and Bayesian mechanics:
Suppose you have a coin on a table in front of you. I tell you that the coin might be biased, and then ask you to tell me what the probability of heads is. You have never seen this coin before, but since it just sitting there, you might decide to flip the coin a few times. At this point, it seems reasonable to guess that the probability of heads is just the proportion of times that heads turned up.
This is a very simple example of frequentist reasoning – the idea that we can get information about a statistical process by sampling it a whole bunch of times (we make decisions based on the frequency of observed events). This is not justified in every case, but it is in fact accurate quite a lot of the time (including any case when we have independent trials, like above), justified by quite a lot of complex background math. If you were to go through all the calculations, you would find as the number of samples you take increases, the sample probability of heads will converge almost surely to the actual probability of heads.
Sadly, it is not possible to flip the coin infinitely many times. Tweaking the above approach to give useful answers with only a finite (but usually large) number of coin flips is largely case-specific, and is the job of the frequentist statistician.
We have seen how a frequentist might handle the scenario above. Now we will put a Bayesian in the same scenario and see what they might do.
The crux of Bayesian methods is what is called a “Bayesian prior” – a set of existing beliefs that we lay out before performing statistical trials. For example, maybe we believe strongly that our coin is fair, and our degree of belief in our coin having probability of heads p decreases as p strays from 1/2. We then sample – this could be a single coin flip or several – and update our beliefs based on what we observe.
Suppose that we do a bunch of coin flips and get a long sequence of heads. We would then update our beliefs to reflect this – we believe more strongly that p is larger than half, and the larger it is, the more strongly we believe. We still don’t have a direct answer to the question we were asked, but we do have a representation for our degree of belief that the probability of heads is p, for each value of p. Based on this there are a few sensible answers we could give – pick the value that we believe most in, take an average weighted by degree of belief, etc.
As you might have noticed, neither of these descriptions are complete – how does the frequentist approximate an infinite sequence of trials with only finitely many, how does the Bayesian choose their prior – but I think this gives a good idea of the philosophy behind each approach.