My attempt to reconcile Bayesian reasoning and critical rationalism.
In short: Bayesian inference is not a form of induction. If we have a definite set of specified alternatives, and if we can compute the probability of possible observations under each alternative, the subsequent modification of the probability distribution is logically entailed by our premises — which makes it a deductive inference. The logic of science does not consist of picking the most probable explanation from a set of alternatives known in advance : it consists of creating new ones. Furthermore, the probabilities computed by a model, no matter its predictive success, have nothing to do with the probability of the model itself being true.
All criticism welcome.
Why Bayesianism fails
“If anything is to be probable, then something must be certain.” — Lewis Carroll
Ferdinand Hodler, Le Lac Léman et le Mont Blanc au lever du soleil
Can evidence support an hypothesis?
Empirical support lies at the core of our idea of rationality. We ask for “evidence-based policies”; we admonish each other to “back up claims with data”; we reject statements that are not “supported by the facts”. And yet, as a matter of logic, the idea of empirical support is surprisingly difficult to pin down.
This is the problem of induction, most famously stated by David Hume, who believed it to be insoluble. Our knowledge of the world consists of theories that explain what we see in terms of things that we don’t see. How can we infer general theories from limited observations? We can’t deduce them from the evidence, since their very nature is to go beyond the evidence. Can a theory be confirmed by the evidence compatible with it, or made more probable? Does evidence allow us to feel more confident in our beliefs? If so, by which kind of logic?
These questions are central to a profound debate in philosophy of science. Steven Pinker made a passing reference to this debate in Enlightenment Now:
Our beliefs about empirical propositions should be calibrated by their fit to the world. When scientists are pressed to explain how they do this, they usually reach for Karl Popper’s model of conjecture and refutation, in which a scientific theory may be falsified by empirical tests but is never confirmed. In reality, science doesn’t much look like skeet shooting, with a succession of hypotheses launched into the air like clay pigeons and shot to smithereens. It looks more like Bayesian reasoning: a theory is granted a prior degree of credence, based on its consistency with everything else we know. That level of credence is then incremented or decremented according to how likely an empirical observation would be if the theory is true, compared with how likely it would be if the theory is false.
The first answer is arguably the most famous. According to Popper, evidence can never, in any way, support or justify a theory, or make it more probable. He believed that David Hume’s statement of the problem of induction was “a gem of priceless value for the theory of objective knowledge: a simple, straightforward, logical refutation of any claim that induction could be a valid argument, or a justifiable way of reasoning”.
Popper’s solution comes from the realization that we do not need induction to create knowledge. The fact that a scientific theory cannot be supported by evidence does not amount to a demonstration that it is false: whether or not a theory is true is independent from whether we can prove it. Science, according to Popper, is based on the logical asymmetry between verification and refutation. No amount of evidence can ever prove that a theory is true: however, if any statement deducible from a theory is false, it proves that the theory is false. We can create knowledge, therefore, by making unsupported and unjustified guesses, and seeing which ones withstand our attempts to refute them.
But Popper’s negative account of empiricism proved difficult to accept. The idea of supporting evidence is a resilient one. In Fashionable Nonsense, Sokal and Bricmont expressed a common criticism of Popper that resurfaced many times in the history of philosophy:
When a theory successfully withstands an attempt at falsification, a scientist will, quite naturally, consider the theory to be partially confirmed and will accord it a greater likelihood or a higher subjective probability. The degree of likelihood depends, of course, upon the circumstances: the quality of the experiment, the unexpectedness of the result, etc. But Popper will have none of this: throughout his life, he was a stubborn opponent of any idea of “confirmation” of a theory, or even of its “probability”. […]
Obviously, every induction is an inference from the observed to the unobserved, and no such inference can be justified using solely deductive logic. But, as we have seen, if this argument were to be taken seriously — if rationality were to consist only of deductive logic — it would imply also that there is no good reason to believe that the Sun will rise tomorrow, and yet no one really expects the Sun not to rise. With his method of falsification, Popper thinks that he has solved Hume’s problem, but his solution, taken literally, is a purely negative one: we can be certain that some theories are false, but never that a theory is true or even probable. Clearly, this “solution” is unsatisfactory from a scientific point of view.
The second approach mentioned by Pinker, Bayesian reasoning, is seen as a possible remedy. According to Bayesianism, probabilities represent degrees of belief in statements, which can then be incremented or decremented according to the evidence. The idea is simple. We start with a set of possible hypotheses, each with a given probability of being true. The probability distribution is supposed to incorporate all the relevant information we already have: if we know nothing else, all possibilities will have equal probability. Then, we look at the evidence, and ask ourselves: how probable was it to observe that evidence, given each possible hypothesis? Using a famous mathematical rule called Bayes theorem, we can then update the probability of each possible hypothesis, given the probability of the evidence. Reasoning in this way is also known as “inverse probability”, because instead of computing the probability of observations according to causes, we assign probabilities to possible causes, according to our observations.
This is often seen as a rigorous, mathematically impeccable formalization of empirical support and rationality itself. Bayesianism was adopted by several popular science authors, including Sean Carroll and Nate Silver, and enthusiastically promoted by the online group of thinkers known as the “Rationalist community”, organized around the writings of Eliezer Yudkowsky and Scott Alexander.
In what could arguably be considered the Bible of Bayesianism, Probability Theory: The Logic of Science, the late E.T. Jaynes had some scathing criticism for Popper and others who have denied the possibility of induction. He refers to them as the “irrationalists” and criticizes Popper in these terms:
In denying the possibility of induction, Popper holds that theories can never attain a high probability. But this presupposes that the theory is being tested against an infinite number of alternatives. […] It is not the absolute status of an hypothesis embedded in the universe of all conceivable theories, but the plausibility of an hypothesis relative to a definite set of specified alternatives, that Bayesian inference determines. […] an hypothesis can attain a very high or very low probability within a class of well-defined alternatives. Its probability within the class of all conceivable theories is neither large nor small; it is simply undefined because the class of all conceivable theories is undefined. In other words, Bayesian inference deals with determinate problems — not the undefined ones of Popper — and we would not have it otherwise.
Popper always rejected the idea of searching for probable theories. On the contrary, because we want theories with high informative content that make specific predictions, he argued that a better theory will always mean a less probable theory. In a paper titled “A proof of the impossibility of inductive probability”, Popper and his collaborator David Miller set out to demonstrate, in a technical fashion, that the part of an hypothesis that is not deductively entailed by the evidence is always strongly counter-supported by it. According to them, “this result is completely devastating for the inductive interpretation of the calculus of probability”.
According to Jaynes, “written for scientists, this is like trying to prove the impossibility of heavier-than-air flight to an assembly of professional airline pilots.”
As an adherent of the Bayesian approach to statistics and probability, and an admirer of Jaynes, my thesis here is that Popper was right. Rationality, including Bayesian reasoning, does indeed consist only of deductive logic. (As David Miller put it, “the use of Bayes theorem does not characterize Bayesianism any more than the use of Pythagoras’ theorem characterizes Pythagoreanism”).
I believe the debate between Bayesians and Popperians comes from a misunderstanding of the word “induction” as used by Bayesians. Bayesian inference is not a form of induction: it is entirely deductive. If we have a “definite set of specified alternatives” with a probability distribution, and if we can use this model to compute the probability of future observations under each of those alternatives, the subsequent modification of the probability distribution is logically entailed by our premises — which makes it a deductive inference. We are not learning anything beyond what we already put into our model and what we subsequently observe: we move smoothly from a prior set of assumptions to a posterior set of conclusions, according to clear mathematical rules.
It seems preposterous to suggest that such an important philosophical debate turns on a misuse of words, but I really believe that’s what’s happening here. We were misled to call Bayesian inference “inductive probability” because it makes it look like evidence can support an hypothesis without deductively entailing it. But in fact, the evidence only supports that hypothesis via a prior set of probabilistic assumptions that are not supported by the evidence.
This how David Miller expresses the problem:
There is nothing at all inductive about Bayesian conditionalization. Statements of probability are not statements about the external world, and how they are amended in light of the new evidence is determined perfectly deductively. […] Discovering an item of evidence that makes an hypothesis more (or less) probable is not a scientific advance; it is simply a move.
More fundamentally, it is easy to see how Bayesianism fails as a philosophy of science. The logic of science does not consist of picking the most probable explanation from a set of preordained alternatives — it consists of creating new ones and putting them to the test. The set of all possible scientific explanations does not obey the probability calculus, simply because they cannot be known in advance. As David Deutsch observed, the negation of a scientific explanation does not constitute an alternative explanation.
Jaynes seems to think it’s ridiculous to talk about the set of all possible scientific explanations, because such a set is not well-defined in terms of probability theory. But this is precisely the point. Anyone concerned with the truth must admit that the answers we are looking for may not already be contained in our existing models. Given a set of alternative hypotheses, the probabilities we assign to them depend upon the validity of that model — which remains mysterious. This is what makes Bayesianism a static philosophy of science. It is not compatible with the growth of knowledge — the creation of new explanations and new models.
Furthermore, the probabilities computed by a model have nothing to do with the probability of the model itself being true. If evidence can deductively change a probability distribution, via a framework of assumptions, in no way can it “support” that framework as a whole. Even if a Bayesian model achieves extraordinary predictive accuracy, that accuracy does not logically imply that the model contains any truth about the world (although you might conjecture that it does to explain why it works so well). There could always be better explanations. In the Popperian view, it’s the model as a whole, with its assumptions about the set of possibilities, that should be seen as conjectural, with its better alternatives waiting to be conjectured into existence. No amount of predictive success can tell you that your model is probably true — except, maybe, in light of another, more general model, subject to the same objection.
The most elegant statement of that argument comes from Jacob Bronowski:
Philosophers who have tried to quantify the weight of new evidence have often said that it increases the probability of the theory. But I have already remarked that Popper insists, and rightly insists, that we cannot assign a probability to a theory: for probabilities have to conform to a calculus which (he holds, and I hold) can only be made to apply consistently to physical events or logical statements about them. I put this by saying that probability requires the events which it subsumes to have a distribution, but a theory and all its possible alternatives do not have a unique distribution. It is true that a theory can contain a parameter whose possible values have a distribution, so that we can assign a probability to the hypothesis that the parameter has one range of values rather than another. But this is not the same thing as calculating a probability for the theory as a whole.
As a final note, I want to give an example of the misuse of probability theory to express epistemological truths. Sean Carroll and Nate Silver both remark that when a Bayesian thinker assigns a probability of 1 or 0 to a given statement, it means that no evidence will ever change their mind. Thus, to reflect the uncertain and revisable nature of scientific knowledge, they somehow imply that there is something irrational about thinking that something has a probability of one or zero. This idea is also known as Cromwell’s rule, after the famous quote from Oliver Cromwell: “I beseech you, in the bowels of Christ, think it possible that you may be mistaken.”
This, to me, is a misconception. If I fill an urn with black marbles, it is not irrational, based on my model, to say that there are 100% chances that the next marble I’ll draw will be black. It’s not an assertion of epistemic or metaphysical certainty, or a form of dogmatism. It’s a straightforward deduction from the information I have about the content of the urn. The model itself is still conjectural. Any result other than a black marble would flatly refute it. What’s irrational is not assigning probabilities of 1 or 0: it is holding on to models that don’t work, perhaps because they wrongly assigned probabilities of 1 and 0.
I beseech you, in the bowels of Christ, to see the difference.
So, can data support an hypothesis? My answer: yes, in a deductive manner, given a well-specified set of all possibilities known in advance, and prior conjectures about what the evidence would look like under each of those possibilities.
The resilience of the idea of empirical support may be due to the fact that, since a rational thinker can only know a finite set of possible alternative explanations, the psychology of belief and our subjective sense of plausibility could reflect in some way the mathematics of Bayesian probability, in the sense described by Sokal and Bricmont. For practical purposes, it’s possible that the idea of evidential support for our beliefs cannot be uprooted from the human mind. However, we should be very clear about what we mean by that. Such a support can only be deductive and mediated by models consisting of unproven and often implicit conjectures.