Reader's note: This post is an experiment in low-effort writing for my own enjoyment. Everything below is naught but extemporisation, and should not be taken too seriously.
All that is true or beautiful belongs to the world of ideas, and all that is mistaken or ugly to me.

Agh, how I love the preface paradox.

Solving the problem of induction

Try to predict the next values in each of these sequences:

Solutions always 1: \[ 1111111111\color{green}1 \] alternate! \[ 0101010101\color{green}0 \] whether the \(n\)th digit in the decimal expansion of \(\pi\) is even: \[ \begin{aligned} &0100011000\color{green}1 \\ 3.&1415926535\color{green}8... \end{aligned} \] The output of [random.random() > 0.5 for i in range(11)]. \[ 0111111000\color{green}0 \] The first eleven bits of the SHA-256 hash of the string “exegesis”. \[ 1111011001\color{green}0 \]

What's the best way to solve this problem, in general? Can it be solved at all?

From one of humanity's most wonderful online resources:

We generally think that the observations we make are able to justify some expectations or predictions about observations we have not yet made, as well as general claims that go beyond the observed. For example, the observation that bread of a certain appearance has thus far been nourishing seems to justify the expectation that the next similar piece of bread I eat will also be nourishing, as well as the claim that bread of this sort is generally nourishing. Such inferences from the observed to the unobserved, or to general laws, are known as “inductive inferences”.

The Problem of Induction - Stanford Encyclopedia of Philosophy

Grue

The philosopher Nelson Goodman illustrated a problem with induction using the following example:

Our evidence statements assert that emerald a is green, that emerald b is green, and so on; and each confirms the general hypothesis that all emeralds are green. So far, so good.

Now let me introduce another predicate less familiar than “green”. It is the predicate “grue” and it applies to all things examined before [the time] t just in case they are green but to other things just in case they are blue. Then at time t we have, for each evidence statement asserting that a given emerald is green, a parallel evidence statement asserting that that emerald is grue. And the statements that emerald a is grue, that emerald b is grue, and so on, will each confirm the general hypothesis that all emeralds are grue. Thus according to our definition, the prediction that all emeralds subsequently examined will be green and the prediction that all emeralds subsequently examined will be grue are alike confirmed by evidence statements describing the same observations.

New Riddle of Induction - Nelson Goodman

Goodman constructs a deviant predicate, “grue”, that behaves like “green” up to a certain time, and then switches to “blue” afterwards. Both hypotheses - that all emeralds are green, and that all emeralds are grue - are equally supported by the same evidence.

You can see this if you imagine the Bayes factor for the observation of a green emerald at time . Both hypotheses predict the observation with probability 1, and so if you were previously uncertain between them, you remain so afterwards.

As “grue” is parameterised by a time , many of our different ‘grue’s are constantly fizzing away as we observe more emeralds - but we will never be completely rid of them, as for any time in the future we can always construct a new ‘grue’ that switches at that time.

The astute reader will have some uncomfortable feelings about this - there are an infinite number of such deviant predicates we could construct, placed against the single intuitive predicate “green” - are we not swamped by silly, fizzy predicates? How can we justifiably be so dogmatic as to pick out this single way of describing emeralds?

The simplicity prior

A rebuttal to this particular riddle is provided by the simplicity prior, which captures the intuition that there is something wrong with the idea of “grue” that arises from its arbitrariness, or its complexity. The simplicity prior assigns probabilities to hypotheses (like “all emeralds are green” or “all emeralds are grue”) based on their Kolmogorov complexity, or more properly, the inverse exponential of their complexity.

In order to produce a ‘grue’, one must select a time at which to switch from green to blue - this is an extra piece of information that is not required to define “green”.

In order to produce even the curried form “grue at time ”, one must specify the switching procedure itself, which is more complex than simply “green”.

Thus, the single hypothesis ‘all emeralds are green’ takes up a “larger volume” of the prior probability space than every single ‘grue’ combined, and even though both hypotheses are equally supported by the evidence, the posterior probability of the ‘all emeralds are green’ hypothesis is much higher than that of the ‘all emeralds are grue’ hypothesis.

Conclusion

Many things can be ameliorated by the simplicity prior - it is a formalisation of Occam's razor, and captures the intuition that simpler explanations are more likely to be true.

A particular idiosyncraticIn the sense that I am not a linguist, and should not be walking around having opinions on linguistics. This may be a totally normal view in linguistic circles! view of mine is that it can be applied to concerns in linguistics about the poverty of the stimulus - that children are able to learn language from shockingly little data.

There are arguments against simplicity priors - perhaps we should want weakness instead? Perhaps the simplicity prior is actually out to get us?

I'm not sure. It seems to do pretty well by my lights.