The predictive processing account aspires to explain all of cognition using a single, unifying principle. Among the major challenges is to explain how brains are able to infer the structure of their generative models. Recent attempts to further this goal build on existing ideas and techniques from engineering fields, like Bayesian statistics and machine learning. While apparently promising, these approaches make specious assumptions that effectively confuse structure learning with Bayesian parameter estimation in a fixed state space. We illustrate how this leads to a set of theoretical problems for the predictive processing account. These problems highlight a need for developing new formalisms specifically tailored to the theoretical aims of scientific explanation. We lay the groundwork for a possible way forward.

Danaja Rutar and Erwin de Wolff contributed equally to this work.

The predictive processing account is a key theoretical player in present-day cognitive neuroscience. The account postulates that our brains make sense of the world through a cycle of making predictions based on a hierarchical, generative model and updating its internal states to reduce prediction errors (Clark,

The account’s explanatory and modelling successes—spanning domains like perception (Den Ouden et al.,

In this theoretical paper, we evaluate the commitments of predictive processing models of (Bayesian) structure learning. We observe that these models make the contentious assumption that the generative model’s state space is

The remainder of this paper is organized as follows. We first introduce formal concepts and notation from predictive processing and explain a predictive processing view of structure learning called

Predictive processing proposes that human (and other) brains make predictions about the world based on an internal, hierarchical, generative model. This generative model is taken to represent a person’s beliefs about concepts, relations, and transitions in the world (Clark,

A hierarchical generative model. Prediction layers at level _{now}. Adapted with permission from Kwisthout et al. (

According to predictive processing, predictions form the basis of our interaction with the world. Due to the complexity and random nature of the world we live in, predictions made by a generative model are bound to have some degree of uncertainty. This is called

In the predictive processing literature,

In model expansion, generative models have two distinct types of hypotheses: _{1}, ...,_{n}} (also referred to as “spare slots”) and _{1}, ...,_{m}}. The full hypothesis space is the union of the two types of hypotheses, i.e., _{t+ 1}(_{t}(

If the best hypothesis

We now describe two problems that emerge from a fixed-state space model of structure learning (such as model expansion). We define the problems formally, then provide a real-world example where the problem can occur. We prove that under a plausible assumption (that there are more categories in the world than hypotheses in a fixed state space) either of these two problems will inevitably occur (Fig.

A visual representation of the two problems discussed in this paper. Hypotheses are indicated with a letter _{2} and _{3} are both represented by _{2}. This is an example of _{3} is not represented by any hypothesis. This is an example of

To explain the problems, we need to introduce some definitions. We define the world as a set of categories

We define what it means for a category

The first problem that we consider is category conflation. Category conflation is the cognitive phenomenon where an agent represents two distinct categories _{1} and _{2} as instances of

Given the definition in Eq.

The second problem that we consider is cognitive blindness. Cognitive blindness is the cognitive phenomenon where a specific category

We will now prove that a generative model with a fixed state space will necessarily either conflate categories or be blind to categories if there are more categories in the world than the agent can represent in such a model. The assumption may seem bold, but we should consider the number of categories that we need to learn during our lifetime. All the animals, plants, tools, words, songs, types of food, faces, furniture and so much more need to be represented by a “spare slot.” Furthermore, even if evolution conveniently provided us with the right number of such slots for a particular time, we would be missing a means of representing all the new categories that are constantly being developed and discovered around us (e.g., the notions of a “neutrino,” “blockchain,” “tweet,” and “bitcoin” did not exist prior to 1930, 1998, 2006, and 2009 respectively).

Let

By tautology, we have that:

For step 2, suppose that

From 3, it follows that

Step 4 is equivalent to

For step 2, suppose that

From 1 and 6, it follows by pigeonhole principle that

Step 7 is equivalent to

Combining steps 1, 2, 3, 5, 6, and 8, we conclude that every model must either suffer from category conflation or cognitive blindness, given that there are more categories in the world than there are concepts in the model.

Our proof shows that structure learning on a fixed state space, as proposed by, e.g.,

Arguably, humans may also conflate categories from time to time, and they can at times be cognitively blind to certain features in the world. Be that as it may, unlike the fixed models adopted in the model expansion account of structure learning, humans certainly do not appear to be stuck indefinitely when these states arise. When humans find themselves with conflated categories, or when they come across something unfamiliar to them, they will often readily introduce new concepts to try and explain (i.e., resolve) the problem. For instance, while a child may indeed believe initially that ducks and geese are one and the same species guck, they will learn at some point that the animals are different species. Similarly, an explorer that first sees an ostrich might consider all other bird hypotheses to be insufficiently explanatory, and will introduce a new hypothesis “ostrich” instead. New hypotheses added in these situations are often wrong, perhaps most of the time so, but regardless of the verity of the new concept, the cognitive problem is actively being solved. In fact, humans continually hone their skills in structure learning through tools acquired through formal education, scientific training and explicit feedback from other people. For examples of successful structure learning, see the section below. So, if the predictive processing account aims to explain how the brain

In the previous section, we have argued that current attempts to model structure learning in predictive processing run into problems due to their reliance on a fixed state space. Therefore, predictive processing models will need to incorporate operations that allow for changing the ^{′} in a way that can not be explained by Bayesian (parameter) updating alone. Table

Options for structural changes to generative models formalized as Bayesian networks. Note that the possible preconditions mentioned here are mere suggestions. They should not be interpreted as a commitment to a particular stance

Structural changes | Real-life examples | Formalized transformations | Possible preconditions |
---|---|---|---|

Add a new variable | A child learns about the existence | An observation | |

of bacteria. |
| to all variables | |

Remove a variable | A child stops believing in Santa | The prior probability of | |

Claus. |
| is below some existence- | |

threshold | |||

Merge two variables | Physicists discover that |
| |

“temperature” and “energy” are the same | where | same relations with the | |

thing. | other variables in | ||

Split a variable | Psychologists realize that “sex” and | Distinct groups of values of | |

“gender” are distinct. | where | relate sufficiently differently | |

to other variables in | |||

Add a new causal connection | Students learn that thunderstrokes | A stable, sizeable improvement | |

are the result of unstable air mass. |
| of the likelihood of observations | |

given | |||

Remove a causal connection | A person concludes that the | The difference in the likelihood | |

amount of water has no effect on |
| of observations given | |

the colour of aloe vera. |
| ||

the complexity of the model | |||

compared to | |||

Add a new value for a variable | A headmaster learns that the “fist | An observation | |

bump” is a new greeting. | { | classified as an instance of | |

is better explained by a new | |||

value for | |||

Remove a value of a variable | An incorrectly learned word is | Some decay of the weights of | |

removed after a period of non-use. | { | ||

| |||

Merge two values of a variable | Two different sounds are concluded to be the same phoneme ‘r’. | Minimal difference between the two likelihood functions of the values | |

Split a value of a variable | A culture is introduced to the | Replace | |

blue/green distinction. | such that the merging of | exceeds some maximum variance. | |

with |

The formal description takes the verbal description and translates it into a mathematical formalisation of the same change. To allow for these formal translations, we define the generative models postulated by the predictive processing account as _{1}, ...,_{n}}, and a single value as

Example of a Bayesian network. The bubbles represent variables, and the arrows denote (causal) relations between those variables. Here, the variable “Diet” influences “Restaurant”: what kind of diet you are on limits the choice of restaurants you can go to. Both “Diet” and “Exercise” influence “Physical Health”: A healthy diet and more exercise lead to better physical health

Lastly, the table lists a conceivable precondition for each simple change to occur (or for it to be justified to occur). These preconditions serve as an inspiration and illustration, and should not be interpreted as any definitive answer. The reason why we only provide an example is that for most of these changes these preconditions have not been both formally defined and empirically investigated. What the real preconditions are is therefore an open question. This paper does not aim to give answers to these questions, because each question would demand a research project to address, if not a body of research. This then is the challenge: In order for predictive processing to account for structure learning proper, it must investigate the formal and empirical underpinnings of the preconditions under which our proposed changes occur in human structure learning. We stress that the ultimate goal here is not to find statistical or engineering solutions, but to formulate an explanatory scientific account of structure learning proper at Marr’s (1982) computational level.

Before we close, we reflect on the relationship between the challenge that we pose and an existing approach in computational cognitive science that takes inspiration from Bayesian non-parametrics. Readers familiar with that approach may believe that that method already addresses the challenge that we pose. Here, we briefly explain how and why it does not.

Bayesian non-parametrics is a method firmly rooted in Bayesian statistics that has found appreciation in modelling cognition (Griffiths et al.,

The first limitation of Bayesian non-parametrics is that it only captures a proper subset of the possible structural changes listed in Table

The second, and arguably more important limitation of Bayesian non-parametrics is that it commits to a very particular stance of when and why values are changed in a generative model. This view, if considered at Marr’s computational level (Marr,

To summarize, while we appreciate the perspective that Bayesian non-parametrics offers on a relevant subset of the changes that make up structure learning proper, we conclude that there are still important changes that are unaccounted for. Furthermore, we argue that there are other plausible preconditions possible that are not captured by Bayesian non-parametrics. Our challenge to predictive processing theorists, thus, remains to investigate which of these preconditions best explains human structure learning.

Learning in predictive processing has mostly been conceptualized as

We showed that generative models with a fixed state space will inevitably lead to at least one of the two conceptual problems. These problems are

We presented an exhaustive set of minimal changes that can be made to a Bayesian generative model, defining structure learning

We take the position that current theories of structure learning should integrate the proposed proper structural changes into their explanatory toolbox. The importance of structural changes, like the ones we suggest, cannot be understated as a means to explain the richness of human learning. If predictive processing wishes to live up to its ambition to explain all of learning, it needs to make the transition from its current formalisation of “effective” structure learning to structure learning proper.

DR was supported by a Donders Centre for Cognition grant awarded to JK, EdW was supported by a Donders Centre for Cognition grant awarded to JK and IvR. IvR acknowledges the support of a Distinguished Lorentz fellowship by the Netherlands Institute for Advanced Studies in the Humanities and Social Sciences (NIAS-KNAW) and the Lorentz Center.

At least, we do not think its challenges are that more insurmountable than, or even fundamentally different from, those faced by other approaches.

One may argue that most cognitive scientists do believe that human generative models are flexible, adapting their structural properties across time and circumstance. Whilst some predictive processing theorists may agree, their generative models do not embody this view, and instead make the contentious assumption that we noted (Smith et al.,

Our definition of structure learning differs from the more data-scientific perspective on structure learning, which focuses more on quantifying relations within data through algorithms. See, for example, Madsen et al. (

This dual updating is not a default choice. Sometimes, only the likelihood functions are updated (Smith et al.,

The Hidden Markov Models (HMM) commonly used in the predictive processing literature are a special case of Bayesian networks. For arguments for adopting the more general formalism see Kwisthout et al. (

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.