Philosophical aspects of chaos: definitions in mathematics, unpredictability, and the observational equivalence of deterministic and indeterministic descriptions. DISSERTATION Submitted for the degree of Doctor of Philosophy CHARLOTTE SOPHIE WERNDL St John’s College, University of Cambridge Cambridge, September 2009 2This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. No part of this dissertation has been submitted for any other qualification. This dissertation does not exceed the word limit laid down by the Faculty of Philosophy. 3Summary of the dissertation, Charlotte Sophie Werndl. Philosophical aspects of chaos: definitions in mathematics, unpre- dictability, and the observational equivalence of deterministic and indeterministic descriptions. This dissertation is about some of the most important philosophical aspects of chaos research, a famous recent mathematical area of research about de- terministic yet unpredictable and irregular, or even random behaviour. It consists of three parts. First, as a basis for the dissertation, I examine notions of unpredictability in ergodic theory, and I ask what they tell us about the justification and formulation of mathematical definitions. The main account of the actual practice of justifying mathematical definitions is Lakatos’s account on proof- generated definitions. By investigating notions of unpredictability in ergodic theory, I present two previously unidentified but common ways of justifying definitions. Furthermore, I criticise Lakatos’s account as being limited: it does not acknowledge the interrelationships between the different kinds of justification, and it ignores the fact that various kinds of justification—not only proof-generation—are important. Second, unpredictability is a central theme in chaos research, and it is widely claimed that chaotic systems exhibit a kind of unpredictability which is specific to chaos. However, I argue that the existing answers to the ques- tion ‘What is the unpredictability specific to chaos?’ are wrong. I then go on to propose a novel answer, viz. the unpredictability specific to chaos is that for predicting any event all sufficiently past events are approximately probabilistically irrelevant. Third, given that chaotic systems are strongly unpredictable, one is led to ask: are deterministic and indeterministic descriptions observationally equivalent, i.e., do they give the same predictions? I treat this question for measure-theoretic deterministic systems and stochastic processes, both of which are ubiquitous in science. I discuss and formalise the notion of obser- vational equivalence. By proving results in ergodic theory, I first show that for many measure-preserving deterministic descriptions there is an observation- 4ally equivalent indeterministic description, and that for all indeterministic descriptions there is an observationally equivalent deterministic description. I go on to show that strongly chaotic systems are even observationally equiva- lent to some of the most random stochastic processes encountered in science. For instance, strongly chaotic systems give the same predictions at every observation level as Markov processes or semi-Markov processes. All this illustrates that even kinds of deterministic and indeterministic descriptions which, intuitively, seem to give very different predictions are observation- ally equivalent. Finally, I criticise the claims in the previous philosophical literature on observational equivalence. Contents Acknowledgements 8 1 Introduction 9 2 Setting the stage 15 2.1 Deterministic systems . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Justifying definitions in mathematics 35 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Lakatos’s proof-generated definitions . . . . . . . . . . . . . . 37 3.3 Case study: notions of unpredictability in ergodic theory . . . 42 3.4 Kinds of justification of definitions . . . . . . . . . . . . . . . 44 3.4.1 Natural-world justification . . . . . . . . . . . . . . . . 44 3.4.2 Condition justification . . . . . . . . . . . . . . . . . . 54 3.4.3 Redundancy justification . . . . . . . . . . . . . . . . . 60 3.4.4 Occurrence of the kinds of justification . . . . . . . . . 63 3.5 Interrelationships between the kinds of justification . . . . . . 65 3.5.1 One argument . . . . . . . . . . . . . . . . . . . . . . . 65 3.5.2 Different arguments . . . . . . . . . . . . . . . . . . . . 66 3.6 Assessment of Lakatos’s ideas on proof-generated definitions . 67 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4 The unpredictability specific to chaos 71 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.2 Unpredictability . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5 CONTENTS 6 4.3 Chaos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.1 Defining chaos . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.2 Defining chaos via strong mixing . . . . . . . . . . . . 76 4.4 Criticism of answers in the literature . . . . . . . . . . . . . . 83 4.4.1 Asymptotically unpredictable? . . . . . . . . . . . . . . 83 4.4.2 Unpredictable due to rapid or exponential divergence of solutions? . . . . . . . . . . . . . . . . . . . . . . . . 85 4.4.3 Macro-predictable and micro-unpredictable? . . . . . . 86 4.5 A kind of unpredictability specific to chaos . . . . . . . . . . . 88 4.5.1 Approximate probabilistic irrelevance . . . . . . . . . . 88 4.5.2 Sufficiently past events are approximately probabilisti- cally irrelevant for predictions . . . . . . . . . . . . . . 90 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5 Determinism versus indeterminism 95 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Basic observational equivalence . . . . . . . . . . . . . . . . . 96 5.2.1 Deterministic systems simulated by stochastic processes 97 5.2.2 Stochastic processes simulated by deterministic systems 104 5.2.3 A mathematical definition of observational equivalence 106 5.3 Advanced observational equivalence I . . . . . . . . . . . . . . 110 5.3.1 Deterministic systems used in science which simulate stochastic processes used in science . . . . . . . . . . . 110 5.4 Advanced observational equivalence II . . . . . . . . . . . . . 115 5.4.1 The meaning of simulation at every observation level . 116 5.4.2 Stochastic processes used in science which simulate de- terministic systems used in science at every observation level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.5 Previous philosophical discussion . . . . . . . . . . . . . . . . 129 5.5.1 The significance of Theorem 5 and Theorem 10 . . . . 130 5.5.2 The role of chaotic behaviour . . . . . . . . . . . . . . 134 5.5.3 Is the deterministic or the indeterministic description better? . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 CONTENTS 7 5.7 Appendix: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.7.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . 142 5.7.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . 144 5.7.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . 145 5.7.4 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . . 146 5.7.5 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . 148 5.7.6 Proof of Theorem 5 . . . . . . . . . . . . . . . . . . . . 148 5.7.7 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . 150 5.7.8 Proof of Theorem 8 . . . . . . . . . . . . . . . . . . . . 152 5.7.9 Proof of Theorem 9 . . . . . . . . . . . . . . . . . . . . 153 5.7.10 Proof of Theorem 12 . . . . . . . . . . . . . . . . . . . 154 5.7.11 Proof of Theorem 14 . . . . . . . . . . . . . . . . . . . 157 5.7.12 Proof of Theorem 15 . . . . . . . . . . . . . . . . . . . 159 5.7.13 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . 160 5.7.14 Proof of Proposition 4 . . . . . . . . . . . . . . . . . . 161 6 Concluding remarks 162 List of Figures 168 Bibliography 169 Acknowledgements First and foremost, I want to thank my supervisor Jeremy Butterfield. He really has been one of the best supervisors you can ever wish to have. I am indebted for his helpful comments, for his continued support, and, in partic- ular, for getting up at 3am to read my work. I already miss our meetings! I am grateful to my shadow supervisor Peter Smith for fruitful suggestions and for his strategic advice and help. And I want to thank Roman Frigg for stimulating discussions and his encouragement. For valuable comments I also want to thank Robert Bishop, Adam Caulton, Stephan Hartmann, Cymra Haskell, Franz Huber, Brendan Larvor, Hannes Leitgeb, Mary Leng, Thomas Mu¨ller, Donald Ornstein, Amy Radunskaya, Maximilian Thaler, Jos Uffink and Paul Weingartner. I am grateful to St John’s College Cambridge for financial support, which made it possible for me to study at the University of Cambridge. Finally, I want to express my gratitude to my fiance´ Franz for his continued support and for showing me what is really important in life. I also want to thank my sister Kristina and my parents for their understanding and their encouragement. 8 Chapter 1 Introduction This dissertation is about some of the most important philosophical aspects of chaos as understood in the mathematical field of chaos research. A system is deterministic just in case the state of the system at one time determines the state of the system at all times. And, intuitively speaking, a chaotic system is deterministic yet still shows unpredictable and irregular, or even random behaviour. Examples of what is now called ‘chaotic behaviour’ were already discovered at the end of the nineteenth century. However, only from the 1960s onwards, catalysed by the development of electronic computers, chaotic behaviour was systematically investigated. An area of research called ‘chaos research’ developed, and chaotic behaviour was examined in several branches of mathematics and theoretical physics, such as in ergodic theory and topological dynamical systems theory. At the end of the twentieth cen- tury chaos research boomed, and important results continue to be produced. Because systems in Newtonian mechanics and statistical mechanics can show chaotic behaviour, chaos research has led to a renewed interest in these fields. Chaos research is now widely regarded as one of the most important scien- tific achievements of the second half of the twentieth century (cf. Aubin & Dahan-Dalmedico 2002). In the sciences chaotic systems are employed to model many phenomena, from the movement of planets, the motion of billiard balls, the motion of gases, the spinning of waterwheels, turbulence, chemical reactions, weather dynamics, climate dynamics and population dynamics to the dynamics of the 9 CHAPTER 1. INTRODUCTION 10 Figure 1.1: A billiard system with a convex obstacle heartbeat (cf. Chernov & Markarian 2006; He´non 1976; Kola´r˘ & Gumbs 1992; Laskar 1994; Lissauer 1999; Lorenz 1963; Lorenz 1964; May 1976; Ruelle & Takens 1971; Scott 1991; Skinner, Goldberger, Mayer-Kress & Ideker 1997; Sz´asz 2000). In some contexts, such as for waterwheels, chaotic descriptions give relatively accurate predictions. Yet often, such as in population ecology and climate dynamics, the phenomena are so complicated that all scientists are able to derive are very simple chaotic models which help us to understand phenomena, but not so much to predict them. Let me give an example of a chaotic system, namely a so-called billiard system with convex obstacles. This is a system where a ball moves with con- stant speed on a rectangular table where there are a finite number of convex obstacles with a smooth boundary. It is assumed that there is no friction and that there are perfectly elastic collisions (cf. Ornstein & Galavotti 1974). Figure 1.1 illustrates two key characteristics of chaotic behaviour with help of the example of a billiard system with one convex obstacle. First, Fig- ure 1.1(a) shows that solutions which start close together eventually separate considerably, causing the motion to be unpredictable. Second, Figure 1.1(b) illustrates that the motion exhibits irregular behaviour in the sense that a solution eventually visits every region on the billiard table. From a philosophical point of view chaotic behaviour is relevant for the CHAPTER 1. INTRODUCTION 11 following reasons. First, unpredictability is a crucial philosophical theme be- cause we want to be aware of the limitations of our predictions, and chaos research contributes to our understanding of the kinds of unpredictability scientists can encounter. Second, also randomness is a central theme in philosophy, and chaos research has led to a better understanding of the pos- sible randomness of deterministic behaviour. Third, the question of whether the world is deterministic or indeterministic has always been a topic of philo- sophical debate. And chaos research provides new insights about how deter- ministic behaviour compares to indeterministic behaviour. Fourth, it is one of the main questions in the philosophy of science and also in metaphysics how probabilities can be understood, and chaos research sheds light on the emergence of probabilities and has suggested new interpretations of proba- bilities. Finally, fifth, chaotic behaviour is also of interest to foundational problems in physics. In particular, there is the question whether chaos re- search can contribute to solving some of the vexing problems in statistical mechanics, such as how to derive an analogue of the second law of thermo- dynamics. Moreover, there is the hope that chaotic behaviour will help us to understand the emergence of classical physics from the quantum world. Much philosophical research will be needed to answer all the philosophical questions raised by chaos research. This dissertation will mainly contribute to our understanding of unpredictability and the topic of whether phenom- ena are deterministic or indeterministic, that is, the first and the third point, but will also touch the other points. Chaos research is a part of dynamical systems theory, a general theory of deterministic behaviour. Dynamical systems theory broadly divides into two approaches: measure-theoretic dynamical systems theory, also called ‘ergodic theory’, and topological dynamical systems theory. This dissertation will be mainly about ergodic theory, although sometimes I will also invoke notions of topological dynamical systems theory. Ergodic theory not only describes chaotic behaviour but a wide class of deterministic behaviour, namely it deals with all those deterministic systems which are endowed with a measure. For instance, all deterministic systems in Newtonian mechanics and statistical mechanics can be described by ergodic theory. I focus on ergodic theory for two reasons. First and foremost, in ergodic CHAPTER 1. INTRODUCTION 12 theory deterministic systems are endowed with a measure, which can be inter- preted as a probability density. As a consequence, only the measure-theoretic perspective allows for a connection to probability theory, to information the- ory, to extant probabilistic accounts of randomness, and to the theory of stochastic processes and hence allows for a comparison of deterministic and indeterministic descriptions. The notions of probability, randomness and determinism are central philosophical themes. Thus I believe that ergodic theory is a richer and more interesting field for philosophical investigations than topological dynamical systems theory. Second, in the philosophical lit- erature on chaos there has been little work on the measure-theoretic approach and there has been more work on the topological approach (e.g., Bishop 2003, Bishop 2008, Kellert 1993, Schurz 1996, Smith 1998, Stone 1989). One reason for this might be that ergodic theory is technically harder than topological dynamical systems theory. So even though ergodic theory seems to be a richer field for philosophical investigations, there has been less work on it. The general outline of this dissertation on some of the most important philosophical aspects of chaos is as follows. First, I will examine mathe- matical notions of unpredictability in ergodic theory, and this examination will lead me to draw conclusions about the actual practice of how mathe- matical definitions are justified. On this basis, second, I will tackle the ques- tion of what kind of unpredictability is specific to chaotic systems. Finally, third, the fact that deterministic systems can be unpredictable and even ran- dom prompts the question of whether deterministic descriptions in ergodic theory and indeterministic descriptions can be observationally equivalent. I will reflect on this question, and, in particular, I will investigate what kind of results on observational equivalence hold for chaotic behaviour. More specifically, in Chapter 2 of this dissertation I will introduce the basic notions on which the discussion of this dissertation will be based, most notably deterministic systems and stochastic processes. In Chapter 3 I will investigate historically how notions of unpredictability in ergodic were formed and how they have been justified in the literature. We will see that there is hardly any philosophical research on the actual practice of how mathematical definitions are justified apart from Lakatos (1976, 1978). On the basis of my case study of notions of unpredictability in ergodic theory, I will identify CHAPTER 1. INTRODUCTION 13 novel ways in which mathematical definitions can be justified, and I will criticise Lakatos’s account of the justification of definitions. The discussion of notions of unpredictability in ergodic theory also serves the purpose of providing a background for the following chapters, where these definitions will be applied. With this background I am ready to embark in Chapter 4 on the question of what is the unpredictability specific to chaos. From the beginning of chaos research, the unpredictability of chaotic systems has been of central interest, and so this question is one of the key questions about chaos and unpredictability. I will discuss the existing answers in the literature, and I will argue that they do not fit the bill. This prompts the search for an alternative answer, and I will propose a novel and general answer. Given that deterministic systems can be unpredictable and even ran- dom, one can go a step further and ask: are deterministic and indeterminis- tic descriptions observationally equivalent; that is, is it possible to describe some phenomena by deterministic as well as indeterministic descriptions? I will discuss this question in Chapter 5 with a special emphasis on observa- tional equivalence involving chaotic behaviour. Once ergodic theory and the modern theory of stochastic processes had been developed, it was realised that by combining these two theories one can compare measure-theoretic de- terministic descriptions with stochastic processes (the main indeterministic descriptions used in the sciences). Hence some mathematical results have been proven which shed light on the observational equivalence of determi- nistic and indeterministic descriptions. I will review these results which, surprisingly, have caught hardly any philosophical attention, and I will ex- tend them by proving several new theorems in ergodic theory. Furthermore, I will philosophically assess all these results on observational equivalence. Then in Chapter 6 I will briefly summarise the findings of this dissertation, and I will conclude with an outlook for future research in this area. Finally, let me point to two issues this dissertation will not be concerned with. I will be concerned with deterministic descriptions in dynamical sys- tems theory, which can be regarded as a special kind of description of classical physics. Therefore, first, I will not be concerned with quantum theory. In particular, I will not treat the question of how the classical realm emerges CHAPTER 1. INTRODUCTION 14 from the quantum world. There is, of course, a vast literature on this contro- versial question. Let me just cite two philosophical works that focus on the connection with chaos theory, namely Belot & Earman (1997) and Landsman (2007, section 5-7). Second, deterministic descriptions in dynamical systems theory are mathematical models. I will explain in some detail in this disserta- tion that these mathematical models are often used in the sciences to model phenomena. But I will not tackle the questions of what constitutes a scien- tific model and whether scientific models accurately depict reality. Again, there is, of course, a vast literature on this: let me just cite a recent survey Frigg & Hartmann (2006). Let me now introduce the notions needed for the discussion in this dis- sertation. Chapter 2 Setting the stage In this chapter in section 2.1 I will discuss the deterministic descriptions which will be needed throughout the dissertation, namely measure-theoretic deterministic systems and topological deterministic systems. After that, in section 2.2 I will introduce stochastic processes. Apart from the notion of a Bernoulli process, which will be important throughout the dissertation, the notions introduced in section 2.2 will only be needed in Chapter 5. 2.1 Deterministic systems In this dissertation I will be mainly concerned with measure-theoretic de- terministic systems but a few times also with topological deterministic sys- tems; both deterministic descriptions are descriptions drawn from dynamical systems theory. Generally, deterministic systems as described in dynami- cal systems theory often model natural systems. Typically, a deterministic system is used to model a phenomenon that is only one among many phe- nomena which take place in the actual world. The assumption is made that the phenomenon under consideration is isolated from its environment. Of course, in the actual world this is not the case. But nevertheless the actual world is such that many phenomena can effectively be treated as isolated, and hence modeling phenomena with deterministic systems has proven to be very successful. The two main elements of every deterministic system in dynamical sys- 15 CHAPTER 2. SETTING THE STAGE 16 tems theory are a set M of all possible states m, the phase space of the deterministic system, and a family of functions Tt : M → M mapping the phase space to itself, called the evolution functions. The parameter t is time, and Tt(m) is the state of the system that started in initial state m after t units of time. If t is an integer (i.e., t ∈ Z), the dynamics of the sys- tem is discrete and the system is said to be a discrete deterministic system. If t is a real number (i.e., t ∈ R), the dynamics of the system is continu- ous and the system is called a continuous deterministic system. The family Tt defining the dynamics of the deterministic system must have the struc- ture of a group where Tt1+t2(m) = Tt2(Tt1(m)) for all m ∈ M and for all t1, t2 either in Z (discrete time) or R (continuous time). For discrete deter- ministic systems all Tt are generated as iterative applications of the single bijective map T = T1, T1 : M → M because Tt(m) = T t(m), and I refer to the T t(m) as iterates of m. The discrete solution through m, m ∈ M , is the sequence sm = (. . . T −1(m),m, T 1(m) . . .). The continuous solution through m, m ∈ M , is the function sm : R → M , sm(t) = T (t,m), where T (t,m) = Tt(m). Continuous deterministic systems are also called flows, and they often arise as solutions to differential equations of motion, such as Newton’s laws of motion. It follows that all discrete and continuous deterministic systems are deter- ministic according to the canonical definition: any two solutions that agree at one instant of time agree at all future and past times (Butterfield 2005, Earman 1971, Earman 1986, Montague 1962). I will mainly be concerned with measure-theoretic deterministic systems, but sometimes I will also need topological deterministic systems. So let me briefly introduce topological deterministic systems and then turn to measure- theoretic deterministic systems. A topological deterministic system is a one that has a metric defined on M (cf. Petersen 1983, pp. 2–3). More specifically: Definition 1 A discrete topological deterministic system is a triple (M,d, T ) where M (the phase space) is a set, d is a metric on M , and T : M → M (the evolution function) is a bijective and continuous function. CHAPTER 2. SETTING THE STAGE 17 Definition 2 A contiuous topological deterministic system is a triple (M,d, Tt) where M (the phase space) is a set, d is a metric on M , and Tt : M → M (the evolution functions), t ∈ R, is a family of continuous functions which have the structure of the above group. Assume that a continuous topological deterministic system (M,d, Tt) is given. Then (M,d, Tt0) for t0 ∈ R arbitrary, t0 6= 0, is a discrete topolog- ical deterministic system. The evolution function of this discrete system is Tt0 : M → M which means that you look at the continuous topological deterministic system (M,d, Tt) at points of time nt0, n ∈ Z. And I call these discrete deterministic systems (M,d, Tt0) the discrete versions of the continuous topological deterministic system (M,d, Tt). 1 It is generally assumed in the literature (e.g., Devaney 1986, p. 51) that topological deterministic systems provide a possible framework for charac- terising chaos. This makes intuitive sense because it is often imagined that in case of chaotic behaviour there is some way of measuring the distance be- tween states in the phase space M and thus that there is a metric defined on M . Moreover, to the best of my knowledge, there is always a natural metric for paradigmatic chaotic systems. Often the phase space is simply a subset of Rn, n ≥ 1, and the metric is the standard Euclidean metric. A measure-theoretic deterministic system is one whose phase space is endowed with a measure (cf. Cornfeld, Fomin & Sinai 1982, pp. 3–5). Before I can proceed, recall the following canonical definitions. A measurable space is a pair (M,ΣM) where M is a set and ΣM is a σ-algebra on M . A measure space is a triple (M,ΣM , µ) where M is a set, ΣM is a σ-algebra on M and µ is a measure on (M,ΣM). For simplicity and to avoid some technical problems, I assume that any measure space is complete, i.e., every subset of a measurable set of measure zero is measurable. Furthermore, I assume that any measure space (M,ΣM , µ) is a Lebesgue space; 2 this is standard in the 1Alternatively, continuous-time deterministic systems can be discretised by considering the successive hits of a solution on a suitable Poincare´ section. All I say about discrete ver- sions of continuous deterministic systems also holds true for discrete deterministic systems arising in this way (Berkovitz, Frigg & Kronz 2006, pp. 680–685; Smith 1998, pp. 92–93). 2A measure space (M,ΣM , µ) is called a Lebesgue space if, and only if, there is a measure space (K,ΣK , ν) where K = [a, b) ⊆ R is a (possibly nonempty) interval, there CHAPTER 2. SETTING THE STAGE 18 context of measure-theoretic dynamical systems theory.3 Now I can define: Definition 3 A discrete measure-theoretic deterministic system is a quadru- ple (M,ΣM , µ, T ) where (M,ΣM , µ) is a measure space with µ(M) = 1 (M is the phase space) and T : M → M (the evolution function) is a bijective measurable function such that also T−1 is measurable. Definition 4 A continuous measure-theoretic deterministic system is a quadruple (M,ΣM , µ, Tt) where (M,ΣM , µ) is a measure space with µ(M) = 1 (M is the phase space) and Tt :M →M (the evolution functions), t ∈ R, is a family of measurable functions which have the structure of the above group such that also T−1t is measurable for all t ∈ R. I follow the common assumption that the measure of measure-theoretic de- terministic systems is normalised: µ(M) = 1. The motivation for this is that normalised measures are probability measures, making it possible to use prob- ability calculus. Several interpretations suggest interpreting the measure as probability. This is not one the main topics of this dissertation, but I shall briefly explain at the end of this section some of the most popular interpre- tations which justify interpreting the measure as probability. Given a discrete or continuous measure-theoretic deterministic system, when a property holds for all states m ∈ Mˆ with µ(M \ Mˆ) = 0, I will say that the property holds for almost all points in M or that the property holds except for a set of measure zero. Given a continuous measure-theoretic deterministic system (M,ΣM , µ, Tt), then (M,ΣM , µ, Tt0) for t0 ∈ R arbitrary, t0 6= 0, is a discrete measure- is a countable set ∪i≥1mi ∈ M , there is a Kˆ ⊆ K with ν(Kˆ) = 1, there is a Mˆ ⊆ M with µ(Mˆ) = 1, and there is a bijective function φ : Mˆ \ ∪i≥1mi → Kˆ such that (i) φ(A)∈ΣK for all A∈ΣM , A ⊆ Mˆ \ ∪i≥1mi, φ−1(B) ∈ ΣM for all B ∈ ΣK , B ⊆ Kˆ; and (ii) ν(φ(A)) = µ(A) for all A ∈ ΣM , A ⊆ Mˆ \ ∪i≥1mi (see Petersen 1983, pp. 16–17). 3These two assumptions are not restrictive for the following reasons: first, every mea- sure space can easily be made complete. Second, every example of a measure space which is of interest in the applications of dynamical systems theory, and more generally in the development of the mathematical theory of measure-theoretic dynamical systems, is a Lebesgue space (see Petersen 1983, Rudolph 1990). CHAPTER 2. SETTING THE STAGE 19 theoretic deterministic system. And I call these discrete deterministic sys- tems (M,ΣM , µ, Tt0) the discrete versions of the continuous measure-theoretic deterministic system (M,ΣM , µ, Tt). When observing a measure-theoretic deterministic system (M,ΣM , µ, T ) or (M,ΣM , µ, Tt), one observes a value functionally dependent on, but maybe different from, the actual state. Hence observations can be modeled by an observation function, i.e., a measurable function Φ :M →MO from (M,ΣM) to (MO,ΣMO) where MO is a set and (MO,ΣMO) is a measurable space (cf. Ornstein & Weiss 1991, p. 16). I will often be concerned with measure-preserving deterministic systems defined as follows (cf. Cornfeld et al. 1982, pp. 3–5): Definition 5 A discrete measure-preserving deterministic system is a dis- crete measure-theoretic deterministic system (M,ΣM , µ, T ) where the mea- sure µ is invariant, i.e., µ(T (A)) = µ(A) for all A ∈ ΣM . A continuous measure-preserving deterministic system is a continuous measure-theoretic deterministic system (M,ΣM , µ, Tt) where the measure µ is invariant, i.e., µ(Tt(A)) = µ(A) for all A ∈ ΣM and all t ∈ R. Measure-preserving deterministic systems are important models in physics but are also important in other sciences such as biology, geology etc. For first, all deterministic Hamiltonian systems and deterministic statistical- mechanical systems, and their discrete versions, are measure-preserving; and the relevant invariant measure is the Lebesgue-measure or a close cousin of it (Petersen 1983, pp. 5–6). A measure-preserving deterministic system is called volume-preserving if, and only if, the Lebesgue measure or a nor- malised Lebesgue measure is the invariant measure. A measure-preserving deterministic system which fails to be volume-preserving is called dissipa- tive. Dissipative systems can also often be modeled as measure-preserving deterministic systems. More precisely, if (M,ΣM , λ, T ) or (M,ΣM , λ, Tt) is dissipative (where λ is the Lebesgue measure), then often there exists a mea- sure µ 6= λ such that (M,ΣM , µ, T ) or (M,ΣM , µ, Tt) is measure-preserving. The Lorenz system is a case in point (see Example 3 which will be introduced later in this section) (Luzzatto, Melbourne & Paccaut 2005). Generally, the long-term behaviour of a large class of deterministic systems can be modeled CHAPTER 2. SETTING THE STAGE 20 by measure-preserving deterministic systems (Eckmann & Ruelle 1985), and the potential scope of measure-preserving deterministic systems is quite wide: although some evolution functions cannot be modeled by invariant measures, for very wide classes of evolution functions invariant measures have been proven to exist. For instance, if T is a continuous function on a compact metric space, then there exists at least one invariant measure (Man˜e´ 1987, p. 52).4 It is generally agreed in the literature that measure-preserving determi- nistic systems provide a possible framework for characterising chaos (e.g., Eckmann & Ruelle 1985). As already pointed out, for volume-preserving de- terministic systems the relevant invariant measure is the Lebesgue measure or a normalized Lebesgue measure. For dissipative deterministic systems, to the best of my knowledge, all systems that have ever been identified as chaotic have, or are believed to have, a relevant invariant measure—in the light of the following considerations. Many chaotic systems have attractors. For a discrete topological de- terministic system (M,d, T ) the set Λ ⊂ M is an attractor if, and only if, (i) T (Λ) = Λ; (ii) there is a neighbourhood U ⊃ Λ, called a ‘basin of attraction’, such that all solutions are attracted by Λ, i.e., for all y in U limt→∞ inf{d(T t(y), x) |x ∈ Λ} = 0; and (iii) no proper subset of Λ satisfies (i) and (ii). For a continuous topological deterministic system (M,d, Tt) the set Λ ⊂ M is an attractor if, and only if, (i) Tt(Λ) = Λ for all t ∈ R; (ii) there is a neighbourhood U ⊃ Λ, called a ‘basin of attraction’, such that for all y in U limt→∞ inf{d(Tt(y), x) |x ∈ Λ} = 0; and (iii) no proper subset of Λ satisfies (i) and (ii). Liouville’s theorem implies that only dissipative systems can have attractors (Schuster & Just 2005, p. 162).5 As I will show in section 4.3, for chaotic systems the evolution of any bundle of initial con- 4Topological deterministic systems and measure-theoretic deterministic systems are usually related in the following way: the σ-algebra ΣM of a measure-theoretic determi- nistic system is or at least includes the Borel σ-algebra of the metric space (M,d) of the topological deterministic system. The Borel σ-algebra of (M,d) is the σ-algebra generated by all open sets of M (cf. Man˜e´ 1987, pp. 2–3). Intuitively, it is the σ-algebra which arises from the metric space (M,d). 5Some other definitions of ‘attractor’ allow that volume-preserving deterministic sys- tems can have attractors; yet these definitions are not standard in our context. CHAPTER 2. SETTING THE STAGE 21 ditions eventually enters every region of phase space. This is impossible for the motion approaching an attractor Λ since the attracted solutions never return arbitrarily close to where they originated. Hence chaotic behaviour can only occur on Λ. The chaotic motion is described by a deterministic system with phase space Λ, and the invariant measure is only defined on Λ. Generally, an attractor on which the motion is chaotic is called a ‘strange attractor ’. Of course, in practice one is often concerned with solutions approaching a strange attractor. Yet after a sufficiently long duration either the solutions enter the attractor or come arbitrarily near to the attractor. In the latter case, since the dynamics is typically continuous, when the solutions are suf- ficiently near to the attractor, they essentially behave like the solutions on the attractor. And in applications such solutions which are sufficiently near to a strange attractor are considered to be chaotic for practical purposes. In particular, in the latter case, the unpredictability or randomness of solutions very near to the attractor is practically indistinguishable from the unpre- dictability or randomness on the attractor. Consequently, for characterising the unpredictability or randomness of motion dominated by strange attrac- tors, it is widely acknowledged that it suffices to consider the dynamics on attractors, where relevant invariant measures can be defined (cf. Eckmann & Ruelle 1985). The following examples of a discrete measure-preserving deterministic system and the following two examples of a continuous measure-preserving determi- nistic system will accompany us throughout the dissertation. They are all also paradigmatic examples of chaotic systems. Example 1: The baker’s system. On the setM = [0, 1]× [0, 1]\D where D = {(x, y) ∈ [0, 1]× [0, 1] | x = j/2n or y = j/2n, n ∈ N, 0 ≤ j ≤ 2n} consider T (x, y) = (2x, y 2 ) if 0 ≤ x < 1 2 ; (2x− 1, y + 1 2 ) if 1 2 ≤ x ≤ 1. (2.1) I exclude the set D from [0, 1]× [0, 1] in order to be able to define a bijective function T . Figure 1 illustrates that the baker’s system first stretches the set M to twice its length and half its width; then it cuts the rectangle obtained CHAPTER 2. SETTING THE STAGE 22 Figure 2.1: The baker’s system on 0 ≤ y ≤ 1/2 in half and places the right half on top of the left. For the Lebesgue σ-algebra ΣM on M and the Lebesgue measure µ one obtains the measure-preserving deterministic system (M,ΣM , µ, T ). This system also has physical meaning. It describes a particle which moves in a part of three-dimensional space which contains M . It starts out in initial position (x, y) in M . The particle moves with constant speed in three-dimensional space. There it bounces on several mirrors, causing it to return to M at T (x, y) (cf. Pitowsky 1995, p. 166). Example 2: A billiard system with convex obstacles. Our first example of a continuous measure-preserving deterministic system is a billiard system with convex obstacles as discussed in the Introduction (Chapter 1, see Figure 1.1). This is a system where a ball moves with con- stant speed on a rectangular table with a finite number of convex obstacles. It is assumed that there is no friction and that there are perfectly elastic collisions. Here M is the set of all possible positions and directions of the ball, ΣM is the Lebesgue σ-algebra on M , µ is the Lebesgue measure, and Tt(m), where m = (p, q), gives the position and the direction after t time units of the ball that starts out in initial position q and initial direction p (for details, see Ornstein & Galavotti 1974). Example 3: The Lorenz system. Our second example of a continuous measure-preserving deterministic system CHAPTER 2. SETTING THE STAGE 23 Figure 2.2: Numerical solution of the Lorenz equations for σ = 10, r = 28, b = 8/3 is the Lorenz system. Consider the Lorenz equations dx(t) dt = σ(y(t)− x(t)) dy(t) dt = rx(t)− y(t)− x(t)z(t) (2.2) dz(t) dt = x(t)y(t)− bz(t), for the parameter values σ = 10, r = 28 and b = 8/3. These are the pa- rameters Lorenz (1963) considered when proposing the Lorenz system as a simplified model of weather dynamics. The Lorenz equations have also been used to model waterwheels, and it has been found that the Lorenz system gives relatively accurate predictions of waterwheels (cf. Hilborn 2000; Kola´r˘ & Gumbs 1992; Strogatz 1994). For these parameter values it is proven that there is a strange attractor of Lebesgue measure zero such that all solutions originating in the basin of attraction U , which is of positive Lebesgue mea- sure, approach but never enter the attractor. Hence the dynamics is modeled by a measure-preserving deterministic system, the phase space of which is the attractor (Luzzatto et al. 2005). Figure 2.2 shows a numerical solution of these equations; one can vaguely discern the shape of the strange attractor, known as the Lorenz attractor, because the solution spirals toward it. CHAPTER 2. SETTING THE STAGE 24 I have pointed out above that the measure of measure-theoretic deterministic systems is commonly interpreted as a probability density. This deep issue has been discussed in statistical mechanics but is not one of the main topics of this dissertation. But let me mention two interpretations that naturally suggest interpreting measures as probability. Namely, according to the time-average interpretation, the measure of a set A is the fraction of the proportion of time the deterministic system spends in A; and according to the ensemble interpretation, the measure of a set A at time t is the fraction of solutions starting from some given set of initial conditions that are in A at t (see Falconer 1990, p. 254; Lavis 2010). Let me say more about the time-average interpretation. For a discrete measure-preserving deterministic system (M,ΣM , µ, T ) the long-run time- average of a solution starting at m relative to A, m ∈M , A ∈ ΣM , is: LA(m) = lim t→∞ 1 t t−1∑ i=0 χA(T i(m)), (2.3) where χA(m) is the characteristic function of A. 6 For a continuous measure- preserving deterministic system (M,ΣM , µ, Tt) the long-run time-average of a solution starting at m relative to A, m ∈M , A ∈ ΣM , is: LA(m) = lim t→∞ 1 t ∫ t 0 χA(Tτ (m))dτ, (2.4) where χA(m) is the characteristic function of A and the measure on the time axis τ ∈ R+0 is the Lebesgue measure. For discrete and continuous time it follows from Birkhoff’s (1931) so called pointwise ergodic theorem that LA(m) exists for almost all states m ∈M . Now from an observational viewpoint it is natural to demand that the long-run time-averages of almost all solutions (relative to the Lebesgue mea- sure) of a deterministic system approximate the measure of the system. Such measures are called ‘physical measures’. And, clearly, physical measures can be interpreted as probability densities in terms of the time average interpre- tation of probability. Let us look at physical measures in more detail. We need to distinguish two methods by which they can be specified. 6That is, χA(m) = 1 for m ∈ A and χA(m) = 0 for m ∈M \A. CHAPTER 2. SETTING THE STAGE 25 For discrete measure-preserving deterministic systems (M,ΣM , µ, T ) with λ(M) > 0 or continuous measure-preserving deterministic systems (M,ΣM , µ, Tt) with λ(M) > 0, where λ is the Lebesgue measure, the fol- lowing method identifies physical measures. (M1): (i) Take any A ∈ ΣM . (ii) Take an initial condition m ∈ M . (iii) Consider LA(m), the long- run time-average of a solution starting at m relative to A. (iv) Consider GA = {m ∈ M | LA(m) exists and LA(m) = µ(A)}. Then µ is a physical measure if, and only if, for any A ∈ ΣM , Lebesgue-almost all initial condi- tions approximate the measure of A, i.e., λ(GA) = λ(M). If such a measure exists, it is unique (cf. Eckmann & Ruelle 1985, Young 2002). What are physical measures for attractors (see the definition on p. 20)? I will be concerned with two kinds of attractors: first, the case where all solutions eventually enter an attractor Λ with λ(Λ) > 0. Clearly, here method (M1) can be applied directly, i.e., for M = Λ. Second, it can be that the solutions approach but never enter an attractor Λ with λ(Λ) = 0 but λ(U) > 0, where U is the basin of attraction of Λ. Here the method has to be slightly modified. (M2): (i) Take any measurable region A ⊆ Λ. (ii) Take an initial condition m ∈ U . (iii) Consider L¯A(m), the long-run time-average of the solution originating at m which is close to A. (iv) Consider G¯A = {m ∈ U | L¯A(m) exists and L¯A(m) = µ(A)}. Then µ is a physical measure if, and only if, for all A ∈ ΣM Lebesgue-almost all initial conditions in U approximate the measure of A, i.e., λ(G¯A) = λ(U). If such a measure exists, it is unique (for more details, see Eckmann & Ruelle 1985, Young 2002). To illustrate the time-average interpretation for chaotic systems, consider the baker’s system (Example 1). Now choose an initial condition m in the phase space M and draw a histogram of the fraction of iterates of m (up to an iterate T t(m), t ≥ 1) which are in a particular part on M . Then, for Lebesgue-almost all initial conditions we chose in M , we obtain what is illustrated in Figure 2.3: as t goes to infinity and the histogram becomes finer, the histograms approximate the uniform measure on M , that is, the Lebesgue measure. Hence this measure is physical according to method (M1). Also, recall Example 3 and Figure 2.2 of the Lorenz system. Recall that here there is a strange attractor of Lebesgue measure zero such that all solutions in the basin of attraction U (of the attractor), which is of positive CHAPTER 2. SETTING THE STAGE 26 Figure 2.3: (a) histogram and (b) natural measure of the baker’s system Lebesgue measure, approach but never enter the attractor. According to the method (M2), the physical measure, which is the natural invariant measure on the attractor, is the unique measure with the following property: for Lebesgue-almost-all initial conditions in the basin of attraction the long- run time-average that the solution spends close to a set A on the attractor approximates the measure of A (cf. Luzzatto et al. 2005). These two examples illustrate what is generally true, namely that for deterministic systems proven to be chaotic physical measure exist. For first, as I will show in section 4.3, chaotic systems are ergodic. Definition 6 A discrete measure-preserving deterministic system (M,ΣM , µ, T ) is ergodic if, and only if, for all A ∈ ΣM with µ(A) > 0: µ(∪t≥0T−t(A)) = 1. (2.5) Now for ergodic volume-preserving deterministic systems method (M1) yields that the Lebesgue-measure is the physical measure (Eckmann & Ruelle 1985). Second, as I will explain in more detail in section 4.3, for dissipative systems proven to be chaotic, physical measures can be proven to exist. Moreover, for systems only conjectured to be chaotic, numerical evidence generally favours the existence of physical measures (Lyubich 2002; Young 1997; Young 2002).7 7Also for nonergodic deterministic systems the time-average interpretation can be used to justify interpreting the measure as probability (see Lavis 2010). CHAPTER 2. SETTING THE STAGE 27 Finally, let me introduce the definition of a partition which I will need throughout the dissertation. Intuitively speaking, a partition of (M,ΣM , µ) is a collection of non-empty, non-intersecting sets that cover M . Definition 7 α = {α1, . . . , αn}, n ∈ N, is a partition of (M,ΣM , µ), where (M,ΣM , µ) is a measure space, if, and only if, αi ∈ ΣM and µ(αi) > 0 for all i, 1 ≤ i ≤ n, αi ∩ αj = ∅ for all i 6= j, 1 ≤ i, j ≤ n, and M = ⋃n i=1 αi. The αi are called atoms. A partition is nontrivial if, and only if, it has more than one element. For a discrete measure-preserving deterministic system (M,ΣM , µ, T ), if α is a partition, then T tα = {T t(α1), . . . , T t(αn)}, t ∈ Z, is also a partition. Likewise, for a continuous measure-preserving deterministic system (M,ΣM , µ, Tt): if α is a partition, then Ttα = {Tt(α1), . . . , Tt(αn)}, t ∈ R, is also a partition. Given two partitions α = {α1, . . . , αn} and β = {β1, . . . , βm} of (M,ΣM , µ), the least common refinement α∨ β is defined as the partition {αi ∩ βj | i = 1, . . . , n; j = 1, . . . ,m} of (M,ΣM , µ). 2.2 Stochastic processes Let me now introduce stochastic processes. Apart from Bernoulli processes which will be important throughout the dissertation, the notions introduced in this section will only be needed to follow the discussion in Chapter 5. A stochastic process is a process governed by probabilistic laws. Hence there is usually indeterminism in the time-evolution: if the process yields a specific outcome, there are different outcomes that might follow; and a probability distribution measures the likelihood of them. I call a sequence which describes a possible time-evolution of the stochastic process a realisa- tion. Nearly all, but not all, the indeterministic descriptions in science are stochastic processes.8 Let me formally define stochastic processes. A random variable is a mea- surable function Z : Ω → M¯ from a probability space (Ω,ΣΩ, ν), that is, a 8For instance, Norton’s dome (which satisfies Newton’s laws) is indeterministic because the time evolution fails to be bijective. Nothing in Newtonian mechanics requires us to assign a probability measure on the possible states of this system. It is possible to assign a probability measure, but the question is whether it is natural (cf. Norton 2003, pp. 8–9). CHAPTER 2. SETTING THE STAGE 28 measure space (Ω,ΣΩ, ν) with ν(Ω) = 1, to a measurable space (M¯,ΣM¯). The probability measure PZ(A) = P{Z ∈ A} = ν(Z−1(A)) for all A ∈ ΣM¯ on (M¯,ΣM¯) is called the distribution of Z. If A consists of one element, i.e., A = {a}, I often write P{Z = a} instead of P{Z ∈ A}. Definition 8 A discrete stochastic process {Zt; t ∈ Z} is a one-parameter family of random variables Zt, t ∈ Z, which are defined on the same probabil- ity space (Ω,ΣΩ, ν) and take values in the same measurable space (M¯,ΣM¯). Definition 9 A continuous stochastic process {Zt; t ∈ R} is a one-parameter family of random variables Zt, t ∈ R, which are defined on the same proba- bility space (Ω,ΣΩ, ν) and take values in the same measurable space (M¯,ΣM¯) such that Z(t, ω) = Zt(ω) is jointly measurable in (t, ω). The set M¯ is called the outcome space of the stochastic process. In the case of discrete time, a bi-infinite sequence rω = (. . . Z−1(ω), Z0(ω), Z1(ω) . . .), for ω ∈ Ω arbitrary, is called a realisation of the stochastic process. For continuous time the function rω : R → M¯ , rω(t) = Z(t, ω), for ω ∈ Ω arbitrary, is called a realisation (cf. Doob 1953, pp. 4–46). Intuitively, t represents time; so that each ω ∈ Ω represents a possible history in all its details, and rω represents the description of that history by giving the ‘score’ at each t. Assume a stochastic process {Zt; t ∈ Z or R} with outcome space M¯ is given. There can be situations when one observes a value which is dependent on, but maybe different from, the actual outcome of the stochastic process. Such situations can be modeled by an observation function Γ, i.e., a measur- able function M¯ → M¯O, where MO is a set and (M¯O,ΣM¯O) is a measurable space. Clearly, the resulting observed stochastic process is {Γ(Zt); t ∈ Z or R}. I will often deal with stationary stochastic processes: Definition 10 A discrete stochastic process {Zt; t ∈ Z} is stationary if, and only if, the distribution of the multi-dimensional random variable (Zt1+h, . . . , Ztn+h) is the same as the one of (Zt1 , . . . , Ztn) for all t1, . . . , tn ∈ Z, n ∈ N, and all h ∈ Z. A continuous stochastic process {Zt; t ∈ R} is stationary if, and only if, the distribution of the multi-dimensional random CHAPTER 2. SETTING THE STAGE 29 variable (Zt1+h, . . . , Ztn+h) is the same as the one of (Zt1 , . . . , Ztn) for all t1, . . . , tn ∈ R, n ∈ N, and all h ∈ R (Doob 1953, p. 94). It is perhaps needless to stress the importance of stochastic processes, and stationary processes in particular: both are ubiquitous in science. The following examples of discrete stochastic processes and of continuous stochastic processes will be important in this dissertation. Example 4 of a Bernoulli process will be important throughout the dissertation; the other examples will be important later in Chapter 5. Let me first introduce the examples of discrete stochastic processes. Example 4: Bernoulli processes. A Bernoulli process is a process where, intuitively, at each time point a (possibly biased) N -sided die is tossed where the probability for obtaining side sk is pk, 1 ≤ k ≤ N, N ∈ N, with ∑N k=1 pk = 1, and each toss is independent of all the other ones. Bernoulli processes are important in all sciences, from physics and biology to the social sciences. The mathematical definition proceeds as follows. The random variables X1, . . . , Xn, n ∈ N, are probabilistically independent if, and only if, P{X1 ∈ A1, . . . , Xn ∈ An} = P{X1 ∈ A1} . . . P{Xn ∈ An} for all A1, . . . , An ∈ ΣM¯ . The random variables {Zt; t ∈ Z} are probabilistically independent if, and only if, any finite number of them is probabilistically independent. Definition 11 The discrete stochastic process {Zt; t ∈ Z} is a Bernoulli process if, and only if, (i) its outcome space is a finite number of symbols M¯ = {s1, . . . , sN}, N ∈ N, and ΣM¯ = P(M¯), where P(M¯) is the power set of M¯ ; (ii) there is a set of numbers pk, 0 ≤ pk ≤ 1, 1 ≤ k ≤ N , with∑N i=1 pk = 1 such that P{Zt = sk} = pk for all t ∈ Z and all k; and (iii) {Zt; t ∈ Z} are probabilistically independent. Clearly, a Bernoulli process is stationary. In this definition the probability space Ω is not explicitly given. I now give a representation of Bernoulli processes where Ω is explicitly given. The idea is that Ω is the set of all possible realisations of the process. For a Bernoulli process with outcomes M¯ = {s1, . . . , sN} which have probabilities p1, . . . , pN , CHAPTER 2. SETTING THE STAGE 30 N ∈ N, let Ω be the set of all bi-infinite sequences ω = (. . . ω−1, ω0, ω1 . . .) with ωi ∈ M¯ corresponding to one of the possible outcomes of the i-th trial in a doubly infinite sequence of trials. Let ΣΩ be the σ-algebra on Ω generated 9 by the semi-algebra of cylinder-sets CA1...Ani1...in ={ω ∈ Ω |ωi1∈A1,..., ωin∈An, ij∈Z, i1< ... 0. A Markov process is aperiodic 9The σ-algebra on M generated by a set E ⊆ P(M) is the smallest σ-algebra on M containing E, that is, the σ-algebra (cf. Ash 1972):⋂ All σ-algebras Σ onM, E⊆Σ Σ. (2.6) CHAPTER 2. SETTING THE STAGE 31 exactly if for every possible outcome there is no periodic pattern in which the process can revisit that outcome. To be precise: the period dsi of an outcome si ∈ M¯, 1 ≤ i ≤ N , is defined by di = gcd{k ≥ 1 |P k(si, si) > 0} where ‘gcd’ denotes the greatest common divisor. An outcome si ∈ M¯ is aperiodic if, and only if, di = 1, and the Markov process is aperiodic if, and only if, all its possible outcomes are aperiodic. Example 6: Multi-step Markov processes. Multi-step Markov processes are Markov processes of order n, n ∈ N, and are a generalisation of Markov processes. For Markov processes of order n the next outcome depends on the previous n outcomes but no other outcomes. I will also assume that a Markov process of order n has finitely many possible outcomes and is stationary; (hence Markov processes are Markov processes of order 1). Again, multi-step Markov processes are widely used in science to model phenomena. Definition 13 A discrete stochastic process {Zt; t ∈ Z} is a Markov process of order n, n ∈ N, if, and only if, (i) its outcome space consists of a fi- nite number of symbols M¯ = {s1, . . . , sN}, N ∈ N, and ΣM¯ = P(M¯); (ii) P{Zt+1 = sj |Zt, Zt−1 . . . , Zk} = P{Zt+1 = sj | Zt, . . . , Zt−n+1} for any t, any k ∈ Z, k ≤ t − n + 1, and any sj ∈ M¯ ; and (iii) {Zt; t ∈ Z} is stationary. That a Markov process of order n is irreducible is defined exactly as for Markov processes; also, that an outcome si, 1 ≤ i ≤ N , of a Markov pro- cess of order n is aperiodic and that the Markov process of order n itself is aperiodic is defined exactly as for Markov processes. Let me now introduce the examples of continuous-time stochastic pro- cesses. Example 7: Semi-Markov processes. Intuitively, a semi-Markov process is a continuous stochastic process with finitely many possible outcomes si; it takes the outcome si for a time u(si), and which outcome follows si depends only on si and no other past out- CHAPTER 2. SETTING THE STAGE 32 comes.10 Semi-Markov processes are widely used in the sciences to model phenomena, from physics and biology to the social sciences. In particular, they play an important role in queuing theory (cf. Janssen & Limnios, 1999). A semi-Markov process is defined with help of a discrete stochastic process {(Sk, Tk); k ∈ Z}. {Sk; k ∈ Z} describes the successive outcomes si visited by the semi-Markov process, and at time 0 the outcome of the semi-Markov process is S0. T0 is the time interval after which there is the first jump of the semi-Markov process after the time 0, T−1 is the time interval after which there is the last jump of the process before time 0, and all other Tk similarly describe the time-intervals between jumps of the stochastic process. Because at time 0 the semi-Markov process takes the outcome S0 and the process takes the outcome S0 for the time u(S0), it follows that T−1 = u(S0)− T0. Technically, {Yk; k ∈ Z} = {(Sk, Tk), k ∈ Z} is a stochastic process which satisfies the following conditions: (i) Sk ∈ S = {s1, . . . , sN}, N ∈ N; Tk ∈ U = {u1, . . . , uN¯}, N¯ ∈ N, N¯ ≤ N for k 6= 0,−1, where ui ∈ R+, 1 ≤ i ≤ N¯ ; T0 ∈ (0, u(S0)], T−1 ∈ [0, u(S0)), where u : S → U, si → u(si), is a surjective measurable function; and hence M¯ = S × [0,maxi ui]; (ii) ΣM¯ = P(S)×L([0,maxi ui]), where L([0,maxi ui]) is the Lebesgue σ-algebra on [0,maxi ui]; (iii) {Sk; k ∈ Z} is a Markov process with outcome space S (as defined in Example 5); psi = P{S0 = si} > 0, for all i, 1 ≤ i ≤ N ; (iv) Tk = u(Sk) for k ≥ 1, Tk = u(Sk−1) for k ≤ −2, and T−1 = u(S0)−T0; (v) for all i, 1 ≤ i ≤ N , P (T0 ∈ A |S0 = si) has a uniform density over (0, u(si)], i.e., we have P (T0 ∈ A |S0 = si) = ∫ A 1/u(si)dλ for all A ∈ L((0, u(si)]), where L((0, u(si)]) is the Lebesgue σ-algebra on (0, u(si)] and λ is the Lebesgue measure on (0, u(si)]. Definition 14 The continuous stochastic process {Zt; t ∈ R} with outcome space S and ΣS = P(S) constructed via a process {(Sk, Tk); k ∈ Z} as follows is called a semi-Markov process: Zt=S0 for − T−1 ≤ t < T0, Zt=Sk for T0 + . . .+ Tk−1 ≤ t < T0 + . . .+ Tk; k ≥ 1 and thus t ≥ T0, Zt=S−k for−T−1−. . .−T−k−1 ≤ t <−T−1−. . .−T−k(ω); k ≥ 1 and thus t <−T−1, 10The term ‘semi-Markov process’ is not used unambiguously in the literature. Our use of this term follows Ornstein & Weiss (1991). CHAPTER 2. SETTING THE STAGE 33 and for all i, 1 ≤ i ≤ N , P (Z0 = si) = psiu(si) ps1u(s1) + . . .+ psNu(sN) . (2.8) It can be proven that semi-Markov processes thus defined are station- ary stochastic processes (Ornstein 1970b; Ornstein 1974, pp. 56–61). I will be concerned later with semi-Markov processes where the Markov process {Sk; k ∈ Z} is irreducible and aperiodic and where the elements of the set U are irrationally related (ui and uj are called irrationally related if, and only if, ui vj is not a rational number; and a set of elements {u1, . . . , uN¯} is called irrationally related if, and only if, for all i, j, i 6= j, ui and uj are irrationally related). I will call those stochastic processes irrationally related semi-Markov processes. Example 8: Multi-step semi-Markov processes. Multi-step semi-Markov processes are semi-Markov processes of order n, n ∈ N, and are a generalisation of semi-Markov processes. A semi-Markov process of order n is a continuous stochastic process with a finite number of possible outcomes si; it takes the outcome si for a time u(si), and which outcome follows si depends only of the past n outcomes (hence semi-Markov processes are semi-Markov processes of order 1).11 Again, multi-step semi-Markov processes are widely used to model phenomena in science (cf. Janssen & Limnios, 1999). Definition 15 Semi-Markov processes of order n are defined as semi-Markov processes except that for the discrete stochastic process {(Sk, Tk); k ∈ Z} con- dition (iii) is replaced by the following condition: (iii’) {Sk; k ∈ Z} is a Markov process of order n with outcome space S (as defined in Example 6) and psi = P{S0 = si} > 0, for all i, 1 ≤ i ≤ N . Again, it can be proven that multi-step semi-Markov processes are sta- tionary stochastic processes (Park 1982). In Chapter 5 I will be concerned with multi-step semi-Markov processes where the multi-step Markov process 11The term ‘multi-step semi-Markov process’ is not used unambiguously in the literature, and I follow the usage of Ornstein & Weiss (1991). CHAPTER 2. SETTING THE STAGE 34 {Sk; k ∈ Z} is irreducible and aperiodic and where the elements of U are irrationally related. I will call those stochastic processes irrationally related multi-step semi-Markov processes. After setting the stage, we are now ready to turn to the first substantial chapter of this dissertation, where I will historically investigate how notions of unpredictability in ergodic theory were formed and how they are justified in the mathematical literature. Chapter 3 Justifying definitions in mathematics—going beyond Lakatos 3.1 Introduction Mathematical practice suggests that mathematical definitions are not arbi- trary: for definitions to be worth studying there have to be good reasons. Moreover, definitions are often regarded as important mathematical knowl- edge (cf. Tappenden 2008a and 2008b). Reasoning and knowledge are classi- cal philosophical issues; hence reflecting on the reasons given for definitions is philosophically relevant. These considerations motivate the guiding question of this chapter: in what ways are definitions in mathematics justified, and are these kinds of justification reasonable? By a justification of a definition I mean a reason provided for the definition. I will concentrate on explicit definitions, which introduce a new expression by stipulating that it be semantically equivalent to the definiens consisting of already-known expressions. I will not deal with their complement, implicit definitions, which assign meaning to expressions by imposing constraints on how to use sentences (or other longer expressions) containing them (Brown 1999, p. 97). Generally, attempting to justify definitions is reasonable: as we will see, 35 CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 36 if definitions were not justified, the mathematics involving these definitions would be much less meaningful to us than mathematics involving definitions which were justified. Thus given our limited resources, it is better to con- centrate on definitions which we can justify.1 When a mathematician formulates a definition she or he has not known before, I speak of a formulation of the definition. The way a formulation of a definition is guided usually corresponds to the way the definition is justified when it is formulated. Thus all that will be said about the justification of definitions has a natural counterpart in terms of the guidance of the for- mulation of definitions. Since the guidance of the formulation of definitions derives from the justification, the latter is the main issue, and in what follows I will focus on the justification of definitions.2 In this chapter, in section 3.2, I will first discuss the state of the art of philosophical theorising about the actual mathematical practice of how defi- nitions are justified in articles and books. There is hardly any philosophical discussion on this issue apart from Lakatos’s ideas on proof-generated def- initions, and hence I will concentrate on them. While Lakatos’s ideas are important, this chapter aims to show how they are limited. My criticism of Lakatos will be based on a case study of notions of unpredictability in ergodic theory, which will be introduced in section 3.3. In section 3.4 I will discuss how notions of unpredictability in ergodic theory have been justified. And based on this, I will introduce three other ways in which definitions are commonly justified: natural-world justification, condition justification and redundancy justification; the latter two, to my knowledge, have not been discussed before. In section 3.5 I will clarify the interrelationships between the different kinds of justification, an issue which also has not been addressed before. In particular, I argue that in different arguments the same definition 1What this means for the ontology of mathematical definitions depends on the ontology adopted: platonists may hold that the entity defined by a definition is real regardless of whether we can justify the definition or not. Constructivists may hold that only those definitions that have been justified a constructed by us. 2Strictly speaking, the justification and the guidance of formulation are conceptually distinct. For instance, it could be that a definition which captures an important preformal idea was randomly formulated by a computer; then there was no way the formulation of the definition was guided, but there is a convincing initial justification. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 37 can be justified in different ways. In section 3.6 I point out how Lakatos’s ideas are limited: his ideas fail to show that often, and in particular for notions of unpredictability in ergodic theory, various kinds of justification are found and that various kinds of justification can be reasonable. Further- more, they fail to acknowledge the interplay between the different kinds of justification. Finally, in section 3.7 I summarise the findings of this chapter. The research of this chapter is in the spirit of ‘phenomenological philo- sophy of mathematics’ as recently characterised by Larvor (2001, pp. 214– 215) and Leng (2002, pp. 3–5): it looks at mathematics ‘from the inside’ and on this basis asks philosophical questions. 3.2 Lakatos’s proof-generated definitions In the relatively recent literature Larvor (2001, p. 218) at least mentions the importance of researching the justification of mathematical definitions. Corfield (2003, chapter 9) discusses the related issue of what makes con- cepts fundamental but does not provide conceptual reflection on our ques- tion. Tappenden (2008a, 2008b) treats the related issues of naturalness of definitions and how to decide between different definitions. In our context Tappenden’s (2008a) conclusion is relevant: namely that judgments about definitions mainly depend not on the rules of logic but on detailed knowledge about the mathematics involved. Furthermore, several philosophers have ar- gued that mathematical definitions should capture a valuable preformal idea (cf. Brown 1999, p. 109). Apart from this, the main philosopher who has written on our guid- ing question in the light of mathematical practice is Lakatos (1976, 1978). Lakatos develops an approach of informal mathematics, which includes an account of mathematical progress called proofs and refutations. Most impor- tantly, Lakatos is also concerned with how definitions are justified. His key idea is the notion of a proof-generated definition. Here his main example are definitions of polyhedron which are justified because they are needed to make the proof of the Eulerian conjecture work: viz. that for every polyhedron the number of vertices minus the number of edges plus the number of faces equals two (V − E + F = 2). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 38 What is a proof-generated definition? Unfortunately, Lakatos does not state exactly what he means by this. Clearly, mathematical definitions justi- fied in any way are eventually involved in proofs. Therefore, the trivial idea that definitions are justified because they are involved in proofs cannot be what interested Lakatos. To find out more, consider the Carathe´odory definition of measurable sets, another proof-generated definition Lakatos discusses. The mathematician Halmos (1950, p. 44) remarks on this definition: “The greatest justification of this apparently complicated concept is, however, its possibly surprising but absolute complete success as a tool of proving the extension theorem”. Lakatos (1976, p. 153) comments: as we learn from the second part [Halmos’s remark above], this concept is a proof-generated concept in Carathe´odory’s theorem about the extension of measures [...]. So whether it is intuitive or not is not at all interesting: its rationale lies not in its intuitive- ness, but in its proof-ancestor. This quote and the rest of the discussion of proof-generated definitions sug- gests that a proof-generated definition is a definition which is needed in order to prove a specific conjecture regarded as valuable (Lakatos 1976, pp. 88–92, pp. 127–133, pp. 144–54; Lakatos 1978, pp. 95–97). This idea is also hinted at by Polya (1949, p. 686; and 1954, p. 148). The final theorems which in- volve proof-generated definitions often, but not always, result from a series of trials and revisions. Lakatos (1976, pp. 33–50, p. 127) rightly argues that lemma-incorporation produces proof-generated definitions: assume that a conjecture, known not to hold for all objects of a domain, should be established. Then if conditions which are needed in order to prove the conjecture are identified, i.e., lemmas are incorporated, proof-generated definitions arise. For instance, consider the conjecture that the limit function of a convergent sequence of continu- ous functions is continuous. This conjecture can be proven if ‘convergent’ is understood as uniformly convergent but not if it is understood as the more obvious, weaker pointwise convergent; hence the definition of uniformly con- vergent is proof-generated (Lakatos 1976, pp. 144–146). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 39 Lakatos (1976, pp. 90–92, p. 128, pp. 148–149, p. 153) thinks that for his examples of proof-generated definitions the justification was reasonable be- cause the corresponding conjectures are valuable. Generally, if the conjecture is mathematically valuable, proof-generation is a reasonable kind of justifica- tion.3 A proof-generated definition can be regarded as providing knowledge since it answers the question of which notion is needed to prove a specific conjecture. Lakatos (1976, pp. 14–33, pp. 83–87) also discusses four other ways of justifying definitions. Imagine that counterexamples are presented to a con- jecture of interest, and that the conjecture is defended by claiming that these are no ‘real’ counterexamples because a definition in the conjecture has been wrongly understood. Properly understood, it is argued, the defini- tion excludes a class of objects which includes the alleged counterexamples, where the exclusions are made independent of any proof of the conjecture (and thus it is unknown whether the conjecture indeed holds true for the definition). Then the definition is justified via monster-barring. The sec- ond kind of justification is exception-barring. Here the definition is defended by excluding, with the extant definition, a class of objects which are, and which are regarded as, counterexamples to the conjecture; again, this is in- dependent of any proof of the conjecture.4 The third kind of justification is monster-adjustment. Here the definition is defended by reinterpreting, inde- pendent of any proof of the conjecture, the terms of the extant definition such that counterexamples to the conjecture are no longer counterexamples. The fourth and final kind of justification is monster-including. Here the definition is defended by extending the definition to include a new class of objects; this class of objects is defined using properties which are shared by examples for which the conjecture holds true; and again, this is independent of any proof of the conjecture. Monster-barring, exception-barring and monster-adjustment are all ways 3For the proof-generated definitions discussed in Lakatos (1976) and in this chapter it is argued why the conjectures are valuable. Yet answering the question of what constitutes valuable conjectures at a general level would require further research. 4Contrary to exception-barring, in the case of monster-barring it is denied that the counterexamples are actual counterexamples. This is how monster-barring differs from exception-barring. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 40 of dealing with counterexamples to conjectures. And I agree with Lakatos that for this purpose they are inferior to proof-generation because they do not take into account how the conjectures are proved; and therefore, it is even unclear whether the conjecture is true for the definition under consid- eration. Monster-including is a way of generalising conjectures. Yet again, since it neglects how conjectures are proved, I agree with Lakatos that for this purpose it is inferior to proof-generation. Furthermore, Lakatos thought that any of these kinds of justification were applied only because the better way of justifying definitions, namely with proof-generation, was not known (Lakatos 1976, pp. 14–42, pp. 136–140). Because of their inadequacies and since they play no role in our case study, I shall not say any more about these kinds of justification in this chapter. Unfortunately, Lakatos (1976) never explicitly states how widely he thinks that his ideas on proof-generated definitions apply. He seems to think that mathematicians discovered the method of justifying definitions via proof- generation in the 1840s (Lakatos, 1976, p. 139). Apart from this, general claims such as Progress indeed replaces naive classification by [...] proof-generated [...]classification. [...]Naive conjectures and naive concepts are su- perseded by improved conjectures (theorems) and concepts (proof- generated [...] concepts) growing out of the method of proofs and refutations (Lakatos, 1976, pp. 91–92; see also p. 144, original emphasis). suggest that mathematical definitions should be, and after mathematicians discovered the method of proof-generation, are generally proof-generated, and some have interpreted him as saying this (Brown 1999, pp. 110–111). However, as Larvor (1998) has pointed out, Lakatos stresses in his disserta- tion (Lakatos 1961), on which his (1976) book is based, that his account of informal mathematics does not apply to all of mathematics. What is clear is that Lakatos thought that there are many mathematical subjects with some proof-generated definitions and that there are many mathematical subjects with some definitions which should be proof-generated.5 Maybe Lakatos 5Of course, the question remains what a ‘mathematical subject’ is; I will say more CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 41 also believed something stronger, and this would explain his strong claims such as in the above quote, namely that there are many subjects where proof- generation should be the sole important way in which definitions are justified; and that there are many subjects created after mathematicians discovered the method of proof-generation where proof-generation is the sole important way in which definitions are justified. In what follows, I will show in which ways Lakatos’s ideas on justifying definitions are limited; and for this it will not matter much whether or not he endorsed the stronger claim. Corfield (1997, pp. 111–115) argues that Lakatos did not think that his account of informal mathematics, which includes his ideas on justifying defi- nitions, extends to established branches of mathematics of the twentieth cen- tury. Yet Corfield’s claim is implausible. Lakatos (1976, p. 5, pp. 152–154) states that his ideas on informal mathematics apply to modern metamathe- matics and to Carathe´odory’s (1914) investigations on measurable sets. And substantial parts of established mathematics of the twentieth century are not any more formalised than that mathematics: e.g., ergodic theory, which will be relevant later. Thus Lakatos indeed thought that his ideas could apply to substantial parts of established branches of mathematics of the twentieth century. But I agree with Corfield’s (1997) main point that Lakatos failed to see that his ideas are also relevant for highly formalised mathematics. For this reason, this chapter is not restricted to informal mathematics. This discussion highlights that there is little work on the actual practice of how definitions are justified in articles and books. Furthermore, although Lakatos’s account of proofs and refutations has been challenged (Corfield 1997, Leng 2002), his ideas on proof-generated definitions have hardly been criticised. My contribution on the guiding question and my criticism of Lakatos’s ideas on justifying definitions will be based on a case study of notions of unpredictability in ergodic theory. Let me now introduce this case study. about this later (see subsection 3.4.4). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 42 3.3 Case study: notions of unpredictability in ergodic theory My case study is on notions of unpredictability in ergodic theory. Ergodic the- ory originated from work in statistical mechanics, in particular Boltzmann’s kinetic theory of gases. Some of Boltzmann’s work relied on the assumption that the time-average of a function equals its space average, but no accept- able argument was provided for this (cf. Uffink 2007). Generally, the possible unpredictable motion of classical systems was a constant theme in statistical mechanics. Ergodic theory arose in the early 1930s when Birkhoff (1931) and von Neumann (1932a) proved the famous mean and pointwise ergodic theorems, respectively. Among other things, they found that ergodicity (cf. Definition 2.5) was the sought-after concept guaranteeing the equality of time and space averages for almost all states of the system. Motivated by these results, an investigation into the unpredictable behaviour of classical systems began. Of particular importance here was the study of unpredictability by a group of mathematicians around Kolmogorov in Russia. From the 1960s onwards, ergodic theory became prominent, and was further developed, as a mathematical framework for studying chaotic behaviour. Overall, ergodic theory had less impact on statistical mechanics than expected, partly because of the doubts, and the difficulty of proving, that the relevant systems are er- godic. But it developed into a discipline with its own internal problems and had, and continues to have, considerable impact on probability theory and chaos research (Aubin & Dahan-Dalmedico 2002; Dahan-Dalmedico 2004; Mackey 1974). Why do notions of unpredictability in ergodic theory constitute a valu- able case study? First, several of Lakatos’s assertions, e.g., that mathemat- ics is driven by counterexamples, have been criticised in the following way: while they may be correct for older mathematics, they do not hold true for twentieth century mathematics (Leng 2002, p. 10). As also Lakatos (1976, pp. 136–140) suggests, how definitions are justified may depend on when they were formulated because reasoning changes with the advancement of mathematics. To ensure that claims on the justification of definitions escape the criticism of not applying to twentieth century mathematics, I choose a CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 43 branch of mathematics, viz. ergodic theory, which was created in the twen- tieth century. Second, concerning the justification of definitions, the picture for notions of unpredictability in ergodic theory appears different to that proposed by Lakatos, and this picture seems prevalent in mathematics. As widely acknowledged, the main notions of unpredictability in ergodic theory are (cf. Berkovitz, Frigg & Kronz 2006; Sinai 2000, p. 21, pp. 41–46; Walters 1982, pp. 39–41, pp. 86–87, pp. 105–107): weak mixing (three versions), strong mixing (two versions), Kolmogorov- mixing, Kolmogorov-system, Bernoulli system (two versions), Kolmogorov- Sinai entropy.6 In the remaining sections of this chapter, I will present the insights on the justification of definitions which derive from this case study. I will discuss the way the discrete-time and continuous-time versions of the definitions of the above list which are italicised are justified as notions of unpredictability in the literature and whether they are reasonably justified. I will also examine the way these definitions have been initially justified.7 A detailed investigation of them will suffice to illustrate these insights. Hence, for the remaining listed definitions, I will just state how they are justified. Let me now discuss the kinds of justification which occur in this case study. They illustrate that not only proof-generation is important. 6The definitions of weak mixing, strong mixing, being a Kolmogorov system and being a Bernoulli system are also sometimes referred to as the ergodic hierarchy. 7I will not investigate the use of these definitions elsewhere in mathematics. The main reason for such an investigation would be to understand how the justification of definitions varies in different contexts. Yet I think that one can also find out about this by considering only how definitions were initially justified and later justified as notions of unpredictability. Going further would required an enormous amount of work without considerable gain. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 44 3.4 Kinds of justification of definitions 3.4.1 Natural-world justification I claim, first, that definitions in my case study are frequently justified because they capture a preformal idea regarded as valuable for describing or under- standing the natural world. Here I will speak of natural-world-justified def- initions. Natural-world-justified definitions are a special case of the general idea discussed in the literature that mathematical definitions should capture a valuable preformal idea (cf. Brown 1999, p. 109). If the preformal idea is valuable for describing or understanding the natu- ral world, natural-world-justification is reasonable. It is important to realise that natural-world-justification does not mean that there is a ‘best’ definition of a vague idea. There can be several different definitions expressing a vague idea without a clearly ‘best’ one. Natural-world-justified definitions can be regarded as providing knowledge in the following sense: they are a possible formalisation of a preformal idea which is valuable. Many definitions in the list of notions of unpredictability (cf. section 3.3) are natural-world-justified: I will now discuss one version of weak mixing (for discrete and continuous time), one version of a Bernoulli system (for discrete time) and the Kolmogorov-Sinai entropy (for discrete and continuous time) in detail. For illustrating natural-world-justification, it would suffice to consider the Kolmogorov-Sinai entropy. But the discussion of the remaining two definitions is crucial in order to provide the necessary background for the next sections. Moreover, all versions of strong mixing (Berkovitz, Frigg & Kronz 2006, p. 676; Hopf 1932a, p. 205) and Kolmogorov-mixing (Sinai 1963, p. 66) are natural-world-justified. Weak mixing Definition 16 The discrete measure-preserving deterministic system (M,ΣM , µ, T ) is weakly mixing if, and only if, for all A,B ∈ ΣM there is a P ⊆ N of density zero such that lim t→∞, t/∈P µ(T t(A) ∩B) = µ(A)µ(B), CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 45 where P ⊆ N is of density zero if, and only if, limt→∞, t∈N#(P ∩{i | i ≤ t, i ∈ N})/t = 0. Definition 17 The continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is weakly mixing if, and only if, for all A,B ∈ ΣM there is a P ⊆ R+ of density zero such that lim t→∞, t/∈P µ(Tt(A) ∩B) = µ(A)µ(B), where P ⊆ R+ is of density zero if, and only if, limt→∞, t∈R+ λ(P ∩ (0, t])/t = 0, where λ is the Lebesgue measure on R. For a discrete measure-preserving deterministic system (M,ΣM , µ, T ) or a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) and a set A ∈ ΣM , define At as the event that the state of the deterministic system is in A at time t. For instance, for the baker’s system (Example 1) you could be interested in the event that the state of the deterministic system at time t is on the left side of the unit square, i.e., you could be interested in the event At where A = [0, 1/2]× [0, 1] \D. Because the exact state of the deterministic system may be unknown, I introduce p(At), the probability of the event At. Assume that the measure can be interpreted as time-independent probability. As explained in section 2.1, this is quite natural under certain interpretations. Then: For all t and for all A ∈ ΣM : p(At) = µ(A). (3.1) This idea can be generalised to joint simultaneous events as follows: For all t and for all A,B ∈ ΣM : p(At&Bt) = µ(A ∩B). (3.2) This immediately implies: For all t, t′ and all A,B ∈ ΣM : p(At&Bt′) = µ(T t′−t(A) ∩B) (3.3) since T t ′−t(A) is the evolution of the set A from t to t′.8 8I can infer (3.3) from (3.2) as follows: T t ′−t(A) contains exactly those points that are in A at time t. Consequently, T t ′−t(A) ∩B consists of exactly those points which pass B at time t′ and go through A at time t, i.e., for which At&Bt ′ is true. Thus from (3.2) it follows that p(At&Bt ′ ) = µ(T t ′−t(A) ∩B). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 46 Definitions 16 and Definition 17 expresses that for any A,B ∈ ΣM and any ε > 0 there is a t′ ∈ N or t′ ∈ R+ and a set P of density zero with |µ(T t(A) ∩ B)−µ(A)µ(B)|< ε for all t ≥ t′, t /∈ P . Now assume, without loss of generality, that the event you want to predict occurs at time 0. Then from equation (3.3) it follows that Definition 16 and Definition 17 capture the following idea of unpredictability: for any event B0, B ∈ ΣM , any A ∈ ΣM and any ε > 0 there is a t′ ∈ N or R+ and a set P of density zero with |p(B0& A−t) − p(B0)p(A−t)| < ε for all t ≥ t′, t /∈ P . That is, given an arbitrary level of precision ε > 0 any event is approximately probabilistically independent of almost any event that is sufficiently past. Independence is understood here as in probability theory. This unpredictability might apply, for instance, to systems in meteorology and make it hard to predict them. Von Neumann (1932b, p. 591, p. 594) lists the main statistical properties of classical deterministic systems discussed in ergodic theory at that time. In this context he remarks that Definition 16 captures the preformal idea of approximate independence of almost all events explained above. Thus he argues that it is natural-world justified. This justification grew in importance with the rise of chaos research in the 1960s (see, e.g., Berkovitz, Frigg & Kronz 2006, p. 688). This justification also appears in a few standard books on ergodic theory (e.g., Walters 1982, p. 45), although in books often no justification is provided for weak mixing (e.g., Arnold & Avez 1968, pp. 21– 22; Cornfeld et al. 1982, pp. 22–23; Sinai 2000, p. 21). Especially before the rise of chaos research weak mixing appears to be mostly not naturally-world justified. This will be shown in subsection 3.4.2, where I will also discuss the key contexts in which weak mixing was intro- duced. The next definition relates to the important topic of equivalence of measure- preserving deterministic systems. Discrete Bernoulli system The idea of an infinite sequence of probabilistically independent trials of an N -sided die is a very old one. Kolmogorov (1933) gave the modern measure- theoretic formulation of probability theory and laid the foundations for the CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 47 modern theory of stochastic processes (as introduced in section 2.2) (von Plato 1994, pp. 230–233). Recall that in this modern framework a doubly- infinite sequence of independent rolls of an N -sided die where the possible outcomes are M¯ = {s1, . . . , sN} and the probability of obtaining outcome sk is pk, 1 ≤ k ≤ N , ∑N k=1 pk = 1, is called a Bernoulli process; also, recall that a Bernoulli process can be represented as follows (see Example 4 in section 2.2): Ω is the set of realisations of the stochastic process, ΣΩ is the σ-algebra generated by the semi-algebra of cylinder-sets, ν is the extension of the pre-measure defined by the independence property on the cylinder sets and Zt : Ω→ M¯, Zt(ω) = ωt (the t-th coordinate of ω). Then {Zt; t ∈ Z} is a representation of the Bernoulli process. Now I define a measure-preserving deterministic system: consider the following function, called a shift T : Ω→ Ω T ((. . . ωi . . .)) = (. . . ωi+1 . . .). (3.4) The shift is easily seen to be measurable and measure-preserving. Definition 18 The measure-preserving deterministic system (Ω,ΣΩ, ν, T ) as constructed above is called a Bernoulli shift with probabilities (p1, . . . , pN). The meaning of a Bernoulli shift is that it represents a Bernoulli process. For assume that one sees only the 0-th coordinate of the sequence ω, i.e., one applies the observation function Φ0 : Ω → M¯, Φ0(ω) = ω0 to the Bernoulli shift (Ω,ΣΩ, ν, T ). Then the possible outcomes of the Bernoulli process are the possible observed values of the Bernoulli shift (Ω,ΣΩ, ν, T ). It is clear that any realisation of the Bernoulli process rω, where rω generally denotes a realisation of a stochastic process (cf. section 2.2), is contained in the phase space Ω. And observing the solution srω of (Ω,ΣΩ, ν, T ) with Φ0 exactly gives rω. Furthermore, the measure ν is defined by the probabilities which are assigned by the Bernoulli process to each cylinder set. Hence the probability distribution over the realisations of the Bernoulli process is the same as the one over the sequences of observed values of (Ω,ΣΩ, ν, T ). Thus a Bernoulli shift is a deterministic representation of a Bernoulli process. In one of the first papers on ergodic theory, von Neumann (1932b) intro- duced the fundamental idea that measure-preserving deterministic systems CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 48 are probabilistically equivalent, i.e., that their states can be put into one- to-one correspondence such that the corresponding solutions have the same probability distributions. He developed the definition of isomorphic deter- ministic systems to capture this idea (Sinai 1989, p. 833), and he called for a classification of measure-preserving deterministic systems up to isomorphism. Definition 19 The discrete measure-preserving deterministic systems (M1,ΣM1 , µ1, T1) and (M2,ΣM2 , µ2, T2) are isomorphic if, and only if, there are measurable sets Mˆi ⊆Mi with µi(Mi \Mˆi) = 0 and TiMˆi ⊆ Mˆi (i = 1, 2), and there is a bijection φ : Mˆ1 → Mˆ2 such that (i) φ(A) ∈ ΣM2 for all A ∈ ΣM1 , A ⊆ Mˆ1, and φ−1(B) ∈ ΣM1 for all B ∈ ΣM2 , B ⊆ Mˆ2; (ii) µ2(φ(A)) = µ1(A) for all A ∈ ΣM1 , A ⊆ Mˆ1; (iii) φ(T1(m)) = T2(φ(m)) for all m ∈ Mˆ1. For continuous measure-preserving deterministic systems (M1,ΣM1 , µ1, T 1 t ) and (M2,ΣM2 , µ2, T 2 t ) the definition of being isomorphic is the same except that condition (iii) is φ(T 1t (m)) = T 2 t (φ(m)) for all m ∈ Mˆ1 and all t ∈ R (cf. Petersen 1983, p. 4). One easily sees that ‘being isomorphic’ is an equivalence relation. Consequently, we see that the following definition captures the idea of a deterministic system which is probabilistically equivalent to a deterministic system representing a Bernoulli process, e.g., throwing a die: Definition 20 (M,ΣM , µ, T ) is a discrete Bernoulli system if, and only if, it is isomorphic to a Bernoulli shift. In many articles Definition 20 is natural-world-justified as capturing the idea that a deterministic system is probabilistically equivalent to a deterministic representation of a Bernoulli process (Ornstein 1989, p. 4; Rohlin 1960, p. 5). Walter’s (1982, p. 107; see also Ornstein 1974, p. 4) comment Since a Bernoulli shift is really an independent identically dis- tributed stochastic process indexed by the integers, we can think of a {discrete Bernoulli system} as an abstraction of such a stochas- tic process.9 9Square brackets indicate that the original notation has been replaced by the notation used in this dissertation. I will use this convention throughout. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 49 shows that this justification is also found in standard books on ergodic theory. Yet some books do not provide any justification for Definition 20 (e.g., Shields 1973, p. 5). Clearly, the Bernoulli shifts given by choices of N and, for each N , the choices of p1, . . . pN are discrete Bernoulli systems. In the next paragraph about the Kolmogorov-Sinai entropy we will say more about when Bernoulli shifts are isomorphic. The next definition illustrates that a definition can be both natural-world- justified and proof-generated. Kolmogorov-Sinai entropy Assume that a probability distribution P = (p1, . . . , pn) is given over a set of possible symbols (x1, . . . , xn), n ∈ N (that is, pi ≥ 0 for all i and ∑n i=1 pi = 1). In information theory the amount of information gained when a symbol is received is understood to equal the amount of uncertainty reduced when a symbol is received. The Shannon information S(P ) = −∑ni=1 pi log(pi) mea- sures the average amount of uncertainty reduced when a symbol is received or, equivalently, the average amount of information gained when a symbol is received (see Cover & Thomas 2006; Frigg & Werndl 2010; Klir 2006, section 2.2.3).10 Ergodic theory and information theory can be connected as follows: first, recall Definition 7 of a partition α. Given a discrete measure-preserving deter- ministic system (M,ΣM , µ, T ) each m ∈ M produces, relative to a partition α = {α1, . . . , αk}, a bi-infinite string of symbols . . . x−2x−1x0x1x2 . . . in an al- phabet of k letters via the coding xj = αi if, and only if, T j(m) ∈ αi, j ∈ Z. Interpreting the measure-preserving deterministic system (M,ΣM , µ, T ) as the source, the output of the source are these strings . . . x−2x−1x0x1x2 . . .. If the measure is interpreted as probability density, one has a probability distribution over these strings. Hence the whole apparatus of information theory can be applied to these strings. In particular, given a partition α = {α1, . . . , αk} of (M,ΣM , µ), H(α) = 10Throughout the dissertation ‘log’ stands for the logarithm to the basis of two. Also, 0 log(0) is defined to be 0. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 50 −∑ki=1 µ(αi) log(µ(αi)) is the Shannon information of P = (µ(α1), . . . , µ(αk)) and measures the average information of the symbol αi. Let us regard strings of length n, n ∈ N, produced by the deterministic system relative to a coding α as messages. The probability distribution of these possible strings of length n relative to α is µ(βi), 1 ≤ i ≤ h, β = {β1, . . . , βh} = (α ∨ T−1α ∨ . . . ∨ T−n+1α). Hence Hn(α, T ) = 1 n H(α ∨ T−1α ∨ . . . ∨ T−n+1α) (3.5) measures the average amount of information which the measure-preserving deterministic system produces per step over the first n steps relative to the coding α. And the limit H(α, T ) = lim n→∞ Hn(α, T ), (3.6) which can be proven to exist, measures the average information which the measure-preserving deterministic system produces per step relative to α as time goes to infinity (Petersen 1983, pp. 233–240). Now: Definition 21 EKS(M,ΣM , µ, T )=supα{H(α, T )} is the Kolmogorov-Sinai entropy of the discrete measure-preserving deterministic system (M,ΣM , µ, T ). It is clear that it measures the highest average amount of information that the deterministic system can produce per step relative to a coding, or, equiv- alently, the average amount of uncertainty reduced per step relative to a coding. The Shannon information measures uncertainty, and this uncertainty can be regarded as a form of unpredictability (cf. Frigg 2004, Frigg 2006). Hence a positive Kolmogorov-Sinai entropy means that relative to some cod- ings the behaviour of the system is unpredictable. For a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) it can be shown that for any t0, −∞ < t0 <∞, (Sinai 2007): EKS(M,ΣM , µ, Tt0) = |t0|EKS(M,ΣM , µ, T1), (3.7) where EKS(M,ΣM , µ, Tt0) denotes the Kolmogorov-Sinai entropy of the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) and EKS(M,ΣM , µ, T1) is the Kolmogorov-Sinai entropy of the discrete measure- preserving deterministic system (M,ΣM , µ, T1). Consequently: CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 51 Definition 22 The Kolmogorov-Sinai entropy of a continuous measure- preserving deterministic system (M,ΣM , µ, Tt) is defined as EKS(M,ΣM , µ, T1). And it measures the average amount of information or uncertainty produced by the continuous deterministic system over one time unit. Having worked for several years on information theory, Kolmogorov (1958) was the first to apply information-theoretic ideas to ergodic theory. He intro- duced a definition of entropy only for what are nowadays called Kolmogorov- systems. Based on Kolmogorov’s work, Sinai (1959) introduced a different notion of entropy which applies to all measure-preserving deterministic sys- tems, the now canonical Definition 21 and Definition 22. Sinai also proved— a big surprise at that time—that automorphisms on the torus have positive Kolmogorov-Sinai entropy and thus are unpredictable because they produce information. Kolmogorov and Sinai were motivated by finding a concept which characterises the amount of randomness or unpredictability of a sys- tem (Frigg & Werndl 2010, Shiryaev 1989, Sinai 2007, Werndl 2009c). More specifically, as Halmos (1961, p. 76) explains: “Intuitively speaking, the en- tropy {EKS} is the greatest quantity of information obtainable about the universe per day [i.e., step] by repeated performances of experiments with a finite [...] number of possible outcomes”. Hence Definition 21 is natural- world-justified by capturing the idea of the average amount of information produced per step explained above. Also in some standard books on ergodic theory Definition 21 and Def- inition 22 are natural-world-justified in this way (Billingsley 1965, p. 63; Petersen 1983, pp. 233–240). It should, however, be mentioned that in many books Definition 21 and Definition 22 are not justified at all (e.g., Arnold & Avez 1968, pp. 35–50; Cornfeld et al. 1982, pp. 246–257; Sinai 2000, pp. 40–43). Interestingly, Definition 21 of the Kolmogorov-Sinai entropy is also proof- generated. And, so far as I can see, it is the only notion of unpredicta- bility in ergodic theory (cf. section 3.3) which is proof-generated. The cen- tral internal problem of ergodic theory is the following: which measure- preserving deterministic systems are isomorphic (cf. Definition 19)? Given a measure space (M,ΣM , µ) consider L 2(M,ΣM , µ), the Hilbert space of real-valued square integrable functions on (M,ΣM , µ) where two functions CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 52 which differ by a set of measure zero are identified and the inner product is = ∫ M fg dµ for any elements f, g of L2(M,ΣM , µ). Now suppose that a discrete measure-preserving deterministic system (M,ΣM , µ, T ) is given. Then UT : L 2(M,ΣM , µ)→ L2(M,ΣM , µ), UT (f) = f(T (m)), is a linear op- erator. Likewise, given a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) and any t ∈ R, the map UTt : L2(M,ΣM , µ)→ L2(M,ΣM , µ), UTt(f) = f(Tt(m)), is a linear operator. In fact, UT and UTt are unitary operators. An operator V on a Hilbert space is called unitary if, and only if, (i) V is linear, (ii) V is invertible and (iii) < V f, V g >=< f, g > for all elements f, g of the Hilbert space.11 This was first discovered by Koopman (1931), and the investigation of measure-preserving deterministic systems by these operators is referred to as the spectral theory of deterministic systems (cf. Petersen 1983, section 2). Measure-preserving deterministic systems which are equivalent from this viewpoint are said to be spectrally isomorphic. Formally, the discrete measure- preserving deterministic systems (M1,ΣM1 , µ1, T1) and (M2,ΣM2 , µ2, T2) are spectrally isomorphic if, and only if, there exists an unitary operator V on L2(M1,ΣM1 , µ1) such that V ∗UT1V = UT2 , where V ∗ is the adjoint of V . And the continuous measure-preserving deterministic systems (M1,ΣM1 , µ1, T 1 t ) and (M2,ΣM2 , µ2, T 2 t ) are spectrally isomorphic if, and only if, there exists an unitary operator V on L2(M1,ΣM1 , µ1) such that V ∗UT 1t V = UT 2t for all t ∈ R. In the 1950s it was known that deterministic systems with discrete spec- 11Clearly, UT and UTt are linear. And it is clear that UT is invertible and that U −1 T (f) = f(T−1(m)), and that UTt is invertible for all t ∈ R and that U−1Tt (f) = f(T−t(m)). Finally, the fact that (M,ΣM , µ, T ) and (M,ΣM , µ, Tt) are measure-preserving implies that (cf. Petersen 1983, section 2): = ∫ M UT (f)UT (g)dµ = ∫ M f(T (m))g(T (m))dµ = ∫ M f(m)g(m)dµ = (3.8) and = ∫ M UTt(f)UTt(g)dµ = ∫ M f(Tt(m))g(Tt(m))dµ = ∫ M f(m)g(m)dµ = (3.9) is true for all characteristic functions, all combinations of characteristic functions and hence, by approximation, also for all f, g ∈ L2(M,ΣM , µ). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 53 trum are isomorphic if, and only if, they are spectrally isomorphic and that this is not so for deterministic systems with mixed spectrum. Most importantly, however, is the case of a continuous spectrum since measure- preserving deterministic systems typically have this property (Arnold & Avez 1968, pp. 27–32). Measure-preserving deterministic systems have continuous spectrum if, and only if, their only eigenfunctions are the constant functions. That is, for discrete time if, and only if, the only functions f ∈ L2(M,ΣM , µ) satisfying UT (f) = λf , where λ ∈ R arbitrary, are the constant functions; and for continuous time if, and only if, the only functions f ∈ L2(M,ΣM , µ) satisfying UTt(f) = λf for all t ∈ R, where λ ∈ R arbitrary, are the con- stant functions. For measure-preserving deterministic systems with continu- ous spectrum, e.g., discrete Bernoulli systems, the conjecture emerged that spectrally isomorphic systems are not always isomorphic, but the problem resisted solution. Kolmogorov (1958) and Sinai (1959) were motivated by making progress about this conjecture (Shiryaev 1989, pp. 914–915; Sinai 1989, pp. 834–836). And Kolmogorov’s (1958) main result is that this conjecture is true. As hinted at by Rohlin (1960, pp. 1–2, p. 8), the Kolmogorov-Sinai entropy can be justified as being precisely the definition which is needed to prove that conjecture, i.e., it is proof-generated. The argument, which goes back to Kol- mogorov’s work, is as follows: isomorphic measure-preserving deterministic system have the same Kolmogorov-Sinai entropy. Now look at Bernoulli shifts, whose Kolmogorov-Sinai entropy is ∑ i pi log(pi) and hence takes a continuum of different values. Since all Bernoulli shifts are spectrally iso- morphic, there is a continuum of measure-preserving deterministic systems being spectrally isomorphic but not isomorphic. Billingsley’s (1965, p. 65) comment It is essential to understand the difference between H(α, T ) and {EKS(M,ΣM , µ, T )} and why the latter is introduced. If the en- tropy of T were taken to be H(α, T ) for some “naturally” selected α [...], then it would be useless for the isomorphism problem. shows that the justification of Definition 21 as being proof-generated made it into standard books on ergodic theory too (see also Petersen 1983, p. 227, CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 54 p. 246). Let us turn to the second kind of justification I have identified. 3.4.2 Condition justification I claim that another kind of justification abounds in my case study: a defini- tion is justified by the fact that it is equivalent in an allegedly natural way to a previously specified condition which is regarded as mathematically valuable. I speak here of condition-justified definitions. If the previously specified condition is valuable and the kind of equiva- lence is natural, condition justification is a reasonable kind of justification.12 A condition-justified definition can be regarded as providing knowledge be- cause it answers the question of which definition corresponds naturally to a previously specified condition. The following notions of unpredictability in ergodic theory (cf. section 3.3) are condition-justified: all versions of weak mixing (for discrete and contin- uous time) and one version of being a discrete Bernoulli system (for discrete time). Let us discuss them now. Weak mixing Recall Definition 16 and Definition 17 of weak mixing. Two alternative equiv- alent definitions for discrete and continuous time are (Cornfeld et al. 1982, pp. 22–23; Petersen 1983, pp. 65–67): Definition 23 A discrete measure-preserving deterministic system (M,ΣM , µ, T ) is weakly mixing if, and only if, for all A,B ∈ ΣM lim t→∞ 1 t t−1∑ i=0 |µ(T i(A) ∩B)− µ(A)µ(B)| = 0. 12For the condition-justified definitions of my case study we will see why the conditions are valuable and the equivalences are natural. Yet characterising what constitutes valuable conditions or natural kinds of equivalence at a general level would require further research. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 55 Definition 24 A continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is weakly mixing if, and only if, for all A,B ∈ ΣM lim t→∞ 1 t ∫ t 0 |µ(Tτ (A) ∩B)− µ(A)µ(B)|dτ = 0, where the measure on the time axis τ ∈ R+0 is the Lebesgue measure. Definition 25 The discrete measure-preserving deterministic system (M,ΣM , µ, T ) is weakly mixing if, and only if, for all f, g ∈ L2(M,ΣM , µ) lim t→∞ 1 t t−1∑ i=0 | ∫ f(T i(m))g(m)dµ− ∫ f(m)dµ ∫ g(m)dµ |= 0, where L2(M,ΣM , µ) is the Hilbert space of real-valued square integrable func- tions on (M,ΣM , µ) where two functions which differ by a set of measure zero are identified. Definition 26 The continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is weakly mixing if, and only if, for all f, g ∈ L2(M,ΣM , µ) lim t→∞ 1 t ∫ t 0 | ∫ f(Tτ (m))g(m)dµ− ∫ f(m)dµ ∫ g(m)dµ | dτ = 0, where L2(M,ΣM , µ) is the Hilbert space of real-valued square integrable func- tions on (M,ΣM , µ) where two functions which differ by a set of measure zero are identified, and the measure on the time axis τ ∈ R+0 is the Lebesgue mea- sure. I already argued that Definition 16 and Definition 17 of weak mixing can be natural-world-justified. The first three papers discussing weak mixing seem to be Hopf (1932a), Hopf (1932b), and Koopman & von Neumann (1932), which all discuss weak mixing for continuous deterministic sytems. These papers show that there is more to say; for three reasons. First, Hopf (1932a) starts by emphasising the importance of ergodicity for statistical mechanics (cf. Definition 2.5). He then considers a statisti- cal property discussed by Poincare´: when initially a certain part of a fluid is coloured, experience shows that after a long time the colour uniformly dissolves in the fluid. Mathematically, Hopf expresses this by strong mixing. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 56 Definition 27 A discrete measure-preserving deterministic system (M,ΣM , µ, T ) is strongly mixing if, and only if, for all A,B ∈ ΣM lim t→∞ µ(T t(A) ∩B) = µ(A)µ(B). Definition 28 A continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is strongly mixing if, and only if, for all A,B ∈ ΣM lim t→∞ µ(Tt(A) ∩B) = µ(A)µ(B). By looking at Definition 16 and Definition 17, we immediately see that any strongly mixing measure-preserving deterministic system is also weakly mix- ing. Interested in the interrelationship between strong mixing and ergodicity, Hopf indeed conjectures that a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is strongly mixing if, and only if, for all t0 ∈ R+ the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. Yet he is unable to prove this (it was later shown to be false, see Lind 1975). As a result, Hopf attends to the question of which weaker statisti- cal property is equivalent to the condition that for all t0 ∈ R+ the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. The an- swer he arrives at is Definition 26. Therefore, Definition 26 of weak mixing is condition-justified because its justification stems from it being equivalent in a natural way to a condition regarded as valuable. This justification only works for continuous deterministic systems and not for discrete deterministic system because it is not true that a discrete measure-preserving determinis- tic system (M,ΣM , µ, T ) is weakly mixing if, and only if, for all t0 ∈ N the discrete deterministic system (M,ΣM , µ, T t0) is ergodic.13 Second, Hopf (1932b) is concerned with Gibbs’ fundamental hypothesis that any initial distribution tends toward statistical equilibrium, and he de- rives several conditions under which this hypothesis holds true. Within this context, the question arises how properties of a discrete measure-preserving deterministic system (M,ΣM , µ, T ) or a continuous measure-preserving de- terministic system (M,ΣM , µ, Tt) relate to the composite system (M×M,ΣM⊗ 13The irrational rotation on the circle, which I will discuss in subsection 5.5.2, is a counterexample (Petersen 1983, p. 8). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 57 ΣM , µ×µ, T ×T ) or (M×M,ΣM ⊗ΣM , µ×µ, Tt×Tt) comprising two copies of the single system.14 Because of the importance of ergodicity, it is natural to ask: which property of the single system is equivalent to the composite system being ergodic? Hopf (1932b) provides the answer for continuous de- terministic systems, namely Definition 26 of weak mixing. The same answer, namely weak mixing, is also true for discrete measure-preserving determinis- tic systems (Halmos 1949, pp. 1021–1022). Hence weak mixing is condition- justified as Halmos (1949, p. 1022) stresses by referring to Definition 16 and Definition 23: an “indication that weak mixing is more than an analytic ar- tificiality is in the assertion that T is weakly mixing if, and only if, its direct product with itself is indecomposable [ergodic]”. Third, when discussing Definition 21 of the Kolmogorov-Sinai entropy, we encountered the property of a continuous spectrum which arises in spec- tral theory. Koopman & von Neumann (1932) emphasise the naturalness of, and devote their paper to, this property. From the beginning of ergodic theory the correspondence of concepts from spectral theory and set-theoretic and integral-theoretic concepts from ergodic theory has been a core theme. Hence it was natural to address the question, as Koopman & von Neumann did, which set-theoretic or integral-theoretic definition is equivalent to having a continuous spectrum. The answer they arrived at for continuous determi- nistic systems is Definition 17 of weak mixing, and the same answer, namely weak mixing, is also true for discrete deterministic systems (Petersen 1983, p. 64). Thus, again, Definition 16 and Definition 17 of weak mixing are condition-justified. I have found no book motivating the continuous version of weak mixing by the condition that for all t0, t0 6= 0, the discrete deterministic system (M,ΣM , µ, Tt0) is ergodic. This might be because that characterisation does not hold for discrete systems. The other two interpretations of weak mixing as condition-justified appear in standard books on ergodic theory, e.g., Halmos (1956, p. 39) and Petersen (1983, p. 64). The latter comments: 14HereM×M is the Cartesian product ofM withM ; ΣM⊗ΣM is the product σ-algebra, that is, the σ-algebra generated by sets of the form A × B, where A,B ∈ ΣM ; µ × µ is the product measure, that is, the unique measure satisfying the property µ× µ(A×B) = µ(A)µ(B); T × T (m, q) = (T (m), T (q)) and Tt × Tt(m, q) = (Tt(m), Tt(q)). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 58 That the concept of weak mixing is natural and important can be seen from the following theorem, according to which a trans- formation is weakly mixing if, and only if, its only measurable eigenfunctions are the constants. To summarise, all versions of weak mixing are condition-justified because their justification stems from their being equivalent in a natural way to a condition regarded as valuable. The next definition illustrates the danger of not appreciating that a definition is condition-justified. Discrete Bernoulli system Recall Definition 20 of a discrete Bernoulli system. The appeal to isomor- phisms makes this definition indirect. Furthermore, most states of the de- terministic systems encountered in the sciences, e.g., states of Newtonian systems, are not infinite sequences. Thus it is often easier to work without notions referring to infinite sequences. In investigating simple systems iso- morphic to Bernoulli shifts, it became clear that proving an isomorphism amounts to finding a partition which can be used to code the dynamics. Hence it was natural to ask which condition that does not appeal to isomor- phisms and infinite sequences, but to partitions, is equivalent to a discrete Bernoulli system. Definition 29 The discrete measure-preserving deterministic system (M,ΣM , µ, T ) is a discrete Bernoulli system if, and only if, there is a parti- tion α such that (i) T iα is an independent sequence, i.e., for any distinct i1, . . . , ir ∈ Z, and not necessarily distinct αj ∈ α, j = 1, . . . , r (r ≥ 1): µ(T i1α1 ∩ . . . ∩ T irαr) = µ(α1) . . . µ(αr). (ii) ΣM is generated by {T iα |i ∈ Z}. Hence Definition 29 can be justified by the fact that it gives an answer to the above question, i.e., it is condition-justified. Standard books on ergodic theory also hint at this justification (Shields 1973, p. 8, p. 11; Sinai 2000, p. 47). CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 59 There have been attempts to justify Definition 29 as capturing a pre- formal idea of randomness or unpredictability. Interpreting the measure as time-independent probability, condition (i) captures the idea that any finite number of events of a specific partition at different times are probabilistically independent. Berkovitz et al. (2006) argue that because condition (i) can be thus interpreted, discrete Bernoulli systems capture unpredictability;15 they do not say anything about condition (ii). Yet since (i) is only one part of this definition, this justification of Definition 29 fails.16 Generally, if a definition does not capture the idea it is said to capture, the justification fails because it is unclear why this definition is chosen. Batterman’s (1991) and Sklar’s (1993, pp. 238–239) motivation for Def- inition 29 is also that it captures a preformal idea of randomness or unpre- dictability. Their argument as expressed by Batterman (1991, pp. 249–250) is: Now let us see just how random a Bernoulli system is. [...] The Bernoulli systems are those in which knowing the entire past his- tory of box-occupations even relative to a partition (measure- ment) which is generating in the above sense, is insufficient (in the sense of being probabilistically independent) for improving the odds that the system will next be found in a given box. 15Actually, a slip occurred in Berkovitz et al.’s (2006, p. 667) interpretation of condition (i); (i) holds only for any finite number of events of a specific partition at different times, not for any events. 16For instance, the following measure-preserving deterministic system fulfills (i) but not (ii): let M = ([0, 1] × [0, 1] × [0, 1]) \ (D × [0, 1]) where D is defined as for the baker’s system (cf. Example 1). Let ΣM bet the Lebesgue σ-algebra on M and µ be the Lebesgue measure. Let T (m, y, z) = (2m, y 2 , z) if 0 ≤ m < 1 2 , (2m− 1, y + 1 2 , z) if 1 2 ≤ m ≤ 1. Obviously, for (M,ΣM , µ, T ) condition (i) of Definition 29 holds for α = {{m ∈ M | 0 ≤ m < 12}, {m ∈M | 12 ≤ m ≤ 1}}. But (M,ΣM , µ, T ) is not a discrete Bernoulli system since it is not even ergodic. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 60 As an interpretation of randomness or unpredictability this is puzzling. Even if it exactly corresponded to Definition 29,17 it is unclear, from the viewpoint of capturing a preformal idea of randomness or unpredictability, why indepen- dence is required relative to generating partitions; and I found no convincing justification for this. It seems that the difficulty stems from the fact that Definition 29 is really condition-justified. As we have seen for weak mixing, condition-justified definitions may in other contexts also capture a preformal idea valuable in some sense. However, often—and this is true for Definition 29 as discussed— this will not be the case. Then there is the danger of not appreciating that a definition is condition-justified and claiming that it captures a valuable preformal idea, when it does not. It seems that in interpreting Definition 29 Batterman and Sklar fell into this trap. This danger is similar to the one identified by Lakatos (1976, p. 153), viz. claiming that a proof-generated definition captures a valuable preformal idea when it does not. Let us now turn to the final kind of justification I have identified. 3.4.3 Redundancy justification I call a definition which is justified because it eliminates as redundant at least one condition in an already accepted definition redundancy-justified. A redundancy-justified definition can be regarded as providing knowledge since it shows that specific conditions in an accepted definition are redundant. It is obviously desirable in mathematics to find out whether there are any redundant conditions in an already accepted definition. Typically, both the original definition, and the one in which the redundant conditions are 17It does not. First, their interpretation does not make clear that the matter of concern is the existence of a partition satisfying (i) and (ii). Even if this is disregarded, their interpretation applies to more systems than discrete Bernoulli systems. This is so because it applies to every discrete measure-preserving deterministic system where there is a gen- erating partition where any events constituting the entire-history of box-occupation are of probability zero, and some of these deterministic systems are not Bernoulli (Ornstein 1974, pp. 93–95). The correct thing to say is: any finite number of events of a specific partition at different times are probabilistically independent, even though the partition is generating. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 61 eliminated, each have their own advantages. It depends on the definitions, but the former might be easier to understand or might allow for a more fine- grained analysis; the latter is simpler (in the sense of being more concise), and it might be that only the latter is easier to use in proofs, allows for natural generalizations, or suggests important analogies. So when is it better to propound the original definition? And when is it better to introduce instead the new definition without the redundant condi- tions, i.e., when is redundancy justification a reasonable kind of justification? I think the answer depends on the definition and the context in which the def- inition is considered. For the purpose of an introductory textbook it might be better to propound the original definition because it is easier to understand. Conversely, for the purpose of a research article it might be better instead to use the new, concise definition, since it is easier to use in some proofs. Furthermore, in many cases it does not seem to matter much whether the original definition or the definition without the redundant conditions is in- troduced, so long as the origin of the definition and the redundant conditions are clearly pointed out. As in the case of proof-generated and condition-justified definitions, there is the danger of not understanding that a definition is redundancy-justified and claiming that it captures a valuable preformal idea, when it does not. Two definitions in the list of notions of unpredictability in ergodic the- ory (cf. section 3.3) are redundancy-justified: the continuous version of a Bernoulli system, which I will discuss for illustration, and a Kolmogorov- system (Sinai 1963, pp. 64–65; Uffink 2007, pp. 94–96). Continuous Bernoulli system We have seen that Kolmogorov (1958) and Sinai (1959) established that iso- morphic discrete Bernoulli systems have the same Kolmogorov-Sinai entropy (cf. subsection 3.4.1). A decade later Ornstein (1970a, 1971) proved the con- verse, i.e., that discrete Bernoulli systems with equal entropy are isomorphic. Having established that celebrated result, Ornstein became interested in finding an analogous definition of a Bernoulli system for continuous time, and he asked whether the Kolmogorov-Sinai entropy could be used to classify CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 62 them too. The most obvious definition of a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) describing an independent process is that for all t0 ∈ R, t0 6= 0, the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is a discrete Bernoulli system. Ornstein (1973a) first intro- duces this definition of a continuous Bernoulli system, and then he shows that there are redundant conditions in this definition because it is equivalent to the following definition: Definition 30 The continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is a continuous Bernoulli system if, and only if, the dis- crete measure-preserving deterministic system (M,ΣM , µ, T1) is a discrete Bernoulli system. Hence Definition 30 is redundancy-justified because it eliminates redundant conditions. In this way it seems to be justified in Ornstein’s (1974, p. 56) book too.18 Ornstein (1973b) indeed showed that two continuous Bernoulli systems are isomorphic if, and only if, they have the same Kolmogorov-Sinai entropy. From Ornstein’s result immediately follows that even more holds, namely that up to a scaling of the time t any two continuous Bernoulli systems are isomorphic. Let me explain this. For any continuous measure-preserving deterministic system (M,ΣM , µ, Tt) the Kolmogorov-Sinai entropy of the discrete deterministic system (M,ΣM , µ, Tt0), t0 ∈ R arbitrary, t0 6= 0, is |t0| times the Kolmogorov-Sinai entropy of the discrete deterministic sys- tem (M,ΣM , µ, T1) (cf. equation (3.7)). So assume that two continuous Bernoulli systems (M,ΣM , µ, Tt) and (M2,ΣM2 , µ2, T 2 t ) with Kolmogorov- Sinai entropy EKS(M,ΣM , µ, Tt) and EKS(M2,ΣM2 , µ2, T 2 t ), respectively, are given. Now make the transformation t′ = ct, for c = EKS(M2,ΣM2 ,µ2,T 2 t ) EKS(M,ΣM ,µ,Tt) . Then we obtain that (M,ΣM , µ, Tt) is isomorphic to (M2,ΣM2 , µ2, T 2 t′) since the Kolmogorov-Sinai entropy of the continuous measure-preserving deter- ministic system (M2,ΣM2 , µ2, T 2 t′) is the Kolmogorov-Sinai entropy of the 18Ornstein (1974, p. 56) expresses this indirectly by introducing continuous Bernoulli systems as follows; “We will call a flow {(M,ΣM , µ, Tt)} a {continuous Bernoulli sys- tem} if {(M,ΣM , µ, T1)} is a {discrete Bernoulli system}. (We will prove later that if {(M,ΣM , µ, T1)} is a {continuous Bernoulli system}, then {(M,ΣM , µ, Tt0)} for each fixed t0 is a {discrete Bernoulli system}).” CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 63 discrete measure-preserving deterministic system (M2,ΣM2 , µ2, T 2 1 c ), which is 1 c EKS(M2,ΣM2 , µ2, T 2 t ) = EKS(M,ΣM , µ, Tt). 3.4.4 Occurrence of the kinds of justification To sum up: in addition to Lakatos’s proof-generated definitions, I have iden- tified three kinds of justification of definitions. To my knowledge, condition justification and redundancy justification have not been identified before. I do not claim that the kinds of justification I have discussed are the only ones at work in mathematics. Further studies might unveil yet other ones. Two more general comments about justifying definitions should be added here. First, for any kind of justification there are three possibilities: (i) a definition is reasonably justified in this way; (ii) it is justified but not reason- ably justified in this way; (iii) it is not justified in this way. As regards (ii), for instance, if the idea of being equivalent in a measure-theoretic sense to an independent process like throwing a die was not valuable, Definition 20 would be natural-world-justified but not reasonably justified. Second, an already justified definition has sometimes additional good features which support this definition but which do not by themselves constitute a sufficient justifi- cation. These features may also be important in deciding between different definitions. For instance, it is often said that a merit of the Kolmogorov- Sinai entropy is its neat connection to other notions of unpredictability such as being a Kolmogorov-system. These are good features but not sufficient justifications; since if there were no further reasons for studying the defini- tion, there would still remain the question why we should regard it as worth considering (cf. Smith 1998, pp. 174–175). How widely do the kinds of justification I have discussed occur? To answer this, I first comment on the notion of a mathematical subject. I think that regardless of which plausible understanding of ‘subject’ is adopted, my claims are true. But a possible way to operationalise this idea is the following: with the subjects identified by the Mathematical Subject Classification19 it would 19This is a five digit classification scheme of subjects formulated by the American Mathe- matical Society; see www.ams.org/msc. For our purposes subjects concerned with educa- tion, history or experimental studies have to be excluded. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 64 be possible to create a list of subjects within mathematics from the nine- teenth century up to today. Then the definitions of my case study (notions of unpredictability in ergodic theory) belong to the mathematical subject ‘strange attractors, chaotic dynamics’. Based on my knowledge of mathematics, I endorse the following claims about mathematics produced in the twentieth century and up to the present day:20 all the kinds of justifications I have discussed are widespread. More specifically, proof-generated, condition-justified, and redundancy-justified def- initions are all found in the majority of mathematical subjects with explicit definitions. Also, for nearly all mathematical subjects with explicit defini- tions which (among other things) aim at describing or understanding the natural world, natural-world-justified definitions are found. This includes subjects not only from what is called applied mathematics but also from pure mathematics, e.g., measure theory. Furthermore, as in my case study, for nearly all mathematical subjects with explicit definitions many different ways of justifying definitions are found and are reasonable. Indeed, I would be surprised if one subject could be found where only one kind of justifica- tion is important. Clearly, my case study shows that for the subject ‘strange attractors, chaotic dynamics’ these claims hold true. For my case study the argumentation involved in justifying definitions is typically not explicitly stated but is merely hinted at or merely implicit in the mathematics. Because of the conventional style of mathematical writing, this appears to be generally the case in mathematics, as also Lakatos (1976, pp. 142–144) claimed. Also, it should be mentioned that detailed knowledge of parts of ergodic theory is necessary to assess how definitions are justified in my case study. This confirms Tappenden’s claim that judgments about definitions require detailed knowledge of the relevant mathematics (cf. sec- tion 3.2). Let us reflect on the interrelationships between the kinds of justification, an issue which seems not discussed in the literature. 20Starting with the twentieth century is somewhat arbitrary. All the here-discussed kinds of justification appear also important in nineteenth century mathematics. Yet older mathematics may be significantly different. Hence a close investigation would be necessary to identify the role the kinds justification play in older mathematics. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 65 3.5 Interrelationships between the kinds of justification In what follows when I speak of an argument for a definition I mean that a reason is provided for a definition which cannot be split into two separate reasons for this definition. Now I first ask about the interrelationships in one argument : assume that a specific argument establishes that a definition is justified according to one kind of justification. Can it be that this argument implies that the definition is at the same time also justified according to an- other kind of justification? Intuitively, one might think that in an argument a definition can only be justified according to one kind of justification. Yet, as we will see, the matter is more complicated. Second, I ask about the interrelationships between the kinds of justification in different arguments : if different arguments justify the same definition, what combination of kinds of justification do we find? I will discuss these two cases in the next two subsections. 3.5.1 One argument Clearly, there are arguments where a definition is only proof-justified, natural- world-justified, condition-justified or redundancy-justified. For example, uni- form convergence as discussed by Lakatos (1976, pp. 131–133) is only proof- justified, Definition 20 of a discrete Bernoulli system as capturing the idea of a measure-preserving system being equivalent to an independent process is only natural-world-justified, weak mixing as corresponding to ergodicity of the composite system is only condition-justified, and Definition 30 of a continuous Bernoulli system as eliminating redundant conditions is only redundancy-justified. By going back to the characterisation of the kinds of justification, we see that the intuition that in an argument a definition can only be (reason- ably) justified according to one kind of justification is correct except for one case. Namely, in rare cases condition-justified definitions are at the same time proof-generated in an argument. This is so if, and only if, the kind of equivalence is regarded as natural because it occurs in the formulation of a CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 66 conjecture that should be established. For example, assume the following conjecture is regarded as valuable: each function in a convergent sequence of functions is continuous if, and only if, the limit function of the convergent sequence is continuous. Further, assume that sequences of pointwise con- vergent continuous functions without continuous limit functions are known. Then mathematicians might ask: how has the notion of convergence to be changed such that if, and only if, the limit function is continuous the se- quence of continuous functions is convergent? The definition answering this question would be clearly condition-justified. But it would also be proof- generated since it is needed in order to prove the above conjecture. Let us now turn to the interrelationships in different arguments. 3.5.2 Different arguments In our case study different arguments establish that weak mixing is condition- justified: weak mixing corresponds to ergodicity of the composite determinis- tic system, to the set-theoretic or integral-theoretic condition equivalent to having a continuous spectrum, and for continuous measure-preserving deter- ministic systems to the condition that for all t0 ∈ R+ the discrete measure- preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. Generally, one and the same definition can be (reasonably) justified in the same way in dif- ferent arguments by referring to different conjectures, preformal ideas etc. For proof-generated definitions Lakatos (1976, pp. 127–128) also recognises this pattern. What is more, we have seen that in different arguments Definition 16 and Definition 17 of weak mixing are justified in different ways: as men- tioned above, these definitions are condition-justified but also natural-world- justified, expressing the idea that given an arbitrary level of precision ε > 0 any event is approximately independent of almost any event that is suffi- ciently past. Likewise, the discrete version of the Kolmogorov-Sinai entropy is natural-world-justified, expressing the idea of the highest average amount of information produced per step relative to a coding; but it is also proof- generated concerning the conjecture that spectrally isomorphic systems are not always isomorphic. Generally, one and the same definition can in differ- CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 67 ent arguments be (reasonably) justified in different ways. Finally, a definition which is justified in any way can be used to (rea- sonably) justify a definition in an arbitrary way. In this sense the different kinds of justification are closely connected. For example, the natural-world- justified Definition 20 of a discrete Bernoulli system is used to justify the condition-justified Definition 29 of a Bernoulli system. A special case of this is when for proof-generated definitions preformal ideas shine through (which can be, but does not have to be the case). For instance, consider definitions of polyhedron as discussed by Lakatos (1976). Early definitions of polyhedron, which seem to be justified because they cap- ture the preformal idea of a solid with plane faces and straight edges, were eventually replaced by definitions which are needed to prove the Euler con- jecture. For these proof-generated definitions, to some extent, the preformal idea of the old definitions still shine through. Hence Lakatos’s (1976, p. 90) claim “In the different proof-generated theorems we have nothing of the naive concept” is an unfortunate exaggeration. I now return to Lakatos’s ideas on justifying definitions. 3.6 Assessment of Lakatos’s ideas on proof- generated definitions First, in focusing on proof-generated definitions, Lakatos did not recognise the interplay between the different kinds of justification of definitions, which I discussed in section 3.5. In particular, Lakatos never indicates that in different arguments the same definition can be justified in different ways. Second, Lakatos did not show, as I did for notions of unpredictability in ergodic theory, that often various kinds of justification are important and that a variety of kinds of justification can be reasonable. I argued that Lakatos may have believed the following (cf. section 3.2): there are many mathe- matical subjects where proof-generation should be the sole important way that definitions are justified; and there are many subjects created after math- ematicians discovered the method of proof-generation where proof-generation is the sole important way that definitions are justified. From our claim that CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 68 for nearly all mathematical subjects many different ways of justifying def- initions are found and are reasonable, it follows that this must be wrong (cf. subsection 3.4.4). That is, subjects which were created after mathemati- cians discovered the method of proof-generation where solely proof-generated definitions are found and are reasonable appear to be exceptional. Indeed, Lakatos could have shown with his case studies that often vari- ous kinds of justification are found and that various kinds of justification can be reasonable. To demonstrate this, I will now show that even for the sub- jects discussed by Lakatos (1976), not only proof-generation but also other kinds of justification are important. To avoid getting the discussion lengthy, I show this here only for the subjects to which the definition of uniform con- vergence and the Carathe´odory definition of measurable sets belong. But this hypothesis can easily seen to be also true for the subjects to which the other proof-justified definitions Lakatos discusses (namely the definitions of polyhedron, bounded variation and the Riemann integral) belong. Lakatos (1976, pp. 144–146) argues that uniform convergence is proof- generated, also by referring to textbooks. This definition falls under the subject of the Mathematical Subject Classification ‘convergence and diver- gence of series and sequences of functions’. A definition discussed in this subject is the radius of convergence of a power series. A power series is of the form ∑∞ k=0 ak(x− x0)k, where ak, x0 and x ∈ R. Definition 31 Its radius of convergence is the unique number R ∈ [0,∞] such that the series converges absolutely if |x − x0| < R and diverges if |x− x0| > R. The radius of convergence is often defined differently as follows. The root test is a powerful criterion for the convergence of infinite series. Hence the question arises whether there is a definition which is equivalent to the radius of convergence as defined above but which gives an explicit way to calculate this radius by referring to the root test. The answer is yes, namely: Definition 32 For a power series the radius of convergence is R = 1/ lim sup k→∞ k √ |ak|. CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 69 Thus Definition 32 is condition-justified, as, for example, hinted at in Mars- den and Hoffman’s (1974, pp. 289–290) standard analysis textbook: “The reason for the terminology in {Definition 32} is brought out by the following result [that by applying the root test, Definition 32 is equivalent to Defini- tion 31].” Lakatos (1976, pp. 152–154), mainly by referring to Halmos’s (1950) book, argues that the Carathe´odory definition of measurable sets is proof-generated. This definition falls under the subject of the Mathematical Subject Classi- fication ‘classes of sets, measurable sets, Suslin sets’. The definition of a σ-algebra clearly belongs to this subject. The basic idea of a σ-algebra is to have a collection of subsets of X including X which is closed under countable set-theoretic operations. Thus a usual definition is (Cohn 1980, pp. 1–2): Definition 33 A set Σ of subsets of X is a σ-algebra if, and only if, (i) X ∈ Σ, (ii) for all A ⊆ X if A ∈ Σ, then X \ A ∈ Σ, (iii) for all sequences (Ak)k≥0 if Ak ∈ Σ for all k ≥ 0, then ⋃∞ i=0Ak∈Σ, (iv) for all sequences (Ak)k≥0 if Ak ∈ Σ for all k ≥ 0, then ⋂∞ i=0Ak∈Σ. Now one can easily see that the conditions (i), (ii) and (iii) imply (iv). Conse- quently, many use the following definition because it eliminates a redundant condition. Definition 34 A set Σ of subsets of a set X is a σ-algebra if, and only if, (i), (ii) and (iii) hold. Clearly, this definition is redundancy-justified as, for instance, in Ash’s (1972, p. 4) standard book on measure theory. To conclude, even for the subjects discussed by Lakatos various kinds of justification are found and are reasonable. 3.7 Conclusion Mathematical practice suggests that there have to be good reasons for defini- tions to be worth studying, i.e., mathematical practice suggests that mathe- matical definitions are justified. And this chapter has addressed the actual CHAPTER 3. JUSTIFYING DEFINITIONS IN MATHEMATICS 70 practice of how definitions in mathematics are justified in articles and books and whether the justification is reasonable. After some introductory remarks, in section 3.2 I discussed the main account of these issues, namely Lakatos’s ideas on proof-generated defini- tions. Lakatos claims that in many subjects mathematical definitions are and should be ‘proof-generated’, by which he means that the definition is needed to prove a specific conjecture regarded as valuable. While important, this chapter has shown how Lakatos’s ideas are limited. My assessment of Lakatos and my thoughts on justifying definitions are based on a case study of notions of unpredictability in ergodic theory, which was introduced in sec- tion 3.3. In section 4.3 I identified three other important and common ways of justifying definitions: natural-world-justification, condition justification and redundancy justification. A condition-justified definition is a definition which is justified because it is equivalent in a natural way to a previously specified condition regarded as valuable. A redundancy-justified definition is a definition which is justified because it eliminates redundant conditions. To my knowledge, condition justification and redundancy justification have not been discussed so far. Also, I showed that awareness of the ways definitions are justified is important for mathematical understanding and for avoiding mistakes. Then in section 3.5 I discussed the interrelationships between the different kinds of justification of definitions, an issue which has not been ad- dressed before. In particular, I argued that in different arguments the same definition can be justified in different ways. Finally, in section 3.6 I pointed out how Lakatos’s ideas are limited. Lakatos did not recognise the interplay between the different kinds of justification. Furthermore, his ideas fail to show that often various kinds of justification are found and that a variety of kinds of justification can be reasonable. I substantiated this claim by show- ing that even for the subjects Lakatos discusses proof-generation is not the only important kind of justification. With this background on notions of unpredictability in ergodic theory, we are now ready to tackle one of the key questions about chaos and un- predictability, namely the question of what is the unpredictability which is specific to chaotic behaviour. Chapter 4 The unpredictability specific to chaos 4.1 Introduction Since the beginnings of systematically investigating chaos until today, the unpredictability of chaotic systems has been at the centre of interest. There is widespread belief in the philosophy, mathematics and physics communities (and it has been claimed in various articles and books) that there is a kind of unpredictability specific to chaotic systems, meaning that chaotic systems are unpredictable in a way other deterministic systems are not. More specifically, what is usually believed is that there is at least one kind of unpredictability specific to chaotic systems that is shown by all chaotic systems. The physicist James Lighthill, commenting on the impact of chaos on unpredictability, expresses this point as follows: We are all deeply conscious today that the enthusiasm of our forebears for the marvellous achievements of Newtonian mechanics led them to make generalizations in this area of predictability which, indeed, we may have generally tended to believe before 1960, but which we now recognize were false (Lighthill 1986, p. 38). These features connected with predictability that I shall describe from now on, then, are characteristic of absolutely all chaotic systems (Ibid., p. 42). 71 CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 72 Similarly, Weingartner (1996, p. 50) says that “the new discovery now was that [...] a dynamical system obeying Newton’s laws [...] can become chaotic in its behaviour and practically unpredictable”. Thus the question ‘What is the unpredictability specific to chaos?’ appears natural, and one might well suppose that it has already been satisfactorily answered. However, this is not the case. On the contrary, there is a lot of confusion about what exactly the unpredictability specific to chaotic be- haviour ist. Several answers have been proposed, but, as we will see, none of them fits the bill. Fundamental questions about the limits of predictability have always been of concern to philosophy. So the widespread belief and the various flawed accounts about the unpredictability specific to chaotic systems demand clar- ification. The aim of this chapter is to critically discuss existing accounts and to propose a novel and more satisfactory answer. My answer will be based on two insights. First, I will show that chaos can be defined in terms of strong mixing. Although strong mixing is occa- sionally mentioned in connection with chaos, I have not found a publication in print arguing that chaos can be thus defined. Second, I will argue that strong mixing has a natural interpretation as a particular form of approxi- mate probabilistic irrelevance which is a form of unpredictability. On this basis, I will propose a general novel answer: a kind of unpredictability specific to chaotic systems is that for predicting any event at any level of precision, all sufficiently past events are approximately probabilistically irrelevant. The structure of the chapter is as follows. In section 4.2 I will discuss the concepts of unpredictability relevant for this chapter. Section 4.3 will be about chaotic behaviour. Here I will show that chaotic behaviour can be defined in terms of strong mixing. After that, in section 4.4 I will examine the existing answers to the question of what is the unpredictability specific to chaotic systems, and I will dismiss them as mistaken. In section 4.5 I propose a general answer that does not suffer from the shortcomings of the other answers. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 73 4.2 Unpredictability There are different conceptual accounts of unpredictability for deterministic systems. I will introduce two concepts of unpredictability which will be needed in this chapter. According to the first concept of unpredictability, a deterministic system is unpredictable when any bundle of initial conditions spreads out more than a specific diameter representing the prediction accuracy of interest (usually of larger diameter than the one of the bundle of initial conditions). When this happens, the deterministic system is unpredictable in the sense that the prediction based on any bundle of initial conditions is so imprecise that it is impossible to determine the outcome of the deterministic system with the desired prediction accuracy.1 A well-known example is a deterministic system in which, due to exponential divergence of solutions, any bundle of initial conditions of at least a specific diameter spreads out over short time periods more than a diameter of interest. The second concept of unpredictability is probabilistic. It says that for practical purposes any bundle of initial conditions is irrelevant, i.e., makes it neither more nor less likely that the state is in a region of phase space of interest. According to this concept, it is not only impossible to predict with certainty in which region the deterministic system will be, but in addition, for practical purposes knowledge of the possible initial conditions neither heightens, nor lowers, the probability that the state is in a given region of phase space. An example is that knowledge of any bundle of sufficiently past initial conditions is practically irrelevant for predicting that the state of the deterministic system is in a region of phase space. Eagle (2005, p. 775) defines randomness as a strong form of unpredictability: an event is random if, and only if, the probability of the event conditional on evidence equals the prior probability of the event. This idea relativised to practical purposes is at the heart of our second concept. Consequently, this second concept can also be regarded as a form of randomness. Clearly, the first and second concepts of unpredictability are different and cannot be expressed in terms of each other since the notions of ‘diameter’ 1Schurz (1996, pp. 133–139) discusses several variants of this form of unpredictability. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 74 Figure 4.1: evolution of a small bundle of initial conditions I under the baker’s system and ‘probability’ are not expressible in terms of each other. 4.3 Chaos 4.3.1 Defining chaos I base the discussion of defining chaos on the following assumption, which is widely accepted in the literature (e.g., Brin & Stuck 2002, p. 23; Devaney 1986, p. 51). A formal definition of chaos is adequate if, and only if, (i) it captures the main pretheoretic intuitions about chaos, and (ii) it is extensionally correct (i.e., correctly classifies essentially all systems which, according to the pretheoretic understanding, are uncontrover- sially chaotic or non-chaotic). Let us first direct our attention to (i). Roughly, chaotic systems are deterministic systems showing irregular behaviour and sensitive dependence to initial conditions, or even random behaviour. Sensitive dependence to initial conditions (SDIC) means that small errors in initial conditions lead to totally different solutions. Recall the baker’s system, our example of a discrete measure-preserving deterministic system (Example 1), and recall a billiard system with convex obstacles, one of our main examples of a continuous measure-preserving de- terministic system (Example 2). Figure 4.1 shows the second, forth and sixth iterates of a small bundle of initial conditions I of the baker’s system and CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 75 suggests that any bundle of initial conditions spreads out in phase space. Likewise, Figure 1.1(a) suggests that any bundle of initial conditions of a billiard system with convex obstacles spreads out in phase space (cf. Chap- ter 1). Thus these deterministic systems appear to exhibit SDIC. Moreover, Figure 4.1 suggests that for the baker’s system, and Figure 1.1(b) suggests that for billiard systems with convex obstacles, the motion exhibits irregular behaviour in the following sense: any bundle of initial conditions eventually intersects with any other region in phase space, a property called denseness. It is widely agreed that SDIC and denseness are necessary conditions for chaos (Nillsen 1999, pp. 14–15; Peitgen, Ju¨rgens & Saupe 1992, pp. 509–521; Smith 1998, pp. 167–169). This motivates the following criterion: a defini- tion captures the main pretheoretic intuitions about chaos if, and only if, it implies SDIC and denseness. Let us now discuss (ii), the requirement of extensional correctness. Imag- ine we are concerned with a pretheoretic property P. Further, assume that we are faced with a class of objects some of which uncontroversially have property P, others uncontroversially fail to have property P, and yet others are borderline cases or controversial in some sense. The task is to find an unambiguous definition of P. Then it is natural to say that an unambiguous definition of the property P is extensionally correct if, and only if, it classifies all objects correctly which uncontroversially have or do not have property P. For the borderline objects it is unimportant how they are classified, and I defer to the definition. Being chaotic is such a property because the pretheoretic idea of chaos is somewhat vague. Among the deterministic systems whose behaviour is mathematically well understood, there is a broad class of uncontroversially chaotic systems and a broad class of uncontroversially non-chaotic systems. Moreover, there are a few borderline cases, for example the system discussed by Martinelli, Dang & Seph (1998, p. 199), where it is not clear whether they are chaotic (Brin & Stuck 2002, p. 23; Robinson 1995, pp. 81–85; Za- slavsky 2005, pp. 53–54). Consequently, I say that a formal definition of chaos is extensionally correct if, and only if, it correctly classifies essentially all mathematically well understood uncontroversially chaotic and non-chaotic behaviour. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 76 Several definitions of chaos have been proposed (cf. Lichtenberg & Lieber- man 1992, pp. 302–309; Robinson 1995, pp. 81–86). While these definitions are very similar, they are all inequivalent. For want of space I cannot discuss all these definitions here and instead focus on a definition of chaos in terms of strong mixing, which will be crucial later on. 4.3.2 Defining chaos via strong mixing Recall Definition 27 and Definition 28 of strong mixing (see subsection 3.4.2). Intuitively speaking, the fact that a deterministic system is strongly mixing means that any bundle of solutions spreads out in phase space like a drop of ink in a glass of water. Strong mixing is occasionally mentioned in connection with chaos, usually only in the context of volume-preserving deterministic systems (e.g., Licht- enberg & Lieberman 1992, pp. 302–303; Schuster & Just 2005, p. 177). Yet, to the best of my knowledge, I have found no publication arguing that chaos can defined in terms of strong mixing. I will argue for this and propose that a possible definition of chaos is in terms of strong mixing: a deterministic system is chaotic if, and only if, it is strongly mixing. Since strong mixing was introduced before the 1960s, the beginning of the systematic investigation of chaos, it might seem puzzling that chaos can be adequately defined via strong mixing. However, many formal definitions and measures of chaos were invented before the 1960s (Dahan-Dalmedico 2004, p. 70), but rather few deterministic systems were known to which these notions apply. Novel from the 1960s onwards was that many different interesting deterministic systems, surprisingly also very simple systems, were found to which these concepts apply. Let us first discuss whether strong mixing captures the pretheoretic in- tuitions. Strong mixing implies denseness: first, strongly mixing discrete measure-preserving deterministic systems are ergodic (Cornfeld et al. 1982, p. 25). By looking at Definition 2.5 of ergodicity, one sees that from this follows that any region, naturally interpreted as a set of positive measure, eventually visits every region in phase space. Second, it is clear that strongly mixing continuous measure-preserving deterministic systems are weakly mix- CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 77 ing. And as we have seen in subsection 3.4.2, if a continuous deterministic system (M,ΣM , µ, Tt) is weakly mixing, then for all t0 ∈ R+, the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. Hence, again, by looking at Definition 2.5, one sees that also for continuous determi- nistic systems any region, naturally interpreted as a set of positive measure, eventually visits every region in phase space. Strong mixing also implies SDIC. This can be seen as follows. Strong mixing implies that any bundle of initial conditions spreads out uniformly over the phase space. Therefore, any bundle eventually spreads out consider- ably, thus exhibiting SDIC. Formally, assume that a strongly mixing discrete measure-preserving deterministic system (M,ΣM , µ, T ) is given where a met- ric d is defined on M and ΣM contains every open set of (M,d). Further, assume that every open set has positive measure.2 Consider two open sets O1 and O2 with 0 < ε = infm∈O1,y∈O2{d(m, y)}. Strong mixing implies that for any open set O ⊆ M there is a t ∈ N0 such that T t(O) ∩ O1 6= ∅ and T t(O) ∩ O2 6= ∅. But this means that ε ≤ supm,y∈T t(O){d(m, y)}. Hence the following condition holds, which in definitions like Devaney chaos is taken to be the SDIC implied by discrete chaotic motio (see Devaney 1986, p. 51; Werndl 2009d): There is an ε > 0 such that for all m ∈M and for all δ > 0 (4.1) there is a y∈M and a t∈N0with d(m, y)<δ and d(T t(m), T t(y))≥ε. Likewise, assume that a strongly mixing continuous measure-preserving de- terministic system (M,ΣM , µ, Tt) is given where a metric d is defined on M , ΣM contains every open set of (M,d) and every open set has positive measure. Again, consider two open sets O1 and O2 with 0 < ε = infm∈O1,y∈O2{d(m, y)}. Strong mixing implies that for an arbitrary open set O ⊆ M there is a t ∈ R+0 such that Tt(O) ∩ O1 6= ∅ and Tt(O) ∩ O2 6= ∅. Consequently, ε ≤ supm,y∈Tt(O){d(m, y)}. Therefore, the following condition holds which is often taken to indicate the SDIC of continuous chaotic motion: There is an ε > 0 such that for all m ∈M and for all δ > 0 (4.2) there is a y∈M and a t∈R+0 with d(m, y)<δ and d(Tt(m), Tt(y))≥ε. 2This is standardly assumed and, to the best of my knowledge, applies to all paradig- matic chaotic systems. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 78 As SDIC is often linked to positive Lyapunov exponents, let us now turn to a discussion of this issue. For a discrete measure-preserving deterministic system (M,ΣM , µ, T ) where M ⊆ R is an open set and T is continuously differentiable, the Lyapunov exponent of m ∈M is λ(m) = lim n→∞ 1 n n−1∑ i=0 log(|T ′(T i(m))|), (4.3) where T ′ is the derivative of T (for a general definition for discrete deter- ministic systems and for a definition for continuous measure-preserving de- terministic system see Man˜e´ 1987, p. 263, and Oseledec 1968). For ergodic deterministic systems the Lyapunov exponent exists and is equal for all points except for a set of measure zero (Oseledec 1968; Robinson 1995, p. 86). Hence one can speak of the Lyapunov exponent of a deterministic system. Accord- ingly, one definition of chaos that has been suggested is that the deterministic system is ergodic and has a positive Lyapunov exponent. From a positive Lyapunov exponent it is commonly concluded that the SDIC shown by chaos consists of the exponential spreading of inaccuracies over finite time periods (e.g., Lighthill 1986, p. 46; Ott 2002, p. 140; Smith 1998, p. 15).3 However, this is mistaken. Positive Lyapunov exponents im- ply that for almost all points m in phase space the average over all i ≥ 0 of log(|T ′(T i(m))|)—the exponential growth rate of an inaccuracy at the point T i(m)—is positive. Here the average is taken for the solution starting from m over an infinite time period. But positive on average exponential growth rates over an infinite time period do not imply that nearby solutions di- verge exponentially or rapidly over finite time periods. The growth rate over finite time periods can be anything; inaccuracies can even shrink (Smith, Ziehmann & Fraedrich 1999, pp. 2861–2861).4 Furthermore, it is not true that inaccuracies of chaotic systems spread exponentially or rapidly over fi- nite time periods: for paradigmatic chaotic systems like the Lorenz system 3With the qualification that the time periods have to be small enough such that the inaccuracy does not eventually saturate at the diameter of the deterministic system. 4Moreover, Lyapunov exponents only measure the average growth rate of an infinites- imal inaccuracy around m, which is defined as the growth rate of a small ball of radius ε > 0 with centre m as ε→ 0; yet in practice the uncertainty is finite and may not behave like the infinitesimal one (cf. Bishop 2008, p. 8). CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 79 (Example 3) there are regions where inaccuracies even shrink over finite time periods, and numerical evidence suggests such regions for many chaotic sys- tems (Smith et al. 1999, p. 2881; Zaslavsky 2005, p. 315; Ziehmann, Smith & Kurths 1986, pp. 10–11). Strongly mixing deterministic systems need not have positive Lyapunov exponents, and thus inaccuracies need not grow exponentially on average as time goes to infinity. Is this a problem for strong mixing as a definition of chaos? No. First, there is no agreement in the literature whether chaotic behaviour should show this on average exponential growth. Some definitions do indeed demand it, others such as Devaney chaos do not. Second, the arguments for requiring positive Lyapunov exponents are not convincing. The standard rationale is that the SDIC shown by chaotic system has to be exponential divergence of nearby solutions over finite time periods. But as shown above, this is not implied by a positive Lyapunov exponent and also does not generally hold for chaotic systems. Another possible argument is that for chaotic behaviour inaccuracies should spread out rapidly. Yet the rate of divergence of strongly mixing deterministic systems not having positive Lyapunov exponents can be much faster for arbitrary long time periods than for systems with positive Lyapunov exponents; thus it is not clear why positive Lyapunov exponents should be required (Berkovitz et al. 2006, p. 689; Wiggins 1990, p. 615). To conclude, strong mixing captures the pretheoretic intuitions about chaos. It remains to show that the definition of chaos in terms of strong mixing is extensionally correct. To do this, I have to consider the main classes of uncontroversially chaotic and non-chaotic behaviour.5 I start with uncontroversially chaotic behaviour and first discuss volume-preserving deterministic systems. There are (i) Hamiltonian system which are chaotic on the whole hypersurface of con- stant energy. Three types of continuous measure-preserving deterministic systems are mainly discussed here: first, chaotic billiards, such as billiards with convex obstacles (Example 2), which are strongly mixing (Chernov & Markarian 2006; Ott 2002, p. 296); second, hard sphere systems, which de- scribe the motion of a number of hard spheres undergoing elastic reflections 5Obviously, I cannot discuss every single deterministic system regarded as clearly chaotic or non-chaotic. Yet our discussion covers all main examples. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 80 at the boundary and collisions amongst each other; e.g., the motion of N hard balls on the m torus for N ≥ 2 and m ≥ N ; hard-sphere systems are important in statistical mechanics because they are a model of the ideal gas, and they are either proven or conjectured to be strongly mixing (Berkovitz et al. 2006, pp. 679–680; Ornstein & Weiss 1974, pp. 8–9; see also Sz´asz 2000); third, geodesic flows of space with negative Gaussian curvature, i.e., frictionless motion of a particle moving with unit speed on a compact man- ifold with everywhere negative curvature, are strongly mixing too (Schuster & Just 2005, p. 181). Another class are (ii) Hamiltonian systems to which the KAM-theorem applies, e.g., the He´non-Heiles system or the standard map. This class in- cludes simplified versions of Poincare´ maps of continuous measure-preserving deterministic systems to which the KAM-theorem applies. The KAM-theorem describes what happens when integrable systems are perturbed by a nonin- tegrable perturbation. It says that tori with sufficiently irrational winding number survive the perturbation. Between the stable motion on surviving tori there appear to be regions of unpredictable motion. As the perturbation increases, these regions become larger and often eventually cover nearly the entire hypersurface of constant energy. For these deterministic systems the phase space is separated into regions, each of which has its own dynamics: in some of them the motion appears unpredictable and in others it is stable. Because of this separation into re- gions, unpredictable behaviour can only be found in a region. Consequently, as is widely acknowledged, proper chaotic motion can only occur on a region (Ott 2002, pp. 267–295; Schuster & Just 2005, pp. 165–174). Thus I have to show that the mathematically well-understood unpredictable motion in a region is strongly mixing. Yet the conjectured chaotic motion of KAM-type systems is understood only poorly (Zaslavsky 2005, p. 139). It has only been proven that there is chaotic behaviour near hyperbolic fixed points, where the motion is indeed strongly mixing (Moser 1973, chapter 3). Apart from this, some numerical evidence suggests that the motion conjectured to be chaotic is strongly mixing (e.g., Chirikov 1979). Thus Lichtenberg & Lieberman (1992, p. 303) comment that we “expect that the stochastic orbits that we have encountered in previous sections are strongly mixing over the bounded CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 81 portion of phase space for which they exist”. I should mention that numerical experiments suggest that for a few KAM- type maps there are sets on which the motion seems somewhat random, but these sets consist of n ≥ 2 component areas, each of which is mapped successively on to another, returning to itself after n iterations. There is no agreement whether such motion, which cannot be strongly mixing, should be called ‘chaotic’ (e.g., Belot & Earman 1997, p. 154, vs. Ott 2002, p. 300). If it is, chaos can still be defined via strong mixing: one can say that a deterministic system is chaotic if, and only if, it is ergodic (cf. Definition 2.5) and its phase space is decomposable into n ≥ 1 sets with disjoint interior such that the n-th iterate is strongly mixing on each of these sets. I call this the ‘broad definition of chaos via strong mixing’. There are numerical experiments which suggest that the behaviour mentioned above is chaotic according to this definition (Ott 2002, p. 303). Next in line are (iii) chaotic volume-preserving non-Hamiltonian systems. Here the main examples discussed are discrete. First, the baker’s system (Ex- ample 1) and volume-preserving Anosov diffeomorphisms such as the cat map are strongly mixing (Arnold & Avez 1968, p. 75; Lichtenberg & Lieberman 1992, p. 303). Second, paradigmatic chaotic systems are expanding piecewise maps such as the tent map, which are strongly mixing too (Bowen 1977). I now turn to dissipative systems and first discuss strange attractors. One class are (iv) strange attractors where the attracted solutions never enter the attractor. Three main groups are treated here: first, for Smale’s Solenoid and generalised Solenoid systems there is a measure on which the motion is strongly mixing (Mayer & Roepstorff 1983). Second, for the Lorenz system investigated by (Lorenz 1963) (see Example 3) and the Lorenz model, and generalised versions thereof, which have been used to model weather phe- nomena and waterwheels, there is a physical measure on which the motion is strongly mixing (see the end of section 2.1 for a discussion of physical mea- sures) (Luzzatto et al. 2005). Third, for generalised He´non systems such as the He´non map, which has been proposed as a simple model of weather dy- namics, there exists a physical measure such that the motion on the attractor is strongly mixing (Benedicks & Young 1993, He´non 1976). Also important is the (v) visible chaotic behaviour of generalised versions CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 82 of the logistic map; the logistic map has been endorsed as a simplified model of population dynamics and climate dynamics (Lorenz 1964; Lyubich 2002; May 1976). For these measure-preserving deterministic systems for most parameter values the solutions enter an attractor with a physical measure on which the motion is either strongly mixing or chaotic according to the broad definition via strongly mixing. But for a few parameter values there is chaotic behaviour on an entire interval; in these cases there is also a physical measure on which the motion is strongly mixing (Jacobson 1981; Lyubich 2002). Finally, another class of uncontroversially chaotic behaviour is (vi) re- pelling chaotic behaviour on Cantor sets. Two main kinds of discrete determi- nistic systems are discussed here: first, geometric horseshoe-systems such as Smale’s horseshoe, which are strongly mixing (Robinson 1995, pp. 249–274). The second example is chaotic motion on Cantor sets for the logistic map with parameter greater than 4, which is also strongly mixing (Robinson 1995, p. 33).6 Let us now turn to uncontroversially non-chaotic motion. I again start with volume-preserving deterministic systems. A paradigmatic class are (i) integrable Hamiltonian systems, where there is periodic or quasi-periodic motion on tori, which is not strongly mixing (Arnold & Avez 1968, pp. 210– 214). Another class is the (ii) motion on clearly non-chaotic regions of KAM- type systems. Again, this class includes simplified versions of Poincare´ maps of KAM-type deterministic systems. As already discussed, for KAM-type systems the phase space is separated into regions, and on some regions the motion is stable. Thus I have to show that the stable motion is not strongly mixing. And indeed, the behaviour in these regions, e.g., the motion on surviving tori or the one near specific elliptic periodic points, is not strongly mixing (Arnold & Avez 1968, pp. 86–90; Lichtenberg & Lieberman 1992, chapter 3–5). I now turn to dissipative measure-preserving deterministic systems. Im- portant here are (iii) non-chaotic attractors. These are attracting periodic cycles and fixed points and also quasi-periodic attractors as discussed by Ott 6This follows because these deterministic systems are isomorphic to a Bernoulli shift. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 83 (2002, chapter 7), which obviously cannot be strongly mixing. Moreover, the motion approaching such attractors, e.g., the behaviour around stable nodes or stable foci, clearly cannot be strongly mixing (cf. Robinson 1995, p. 105).7 Finally, let us mention two further very broad classes of clearly non- chaotic deterministic systems. Since strong mixing captures SDIC, (iv) sys- tems not exhibiting any kind of SDIC, e.g., the identity function, cannot be strongly mixing. Moreover, since strong mixing captures denseness, (v) motion showing SDIC but where, in any sense, typical solutions do not come arbitrarily near to any region in phase space cannot be strongly mixing. Examples here are discrete-time deterministic systems where the evolution function is T (m) = cm for c > 1 on (0,∞) or the motion around unstable nodes or unstable foci (cf. Robinson 1995, p. 105).7 In sum, I have first demonstrated that strong mixing captures the prethe- oretic intuitions about chaos. After that I have briefly shown that a definition of chaos in terms of strong mixing is extensionally correct in the sense ex- plained above. Consequently, chaos can be adequately defined in terms of strong mixing. With this knowledge about chaos we are ready to critically discuss the answers suggested in the literature to our main question. 4.4 Criticism of answers in the literature 4.4.1 Asymptotically unpredictable? Let us first discuss an answer based on the concept of asymptotic unpredicta- bility. Roughly, systems whose asymptotic behaviour cannot be predicted with arbitrary accuracy for all times, even if the bundle of initial condi- tions is made arbitrarily small, are said to be asymptotically unpredictable. Formally, given a topological deterministic system, let ε be the desired pre- diction accuracy and let δ be the diameter of the bundle of initial conditions. For a discrete topological deterministic system (M,d, T ) and an m ∈M the 7Here there sometimes exists no invariant measure of interest. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 84 solution sm is asymptotically predictable if, and only if, ∀ε > 0 ∃δ > 0 ∀y ∈M ∀t ∈ N0 (d(m, y) < δ → d(T t(m), T t(y)) < ε). (4.4) The discrete topological deterministic system (M,d, T ) is asymptotically un- predictable if, and only if, for all m ∈ M the solution sm is not asymp- totically predictable.8 Likewise, for a continuous topological deterministic system (M,d, Tt) and an arbitrary m ∈ M the solution sm is asymptotically predictable if, and only if, ∀ε > 0 ∃δ > 0 ∀y ∈M ∀t ∈ R+0 (d(m, y) < δ → d(Tt(m), Tt(y)) < ε). (4.5) The continuous topological deterministic system (M,d, T ) is asymptotically unpredictable if, and only if, for all m ∈M the solution sm is not asymptot- ically predictable. In terms of the distinction introduced in section 4.2, this is clearly a version of the first concept of unpredictability. Miller (1996, pp. 106–107) and Stone (1989, p. 127) argue that the un- predictability specific to chaotic systems is that chaotic systems are asymp- totically unpredictable. Indeed, all chaotic systems discussed in the literature are asymptotically unpredictable, and standard definitions of chaos imply asymptotic unpredictability. For instance, (4.1) and (4.2), a condition of De- vaney chaos and, as we have seen, a consequence of strong mixing, clearly implies asymptotic unpredictability. However, as Smith (1998, p. 58) has pointed out, many non-chaotic deter- ministic systems, e.g., one only showing SDIC as it happens for the evolution function T (m) = cm for c > 1 on (0,∞) (class (v) of clearly non-chaotic behaviour), are asymptotically unpredictable. Hence this answer is wrong. But maybe the account can be strengthened in the following way: the the unpredictability specific to chaotic systems is that they are asymptotically un- predictable and bounded. I maintain that this is not correct either: there are unbounded chaotic systems (Smith 1998, pp. 168–169), a point which is reflected in usual definitions of chaos, which do not require boundedness. Furthermore, for many bounded integrable systems (part of class (i) of the 8Bishop (2003, pp. 174–177) also aims to formalise asymptotic unpredictability. How- ever, he does not list the most obvious notion presented here. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 85 clearly non-chaotic behaviour) the solutions loop around tori in such a way that they are asymptotically unpredictable (Arnold & Avez 1968, pp. 210– 214). Hence there are examples of non-chaotic, bounded and asymptotically unpredictable deterministic systems. I conclude that the sole connection between asymptotic unpredictability and chaos is this: while only some non-chaotic deterministic systems are asymptotically unpredictable, every chaotic system is asymptotically unpre- dictable. 4.4.2 Unpredictable due to rapid or exponential diver- gence of solutions? It is widely believed and often claimed that the unpredictability specific to chaotic systems is the following: due to rapid or exponential divergence of nearby solutions, bundles of initial conditions spread out a distance more than a diameter of interest over short time periods (e.g., Ruelle 1997, pp. 27– 28); often it is added that this is so despite the fact that the deterministic systems are bounded (e.g., Lighthill 1986, p. 46). In terms of the distinction introduced in section 4.2, this is a form of the first concept of unpredictability. As many unbounded non-chaotic deterministic systems, such as a dis- crete deterministic systems with evolution function T (m) = cm, c > 1, on (0,∞) show (part of class (v) of clearly non-chaotic behaviour), rapid or exponentially divergence everywhere is ‘nothing new’ (Smith 1998, p. 15). Thus the version not requiring boundedness cannot be true. But also the version requiring boundedness is wrong : as mentioned above, there are un- bounded chaotic systems. Furthermore, as argued in section 4.3, it is often not true that nearby solutions of chaotic systems diverge rapidly or expo- nentially over finite time periods as is so widely believed in the philosophy, physics and mathematics communities (e.g., Eagle 2005, p. 767; Schurz 1996, p. 140; Smith 1998, p. 15). Hence this is not the sought-after unpredictability specific to chaotic systems. Why is it so widely believed that inaccuracies of chaotic systems spread rapidly or exponentially over finite time periods? One plausible reason is that because very simple chaotic systems such as the baker’s system (Exam- CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 86 ple 1) or the cat map show this property, this claim is wrongly generalized to all chaotic systems. Also, the wrong belief stems at least in part from misinterpreting Lyapunov exponents. As pointed out in section 4.3, positive on average exponential growth rates over an infinite time period are wrongly taken to imply that inaccuracies spread exponentially over relatively short finite time periods. The only connection between the unpredictability of chaotic systems and the rapid or exponential increase of inaccuracies over finite time periods seems to be this: it is more often the case for chaotic than for non-chaotic deterministic systems that bundles of initial conditions spread out more than a diameter of interest over short time periods. 4.4.3 Macro-predictable and micro-unpredictable? Macro-predictable yet micro-unpredictable behaviour is a broad and inter- esting topic in physics. For instance, in statistical mechanics deterministic systems are often macro-predictable but micro-unpredictable. Here I concen- trate only on whether there is any combination of macro-predictability and micro-unpredictability in chaotic systems that other deterministic systems do not have. To gain an understanding of this proposed answer, recall the Lorenz sys- tem (Example 3 and Figure 2.2). This system exhibits macro-predictability: the solutions are attracted by an attractor, a small region of phase space. There is also micro-unpredictability since the motion on the attractor exhibits SDIC. Smith (1998) argues that this combination of macro-predictability and micro-unpredictability is a kind of unpredictability specific to chaotic systems : This type of combination of large-scale order with small scale disorder, of macro-predictability with the micro-unpredictability due to sensitive dependence, is one paradigm of what has come to be called chaos. [...] So error inflation by itself is entirely old-hat. The novelty in the new- fangled chaotic cases that will concern us is, to repeat, the combination of exponential error inflation with the tight confinement of trajectories by an attractor (Smith 1998, pp. 13–15, original emphasis). Here macro-predictability means that the deterministic system eventually CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 87 shows the behaviour corresponding to the motion on the attractor, a proper subset of phase space. Micro-unpredictability is understood as the unpre- dictability implied by exponential error inflation. Yet, as shown in section 4.3, solutions of chaotic systems need not diverge exponentially or rapidly over finite time periods. Therefore, micro-unpredictability has to be inter- preted as a weaker notion, e.g., asymptotic unpredictability (cf. subsection 4.4.1). As becomes clear from the Lorenz system (Example 3), strange attractors imply this combination of macro-predictability and micro-unpredictability. However, this combination is no kind of unpredictability which is specific to chaotic systems since there are many chaotic systems without attractors. As already pointed out, all chaotic volume-preserving deterministic systems such as chaotic Hamiltonian systems or the baker’s system (classes (i), (ii) and (iii) of uncontroversially chaotic behaviour) cannot have attractors. And some chaotic dissipative systems, e.g., repelling chaotic motion on Cantor sets or the logistic map on [0, 1] (class (vi) and a part of class (v) of uncontroversially chaotic behaviour), have no attractors. Hence these deterministic systems are not macro-predictable in the above sense, viz. that appeals to attractors. It could be that Smith (1998) only meant to say that this combination of macro-predictability and micro-unpredictability found in strange attrac- tors is a novelty for deterministic systems with attractors. But this would not help. Clearly, this claim would be no satisfying answer to our main question because it does not apply to essentially all chaotic systems. Fur- thermore, also non-chaotic deterministic systems can be macro-predictable and micro-unpredictable as discussed here. For instance, in the plane let R be the region enclosed by a circle of radius r around the origin (boundary included). Imagine that all solutions in R go in circles around the origin and that all solutions outside R are attracted by the periodic motion in R such that all solutions are continuous. Such non-chaotic attractors (part of class (iii) of clearly non-chaotic behaviour) obviously imply macro-predictability and micro-unpredictability. Thus this combination of macro-predictability and micro-unpredictability is not even a kind of unpredictability specific to deterministic systems with attractors. Of course, there are also other concepts of macro-predictability and micro- CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 88 unpredictability (e.g., Smith 1998, pp. 60–61). However, to the best of my knowledge, none of them provides a combination of macro-predictability and micro-unpredictability that is characteristic of chaotic behaviour. To conclude, strange attractors are macro-predictable and micro- unpredictable in the above specified sense. However, it is not the case that a combination of macro-predictability and micro-unpredictability constitutes a kind of unpredictability specific to chaotic behaviour. None of the answers examined so far have proven to be correct. There is one more answer suggested in the literature: some physicists, e.g., Ford (1989), have defined chaos by the condition that almost all solutions have pos- itive algorithmic complexity. In other words, they have argued that the un- predictability implied by positive algorithmic complexity is specific to chaotic systems. However, Batterman & White (1996) and Smith (1998, p. 160) have made it clear that chaos cannot be defined via algorithmic complexity since many deterministic systems without SDIC (part of class (iv) of clearly non- chaotic behaviour) have positive algorithmic complexity too. Consequently, this is not a kind of unpredictability which is specific to chaotic behaviour, and this is all we need to know. In sum, the answers in the literature do not fit the bill. 4.5 A kind of unpredictability specific to chaos 4.5.1 Approximate probabilistic irrelevance The answer I propose starts from the idea that strong mixing goes along with loss of information as recently discussed by Berkovitz et al. (2006). First of all, let us introduce the approximate probabilistic irrelevance, the notion of unpredictability which will be crucial for our claim. Recall the definition of an event and the definition of a probability of an event as introduced when discussion weak mixing in subsection 3.4.1 (see also Berkovitz et al. 2006, pp. 670–672; Werndl 2009e): given a dis- crete measure-preserving deterministic system (M,ΣM , µ, T ) or a continuous measure-preserving system (M,ΣM , µ, Tt), A t is defined as the event that the state of the deterministic system is in A at time t, A ∈ ΣM arbitrary, CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 89 t ∈ Z or R. And p(At) is the probability that the event At obtains. Let me introduce conditional probabilities: p(Bt ′ | At), for arbitrary A, B ∈ ΣM with µ(A) > 0, is the probability that PB obtains at time t ′ given that PA obtains at time t. By the usual definition, p(Bt ′ | At) = p(Bt′&At)/p(At). Because the measure is interpreted as probability density, the probability of events is given by the equations (3.1), (3.2) and (3.3) (see subsection 3.4.1). Now recall the second conception of unpredictability of section 4.2. For this conception I have to say what it means that knowledge that the deter- ministic system is in a region A at t is practically irrelevant for predicting that it will be in region B at t′. I say that this is so if the probability of the event Bt ′ given knowledge of the event At approximately equals the un- conditionalised probability of the event Bt ′ . Let ε > 0 be the level at which probabilities differing by less than ε are considered as practically equivalent. Further, assume that p(At) > 0; I will later explain why I am justified to do so. Then formally this is captured by the following definition:9 At is approximately probabilistically irrelevant for predictingBt ′ (4.6) (t, t′ ∈ Z orR) at level ε > 0 if, and only if, |p(Bt′ | At)− p(Bt′)| < ε. Or equivalently, but simpler (still assuming that p(At) > 0): At is approximately probabilistically irrelevant for predictingBt ′ (4.7) (t, t′ ∈ Z orR) at level ε > 0 if, and only if, |p(Bt′&At)− p(Bt′)p(At)| < ε. In the next section we will see how the approximate probabilistic irrele- vance relates to chaos, and I will finally propose an answer to our question. 9I use what is basically the difference measure in confirmation theory to define the approximate probabilistic irrelevance. I should point out that my claims are independent of the measure involved, i.e., they would remain the same if I used any other measure with the indisputable property that it is continuous when the unpredictability is highest, i.e., when p(Bt ′ | At) = p(Bt′). Berkovitz et al. (2006, p. 672) interpret the difference measure of events as a general measure of unpredictability. However, they do not justify this choice or address whether their results are independent of the measure. CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 90 4.5.2 Sufficiently past events are approximately prob- abilistically irrelevant for predictions The argument I put forward to answer the main question of the chapter is as follows. (P1) Chaos can be defined in terms of strong mixing. (P2) Strongly mixing deterministic systems exhibit a particular pattern of approximate prob- abilistic irrelevance, which constitutes a form of unpredictability. Therefore: (C) a kind of unpredictability specific to chaotic systems is the particular pattern of approximate probabilistic irrelevance arising from strong mixing. In subsection 4.3.2 we have seen that premise (P1) is true. Let us now argue for premise (P2). Recall the definition of strong mixing (Definition 27 and Definition 28). I assume without loss of generality that the event we want to predict occurs at time 0. Then, assuming (3.1) and (3.3), it fol- lows that a discrete measure-preserving deterministic system (M,ΣM , µ, T ) or a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is strongly mixing if, and only if, lim t→∞ p(B0&A−t)− p(B0)p(A−t) = 0, (4.8) for all A,B ∈ ΣM with µ(A) > 0. This equation holds for all, i.e., discrete and continuous measure-preserving deterministic systems. Berkovitz et al. (2006, p. 676) also show (4.8), but they interpret their results as applying only to Hamiltonian deterministic systems. Many chaotic systems, e.g., all strange attractors (classes (iv) and (v) of uncontroversially chaotic behaviour), are not Hamiltonian. Since I am interested in the unpredictability implied by chaos, it is important to realise that (4.8) holds for all deterministic systems. From the definition of the limit, I obtain that (4.8) can be expressed as: For any event B0, any ε > 0 and any A ∈ ΣM , µ(A) > 0, there is t′ ∈ N orR+0 such that for all t ≥ t′ : |p(B0&A−t)− p(B0)p(A−t)| < ε. (4.9) Hence strong mixing means that for predicting an arbitrary event at an ar- bitrary level of precision ε > 0, any sufficiently past event is approximately probabilistically irrelevant. Notice that due to the impossibility of deter- mining initial conditions precisely, scientists always consider regions of phase space corresponding to possible initial conditions. Since these regions are CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 91 not of measure zero, I am justified assuming that µ(A) > 0. In terms of the distinction introduced in section 4.2, this pattern of probabilistic irrelevance is a version of the second concept of unpredictability. Hence strongly mix- ing measure-preserving deterministic systems exhibit a particular pattern of approximate probabilistic irrelevance, which constitutes a form of unpredicta- bility : i.e., premise (P2) is true.10 Now that I have argued for the premises (P1) and (P2) of the above argument, I conclude: (C) a general kind of unpredictability specific to chaotic systems is that for predicting any event at any level of precision ε > 0, all sufficiently past events are approximately probabilistically irrelevant. To fully understand this conclusion, consider the following: for strange attractors this claim applies in a strict sense only to events on the attractor. Yet for practical matters there is chaotic behaviour when solutions are very near to the strange attractor (cf. section 2.1); then my claim means that for predicting any event on or very near the attractor Λ at any level of precision ε > 0, all sufficiently past events in the basin of attraction U ⊃ Λ are approximately probabilistically irrelevant. For KAM-type systems my claim applies, as one would like it, to each chaotic region. Moreover, as explained in subsection 4.3.2 in discussing the uncontroversially chaotic behaviour, some may want to adopt the broad definition of chaos via strong mixing, i.e., that the measure-preserving deterministic system is ergodic and its phase space is decomposable into n ≥ 1 regions with disjoint interior such that the n- th iterate is strongly mixing on each set. When n > 1, my claim (C) has 10This claim can be generalised. The discrete measure-preserving deterministic system (M,ΣM , µ, T ) or the continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is strongly mixing if, and only if, for any probability measure ρ absolutely continuous with respect to µ and any square integrable function f ∈ L2(M,ΣM , µ): lim t→∞ ∫ f(m)dρt = ∫ f(m)dµ, (4.10) where ρt is the evolved measure after t units of time (t ∈ Z or R). Interpret µ as probability and ρ as measuring our knowledge of the initial condition. Then, assuming absolute continuity of ρ, strong mixing means that for arbitrary knowledge of the initial condition after a sufficiently long time the prediction obtained by evolving the measure is practically no better than if we had no knowledge whatsoever of the initial conditions (cf. Berger 2001, pp. 126–132). CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 92 to be adapted in the following way: the unpredictability of strong mixing applies to the n-th iterate on the region of interest. This means that for predicting any event in the region of interest at any level of precision ε > 0, all sufficiently past events that could have evolved to the region of interest are approximately probabilistically irrelevant. On the one hand, the unpredictability involved in my answer is strong: sufficiently distant events are practically as probabilistically independent as coin tosses. On the other hand, it is weak since only sufficiently past measure- ments are approximately probabilistically irrelevant. Restricting my claim to sufficiently past events is essential: first, many chaotic systems are continu- ous, and continuity makes it impossible that for all past times, all events are approximately probabilistically irrelevant for predictions. Second, we have seen that to require rapid divergence of nearby solutions for chaotic behaviour is untenable. What is novel about my claim? Granted, in a few publications on chaos the notion of ‘irrelevance’ is discussed. In fact, there are two main foci; but none give my claim. First, there is Berkovitz et al.’s (2006) explication of the ergodic hierarchy. Yet recall our main argument (cf. the beginning of this subsection). As pointed out, Berkovitz et al. interpret their results as only applying to Hamiltonian systems. Hence they do not argue for the general premise (P2), and, most importantly, they do not argue for the premise (P1). Therefore, they could not arrive at the conclusion (C). Second, sometimes it is asserted that for chaos the input is irrelevant in the sense that prediction is exponentially expensive in the initial data, meaning that for an input string of length n all information is lost after n steps, at which point we are totally unsure what happens next (Leiber 1998, p. 361; Smith 1998, p. 53). However, as argued in subsection 4.4.2, predictions for chaotic systems need not be exponentially expensive in the initial data; the irrelevance shown by chaotic systems is more subtle. 4.6 Conclusion The unpredictability of chaotic systems is one of the issues that has attracted most interest in chaos research. Nonetheless, nearly half a century after the CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 93 start of the systematic investigation of chaos, there has been much confusion about, and no correct answer to, the question: what is the unpredictability specific to chaos? I have tackled this question in this chapter. After some introductory remarks, in section 4.2 I introduced two con- ceptual accounts of unpredictability relevant for the discussion. After that, in section 4.3 I showed that chaos can be defined in terms of strong mix- ing, i.e., that strong mixing captures the main pretheoretic intuitions about chaos and correctly classifies the various classes of uncontroversially chaotic and non-chaotic behaviour. This has never been explicitly argued for in the literature. Then, in section 4.4 I criticised the answers in the literature to the above question. First, I rejected the answer that chaotic systems are asymptotically unpredictable on the grounds that also many non-chaotic de- terministic systems are asymptotically unpredictable. Second, I rejected the answer that chaotic systems are unpredictable in the sense of exponential or rapid divergence of nearby solutions (often claimed with the added condition of boundedness). For, when not requiring boundedness, many non-chaotic deterministic systems are also unpredictable in this sense. Furthermore, in the case of requiring boundedness, there are unbounded chaotic systems and, though unacknowledged in the philosophy literature, chaotic systems need not be unpredictable in the sense of having exponential or rapid divergence of solutions. Third, I dismissed the answer that chaotic systems show a spe- cific combination of macro-predictability and micro-unpredictability: there are chaotic systems which are not macro-predictable and non-chaotic sys- tems which also show this combination of macro-predictability and micro- unpredictability. This prompted the search for an alternative answer. In section 4.5, based on defining chaos via strongly mixing, I proposed a novel general answer: a kind of unpredictability specific to chaotic systems is that for predicting any event at any level of precision ε > 0 all sufficiently past events are approximately probabilistically irrelevant. Chaotic behaviour is multi-faceted and takes various forms. Yet if the aim is to identify a general kind of unpredictability specific to chaotic systems, I think this is the best we can get. In this and the previous chapter we have seen that deterministic systems can be unpredictable and even random. This begs the question of whether CHAPTER 4. THE UNPREDICTABILITY SPECIFIC TO CHAOS 94 measure-theoretic deterministic descriptions and indeterministic descriptions can be observationally equivalent. Let us embark on this question in the next chapter. Chapter 5 Determinism vs. indeterminism: are deterministic and indeterministic descriptions observationally equivalent? 5.1 Introduction There has been a lot of philosophical debate about the question of whether the world is deterministic or indeterministic. Within this context, there is often the implicit belief (cf. Weingartner & Schurz 1996, p. 203) that de- terministic and indeterministic descriptions are not observationally equiva- lent. However, the question of whether these descriptions are observationally equivalent has hardly been discussed. This chapter aims to contribute to fill this gap. Namely, the central questions of this chapter are the following: are deterministic mathematical descriptions and indeterministic mathematical descriptions observationally equivalent? And what is the philosophical significance of the various results on observational equivalence? The deterministic and indeterministic descrip- tions of concern in this chapter are measure-theoretic deterministic systems and stochastic processes, respectively, both of which are ubiquitous in sci- ence. 95 CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 96 More specifically, by saying that a measure-theoretic deterministic system and a stochastic process are observationally equivalent, I will mean the fol- lowing: the deterministic system, when observed, gives the same predictions as the stochastic process. And when I say that a stochastic process can be simulated by a measure-theoretic deterministic system, or conversely, I will mean that it can be simulated by such a deterministic system in the sense that they are observationally equivalent. This chapter proceeds as follows. In section 5.2 I will show that measure- theoretic deterministic systems and stochastic processes can often be simu- lated by each other. Despite this, one might guess that it is impossible to simulate stochastic processes of the kinds in fact used in science by measure- theoretic deterministic systems that are used in science. I will show in section 5.3 that this guess is wrong. Given this, one might still guess that it is im- possible to simulate measure-theoretic deterministic systems of the kinds in fact used in science at every observation level by stochastic processes that are used in science. By proving some results in ergodic theory, I will show in section 5.4 that this guess is also wrong. Therefore, even stochastic processes and measure-theoretic deterministic system which, intuitively, seem to give very different predictions, are in fact observationally equivalent. Finally, in section 5.5 I will criticise the claims of the previous philosophical papers Sup- pes (1993), Suppes & de Barros (1996), Suppes (1999) and Winnie (1998) on observational equivalence. Then, in section 5.6 I will summarise my results. 5.2 Basic observational equivalence I will first discuss some results about observational equivalence which are basic in the sense that they are about the question whether, given a measure- theoretic deterministic system, it is possible to find any stochastic process which is observationally equivalent to the measure-theoretic deterministic system, and conversely. How can a stochastic process and a measure-theoretic deterministic sys- tem yield the same predictions? When a measure-theoretic deterministic system is observed, one only sees how one observed value follows the next observed value. Because the observation function can map two or more ac- CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 97 tual states to the same observed value, the same present observed value can lead to different future observed values. And so a stochastic process can be observationally equivalent to a measure-theoretic deterministic system only if it is assumed that the deterministic system is observed with an observation function which is many to one. Yet this assumption is usually unproblematic: the main reason being perhaps that measure-theoretic deterministic systems used in science typically have an infinitely large phase space, and scientists can only observe finitely many different values. A probability measure is defined on a measure-theoretic deterministic system. Hence the predictions derived from a deterministic system are the probability distributions over sequences of possible observations. And simi- larly, the predictions obtained from a stochastic process are the probability distributions over sequences of possible outcomes. Consequently, the most natural meaning of the phrase ‘a stochastic process and a measure-theoretic deterministic system are observationally equivalent’ is: (i) the set of possible outcomes of the stochastic process is identical to the set of possible observed values of the deterministic system1, and (ii) the realisations of the stochastic process and the solutions of the deterministic system coarse-grained by the observation function have the same probability distribution. Let me now investigate when deterministic systems can be simulated by stochastic processes. Then I will investigate when stochastic processes can be simulated by deterministic systems. 5.2.1 Deterministic systems simulated by stochastic processes Let (M,ΣM , µ, T ) be a discrete measure-theoretic deterministic system. Ac- cording to the canonical Definition 8, Zt(m) = T t(m) is a discrete stochas- tic process with exactly the same predictions as the discrete deterministic system. Likewise, given a continuous deterministic system (M,ΣM , µ, Tt), 1From a probabilistic viewpoint outcomes with probability zero or observed values with probability zero are irrelevant. Hence, more precisely, condition (i) is: the set of possible outcomes with positive probability is identical to the set of possible observed values with positive probability. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 98 according to the canonical definition Definition 9, Zt(m) = Tt(m) is a contin- uous stochastic process with exactly the same predictions as the continuous deterministic system. However, these processes are evidently equivalent to the original deterministic system, and the transition probabilities, i.e., the probabilities that one outcome leads to another one, are trivial (0 or 1). Hence they are still really deterministic systems. So this is the mathematical formalisation of the idea known in the philosophy literature that a determi- nistic system is the special case of a stochastic process where all probabilities are zero or one (cf. Butterfield 2005, Earman 1986). But one can do better by appealing to observation functions as explained above; and, to my knowledge, these results are unknown in philosophy. As- sume the discrete measure-theoretic deterministic system (M,ΣM , µ, T ) is ob- served with an observation function Φ :M →MO. Then {Zt = Φ(T t); t ∈ Z} is a discrete stochastic process. Likewise, assume the continuous measure- theoretic deterministic system (M,ΣM , µ, Tt) is observed with an observa- tion function Φ : M → MO. Then {Zt = Φ(Tt); t ∈ R} is a continuous stochastic process. These processes are constructed by applying the obser- vation function to the measure-theoretic deterministic system. Hence for any of these stochastic processes the following holds: the outcomes of the stochastic process are the observed values of the corresponding determinis- tic system; and the realisations of the stochastic processes and the solutions of the corresponding deterministic system coarse-grained by the observation function have the same probability distribution. Consequently, according to the characterisation above, (M,ΣM , µ, T ) observed with Φ is observationally equivalent to stochastic process {Φ(T t); t ∈ Z}, and (M,ΣM , µ, Tt) observed with Φ is observationally equivalent to stochastic process {Φ(Tt); t ∈ R}. But the important question is whether {Φ(T t); t ∈ Z} and {Φ(Tt); t ∈ R} are nontrivial. Indeed, they are often nontrivial. I give now a theorem for discrete time and a theorem for continuous time which show this by char- acterising a class of measure-theoretic deterministic systems as systems that yield stochastic processes which are nontrivial in a certain sense. Besides, several other results also indicate this (cf. Cornfeld et al. 1982, pp. 178–179).2 2For instance, if discrete Kolmogorov systems or continuous Kolmogorov systems are observed with a finite-valued observation function, one obtains nontrivial stochastic pro- CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 99 Recall Definition 7 of a partition. Let me make the realistic assumption that the observations have finite accuracy, i.e., that only finitely many val- ues are observed. Then one has a finite-valued observation function Φ; i.e., Φ(m) = ∑n i=1 oiχαi(m), MO = {oi | 1 ≤ i ≤ n}, for some partition α of (M,ΣM , µ) and some n ∈ N, where χA denotes the characteristic function of A (cf. Cornfeld et al. 1982, p. 179). A finite-valued observation function is called nontrivial if, and only if, its corresponding partition is nontrivial. The following two theorems show that under certain conditions the stochas- tic process {Φ(T t); t ∈ Z} and {Φ(Tt); t ∈ R} are nontrivial in the following sense: for any time k ∈ N or k ∈ R+ there is an observed value oi ∈ MO such that for all observed values oj ∈MO the probability of moving in k time steps from oi to oj is smaller than 1. Hence there are two or more observed values that one can reach in k time steps from oi; and the probability that oi moves to any of these observed values is between 0 and 1. These are strong results because irrespective of how closely one looks at the measure-theoretic deterministic systems, one always obtains nontrivial stochastic processes. Theorem 1 If, and only if, for the discrete measure-preserving deterministic system (M,ΣM , µ, T ) there does not exist an n ∈ N and a C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, T n(C) = C, then the following holds: for every nontrivial finite-valued observation function Φ : M → MO, MO = ∪rl=1ol, r ∈ N, every k ∈ N and the stochastic process {Zt=Φ(T t); t ∈ Z} there is an oi ∈MO such that for all oj ∈MO, P{Zt+k= oj |Zt=oi} < 1.3 For a proof of this theorem, see subsection 5.7.1. Theorem 2 If, and only if, for the continuous measure-preserving determi- nistic system (M,ΣM , µ, Tt) there does not exist a n ∈ R+ and a C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, Tn(C) = C, then cesses because for Kolmogorov systems the entropy of any finite partition H(α, T ) or H(α, T1) (see equation (3.6)) is positive (cf. Cornfeld et al. 1982, pp. 280–283; Petersen 1983, p. 83). 3For a random variable Z to a measurable space (M¯,ΣM¯ ) where M¯ is finite the condi- tional probability is defined as usual as: P{Z ∈ A |Z ∈ B} = P{Z ∈ A ∩B}/P{Z ∈ B} for all A,B ∈ ΣM¯ with P{Z ∈ B} > 0. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 100 the following holds: for every nontrivial finite-valued observation function Φ :M →MO, MO = ∪rl=1ol, r ∈ N, every k ∈ R+ and the stochastic process {Zt=Φ(Tt); t ∈ R} there is an outcome oi ∈ MO, such that for all possible outcomes oj ∈MO, P{Zt+k=oj |Zt=oi} < 1. For a proof of this theorem, see subsection 5.7.2. Now recall Definition 2.5 of being ergodic. An alternative and equivalent definition of ergodicity is the following (Cornfeld et al. 1982, pp. 14–15): Definition 35 A discrete measure-preserving deterministic system (M,ΣM , µ, T ) is ergodic if, and only if, there is no set A ∈ ΣM , 0 < µ(A) < 1, such that, except for a set of measure zero, T (A) = A. And note the following: the assumption of Theorem 1 that there does not exist an n ∈ N and an C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, T n(C) = C is equivalent to the condition that the discrete measure-preserving deterministic system (M,ΣM , µ, T n) is ergodic for all n ∈ N. And the assumption of Theorem 2 that that there does not exist an n ∈ R+ and a C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, Tn(C) = C, is equivalent to the condition that the discrete measure-preserving deterministic system (M,ΣM , µ, Tn) is ergodic for all n ∈ R+. Both discrete and continuous measure-preserving deterministic systems are typically what is called ‘weakly mixing’ (cf. Definition 23 and Defini- tion 24) (Halmos 1944, Halmos 1949). It is easy to see that any discrete weakly mixing deterministic system satisfies the assumption of Theorem 1 (in fact weakly mixing is stronger than this assumption).4 In the continuous case, as I have explained in subsection 3.4.2, the condition that there does not exist a n ∈ R+ and a C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, T n(C) = C, is equivalent to the measure-preserving deterministic system being weakly mixing (Hopf 1932b). Hence weak mixing 4First, assume that for a weakly mixing discrete measure-preserving deterministic sys- tem there exists an n ∈ N and a C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, Tn(C) = C. But then equation (23) cannot hold for A = C and B = C. In subsection 5.5.2 I will show that the irrational rotation on the circle satisfies the assump- tion of Theorem 1 but is not weakly mixing. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 101 is strictly stronger than the assumption of Theorem 1 but equivalent to the assumption of Theorem 2, and this indicates a difference between my results for continuous time and my results for discrete time. So we conclude that Theorem 1 and Theorem 2 show that for typical measure-preserving deter- ministic systems any finite-valued observation function yields a nontrivial stochastic process. Yet this does not say much about whether the measure-preserving de- terministic systems encountered in science fulfill the assumptions of Theo- rem 1 or Theorem 2 because the measure-preserving deterministic systems encountered in science constitute a small class of all measure-preserving de- terministic systems. Indeed, recall the discussion of the KAM theorem in subsection 4.3.2. The KAM theorem says that that the phase space of in- tegrable Hamiltonian deterministic systems which are perturbed by a small nonintegrable perturbation breaks up into stable regions and regions with unpredictable behaviour. With increasing perturbation the regions with un- predictable behaviour become larger and often eventually cover nearly the entire hypersurface of constant energy. Because according to the KAM the- orem the solutions of a system are often confined to a region of positive measure smaller than 1, this means that these systems, and their discrete versions, do not satisfy the assumptions of Theorem 1 or Theorem 2 (cf. Berkovitz et al. 2006, section 4). Despite this, Theorem 1 applies to several deterministic systems encoun- tered in science. For recall that there are several physically relevant discrete and continuous chaotic systems and that chaotic systems are strongly mixing (cf. subsection 4.3.2). It is clear that any strongly mixing measure-preserving deterministic system is also weakly mixing. Therefore, there are several physically relevant discrete and continuous deterministic systems which are weakly mixing (later in subsection 5.3.1 I will say more about which kind of stochastic processes you obtain from observing measure-theoretic deter- ministic systems encountered in science). For instance, the baker’s system (Example 1) is weakly mixing; thus it satisfies the assumption of Theorem 1. Billiards with convex obstacles (Example 2) are also weakly mixing and thus satisfy the assumption of Theorem 2. Consequently, for the baker’s system or a billiard system with convex obstacles any finite-valued observation function CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 102 gives rise to a nontrivial stochastic process. Moreover, in subsection 5.5.2 it will be shown that there are even deterministic systems which are neither chaotic nor chaotic on a region of phase space but which satisfy Theorem 1. Second, even if the whole measure-theoretic deterministic system does not satisfy the assumption of Theorem 1 or Theorem 2, the motion of the determi- nistic system restricted to some regions of phase space might well satisfy this assumption. In fact, Theorem 1 and Theorem 2 immediately imply the fol- lowing results. Assume that for a discrete measure-preserving deterministic system (M,ΣM , µ, T ) there is a A ∈ ΣM , µ(A) > 0, such that the determinis- tic system restricted to A5 fulfills the assumption of Theorem 1. Then all ob- servations which discriminate between values in A lead to nontrivial stochas- tic processes. That is, for any observation function Φ(m) = ∑n i=1 oiχαi(m) where there are h, l, h 6= l, such that µ(A ∩ αh) 6= 0 and µ(A ∩ αl) 6= 0, we have that for all k ∈ N there is an outcome oi ∈ MO such that for all outcomes oj ∈MO it holds that P{Zt+k=oj |Zt=oi} < 1. Likewise, assume that for a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) there is a A ∈ ΣM , µ(A) > 0, such that Theorem 2 applies to the determinis- tic system restricted to A. Then for any Φ(m) = ∑n i=1 oiχαi(m) where there are h, l, h 6= l, such that µ(A ∩ αh) 6= 0 and µ(A ∩ αl) 6= 0, we have that for all k ∈ R+ there is an oi ∈ MO such that for all oj ∈ MO it holds that P{Zt+k=oj |Zt=oi} < 1. In particular, although mathematically little is known, it is conjectured that the motion restricted to unstable regions of KAM-type systems is weakly mixing (cf. section 4.3.2). If this is true, then my argument shows that for many observation functions of KAM-type systems one obtains nontrivial stochastic processes. Theorem 1 and Theorem 2 show that several measure-theoretic determi- nistic systems, regardless which finite-valued observation function is applied, yield nontrivial stochastic processes. To appreciate this result, and for what follows later, it is important to note the following. For discrete time, assume that the stochastic process {Φ(T t); t ∈ Z}, where (M,ΣM , µ, T ) is a measure- 5That is, the measure-preserving deterministic system (A,ΣM∩A, µA, TA), where ΣM∩A = {B ∩ A|B ∈ ΣM}, µA(X) = µ(X)µ(A) , and TA denotes T restricted to A. (By assumption, TA : A→ A is bijective). CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 103 theoretic deterministic system and Φ is an observation function, matches our observations and is trivial (the transition probabilities are zero or one); for continuous time assume that the stochastic process {Φ(Tt); t ∈ R}, where (M,ΣM , µ, Tt) is a measure-theoretic deterministic system and Φ is an ob- servation function, matches our observations and is trivial (the transition probabilities are zero or one). That a trivial stochastic process is obtained does not imply that the observations derive from a deterministic system be- cause the trivial stochastic process may arise from an observed nontrivial stochastic process. Trivial stochastic processes can derive from both observ- ing deterministic systems and observing nontrivial stochastic processes. Let me explain this with two examples. Consider the measure-preserving deterministic system (M,ΣM , µ, T ) con- sisting of two copies of the baker’s system (Example 1) where M = ([0, 1]× [0, 1]\D)∪ ([2, 3]× [0, 1]\D′) with D′ = {(x, y) ∈ [2, 3]× [0, 1] | x = 2+ j/2n or y = j/2n, n ∈ N, 0 ≤ j ≤ 2n}, ΣM is the Lebesgue σ-algebra on M , µ the normalised Lebesgue measure on M , T restricted to [0, 1] × [0, 1] \ D is the baker’s system, and T restricted to [2, 3] × [0, 1] \ D′ is the baker’s system shifted to the right by (0, 2). Consider the partition {ζ1, ζ2} = {[0, 1] × [0, 1] \D, [2, 3] × [0, 1] \D′}, and the observation function Φ(m) = o1χζ1(m) + o2χζ2(m). So Φ merely reminds us in which of the two copies of the baker’s system the state of the system is in. Then, clearly, all transition probabilities of the stochastic process {Φ(T t); t ∈ Z} are zero or one. Now let γ = α ∪ β be a partition of M where α = {α1, . . . , αn} is a nontrivial partition of [0, 1]× [0, 1] \D and β = {β1, . . . , βh} is a nontrivial partition of [2, 3]×[0, 1]\D′ and define Ψ(m) =∑ni=1 uiχαi(m)+∑hj=1 vjχβj(m). Because the baker’s system is weakly mixing, {Ψ(T t); t ∈ Z} is a nontrivial stochastic process. Now define the observation function Γ : {u1, . . . , un, v1, . . . , vh} → {o1, o2}, Γ(ui) = o1 for all i and Γ(vj) = o2 for all j. Γ tells us whether the outcome is one of the ui or one of the vj, and so Γ(Ψ(m)) tells us which of the two copies of the baker’s system the state is in. Therefore, for all t ∈ Z we have Φ(T t) = Γ(Ψ(T t)), and thus {Φ(T t); t ∈ Z} is identical to {Γ(Ψ(T t)); t ∈ Z}. Consequently, the trivial stochastic process {Φ(T t); t ∈ Z} is obtained from observing the nontrivial stochastic process {Ψ(T t); t ∈ Z} with the observation function Γ. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 104 Or, to start from a stochastic process, consider the nontrivial Markov process {Zt; t ∈ Z} (cf. Example 5) with outcome space {s1, s2, s3, s4} where P{Zt = si} = 1/4, for all i, 1 ≤ i ≤ 4, P{Zt = si |Zt−1 = sj} = 1/2 for all i, j, 1 ≤ i, j ≤ 2, and P{Zt = si |Zt−1 = sj} = 1/2 for all i, j, 3 ≤ i, j ≤ 4. This means that the outcomes s1 and s2 can be reached from each other but not from the outcomes s3 or s4, and, likewise, that the outcomes s3 and s4 can be reached from each other but not from the outcomes s1 or s2. Thus the Markov process can be split into two parts: the dynamics involving s1 and s2 and the dynamics involving s3 and s4. 6 Consider the observation function Γ : {s1, s2, s3, s4} → {o1, o2} where Γ(s1) = Γ(s2) = o1 and Γ(s3) = Γ(s4) = o2. Γ tells us whether the outcome of the Markov process is in {s1, s2} or in {s3, s4}. So, clearly, {Γ(Zt); t ∈ Z} is a trivial stochastic process (all transition probabilities are 0 or 1). But it is obtained from observing the nontrivial Markov process {Zt; t ∈ Z} with the observation function Γ. 5.2.2 Stochastic processes simulated by deterministic systems I have shown that measure-theoretic deterministic systems, when observed, can yield nontrivial stochastic processes. But can one find, for every stochas- tic process, a measure-theoretic deterministic system which produces this process? The following idea of how to simulate stochastic processes by determinis- tic systems is well known in the technical literature (Petersen 1983, pp. 6–7)7 and is known to philosophers (Butterfield 2005); I also need to discuss it for what follows later. The underlying thought is that for each realisation rω, one sets up a deterministic system with phase space {rω}. So start with a discrete stochastic process {Zt; t ∈ Z} from (Ω,ΣΩ, ν) to (M¯,ΣM¯). LetM be the set of all bi-infinite sequencesm = (. . .m−1m0m1 . . .) with mi ∈ M¯, i ∈ Z, and let mt be the t-th coordinate of m, t ∈ Z. Let ΣM 6Hence, technically, the Markov process {Zt, t ∈ Z} is not irreducible (see Example 5). 7Petersen discusses it only for stationary stochastic processes; I consider generally stochastic processes. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 105 be the σ-algebra generated by the semi-algebra of cylinder-sets CA1...Ani1...in ={m∈M |mi1∈A1, ...,min∈An, Aj∈ΣM¯ , ij∈Z, i1< ... 0, ε ∈ R. There are two aspects. First, one imagines that in practice, for suf- ficiently small ε1, one cannot distinguish states of the deterministic system which are less than the distance ε1 apart. The second aspect concerns prob- abilities: in practice, for sufficiently small ε2, one will not be able to observe differences in probabilities of less than ε2. Assume that ε is smaller than ε1 and ε2. Then we can define a measure-preserving deterministic system and a stochastic process to give the same predictions at observation level ε if the following holds: the solutions of the measure-preserving deterministic system can be put into one-to-one correspondence with the realisations of the stochastic process in such a way that the actual state of the determi- nistic system and the corresponding outcome of the stochastic process are at each time point less then ε apart except for a set whose probability is smaller than ε. One can think of this notion of giving the same predictions at observation level ε as a kind of shadowing result: for each solution of the measure-preserving deterministic system the corresponding realisation of the stochastic process shadows this solution in the sense that at each time CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 117 point the state of the deterministic system and the outcome of the stochastic process are within ε (except for a set whose probability is smaller than ε). Mathematically, this idea is captured by the notion of ε-congruence. To define it, one needs to speak of distances between states in the phase space M of the deterministic system; hence one assumes a metric dM defined on M . So we need to find a stochastic process whose outcome is within distance ε of the actual state of the deterministic system. Hence one assumes that each possible outcome of the stochastic process is a subset of the phase space of the deterministic system. Now recall Definition 36 and Definition 37 of the deterministic representation and Definition 19 of being isomorphic. So finally, I can define: Definition 39 Let (M,ΣM , µ, T ) be a discrete measure-preserving determi- nistic system, where (M,dM) is a metric space. Let (M2,ΣM2 , µ2, T2,Φ0) be the deterministic representation of the stationary stochastic process {Zt; t ∈ Z} with outcomes in (M,dM), i.e., Φ0 : M2 → M . (M,ΣM , µ, T ) is ε- congruent to {Zt; t ∈ Z} if, and only if, (M,ΣM , µ, T ) is isomorphic via a function φ : M → M2 to (M2,ΣM2 , µ2, T2) and dM(m,Φ0(φ(m))) < ε for all m ∈ M except for a set of measure < ε in M . For continuous measure- preserving deterministic systems, where (M,dM) is a metric space, and con- tinuous stationary stochastic processes {Zt; t ∈ R} with outcomes in (M,dM) ε-congruence is defined analogously (cf. Ornstein & Weiss 1991, pp. 22–23). By generalising over ε, one obtains a natural meaning of the phrase that stochastic processes of a certain type simulate a measure-preserving deter- ministic system at every observation level, namely: for every ε > 0 there is a stochastic process of this type which gives the same predictions at ob- servation level ε. Or technically: for every ε > 0 there exists a stochastic process of this type which is ε-congruent to the measure-preserving determi- nistic system. This notion, referring to ε-congruence, is the standard notion of simulation at every observation level discussed in the literature (Ornstein & Weiss 1991; Suppes 1999). Note that ε-congruence does not assume that the measure-preserving de- terministic system is observed with an observation function: the actual states of the deterministic system, and not states observed with an observation func- CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 118 tion, are compared with the outcomes of the stochastic process. To arrive at a notion of observational equivalence no observation functions are invoked, but it is asked whether the actual state of the deterministic system and the corresponding outcome of the stochastic process are less than distance ε apart. At this point I should mention that it follows from the discussion at the end of subsection 5.2.3 that if a discrete measure-preserving deterministic system and a discrete stochastic process {Zt; t ∈ Z} are ε-congruent, then {Φ0(φ(T t)); t ∈ Z}, where Φ0(φ(T t)) can take any arbitrary values in M¯ for m ∈ M \ Mˆ , is the stochastic process {Zt; t ∈ Z}. Likewise, for con- tinuous time it follows that if a continuous measure-preserving deterministic system and a continuous stochastic process {Zt; t ∈ R} are ε-congruent, then {Φ0(φ(Tt)); t ∈ R}, where Φ0(φ(Tt)) can take any arbitrary value in M¯ for m ∈ M \ Mˆ , is the stochastic process {Zt; t ∈ R}. Technically, Φ0(φ) is an observation function of the deterministic system but for ε-congruence it is not interpreted in this way. Instead, the meaning of Φ0(φ) is as follows: when it is applied to the deterministic system, the resulting process is the stochas- tic process whose realisations shadow the solutions of the measure-preserving deterministic system (at precision ε). Sometimes we might want to know what stochastic processes are obtained if specific observation functions are applied to a deterministic system, and, as explained, the notion of ε-congruence does not help us in answering this question. For this reason, I will now introduce two other meanings of simu- lation at every observation level which compare measure-preserving determi- nistic systems as observed with observation functions to stochastic processes. Whether a notion of simulation at every observation level is preferable that (i) is based on the assumption that you cannot compare states which are within distance ε (such as the notion based on Definition 39) or (ii) a notion that tells us what stochastic processes are obtained if specific observation functions are applied to a deterministic system (such as the notion based on Definition 40 or the notion based on Definition 41), will depend on the modeling process and the phenomenon under consideration. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 119 A new meaning based on strong (Φ, ε)-simulation To introduce the second meaning of simulation at every observation level, I have to start by explaining what it means for a stochastic process and a measure-preserving deterministic system as observed with an observation function Φ to give the same predictions relative to accuracy ε > 0, ε ∈ R+, where ε indicates that we cannot distinguish differences in probabilistic pre- dictions of less than ε. It is plausible that this means that the possible ob- served values of the measure-preserving deterministic system and the possible outcomes of the stochastic process are the same, and that the probabilistic predictions of the deterministic system as observed with Φ and the proba- bilistic predictions of the stochastic process are the same or differ by less than ε. Technically, this idea is captured by the definition of strong (Φ, ε)-simu- lation; (the reason for ‘strong’ will become clear soon). Since in practice scientists can only observe finitely many values, I will assume that Φ is a finite-valued observation function. Definition 40 A discrete stochastic process {Zt; t ∈ Z} strongly (Φ, ε)- simulates a discrete measure-preserving deterministic system (M,ΣM , µ, T ) observed with Φ, where Φ : M → M¯ is a surjective finite-valued obser- vation function, if, and only if, there is a surjective measurable function Ψ : M → M¯ such that (i) Zt = Ψ(T t) for all t ∈ Z, and (ii) µ({m ∈ M |Ψ(m) 6= Φ(m)}) < ε. That a continuous stochastic process {Zt; t ∈ R} strongly (Φ, ε)-simulates a continuous measure-preserving deterministic sys- tem (M,ΣM , µ, Tt) observed with Φ, where Φ :M → M¯ is a surjective finite- valued observation function, is defined analogously. If ε is small enough, the notion of strong (Φ, ε)-simulation captures the idea that in practice the observed measure-preserving deterministic system and the stochastic process give the same predictions. By generalising over Φ and ε, we obtain a plausible meaning of the phrase that stochastic pro- cesses of a certain type simulate a measure-preserving deterministic system at any observation level, namely: for every finite-valued observation func- tion Φ and every ε there is a stochastic process of this type which strongly CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 120 (Φ, ε)-simulates the deterministic system. This notion of simulation at ev- ery observation seems very natural because it allows that the deterministic system is observed with any finite-valued observation function. Yet, to my knowledge, it has not really been discussed in the literature. A new meaning based on weak (Φ, ε)-simulation The notion of strong (Φ, ε)-simulation tells us what stochastic process we obtain when we apply an observation function Φ to the measure-preserving deterministic system. This notion can be relaxed by allowing that what you obtain when you observe the measure-preserving deterministic system with an observation function Φ is an observed stochastic process. That is, I require that there is a stochastic process and an observation function Γ of this stochastic process such that the stochastic process as observed with Γ gives the same predictions as the measure-preserving deterministic system as observed with Φ for accuracy ε > 0, where ε ∈ R+ (as before, ε indicates that we cannot distinguish differences in probabilistic predictions of less than ε). More specifically, I require that there is an observation of the stochastic process such that the possible observed outcomes of the stochastic process are the possible observed values of the measure-preserving deterministic system, and that the probabilistic predictions of the stochastic process observed with Γ and the probabilistic predictions of the deterministic system observed with Φ are the same or differ by less than ε. Technically, this idea is captured by the notion of weak (Φ, ε)-simulation. Again, since in practice scientists can only observe finitely many values, I will assume that Φ is a finite-valued observation function. Definition 41 A discrete stochastic process {Zt; t ∈ Z} weakly (Φ, ε)- simulates a discrete measure-preserving deterministic system (M,ΣM , µ, T ) observed with Φ, where Φ : M → M¯ is a surjective finite-valued obser- vation function if, and only if, there is a surjective measurable function Ψ : M → S and a surjective observation function Γ : S → M¯ such that (i) Γ(Zt) = Ψ(T t) for all t ∈ Z, and (ii) µ({m ∈ M |Ψ(m) 6= Φ(m)}) < ε. That a continuous stochastic process {Zt; t ∈ R} weakly (Φ, ε)-simulates a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) observed CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 121 with Φ, where Φ :M → M¯ is a surjective finite-valued observation function, is defined analogously. I call it weak (Φ, ε)-simulation because if a stochastic process strongly (Φ, ε)-simulates a measure-preserving deterministic system, then it is appar- ent that it also weakly (Φ, ε)-simulates the deterministic system; (we can simply choose Γ be to be the identity function, that is, we let Γ : S → S, Γ(s) = s). It is also clear that the converse is generally not true. If ε is small enough, weak (Φ, ε)-simulation captures the idea that the observed stochastic process and the deterministic system as observed with Φ give the same predictions. Again, by generalising over Φ and ε, we obtain a plausible meaning of the phrase that stochastic processes of a certain type simulate a measure-preserving deterministic system at any observation level, namely: for every finite-valued observation function Φ and every ε there is a stochastic process of this type which weakly (Φ, ε)-simulates the deterministic system. To my knowledge, this notion of simulation at every observation level has not been discussed in the literature before. Compared to the second notion of simulation at every observation level, this third notion only requires that the data could derive from some observed stochastic process. For this reason, the second notion might look more at- tractive. Still, according to all three notions of simulation at every obser- vation level, regardless how the measure-preserving deterministic system is observed, the data could derive from the measure-preserving deterministic system or a stochastic process of a certain type. Hence for all three notions it will be worthwhile to see in the next subsection what results we obtain. 5.4.2 Stochastic processes used in science which simu- late deterministic systems used in science at ev- ery observation level The discrete-time case For a Bernoulli process the next outcome of the process is probabilistically independent of its previous outcome. So, intuitively, it seems clear that discrete measure-preserving deterministic systems used in science, for which CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 122 the next state of the system is constrained by its previous state (because of the underlying determinism at the level of states), cannot be simulated by Bernoulli processes at every observation level. Smith (1998, pp. 160–162) also hints at this idea but does not substantiate it with a proof. The following two theorems and the following proposition show that for our three notions of simulation at every observation level (respectively) this idea is indeed correct. Consequently, these results show a limitation on the observational equivalence of discrete measure-theoretic deterministic systems and discrete stochastic processes. Theorem 3 Let (M,ΣM , µ, T ) be a discrete measure-preserving determinis- tic system where ΣM contains all open balls of the metric space (M,dM) 12, T is continuous at some point x ∈ M , every open ball around x has pos- itive measure, and there is a set D ∈ ΣM , µ(D) > 0, with d(T (x), D) = inf{d(T (x),m) |m ∈ D} > 0. Then there is some ε > 0 for which there is no Bernoulli process to which (M,ΣM , µ, T ) is ε-congruent. For a proof of this theorem, see subsection 5.7.3. The assumptions of this theorem are very mild and always hold for measure-preserving deterministic systems used in science. Theorem 4 Let (M,ΣM , µ, T ) be a discrete measure-preserving determinis- tic system. Then there is a finite-valued observation function Φ and an ε > 0 such that no Bernoulli process strongly (Φ, ε)-simulates (M,ΣM , µ, T ). For a proof, see subsection 5.7.4. Proposition 1 Let (M,ΣM , µ, T ) be a discrete measure-preserving determi- nistic system. Then there is a finite-valued observation function Φ and an ε > 0 such that no Bernoulli process weakly (Φ, ε)-simulates (M,ΣM , µ, T ). For a proof of Proposition 1, see subsection 5.7.5. Given these results, it is natural to ask (which, incidentally, Smith 1998 does not do) whether discrete measure-preserving deterministic systems used 12An open ball with centre y and radius ε > 0, y ∈ M , is defined as the set {m ∈ M | d(m, y) < ε}. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 123 in science can be simulated at every observation level by other stochastic pro- cesses used in science. The answer is ‘yes’. Besides, all one needs are Markov processes (Example 5) or multi-step Markov processes (Example 6), which are widely used in science. Markov processes are often regarded as random; in particular, Bernoulli processes are regarded as the most random stochas- tic processes and Markov processes as the next most random (Eagle 2005; Ornstein & Weiss 1991, p. 38 and p. 66). The following two theorems and one proposition show that discrete Bernoulli systems (cf. Definition 20) can be simulated at every observation level by irreducible and aperiodic Markov processes or by irreducible and aperiodic multi-step Markov processes (con- cerning respectively the three notions defined in subsection 5.4.1). Theorem 5 Let (M,ΣM , µ, T ) be a discrete Bernoulli system where the met- ric space (M,dM) is separable 13 and where ΣM contains all open balls of (M,dM). Then for any ε > 0 there is an irreducible and aperiodic Markov process such that (M,ΣM , µ, T ) is ε-congruent to this Markov process. For a proof, see subsection 5.7.6. The assumptions in this theorem are fulfilled by all discrete Bernoulli systems used in science. Theorem 6 Let (M,ΣM , µ, T ) be a discrete Bernoulli system. Then for every finite-valued observation function Φ and every ε > 0 there is an n such that an irreducible and aperiodic Markov process of order n strongly (Φ, ε)-simulates (M,ΣM , µ, T ). For a proof of this theorem, see Radunskaya (1992, chapter 4). Proposition 2 Let (M,ΣM , µ, T ) be a discrete Bernoulli system. Then for every finite-valued observation function Φ and every ε > 0 there is an irreducible and aperiodic Markov process which weakly (Φ, ε)-simulates (M,ΣM , µ, T ). 13(M,dM ) is separable if, and only if, there exists a countable set M¨ = {mn |n ∈ N} with mn ∈ M such that every nonempty open subset of M contains at least one element of M¨ . CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 124 For a proof of this proposition, see subsection 5.7.7. For example, consider the baker’s system (M,ΣM , µ, T ) (Example 1) with the Euclidean metric dM . It is a discrete Bernoulli system. Thus for every ε > 0 there is a Markov process such that the baker’s system is ε-congruent to this Markov process. And for every finite-valued observation function Φ and every ε > 0 there is an n such that an irreducible and aperiodic Markov process of order n strongly (Φ, ε)-simulates the baker’s system. And finally, for every finite-valued observation function Φ and every ε > 0 there is an irreducible and aperiodic Markov process which weakly (Φ, ε)-simulates the baker’s system. Now one might ask whether not only discrete Bernoulli systems but maybe also other discrete measure-preserving deterministic systems used in science can be simulated at every observation level by irreducible and ape- riodic Markov processes or by irreducible and aperiodic multi-step Markov processes. As the following theorem (Theorem 7) shows, according to our first notion of simulation at every observation level, indeed only Bernoulli systems can be simulated at every observation level by irreducible and ape- riodic Markov processes. For the second and third notion of simulation of every observation level the complete picture is unknown. But I will give two theorems (Theorem 8 and Theorem 9) which show that two important classes of discrete measure-preserving deterministic systems cannot be simu- lated at every observation level by irreducible and aperiodic Markov processes or by irreducible and aperiodic multi-step Markov processes. Namely, these classes are: (i) discrete measure-preserving deterministic systems with zero Kolmogorov-Sinai entropy, and (ii) discrete measure-preserving determinis- tic systems which are ergodic, which have finite Kolmogorov-Sinai entropy and which are not discrete Bernoulli systems (recall that nearly all deter- ministic systems in science have finite Kolmogorov-Sinai entropy, see sub- section 5.3.1). The classes (i) and (ii) include many discrete deterministic systems used in science, e.g., all discrete versions of integrable Hamiltonian systems, all discrete versions of the motion on clearly non-chaotic regions of KAM-type systems, periodic motion and fixed points (Arnold & Avez 1968, pp. 86–90 and pp. 210–214; Lichtenberg & Lieberman 1992, chapter 3–5; Petersen 1983, p. 245). CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 125 So let me first state the theorem about the first notion of simulation at every observation level. Theorem 7 The deterministic representation of any irreducible and aperi- odic multi-step Markov process (and thus the deterministic representation of any irreducible and aperiodic Markov process) is a discrete Bernoulli system. For a proof of this deep theorem, see Ornstein (1974, pp. 45–47). Let (M,ΣM , µ, T ) be a discrete measure-preserving deterministic system, and as- sume that for all ε > 0 there is an irreducible and aperiodic Markov process which is ε-congruent to (M,ΣM , µ, T ). Then the deterministic representa- tion of any of these Markov processes is isomorphic to (M,ΣM , µ, T ). Hence Theorem 7 implies that (M,ΣM , µ, T ) is a discrete Bernoulli system. Let me now state the theorems about the second and third notion of simulation at every observation level. Theorem 8 Assume that (M,ΣM , µ, T ) is a discrete measure-preserving de- terministic system with zero Kolmogorov-Sinai entropy or a discrete ergodic measure-preserving deterministic system with finite Kolmogorov-Sinai en- tropy which is not a discrete Bernoulli system. Then there is a finite-valued observation function Φ and an ε > 0 such that no irreducible and aperiodic multi-step Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, T ). See subsection 5.7.8 for a proof of this theorem. Theorem 9 Assume that (M,ΣM , µ, T ) is a discrete measure-preserving de- terministic system with zero Kolmogorov-Sinai entropy or a discrete ergodic measure-preserving deterministic system with finite Kolmogorov-Sinai en- tropy which is not a discrete Bernoulli system. Then there is a finite-valued observation function Φ and an ε > 0 such that no irreducible and aperiodic Markov process weakly (Φ, ε)-simulates (M,ΣM , µ, T ). For a proof of this Theorem, see subsection 5.7.9. The continuous-time case So far we have only discussed discrete stochastic processes and discrete measure-preserving deterministic systems. What about continuous stochas- tic processes and continuous measure-preserving deterministic systems? It CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 126 turns out that analogous results hold here too. Namely, as the following three theorems show, according to our three notions of simulation at every observation level, continuous Bernoulli systems can be simulated at every ob- servation level by irrationally related semi-Markov processes (Example 7) or by irrationally related multi-step semi-Markov processes (Example 8). Theorem 10 Let (M,ΣM , µ, Tt) be a continuous Bernoulli system where the metric space (M,dM) is separable and ΣM contains all open balls of (M,dM). Then for any ε > 0 there is an irrationally related semi-Markov process such that (M,ΣM , µ, Tt) is ε-congruent to this semi-Markov process. For a proof of this theorem, see Ornstein & Weiss (1991, pp. 93–94). The assumptions in this theorem are fulfilled by all continuous Bernoulli systems used in science. Theorem 11 Let (M,ΣM , µ, Tt) be a continuous Bernoulli system. Then for every finite-valued observation function Φ and every ε > 0 there is an n such that an irrationally related semi-Markov process of order n strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). For a proof of this theorem, see Ornstein & Weiss (1991, pp. 94–95). Theorem 12 Let (M,ΣM , µ, Tt) be a continuous Bernoulli system. Then for every finite-valued observation function Φ and every ε > 0 there is an irrationally related semi-Markov process {Zt, t ∈ R} which weakly (Φ, ε)- simulates (M,ΣM , µ, Tt). For a proof of this theorem, see subsection 5.7.10. For instance, consider a billiard systems with convex obstacles (Example 2) with the Euclidean metric dM , and recall that it is a continuous Bernoulli system. Hence for every ε > 0 there is an irrationally related semi-Markov process such that the billiard system with convex obstacles is ε-congruent to this semi-Markov process. And it holds that for every finite-valued observa- tion function Φ and every ε > 0 there is an n such that an irrationally related semi-Markov process of order n strongly (Φ, ε)-simulates the billiard system CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 127 with convex obstacles. And finally, for every finite-valued observation func- tion Φ and every ε > 0 there is an irrationally related semi-Markov process which weakly (Φ, ε)-simulates the billiard system with convex obstacles. As in the discrete case, you might wonder whether not only continuous Bernoulli systems but maybe also other measure-preserving deterministic systems used in science can be simulated at every observation level by irra- tionally related semi-Markov processes or by irrationally related multi-step semi-Markov processes. Again, here results analogous to the ones for dis- crete time can be shown. Namely, as the following theorem (Theorem 13) shows, according to the first notion of simulation at every observation level, only continuous Bernoulli systems can be simulated at every observation level by irrationally related semi-Markov processes. For the second and third notion of simulation of every observation level the complete picture is unknown. But below are two theorems (Theorem 14 and Theorem 15) which show that two important classes of continuous measure-preserving de- terministic systems cannot be simulated at every observation level by irra- tionally related multi-step semi-Markov processes or by irrationally related semi-Markov processes. Namely, these classes are: (i) continuous measure- preserving deterministic systems with zero Kolmogorov-Sinai entropy, and (ii) continuous measure-preserving deterministic systems (M,ΣM , µ, Tt) with finite Kolmogorov-Sinai entropy which are not continuous Bernoulli sys- tems and where for some t0 ∈ R, t0 6= 0, the discrete deterministic system (M,ΣM , µ, Tt0) is ergodic (recall that nearly all deterministic systems in sci- ence have finite Kolmogorov-Sinai entropy, see subsection 5.3.1). The classes (i) and (ii) include many continuous deterministic systems used in science, e.g., all integrable Hamiltonian systems, the motion on clearly non-chaotic regions of KAM-type systems and any periodic motion (Arnold & Avez 1968, pp. 86–90 and pp. 210–214; Lichtenberg & Lieberman 1992, chapter 3–5). Let me first present the theorem about the first notion of simulation at every observation level. Theorem 13 The deterministic representation of every irrationally related multi-step semi-Markov process (and thus the deterministic representation of any irrationally related semi-Markov process) is a continuous Bernoulli system. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 128 See Park (1982) and Ornstein (1974, pp. 56–61) for a proof of this theorem. Let (M,ΣM , µ, Tt) be a continuous measure-preserving deterministic system. Assume that for all ε > 0 there is an irrationally related semi-Markov process which is ε-congruent to (M,ΣM , µ, Tt). Then the deterministic representa- tion of any of these semi-Markov processes is isomorphic to (M,ΣM , µ, Tt). Consequently, it follows from Theorem 13 that (M,ΣM , µ, Tt) is a continuous Bernoulli system. Let me now present the theorems about the second and third notion of simulation at every observation level. Theorem 14 Let (M,ΣM , µ, Tt) be a continuous measure-preserving deter- ministic system with zero Kolmogorov-Sinai entropy or a continuous measure- preserving deterministic system which is not a continuous Bernoulli system and where for some t0 ∈ R\{0} the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. Then there is a finite-valued observation function Φ and an ε > 0 such that no irrationally related multi-step semi- Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). For a proof of this theorem, see subsection 5.7.11. Theorem 15 Let (M,ΣM , µ, Tt) be a continuous measure-preserving deter- ministic system with zero Kolmogorov-Sinai entropy or a continuous measure- preserving deterministic system which is not a continuous Bernoulli system and where for some t0 ∈ R\{0} the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. Then there is a finite-valued observation function Φ and an ε > 0 such that no irrationally related semi-Markov pro- cess weakly (Φ, ε)-simulates (M,ΣM , µ, Tt). See subsection 5.7.12 for a proof of this theorem. To summarise the most important points: the results of this section show that discrete Bernoulli systems can be simulated at every observation level by irreducible and aperiodic Markov processes and by irreducible and ape- riodic multi-step Markov processes, respectively. And continuous Bernoulli systems can be simulated at every observation level by irrationally related semi-Markov processes and by irrationally related multi-step semi-Markov CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 129 processes, respectively. Recall that Markov processes, multi-step Markov processes, semi-Markov processes and multi-step semi Markov processes are widely used in science to model phenomena. Also recall that several dis- crete deterministic systems used in science are discrete Bernoulli systems and that several continuous deterministic systems used in science are contin- uous Bernoulli systems (see subsection 5.3.1). Consequently, I conclude that the conjecture advanced at the beginning of this subsection is wrong: it is possible to simulate measure-theoretic deterministic systems used in science at every observation level by stochastic processes used in science; sometimes even by Markov processes, which are regarded as the next most random stochastic processes after Bernoulli processes. All this shows that even kinds of stochastic processes and kinds of deterministic systems which intuitively seem to give very different predictions can be observationally equivalent. 5.5 Previous philosophical discussion Let me discuss the previous philosophical papers about the topic of this chapter that I have been able to find. Suppes & de Barros (1996) and Suppes (1999) discuss an instance of Theorem 5, namely that for discrete versions of billiard systems with convex obstacles and for every ε > 0 there is a Markov process such that the billiard system is ε-congruent to this Markov process. Suppes (1993) (albeit with only half a page on the topic of this chapter) and Winnie (1998) discuss the theorem that for continuous Bernoulli systems and for every ε > 0 there is an irrationally related semi-Markov process which is ε-congruent to the deterministic system (Theorem 10). And Hoefer’s (2008) entry briefly summarises and comments on the debate between Suppes (1993) and Winnie (1998). My discussion of the previous philosophy literature will focus on three issues: the significance of Theorem 5 and Theorem 10, the role of chaos in results on observational equivalence, and the question of whether the deter- ministic or the stochastic description is the better one. Let me start with the first issue. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 130 5.5.1 The significance of Theorem 5 and Theorem 10 Suppes & de Barros (1996, p. 196), Suppes (1999, pp. 181–182) and Winnie (1998, p. 317) claim that the philosophical significance of Theorem 10 and of the above-mentioned instance of Theorem 5 is that for chaotic motion and every observation level one can choose between a deterministic description used in science and a stochastic description. For instance, Suppes & de Barros (1996, p. 196) comment on the significance of these results: What is fundamental is that independent of this variation of choice of examples or experiments is that [sic] when we do have chaotic phenomena [...] then we are in a position to choose either a deterministic or stochastic model. However, I submit that these claims are weak, and Theorem 5 and The- orem 10 show more. As discussed in subsection 5.2.1, the basic results on observational equivalence already show that for many measure-preserving deterministic systems, including several deterministic systems used in sci- ence, the following holds: for every finite-valued observation function one can choose between a nontrivial stochastic description or a deterministic de- scription (cf. Theorem 1 and Theorem 2). And as one would expect, the following two propositions show that this implies the following: according to our first notion of simulation at every observation level, many determi- nistic systems, namely all those to which either Theorem 1 or Theorem 2 applies and which additionally have finite Kolmogorov-Sinai entropy, can be simulated at every observation level by nontrivial stochastic processes. (As discussed in subsection 5.3.1, nearly all measure-preserving deterministic sys- tems used in science have finite Kolmogorov-Sinai entropy). Proposition 3 Let (M,ΣM , µ, T ) be a discrete measure-preserving determi- nistic system where (M,dM) is separable and where ΣM contains all open balls of (M,dM). Assume that (M,ΣM , µ, T ) satisfies the assumption of Theorem 1 and has finite Kolmogorov-Sinai entropy. Then for every ε > 0 there is a stochastic process {Zt; t ∈ Z} with outcome space M¯ = ∪hl=1ol, h ∈ N, such that {Zt; t ∈ Z} is ε-congruent to (M,ΣM , µ, T ), and for all k ∈ N there is an outcome oi ∈ M¯ such that for all oj ∈ M¯ , 1 ≤ j ≤ h, P{Zt+k=oj |Zt=oi} < 1. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 131 This proposition is easy to establish. For a proof, see subsection 5.7.13. Proposition 4 Let (M,ΣM , µ, Tt) be a continuous measure-preserving de- terministic system where (M,dM) is separable and where ΣM contains all open balls of (M,dM). Assume that (M,ΣM , µ, Tt) satisfies the assumption of Theorem 2 and has finite Kolmogorov-Sinai entropy. Then for every ε > 0 there is a stochastic process {Zt; t ∈ R} with outcome space MO = ∪hl=1ol, h ∈ N, such that {Zt; t ∈ R} is ε-congruent to (M,ΣM , µ, Tt), and for all k ∈ R+ there is an outcome oi ∈ MO such that for all oj ∈ MO, 1 ≤ j ≤ h, P{Zt+k=oj |Zt=oi} < 1. Again, this proposition is easy to establish. See subsection 5.7.14 for a proof. Also, clearly, the basic results immediately imply the following for the second and third notion of simulation at every observation level: every measure-preserving deterministic system to which Theorem 1 or Theorem 2 applies, and thus many measure-preserving deterministic systems (includ- ing deterministic systems used in science), can be simulated at every ob- servation level by nontrivial stochastic processes. This is so because the definition of the second or third notion of simulation at every observation level quantifies over all finite-valued observation functions Φ. Given a finite- valued observation function Φ and a discrete measure-preserving determinis- tic system (M,ΣM , µ, T ), the stochastic process {Φ(T t); t ∈ Z} is nontrivial by Theorem 1. And, obviously, {Φ(T t); t ∈ Z} strongly (Φ, ε)-simulates (M,ΣM , µ, T ) and weakly (Φ, ε)-simulates (M,ΣM , µ, T ). Likewise, given a continuous measure-preserving deterministic system (M,ΣM , µ, Tt), the stochastic process {Φ(Tt); t ∈ R} is nontrivial by Theorem 2, and {Φ(Tt); t ∈ R} strongly (Φ, ε)-simulates (M,ΣM , µ, Tt) and weakly (Φ, ε)-simulates (M,ΣM , µ, Tt). Hence all measure-preserving deterministic systems to which Theorem 1 or Theorem 2 applies are simulated at every observation level by nontrivial stochastic processes. And similar results for chaotic systems were known long before the ε- congruence results were proved (cf. subsection 5.2.1). Hence the fact that at every observation level one has a choice between a measure-preserving deterministic system used in science and a stochastic process was known long before the ε-congruence results (the instance of Theorem 5 and Theorem 10) CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 132 were proved; and so this cannot be the philosophical significance of these results as claimed by these authors. As I have argued in subsection 5.4.1, the significance of these results is something stronger: namely that it is possible to simulate measure-preserving deterministic systems used in science at every observation level by stochastic processes used in science. Moreover, Suppes & de Barros (1996, p. 196–198) and Suppes (1999, p. 189 and p. 192) wrongly think that what it means for a measure-preserving deterministic system to be ε-congruent to a certain type of stochastic process for every ε > 0 (the first notion of simulation at every observation level) is the following: the deterministic system observed with any finite-valued observation function yields a stochastic process of a certain type (that is, something like my second notion of simulation at every observation level). As discussed in subsection 5.4.1, the first and the second notion of simulation at every observation level are quite different (for instance, only the latter tells us what happens if we apply any arbitrary observation function to a deterministic system). And in particular, as we have seen in subsection 5.4.2, the first and second notion give rise to different results.14 There is hardly any conceptual or philosophical discussion in the math- ematics literature on those mathematical results presented in this chapter which were already proven before. The main exception is the following com- ment by Ornstein & Weiss (1991, pp. 39–40): Our theorem [Theorem 10] also tells us that certain semi-Markov systems could be thought of as being produced by Newton’s laws (billiards seen through a deterministic viewer) or by coin-flipping. This may mean that there is no philosophical distinction between 14The reader should also be warned that there are several technical lacunae in Suppes & de Barros (1996) and Suppes (1999). For instance, according to their definition, any two measure-preserving deterministic systems whatsoever are ε-congruent (let the metric space simply consist of one element). Also, these authors do not seem to be aware that the results about simulation at every observation level by semi-Markov processes (Theorem 10) require the measure-preserving deterministic system to be a Bernoulli system and so do not generally hold for ergodic measure-preserving deterministic systems. And in these papers it is wrongly assumed that the notion of isomorphism requires that the measure- preserving deterministic system is looked at through a finite-valued observation function (Suppes & de Barros 1996, p. 198; Suppes 1999, pp. 189–192). CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 133 processes governed by roulette wheels and processes governed by Newton’s laws. {The popular literature emphasizes the distinc- tion between “deterministic chaos” and “real randomness”.} In this connection we should note that our model for a stationary process (§ 1.2) [the deterministic representation] means that ran- dom processes have a deterministic model. This model, however, is abstract, and there is no reason to believe that it can be en- dowed with any special additional structure. Our point is that we are comparing, in a strong sense, Newton’s laws and coin flip- ping.15 It is hard to tell what this comment expresses because it is vague and unclear. For instance, why do Ornstein & Weiss highlight coin flipping even though Theorem 10 does not tell us anything about Bernoulli pro- cesses but only about semi-Markov processes? Disregarding that, possibly, Ornstein and Weiss think that semi-Markov processes are random and hence this comment expresses that deterministic systems as well as stochastic pro- cesses can be random. This is true and in fact widely acknowledged in the philosophy literature (e.g., Eagle 2005). Or maybe Ornstein & Weiss want to say that measure-preserving deterministic systems used in science, when observed with specific observation functions, can be observationally equivalent to stochastic processes used in science or, if semi-Markov pro- cesses are random, even random stochastic processes.16 This is true and an important insight. Yet, as discussed in subsection 5.3.1, this insight was generally known before Theorem 10 and related results were proven, and it has been established by theorems which are weaker than Theorem 10. One might have expected Ornstein & Weiss to say that Theorem 10 shows that measure-preserving deterministic systems used in science can be simulated at every observation level by stochastic processes used in science (cf. sub- section 5.4.2). But they do not seem to say this here: because, if they did, 15The text enclosed in braces is in a footnote. 16As explained in subsection 5.4.1, if a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) and a semi-Markov process {Zt; t ∈ R} are ε-congruent, then there is a finite-valued observation function Φ such that {Φ(Tt); t ∈ R} is the same semi-Markov process. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 134 it would be unclear why the deterministic representation is mentioned; and also they do not talk about all possible observation levels. In any case, it goes without saying that even if Theorem 10 shows that deterministic and stochastic descriptions are observationally equivalent in some sense, it is not true that “this may mean that there is no philosophi- cal distinction between processes governed by roulette wheels and processes governed by Newton’s laws” in the sense that this may mean that there is no conceptual distinction between a deterministic description and a stochas- tic description (as a kind of indeterministic description). Regardless of any results on observational equivalence, there will remain this conceptual dis- tinction. 5.5.2 The role of chaotic behaviour Let us now turn to the second issue, namely the role of chaos in results on observational equivalence. Hoefer (2008) is not aware, and Suppes & de Bar- ros (1996), Suppes (1999) and Winnie (1998) do not seem to be aware, that also for non-chaotic systems there is a choice between a deterministic and a stochastic description (at every observation level). To show this, it will suf- fice to show that Theorem 1 also applies to deterministic systems which are uncontroversially neither chaotic nor chaotic restricted to a region of phase space. Consider the measure-preserving deterministic system (M,ΣM , µ, T ) where M = [0, 1) represents the unit circle, i.e., each m ∈ M represents the point e2pimi, ΣM is the Lebesgue σ-algebra on M , µ is the Lebesgue measure, and T is the rotation T (m) = m + α (mod 1), where α ∈ R is irrational. (M,ΣM , µ, T ) is called an irrational rotation on the circle. It is uncontro- versial that this measure-preserving deterministic system is neither chaotic nor chaotic on a region of phase space because all solutions are stable, i.e., nearby solutions stay close for all times. However, one easily sees that it satisfies the assumption of Theorem 1.17 Consequently, for any nontrivial 17Any irrational rotation on the circle is ergodic (cf. Definition 2.5) (Petersen 1983, p. 49). Hence there can be no n ∈ N and C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, Tn(C) = C since this would imply that there is an irrational rotation on the circle which is not ergodic. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 135 finite-valued observation function the measure-preserving deterministic sys- tem (M,ΣM , µ, T ) yields a nontrivial stochastic process. Furthermore any irrational rotation on a circle has zero entropy (Petersen 1983, p. 245). Thus, according to any of our three notions of simulation at every observation level, any irrational rotation (M,ΣM , µ, T ) is simulated at every observation level by nontrivial stochastic processes (see Proposition 3 and the paragraph following this proposition).18 5.5.3 Is the deterministic or the indeterministic de- scription better? Let me now turn to the third issue, namely if there is a choice between a deterministic and a stochastic description, which one is better or preferable? In a way, if you aim to describe the world at a specific level, it is uncon- troversial that if the phenomenon under consideration is really stochastic at this level, the stochastic description is preferable; and if the phenomenon is really deterministic at this level, the deterministic description is preferable. But really of concern here is the question of which description is prefer- able when you cannot know for sure whether the phenomenon is deterministic or stochastic. So which description is then preferable in the sense of being preferable relative to our current knowledge and evidence? This question has not been the topic of this chapter. Rather, the topic of this chapter has been whether measure-theoretic deterministic systems and stochastic pro- cesses are observationally equivalent, and whether even kinds of stochastic processes and kinds of deterministic systems which intuitively seem to give very different predictions can be observationally equivalent. Still, this ques- tion arises from our discussion, and so I will address it. Because of lack of space, it will not be possible to treat the question in all its details. But I will criticise the previous literature about this question, namely Hoefer (2008), Suppes (1993) and Winnie (1998), and I will conclude that a more careful treatment is needed. 18This example can be generalised: any rationally independent rotation on a torus is uncontroversially non-chaotic but fulfills the assumption of Theorem 1 (cf. Petersen 1983, p. 51). CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 136 Before I turn to the previous literature on this question, note the follow- ing. Consider a discrete measure-theoretic deterministic system (M,ΣM , µ, T ) or a continuous measure-theoretic deterministic system (M,ΣM , µ, Tt), and consider an observation function Φ : M → MO which is many to one. Then the deterministic description ((M,ΣM , µ, T ) or (M,ΣM , µ, Tt) observed with Φ) is more informative than the stochastic description ({Zt = Φ(T t); t ∈ Z} or {Zt = Φ(Tt); t ∈ R}) in the following sense: while (M,ΣM , µ, T ) or (M,ΣM , µ, Tt) tells us where each state m ∈ M evolves, {Zt; t ∈ Z} or {Zt; t ∈ R} only gives us the probability distributions over all possible se- quences of outcomes inMO. Yet this extra information might not be desirable or relevant as, for instance, for the deterministic representation. Thus, sup- pose you have a choice between a stochastic process and its deterministic representation. Even though the deterministic representation is more infor- mative in this sense, you might argue that the stochastic process is preferable because, from a philosophical perspective, the deterministic representation is a cheat (cf. subsection 5.2.2). Thus the fact that, in this sense, the determi- nistic description is more informative than the stochastic one does not imply that the deterministic description is the better description. Let me now consider arguments in the literature which purport to show that by observing a phenomenon at different observation levels, you can find out that the measure-preserving deterministic system is the correct descrip- tion. Consider the following claim by Hoefer (2008): It may well be true that there are some deterministic dynamical systems that, when viewed properly, display behavior indistin- guishable from that of a genuinely stochastic process. For exam- ple, using the billiard table above [a billiard system with convex obstacles], if one divides its surface into quadrants and looks at which quadrant the ball is in at 30-second intervals, the resulting sequence is no doubt highly random. But this does not mean that the same system, when viewed in a different way (perhaps at a higher degree of precision) does not cease to look random and instead betrays its deterministic nature [original emphasis].19 19Hoefer (2008) uses the word ‘random’ synonymously to ‘stochastic’. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 137 Our previous discussion shows that this claim is misguided for two reasons. First, for any discretised version of any billiard system with convex obstacles every finite-valued observation function yields a nontrivial stochastic process (cf. Theorem 1). Hence there will never be trivial transition probabilities, contrary to what Hoefer suggests. Second, assume that the stochastic pro- cess {Φ(T t); t ∈ Z}, where (M,ΣM , µ, T ) is a discrete measure-theoretic deterministic system and Φ is an observation function, is in accordance with the observations and is trivial (the transition probabilities are zero or one). Or assume that the discrete stochastic process {Φ(T tt0); t ∈ Z}, where (M,ΣM , µ, Tt) is a continuous measure-theoretic deterministic system, Φ is an observation function and t0 ∈ R+, is in accordance with the observations and is trivial. This does not imply, as the quote suggests, that the obser- vations derive from a deterministic system. As argued, trivial stochastic processes can also derive from observing a nontrivial stochastic process (cf. the end of subsection 5.2.1). Another argument in this direction has been put forward by Winnie (1998).20 For the baker’s system (M,ΣM , µ, T ) (Example 1) we consider the relation between two observations on the system. Consider the observation function Φ(m) = o1χα1(m) + o2χα2(m) where α1 = [0, 1]× [0, 1/2] \D,α2 = [0, 1]× (1/2, 1] \D and consider the observation function Ψ(m) =∑4i=1 qiχβi where β1 = [0, 1/2]× [0, 1/2] \D, β2 = (1/2, 1]× [0, 1/2] \D, β3 = [0, 1/2]× (1/2, 1] \ D, β4 = (1/2, 1] × (1/2, 1] \ D. It is clear that if you observe q1 (with Ψ), the probability that you will next observe o1 (with Φ) is 1; if you observe q2, the probability that you will next observe o2 is 1; if you observe q3, the probability that you will next observe o1 is 1; and if you observe q4, the probability that you will next observe o2 is 1. Thus there are trivial transition probabilities from the observation modeled by Ψ to the coarser observation modeled by Φ. Winnie (1998, pp. 314–315) comments on this: 20Winnie (1998) does not clearly distinguish between random and stochastic behaviour as a form of indeterministic behaviour. As a consequence, the discussion sometimes suffers from ambiguities. It is uncontroversial that stochastic processes are processes governed by probabilistic laws. Random behaviour is usually regarded as different from stochastic behaviour, but there are various different accounts about what randomness amounts to (see, for instance, the recent survey Eagle 2005). CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 138 Thus, the fact that a chaotic deterministic system [...] has some partitioning that yields a set of random or stochastic observations in no way undermines the distinction between deterministic and stochastic behaviour for such systems. [...] As successive parti- tionings are exemplified [...] the determinism underlying the pre- ceding, coarser observations emerges. To be sure, at any state of the above process, the system may be modeled stochastically, but the successive stages of that modeling process provide ample— inductive—reason for believing that the deterministic model is correct [original emphasis]. In order to understand this quote, note the following. From the fact that, in the discrete case, there are trivial transition probabilities from an observation (modeled by Ψ) to a coarser observation (modeled by Φ), or that, in the continuous case, there are trivial transition probabilities from an observation (modeled by Ψ) to a coarser observation (modeled by Φ) when the observations are made at the time points nt0, n ∈ Z, t0 ∈ R+, it does not follow that the observed phenomenon is deterministic and Winnie also does not claim this. It may well be that {Ψ(T t); t ∈ Z} or {Ψ(Tt); t ∈ R}, or any stochastic process at a smaller scale, really governs the phenomenon under consideration. The argument Winnie seems to make in the quote is the following. As- sume that you can make observations at finer levels (that is, observations where there is at least one value of the coarser observation function such that two or more values of the finer observation function corresponds to one observed value of the coarser observation function). Further, assume that you find that for observations at finer levels you need stochastic processes at a smaller scale to explain the observational data (that is, stochastic pro- cesses where there is at least one outcome of the stochastic process at a larger scale such that two or more outcomes of the stochastic process at a smaller scale corresponds to one outcome of the stochastic process at a larger scale). Then this provides inductive evidence that the phenomenon under consider- ation is deterministic, and hence that the deterministic description is better. Let me call this argument the ‘nesting argument’. I think that, unlike sug- gested by Winnie’s quote, the nesting argument is independent of whether CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 139 there are trivial transition probabilities from an observation to a coarser ob- servation. For instance, consider again the baker’s system (M,ΣM , µ, T ) (Example 1). Let the observation function Ψ(m) = ∑4 i=1 qiχβi(m) be as above, and consider the observation function Φ(m) = o1χγ1(m) + o2χγ2(m) where γ1 = [0, 1/2] × [0, 1] \ D, γ2 = (1/2, 1] × [0, 1] \ D. Clearly, for all i, 1 ≤ i ≤ 4, and all j, 1 ≤ i ≤ 2, the probability that qi will be followed by oj is 1/2. Still Φ is coarser that Ψ, and all that matters for the nesting argument is that for observations at finer levels you need stochastic processes at a smaller scale to explain the data. Before I continue the discussion on the nesting argument, let me mention another view in the literature about which description is preferable. Namely, Suppes (1993, p. 254), without providing any arguments, simply claims that if there is a choice between a deterministic description used in science and a stochastic description, both descriptions are equally good. And Winnie presents the nesting argument also as a criticism of this claim by Suppes. I want to argue that neither Suppes (1993) nor Winnie’s (1998) view is tenable. Note that both Suppes and Winnie’s claims are very general and are not based on any arguments about the state of the art of which scientific theories best describe the observed phenomena or which interpretation of a scientific theory is correct. Thus to refute these claims, it will suffice to show that there could be situations in science (regardless of whether this is the current situation in science) where (contra Suppes) not both descriptions are equally good and where the premises of the nesting argument are true but where (contra Winnie) not the deterministic description is preferable. As already pointed out above, in a way, if the aim is to describe the world at a specific level and if the phenomenon under consideration is really stochastic at this level, the stochastic description is preferable, even if the stochasticity is at a very small scale, and thus you find that for observations at a finer level you need stochastic processes at a smaller scale to explain the data. Likewise, if the phenomenon is really deterministic at this level, the deterministic description is preferable. But really of concern is the following question: which description is preferable in the sense of being preferable relative to our current knowledge and evidence? Before I can explain why I think that also here neither Winnie’s nor Sup- CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 140 pes’ view is tenable, let me point out that an answer to this question depends on many factors, such as the kind of phenomenon under consideration, the state of the art of scientific theories, the metaphysical predilections and, as part of this, the views about how models relate to reality. For instance, first, a stochastic description can be preferable if the fol- lowing holds: there is no theory from which the deterministic description is derivable; the stochastic description is derivable from a well-confirmed theory T ; there is evidence which is not derivable from the specific deterministic or stochastic description but which confirms the stochastic theory T and hence provides evidence for the stochastic description. Or second, suppose that a discrete measure-theoretic deterministic system (M,ΣM , µ, T ) or a continu- ous measure-theoretic deterministic system (M,ΣM , µ, Tt) can be be derived from Newton’s equations of motion. And suppose that there is confusion about the more fundamental theory but there is the general consensus that it might well be that in reality there is a stochastic process at a small scale of the form {Φ(T t); t ∈ Z} or {Φ(Tt); t ∈ R}. Because it is unknown which exact stochastic process might be an alternative description, the scientist might reasonably decide to work with the deterministic description. The first example, i.e. that a well-confirmed theory suggest that stochastic process is correct and hence that the stochastic description is preferable, pro- vides a counterexample to both Suppes’s (1993) and Winnie’s (1998) claims. Here the stochastic process which is believed to be the real one might be at a very small scale, and thus you find that for observations at a finer level you need stochastic processes at a smaller scale to explain the data. That is, the premises of the nesting argument are true (but the conclusion is not). At one point in the text Winnie (1998, p. 318) says that if there were some in principle limitations on observational accuracy, then the deterministic de- scription might not be the better one. But he quickly dismisses this thought, arguing that the deterministic descriptions in dynamical systems theory are deterministic descriptions in Newtonian mechanics and there are no in princi- ple limitations on observational accuracy in Newtonian mechanics. But this misses the point: even if there are no such limitations in Newtonian mechan- ics, there might be, or there might be evidence for, in principle limitations on observational accuracy in the actual world; for instance, because in the CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 141 actual world the phenomenon is governed, or believed to be governed, by a stochastic process at a very small scale.21 To conclude, the question of whether the deterministic or the stochastic description is preferable depends on many factors. Neither Hoefer’s (2008), Suppes’ (1993) nor Winnie’s (1998) view is tenable, and a more careful treat- ment of this question is needed. 5.6 Conclusion The central question of this chapter has been: are deterministic and inde- terministic descriptions observationally equivalent in the sense that determi- nistic descriptions, when observed, and indeterministic descriptions give the same predictions? After some introductory remarks, in section 5.2 I demonstrated that ev- ery stochastic process is observationally equivalent to a measure-theoretic de- terministic system, and that many measure-theoretic deterministic systems are observationally equivalent to stochastic processes; and I formally defined what it means for a measure-preserving deterministic system, observed with an observation function, and a stochastic process to be observationally equiv- alent. Still, one might guess that the measure-theoretic deterministic systems which are observationally equivalent to stochastic processes used in science do not include any measure-theoretic deterministic systems used in science. In section 5.3 I showed this to be false because some discrete measure-theoretic deterministic systems used in science even produce Bernoulli processes and some continuous measure-theoretic deterministic systems even produce semi- Markov processes. Despite this, one might guess that measure-theoretic de- terministic systems used in science cannot give the same predictions at every observation level as stochastic processes used in science. I have introduced three plausible technical notions of simulation at every observation level. In section 5.4 I showed that there is indeed a limitation on observational equivalence, namely discrete measure-preserving deterministic systems used 21Furthermore, dynamical systems theory is applied not only in Newtonian mechanics but in many other scientific fields. Hence Winnie would have to extend his argument to all the other applications of dynamical systems theory. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 142 in science cannot give the same predictions at every observation level as Bernoulli processes. However, the guess is still wrong because I have shown the following: several discrete measure-theoretic deterministic systems used in science give the same predictions at every observation level as Markov processes or multi-step Markov processes; and several continuous measure- theoretic deterministic systems used in science, including Newtonian systems, give the same predictions at every observation level as semi-Markov processes or multi-step semi-Markov processes. The general insight of all these results is that even kinds of deterministic systems and kinds of stochastic processes which, intuitively, seem to give very different predictions, are observationally equivalent. Finally, in section 5.5 I criticised the previous philosophical liter- ature. Suppes & de Barros (1996), Suppes (1999) and Winnie (1998) argue that the philosophical significance of the result which says that some con- tinuous measure-preserving deterministic systems can be simulated at every observation level by semi-Markov processes is that for chaotic motion one can choose at every observation level between a stochastic or a deterministic description. However, this is already shown by the basic results in section 5.2. The philosophical significance of these results is really something stronger, namely that there are measure-preserving deterministic systems used in sci- ence that give the same predictions at every observation level as stochastic processes used in science. Moreover, these authors seem not to be aware that there are also uncontroversially non-chaotic deterministic systems which can be simulated at every observation level by nontrivial stochastic processes. Furthermore, I argued that the viewpoints in the literature on the question of whether the deterministic or the stochastic description is preferable, namely Hoefer (2008), Suppes (1993), Winnie (1998), are untenable. I concluded that this question needs more careful consideration. 5.7 Appendix: Proofs 5.7.1 Proof of Theorem 1 Theorem 1 If, and only if, for the discrete measure-preserving determinis- tic system (M,ΣM , µ, T ) there does not exist an n ∈ N and a C ∈ ΣM , 0 < CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 143 µ(C) < 1, such that, except for a set of measure zero, T n(C) = C, then the following holds: for every nontrivial finite-valued observation function Φ : M → MO, MO = ∪rl=1ol, r ∈ N, every k ∈ N and the stochastic pro- cess {Zt = Φ(T t); t ∈ Z} there is an oi ∈ MO such that for all oj ∈ MO, P{Zt+k=oj |Zt=oi} < 1. Proof : Notice that it suffices to prove the following: (∗) If, and only if, for (M,ΣM , µ, T ) it is not the case that there exists an n ∈ N and a C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero (esmz.), T n(C) = C, then the following holds: for any nontrivial partition α = {α1, . . . , αr}, r ∈ N, and any k ∈ N there is an i ∈ {1, . . . , r} such that for all j, 1≤ j≤r, µ(T k(αi)\αj)>0. Recall that finite-valued observation functions are of the form ∑r l=1 olχαl(m), where α = {α1, . . . , αr} is a partition andMO = ∪rl=1ol (cf. subsection 5.2.1). Hence the conclusion of (∗) says that for any nontrivial finite-valued obser- vation function Φ : M → MO and any k ∈ N there is an outcome oi ∈ MO such that for all possible outcomes oj ∈MO it holds that P{Zt+k = oj |Zt = oi} < 1, t ∈ Z arbitrary. ⇐: Assume that there is an n ∈ N and a C ∈ ΣM , 0 < µ(C) < 1, such that, esmz., T n(C) = C. Then for the partition α = {C,M \ C} we have µ(T n(C) \ C) = 0 and µ(T n(M \ C) \ (M \ C)) = 0. ⇒: So assume that the conclusion of (∗) does not hold, i.e., there exists a nontrivial partition α and a k ∈ N such that for each αi there exists an αj with, esmz., T k(αi) ⊆ αj. Now recall the definition of a deterministic system being ergodic (Definition 35). It can be shown (cf. Petersen 1983, section 2.4) that a a discrete measure-preserving deterministic system (M,ΣM , µ, T ) is ergodic if, and only if, for all A,B ∈ ΣM lim n→∞ 1 n n−1∑ i=0 (µ(T n(A) ∩B)− µ(A)µ(B)) = 0. (5.3) CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 144 As already pointed out, the assumption that there exists an n ∈ N and a C ∈ ΣM , 0 < µ(C) < 1, such that, esmz., T n(C) = C implies that (M,ΣM , µ, T k) is ergodic for all k ∈ N. Case 1 : For all i there is a j such that, esmz., T k(αi) = αj. Then ergodicity of (M,ΣM , µ, T k) (equation (5.3)) implies that there is an h ∈ N such that, esmz., T kh(α1) = α1. But this contradicts the assumption that it is not the case that there exists an n ∈ N and a C ∈ ΣM , 0 < µ(C) < 1, such that, esmz., T n(C) = C. Case 2 : For some i there is a j with, esmz., T k(αi) ⊂ αj and µ(αi) < µ(αj). Ergodicity of (M,ΣM , µ, T k) (equation (5.3)) implies that there exists a h ∈ N such that, esmz., T hk(αj) ⊆ αi. Hence it holds that µ(αj) ≤ µ(αi), yielding a contradiction, viz. µ(αi) < µ(αj) ≤ µ(αi). 5.7.2 Proof of Theorem 2 Theorem 2 If, and only if, for the continuous measure-preserving determi- nistic system (M,ΣM , µ, Tt) there does not exist a n ∈ R+ and a C ∈ ΣM , 0 < µ(C) < 1, such that, except for a set of measure zero, Tn(C) = C, then the following holds: for every nontrivial finite-valued observation function Φ :M →MO, MO = ∪rl=1ol, r ∈ N, every k ∈ R+ and the stochastic process {Zt =Φ(Tt); t ∈ R} there is an outcome oi ∈ MO such that for all possible outcomes oj ∈MO, P{Zt+k=oj |Zt=oi} < 1. Proof : This proof uses the same ideas as the proof for the analogous discrete- time result (Theorem 1). It suffices to prove the following: (∗∗) If, and only if, for (M,ΣM , µ, Tt) there does not exist an n ∈ R+ and a C ∈ ΣM , 0 < µ(C) < 1, such that, esmz., Tn(C) = C, then the following holds: for any nontrivial partition α = {α1, . . . , αr}, r ∈ N, and all k ∈ R+ there is an i ∈ {1, . . . , r} such that for all j, 1≤j≤r, µ(Tk(αi)\αj)>0. Recall that finite-valued observation functions are of the form ∑r l=1 olχαl(m), where α = {α1, . . . , αr} is a partition andMO = ∪rl=1ol (cf. subsection 5.2.1). Consequently, the conclusion of (∗∗) expresses that for any nontrivial finite- valued observation function Φ :M →MO and all k ∈ R+ there is an outcome CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 145 oi ∈MO such that for all outcomes oj ∈MO, P{Zt+k = oj |Zt = oi} < 1. ⇐: Assume that there is an n ∈ R+ and a C ∈ ΣM , 0 < µ(C) < 1, such that, esmz., Tn(C) = C. Then for the partition α = {C,M \C} it holds that µ(Tn(C) \ C) = 0 and µ(Tn(M \ C) \ (M \ C)) = 0. ⇒: So assume that the conclusion of (∗∗) does not hold, and hence that there is a nontrivial partition α and a k ∈ R+ such that for each αi there is an αj with, esmz., Tk(αi) ⊆ αj. From the assumptions it follows that for every k ∈ R+ the discrete measure-preserving deterministic system (M,ΣM , µ, Tk) is ergodic (cf. Definition 35). Case 1 : For all i there is a j such that, esmz., Tk(αi) = αj. Because the discrete measure-preserving deterministic system (M,ΣM , µ, Tk) is ergodic (equation (5.3)), it follows that there is an h ∈ N such that, esmz., Tkh(α1) = α1. But this is in contradiction with the assumption that it is not the case that there exists an n ∈ R+ and a C ∈ ΣM , 0 < µ(C) < 1, such that, esmz., Tn(C) = C. Case 2 : For some i there is a j with, esmz., Tk(αi) ⊂ αj and with µ(αi) < µ(αj). Because the discrete time deterministic system (M,ΣM , µ, Tk) is ergodic (equation (5.3)), it holds that there is a h ∈ N such that, esmz., Thk(αj) ⊆ αi. Hence it follows that µ(αj) ≤ µ(αi). But this yields the contradiction µ(αi) < µ(αj) ≤ µ(αi). 5.7.3 Proof of Theorem 3 Theorem 3 Let (M,ΣM , µ, T ) be a discrete measure-preserving determinis- tic system where ΣM contains all open balls of the metric space (M,dM), T is continuous at a point x ∈ M , every open ball around x has posi- tive measure, and there is a set D ∈ ΣM , µ(D) > 0, with d(T (x), D) = inf{d(T (x),m) |m ∈ D} > 0. Then there is some ε > 0 for which there is no Bernoulli process to which (M,ΣM , µ, T ) is ε-congruent. Proof : For m ∈ M , E ⊆ M and ε > 0 let the ball of radius ε around m be B(m, ε) = {y ∈ M | d(y,m) < ε} and let B(E, ε) = ∪m∈EB(m, ε). Since CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 146 d(T (x), D) > 0, one can choose γ > 0 and β > 0 such that B(T (x), 2γ) ∩ B(D, 2β) = ∅. Because T is continuous at x, one can choose δ > 0 such that T (B(x, 4δ)) ⊆ B(T (x), γ). Recall that µ(B(x, 2δ)) = ρ1 > 0 and that µ(D) = ρ2 > 0. Let ε > 0 be such that ε < ρ1ρ2 8 , ε < δ, ε < β and ε < γ. I am going to show that there is no Bernoulli process such that (M,ΣM , µ, T ) is ε-congruent to this Bernoulli process. Assume that (M,ΣM , µ, T ) is ε-congruent to a Bernoulli process, and let (Ω,ΣΩ, ν, S,Φ0) be the deterministic representation of this Bernoulli pro- cess. This implies that (M,ΣM , µ, T ) is isomorphic (via φ : Mˆ → Ωˆ) to the Bernoulli shift (Ω,ΣΩ, ν, S) and hence that (M,ΣM , µ, T ) is a dis- crete Bernoulli system. Let αΦ0 = {α1Φ0 . . . αsΦ0}, s ∈ N, be the parti- tion of (Ω,ΣΩ, ν) corresponding to the observation function Φ0 (cf. sub- section 5.2.1). Let Mˇ = M \ Mˆ and Ωˇ = Ω \ Ωˆ. Clearly, φ−1(αΦ0) = {φ−1(α1Φ0\Ωˇ)∪Mˇ, φ−1(α2Φ0\Ωˇ), . . . , φ−1(αsΦ0\Ωˇ)} is a partition of (M,ΣM , µ). Consider all the sets in φ−1(αΦ0) which are assigned values in B(x, 3δ), i.e., all the sets a ∈ φ−1(αΦ0) with Φ0(φ(m)) ∈ B(x, 3δ) for almost all m ∈ a. Denote these sets by A1, . . . , An, n ∈ N, and let A = ∪ni=1Ai. Be- cause (M,ΣM , µ, T ) is ε-congruent to (Ω,ΣΩ, ν, S,Φ0), it follows that µ(A \ B(x, 4δ)) < ε and µ(A ∩B(x, 2δ)) ≥ ρ1/2. Now consider all the sets in φ−1(αΦ0) which are assigned values inB(D, β), i.e., all the sets c ∈ φ−1(αΦ0) where Φ0(φ(m)) ∈ B(D, β) for almost all m ∈ c. Denote these sets by C1, . . . , Ck, k ∈ N, and let C = ∪ki=1Ci. Because (M,ΣM , µ, T ) is ε-congruent to (Ω,ΣΩ, ν, S,Φ0), I have µ(C∩D) ≥ ρ2/2 and µ(C ∩B(T (x), γ)) < ε. Because (Ω,ΣΩ, ν, S) is a Bernoulli shift isomorphic to (M,ΣM , µ, T ), it must hold that µ(T (Ai)∩Cj) = µ(Ai)µ(Cj) for all i, j, 1 ≤ i ≤ n, 1 ≤ j ≤ k. Hence also µ(T (A) ∩ C) = µ(A)µ(C). But it follows that µ(A)µ(C) ≥ ρ1ρ2 4 and that µ(T (A) ∩C) < ε+ ε, and this yields the contradiction ρ1ρ2 4 < 2ε < ρ1ρ2 4 since it was assumed that ε < ρ1ρ2 8 . 5.7.4 Proof of Theorem 4 Theorem 4 Let (M,ΣM , µ, T ) be a discrete measure-preserving determinis- tic system. Then there is a finite-valued observation function Φ and an ε > 0 CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 147 such that no Bernoulli process strongly (Φ, ε)-simulates (M,ΣM , µ, T ). Proof: Assume you observe the deterministic system (M,ΣM , µ, T ) with a surjective finite-valued observation function Φ : M → {o1, o2}. Then either for every ε > 0 there is a Bernoulli process which strongly (Φ, ε)-simulates (M,ΣM , µ, T ) or not. In the latter case we are done. In the former case there is a Θ(m) = o1χα1(m) + o2χα2(m), {α1, α2} a partition of (M,ΣM , µ), such that {Xt = Θ(T t); t ∈ Z} is a Bernoulli process with probabilities p1 = µ(α1), p2 = µ(α2). Now consider the partition β = {β1, . . . , βl} = α ∨ Tα ∨ T−1α and an observation function Φ(m) = ∑li=1 qiχβi(m) where qi 6= qj for i 6= j, 1 ≤ i, j ≤ l. I now show that the stochastic process {Zt = Φ(T t); t ∈ Z} is no Bernoulli process. First note that for all t it holds that P{Xt+1 = o1, Xt = o1, Xt−1 = o1} = P{Zt = qi} for some qi, 1 ≤ i ≤ l. (5.4) It follows that P{Zt = qi} = P{Xt+1 = o1, Xt = o1, Xt−1 = o1} = p31 < (5.5) p1= p41 p31 = P{Xt+1=o1, Xt=o1, Xt−1=o1, Xt−2=o1} P{Xt=o1, Xt−1=o1, Xt−2=o1} =P{Zt=qi|Zt−1=qi}, and hence that {Zt; t ∈ Z} is no Bernoulli process. And we cannot change Φ on a set of arbitrary small measure such that the resulting stochastic process is a Bernoulli process. For let ε > 0, and consider an arbitrary surjective measurable function Ψ : M → {q1, . . . , ql} with µ({m ∈ M |Ψ(m) 6= Φ(m)}) < ε. For the stochastic process {Yt = Ψ(T t); t ∈ Z} it holds that P{Yt = qi|Yt−1 = qi} > p 4 1 − 2ε p31 + 2ε and that P{Yt = qi} < p31 + ε. (5.6) Because p1 > p 3 1, it follows that for sufficiently small ε > 0: p41 − 2ε p31 + 2ε > p31 + ε. (5.7) Hence I can conclude that P{Yt = qi} < P{Yt = qi|Yt−1 = qi} and that {Yt; t ∈ Z} cannot be a Bernoulli process. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 148 5.7.5 Proof of Proposition 1 Proposition 1 Let (M,ΣM , µ, T ) be a discrete measure-preserving determi- nistic system. Then there is a finite-valued observation function Φ and an ε > 0 such that no Bernoulli process weakly (Φ, ε)-simulates (M,ΣM , µ, T ). Proof : Assume that {Zt; t ∈ Z} is a Bernoulli process with outcome space S. Let Γ : S → M¯ , where M¯ = {q1, . . . , qN}, N ∈ N, be a surjective observation function. I will now show that {Yt = Γ(Zt); t ∈ Z} is a Bernoulli pro- cess too. Clearly, this result and Theorem 4 immediately imply that for the deterministic system (M,ΣM , µ, T ) there is a finite-valued observation func- tion Φ and an ε > 0 such that no Bernoulli process weakly (Φ, ε)-simulates (M,ΣM , µ, T ). All I have to show is that {Yt; t ∈ Z} are probabilistically independent. Label the elements S = {s1,1, s1,2, . . . , s1,l1 , . . . , sN,1, . . . , sN,lN}, li ∈ N, 1 ≤ i ≤ N , such that Γ(s1,1)=q1,Γ(s1,2)=q1, . . . ,Γ(s1,l1)=q1, . . . ,Γ(sN,1)=qN , . . . ,Γ(sN,lN )=qN . (5.8) Now for all m ∈ N, all t1, . . . , tm ∈ Z and all qj1 , . . . , qjm ∈ M¯ P{Yt1 = qj1 , . . . , Ytm = qjm} = ∑ all possible k1,...,km P{Zt1 = sj1,k1 , . . . , Ztm = sjm,km} (5.9) = ∑ all possible k1,...,km P{Zt1 = sj1,k1} · · ·P{Ztm = sjm,km} = P{Yt1 = qj1} ∑ all possible k2,...,km P{Zt2=sj2,k2}· · ·P{Ztm=sjm,km}=. . .=P{Yt1=qj1}· · ·P{Ytm=qjm}, and from this follows that {Yt; t ∈ Z} are probabilistically independent. 5.7.6 Proof of Theorem 5 Theorem 5 Let (M,ΣM , µ, T ) be a discrete Bernoulli system where the metric space (M,dM) is separable and where ΣM contains all open balls of (M,dM). Then for any ε > 0 there is an irreducible and aperiodic Markov process such that (M,ΣM , µ, T ) is ε-congruent to this Markov process. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 149 Proof : I need the following definition. Definition 42 A partition α of (M,ΣM , µ) is generating for (M,ΣM , µ, T ) if, and only if, for every A ∈ ΣM there is an n ∈ N and a set C of unions of elements in ∨nj=−nT j(α) such that µ((A\C)∪(C\A)) < ε (cf. Petersen 1983, p. 244). By assumption, the deterministic system (M,ΣM , µ, T ) is isomorphic via a function φ : Mˆ → Ωˆ to the deterministic representation (Ω,ΣΩ, ν, S,Φ0) of a Bernoulli process with outcome space M¯ . Let αΦ0 = {α1Φ0 , . . . , αkΦ0}, k ∈ N, be the partition of (Ω,ΣΩ, ν) corresponding to the observation function Φ0 (cf. subsection 5.2.1). Let Mˇ = M \ Mˆ and Ωˇ = Ω \ Ωˆ. φ−1(αΦ0) = {φ−1(α1Φ0\Ωˇ)∪Mˇ, φ−1(α2Φ0\Ωˇ), . . . , φ−1(αkΦ0\Ωˇ)} is a partition of (M,ΣM , µ). Since (M,dM) is separable, there exists an r ∈ N and mi ∈ M , 1 ≤ i ≤ r, such that µ(M \ ∪ri=1B(mi, ε2)) < ε2 . Because for a discrete Bernoulli system φ−1(αΦ0) is generating for (M,ΣM , µ, T ) (Petersen 1983, p. 275), for each B(mi, ε 2 ) there is an ni ∈ N and a Ci of union of elements in ∨nij=−niT j(φ−1(αΦ0)) such that µ(Di) < ε2r , where Di = (B(mi, ε2) \ Ci) ∪ (Ci \ B(mi, ε2)). Define n = max{ni}. For Q = {q1, . . . , ql} = ∨nj=−nSj(αΦ0) let ΦQ0 : Ω → M,ΦQ0 (ω) = ∑l i=1 oiχqi(ω), where oi ∈ φ−1(qi \ Ωˇ). Note that oi 6= oj for i 6= j, 1 ≤ i, j ≤ l. Then dM(m,Φ Q 0 (φ(m))) < ε except for a set in M of measure < ε. (5.10) {ΦQ0 (St); t ∈ Z} is a stochastic process from (Ω,ΣΩ, ν) to (M,ΣM), and let (X,ΣX , λ, R,Θ0) be its deterministic representation. This process is a Markov process since for any k ∈ N and any A,B1, . . . , Bk ∈ M¯2n+1, ν({ω ∈ Ω | (ω−n . . . ωn) = A and (ω−n+1 . . . ωn+1) = B1}) ν({ω ∈ Ω | (ω−n+1 . . . ωn+1) = B1}) = (5.11) ν({ω∈Ω|(ω−n. . .ωn)=A and(ω−n+1. . .ωn+1)=B1,. . ., (ω−n+k. . .ωn+k)=Bk}) ν({ω∈Ω|(ω−n+1. . .ωn+1)=B1,. . ., (ω−n+k. . .ωn+k)=Bk}) , if ν({ω∈Ω|(ω−n. . .ωn)=A and(ω−n+1. . .ωn+1)=B1,. . ., (ω−n+k. . .ωn+k)=Bk})>0. Because S is a shift, one sees that for all i, j, 1 ≤ i, j ≤ l, there is a k ≥ 1 such that P k(oi, oj) > 0, and hence that the Markov process is CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 150 irreducible. One also sees that there exists an outcome oi, 1 ≤ i ≤ l, such that P 1(oi, oi) > 0. Hence doi = 1; and since all outcomes of an irreducible Markov process have the same periodicity (Cinlar 1975, p. 131), it follows that the Markov process is also aperiodic. Consider ψ : Ω → X, ψ(ω) = . . .ΦQ0 (S−1(ω)),ΦQ0 (ω),ΦQ0 (S(ω)) . . ., for ω ∈ Ω. Clearly, there is a Xˆ ⊆ X with λ(Xˆ) = 1 such that ψ : Ω→ Xˆ is bi- jective and measure-preserving and R(ψ(ω)) = ψ(S(ω)) for all ω ∈ Ω. Hence (Ω,ΣΩ, ν, S) is isomorphic to (X,ΣX , λ, R) via ψ, and thus (M,ΣM , µ, T ) is isomorphic to (X,ΣX , λ, R) via θ = ψ(φ). Now because of (5.10): dM(m,Θ0(θ(m))) < ε except for a set in M of measure < ε. (5.12) 5.7.7 Proof of Proposition 2 Proposition 2 Let (M,ΣM , µ, T ) be a discrete Bernoulli system. Then for every finite-valued observation function Φ and every ε > 0 there is an irreducible and aperiodic Markov process which weakly (Φ, ε)-simulates (M,ΣM , µ, T ). CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 151 Proof : Let (M,ΣM , µ, T ) be a discrete Bernoulli system. Let Φ : M → {q1, . . . , qN}, N ∈ N, be an arbitrary surjective finite-valued observation function and let ε > 0 be arbitrary. Theorem 6 implies that there is an n and a surjective measurable function Θ : M → Q, Θ(m) = ∑Ni=1 qiχαi(m), for a partition α, such that {Zt = Θ(T t); t ∈ Z} is a Markov process of order n which strongly (Φ, ε)-simulates (M,ΣM , µ, T ). Define β = {β1 . . . , βl} = α ∨ Tα ∨ . . . ∨ T n−1α, and let Ψ : M → {o1, . . . ol},Ψ(m) = ∑l j=1 ojχβj(m) with oi 6= oj for i 6= j, 1 ≤ i, j ≤ l. Let the surjective observation function Γ : {o1, . . . , ol} → Q be defined as follows: for any arbitrary r, 1 ≤ r ≤ N , any oi and any oj, 1 ≤ i, j ≤ l, such that βi ⊆ αr and βj ⊆ αr are assigned the same value, namely Γ(oi) = Γ(oj) = qr, where qr is the value Θ takes for all states in αr. By construction, Zt = Γ(Ψ(T t)) and, since {Zt; t ∈ Z} strongly (Φ, ε)-simulates (M,ΣM , µ, T ), µ({m ∈M | Γ(Ψ(m) 6= Φ(m)}) < ε. Consequently, {Yt = Ψ(T t); t ∈ Z} weakly (Φ, ε)-simulates (M,ΣM , µ, T ). So it remains only to show that {Yt; t ∈ Z} is an irreducible and aperiodic Markov process. By construction, for all t and all i, 1 ≤ i ≤ l, there are qi,0, . . . , qi,n−1 ∈ Q such that P{Yt = oi} = P{Zt = qi,0, Zt+1 = qi,1, . . . , Zt+n−1 = qi,n−1}. (5.13) Therefore, for all k ∈ N and all i, j1, . . . , jk, 1 ≤ i, j1, . . . , jk ≤ l: P{Yt+1 = oi |Yt = oj1 , . . . , Yt−k+1 = ojk} = (5.14) P{Zt+1=qi,0,..., Zt+n=qi,n−1|Zt=qj1,0,..., Zt+n−1=qi,n−2,Zt−1=qj2,0,..., Zt−k+1=qjk,0} = P{Zt+1 = qi,0, . . . , Zt+n = qi,n−1 |Zt = qj1,0, . . . , Zt+n−1 = qi,n−2} = P{Yt+1 = oi |Yt = oj1}, if P{Yt+1 = oi, Yt = oj1 , . . . , Yt−k+1 = ojk} > 0. Hence {Yt; t ∈ Z} is a Markov process. Every discrete Bernoulli system is strongly mixing (cf. Definition 27) (Petersen 1983, p. 58). Consequently, (M,ΣM , µ, T ) is strongly mixing, and this immediately implies that the Markov process {Yt; t ∈ Z} is irreducible and aperiodic. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 152 5.7.8 Proof of Theorem 8 Theorem 8 Assume that (M,ΣM , µ, T ) is a discrete measure-preserving de- terministic system with zero Kolmogorov-Sinai entropy or an ergodic dis- crete measure-preserving deterministic system with finite Kolmogorov-Sinai entropy which is not a discrete Bernoulli system. Then there is a finite-valued observation function Φ and an ε > 0 such that no irreducible and aperiodic multi-step Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, T ). Proof : Case 1: Assume that (M,ΣM , µ, T ) is a discrete measure-preserving deterministic system with zero Kolmogorov-Sinai entropy. Assume that for some finite-valued observation function Ψ(m) = ∑n i=1 oiχαi , where α is a partition, {Ψ(T t); t ∈ Z} is an irreducible and aperiodic multi-step Markov process. The deterministic representation of this Markov process has Kolmogorov-Sinai entropy E > 0 because the deterministic representation of any irreducible and aperiodic multi-step Markov process is a Bernoulli sys- tem (cf. Theorem 7). This implies that H(α, T ) ≥ E > 0 (where H(α, T ) is the entropy relative to the partition α; see equation (3.6) in subsection 3.4.1). Hence the Kolmogorov-Sinai entropy of (M,ΣM , µ, T ) is positive. But this cannot be the case. Therefore, there can be no finite-valued observation func- tion Ψ such that {Ψ(T t); t ∈ Z} is an irreducible and aperiodic multi-step Markov process. Consequently, there is a finite-valued observation function Φ and an ε > 0 such that no irreducible and aperiodic multi-step Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, T ). Case 2 : Assume that (M,ΣM , µ, T ) is an ergodic discrete measure- preserving deterministic system with finite Kolmogorov-Sinai entropy which is not a discrete Bernoulli system. I have to show that there is a finite- valued observation function Φ and an ε > 0 such that no irreducible and aperiodic multi-step Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, T ). I can equally show the following claim (C): assume that an ergodic dis- crete measure-preserving deterministic system with finite Kolmogorov-Sinai entropy is given where for every ε > 0 and every finite-valued observation function Φ there is an n such that an irreducible and aperiodic Markov pro- CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 153 cess of order n strongly (Φ, ε)-simulates (M,ΣM , µ, T ). Then (M,ΣM , µ, T ) is a discrete Bernoulli system. So assume the assumptions of claim (C). A theorem by Krieger (1970) implies that there is a partition β = {β1, . . . , βr}, r ∈ N, of (M,ΣM , µ) which is generating for (M,ΣM , µ, T ) (cf. Definition 42) (Krieger 1970). I need the following theorem (Ornstein 1973a; Petersen 1983, pp. 274–275): (+) Let (K,ΣK , µK , R) be a discrete measure-preserving deter- ministic system, and let Π(k) = ∑l i=1 oiχαi(k), l ∈ N, oi 6= oj for i 6= j, 1 ≤ i, j ≤ l, where the partition α = {α1, . . . , αl} is gen- erating for (K,ΣK , µK , R). Assume that for all ε > 0 there is a surjective measurable function Θ : K → {u1, . . . , us}, s ≥ l, and a surjective measurable function Γ : {u1, . . . , us} → {o1, . . . ol} with µK({k ∈ K |Π(k) 6= Γ(Θ(k))}) < ε such that the deter- ministic representation of {Θ(Rt); t ∈ Z} is a discrete Bernoulli system. Then (K,ΣK , µK , R) is a discrete Bernoulli system. Let Φ(m) = ∑r i=1 qrχβr(m), qi 6= qj for i 6= j, 1 ≤ i, j ≤ r. Then for every ε > 0 there is an n and an irreducible and aperiodic Markov process of order n which strongly (Φ, ε)-simulates (M,ΣM , µ, T ). The deterministic representation of every irreducible and aperiodic multi-step Markov process is a discrete Bernoulli system (Theorem 7). Consequently, Theorem (+) implies that (M,ΣM , µ, T ) is a discrete Bernoulli system. 5.7.9 Proof of Theorem 9 Theorem 9 Assume that (M,ΣM , µ, T ) is a discrete measure-preserving de- terministic system with zero Kolmogorov-Sinai entropy or an ergodic dis- crete measure-preserving deterministic system with finite Kolmogorov-Sinai entropy which is not a discrete Bernoulli system. Then there is a finite-valued observation function Φ and an ε > 0 such that no irreducible and aperiodic Markov process weakly (Φ, ε)-simulates (M,ΣM , µ, T ). Proof : The proof is essentially the same as the proof of Theorem 8 (cf. subsection 5.7.8). CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 154 Case 1: Assume that (M,ΣM , µ, T ) is a discrete measure-preserving de- terministic system with zero Kolmogorov-Sinai entropy. An irreducible and aperiodic Markov process is an irreducible and aperiodic Markov process of order 1. Hence, Case 1 of the proof of Theorem 8 shows that there is a finite-valued observation function Φ and an ε > 0 such that no irreducible and aperiodic Markov process weakly (Φ, ε)-simulates (M,ΣM , µ, T ). Case 2 : Let (M,ΣM , µ, T ) be an ergodic measure-preserving determi- nistic system with finite Kolmogorov-Sinai entropy which is not a discrete Bernoulli system. I have to show that there is a finite-valued observation function Φ and an ε > 0 such that no irreducible and aperiodic Markov pro- cess weakly (Φ, ε)-simulates (M,ΣM , µ, T ). Again it suffices to show the fol- lowing claim (C): assume that an ergodic discrete measure-preserving deter- ministic system (M,ΣM , µ, T ) with finite Kolmogorov-Sinai entropy is given where for every ε > 0 and every finite-valued observation function Φ an irre- ducible and aperiodic Markov process weakly (Φ, ε)-simulates (M,ΣM , µ, T ). Then (M,ΣM , µ, T ) is a discrete Bernoulli system. So assume that the assumptions of claim (C) are fulfilled. The theorem by Krieger (1970) implies that there is a partition β = {β1, . . . , βr}, r ∈ N, which is generating for (M,ΣM , µ, T ). Define Φ(m) = ∑r i=1 qrχβr(m), qi 6= qj for i 6= j, 1 ≤ i, j ≤ r. Then for every ε > 0 there is an irreducible and aperiodic Markov process which weakly (Φ, ε)-simulates (M,ΣM , µ, T ). Therefore, from Theorem (+) (as stated in the proof of Theorem 8) and the fact that the deterministic representation of every irreducible and aperiodic Markov process is discrete Bernoulli system, it follows that (M,ΣM , µ, T ) is a discrete Bernoulli system. 5.7.10 Proof of Theorem 12 Theorem 12 Let (M,ΣM , µ, Tt) be a continuous Bernoulli system. Then for every finite-valued observation function Φ and every ε > 0 there is an irrationally related semi-Markov process {Zt; t ∈ R} which weakly (Φ, ε)- simulates (M,ΣM , µ, Tt). Proof : Let (M,ΣM , µ, Tt) be a continuous Bernoulli system, let Φ : M → CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 155 S, S = {s1, . . . , sN}, N ∈ N, be an arbitrary surjective finite-valued ob- servation function, and let ε > 0 be arbitrary. Theorem 11 implies that there is an n ∈ N and a surjective observation function Θ : M → S, Θ(m) = ∑N i=1 siχαi(m), for a partition α, such that {Yt = Θ(Tt); t ∈ R} is an irrationally related semi-Markov process of order n with outcomes si and corresponding times u(si), 1 ≤ i ≤ N , which strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). I need the following definition: Definition 43 The discrete deterministic system (M2,ΣM2 , µ2, T2) is a factor of the discrete deterministic system (M1,ΣM1 , µ1, T1) (where both systems are assumed to be measure-preserving) if, and only if, there are measurable sets Mˆi ⊆Mi with µi(Mi \ Mˆi) = 0 and TiMˆi ⊆ Mˆi (i = 1, 2) and there is a func- tion φ : Mˆ1→ Mˆ2 such that (i) φ−1(B)∈ΣM1 for all B ∈ΣM2 , A ⊆ Mˆ2; (ii) µ1(φ −1(B)) = µ2(B) for all B ∈ ΣM2 , B ⊆ Mˆ2; (iii) φ(T1(m)) = T2(φ(m)) for all m ∈ Mˆ1. For continuous measure-preserving deterministic systems (M1,ΣM1 , µ1, T 1 t ) and (M2,ΣM2 , µ2, T 2 t ) the definition of a factor is the same except that condition (iii) is φ(T 1t (m)) = T 2 t (φ(m)) for all m ∈ Mˆ1 and all t ∈ R (cf. Petersen 1983, p. 11).22 Note that the deterministic representation (X,ΣX , µX ,Wt,Λ0) of this semi- Markov process of order n is a factor of (M,ΣM , µ, Tt) (via the function φ(m) = rm, where rm is the realisation of m of the stochastic process {Yt; t ∈ R}) (cf. Ornstein & Weiss 1991, p. 18). Now I construct a continuous measure-preserving deterministic system (K,ΣK , µK , Rt) as follows. Let (Ω,ΣΩ, µΩ, V,Ξ0), Ξ0(ω) = ∑N i=1 siχβi(ω), where β is a partition, be the deterministic representation of {Sk; k ∈ Z}, the irreducible and aperiodic Markov process of order n corresponding to {Yt; t ∈ R} (see Example 5). Let f : Ω → {u1, . . . , uN}, f(ω) = u(Ξ0(ω)). Define K as ∪Ni=1Ki = ∪Ni=1(βi × [0, u(si))). Let ΣKi , 1 ≤ i ≤ N , be the product σ-algebra (ΣΩ∩βi)×L([0, u(si))) where L([0, u(si))) is the Lebesgue 22Clearly, if measure-preserving deterministic systems are isomorphic (Definition 19), then they are a factor of each other; but if a measure-preserving deterministic system is a factor of another deterministic system, this does not imply that they are isomorphic. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 156 σ-algebra of [0, u(si)). Let µKi be the product measure (µΣΩ∩βiΩ × λ([0, u(si))))/ N∑ j=1 u(sj)µΩ(βj), (5.15) where λ([0, u(si))) is the Lebesgue measure on [0, u(si)) and µ ΣΩ∩βi Ω is the measure µΩ restricted to ΣΩ∩βi. Now define ΣK as the completion of the σ- algebra generated by ∪Ni=1ΣKi . Define a pre-measure µ¯K on the semi-algebra H = (∪Ni=1(ΣΩ ∩ βi × L([0, si)))) ∪K, (5.16) by µ¯K(K) = 1 and µ¯K(A) = µKi(A) for A ∈ ΣKi , and let µK be the unique extension of this pre-measure to a measure on ΣK . Finally, Rt is defined as follows: let the state of the deterministic system at time zero be (k, v) ∈ K, k ∈ Ω, v < f(k); the state moves vertically with unit velocity, and just before it reaches (k, f(k)) it jumps to (V (k), 0) at time f(k)−v; then it again moves vertically with unit velocity, and just before it reaches (V (k), f(V (k)))) it jumps to (V 2(k), 0) at time f(V (k)) + f(k)− v, and so on. (K,ΣK , µK , Rt) is a continuous measure-preserving deterministic system (called a ‘flow built under the function f ’), and it has been shown that (X,ΣX , µX ,Wt) is iso- morphic (via a function ψ) to (K,ΣK , µK , Rt) (Ambrose 1941; Park 1982; Rudolph 1976). Exactly as in the proof of Proposition 2 we see that for γ = {γ1, . . . , γl} = β ∨ V β ∨ . . . ∨ V n−1β and Π(ω) = ∑lj=1 qjχγj(ω), qj 6= qi for i 6= j, 1 ≤ i, j ≤ l, the discrete stochastic process {Bt = Π(V t(ω))} is an irreducible and aperiodic Markov process. Now consider ∆(k) = ∑l i=1 qiχγi×[0,u(qi))(k), where u(qi), 1 ≤ i ≤ l, is defined as follows: u(qi) = u(sr) where γi ⊆ βr. Then it follows immediately that the stochastic process {Xt = ∆(Rt); t ∈ R} is an irrationally related semi-Markov process. Let the surjective measurable function Ψ : M → {q1, . . . , ql} be defined as follows: Ψ(m) = ∆(ψ(φ(m))) for m ∈ Mˆ and q1 otherwise. Recall that (X,ΣX , µX ,Wt) is a factor (via φ) of (M,ΣM , µ, Tt) and that (X,ΣX , µX ,Wt) is isomorphic (via ψ) to (K,ΣK , µK , Rt). Therefore, it follows that {Zt = Ψ(Tt); t ∈ R} is an irrationally related semi-Markov process with outcomes qi and corresponding times u(qi), 1 ≤ i ≤ l. Consider the surjective finite- valued observation function Γ : {q1, . . . , ql} → S, where Γ(qi), 1 ≤ i ≤ l, is CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 157 defined as follows: Γ(qi) = sr where γi ⊆ βr. By construction, we obtain that, esmz., Γ(Ψ(Tt(m))) = Yt(m) for all t ∈ R. Hence, because {Yt; t ∈ Z} strongly (Φ, ε)-simulates (M,ΣM , µ, Tt), µ({m ∈ M |Γ(Ψ(m)) 6= Φ(m)}) < ε. 5.7.11 Proof of Theorem 14 Theorem 14 Let (M,ΣM , µ, Tt) be a continuous measure-preserving determi- nistic system with zero Kolmogorov-Sinai entropy or a continuous measure- preserving deterministic system which is not a continuous Bernoulli system and where for some t0 ∈ R\{0} the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. Then there is a finite-valued observation function Φ and an ε > 0 such that no irrationally related multi-step semi- Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). Proof : The proof paralells the proof of the analogous discrete-time result (Theorem 8). Case 1: Let (M,ΣM , µ, Tt) be a continuous measure-preserving determi- nistic system with zero Kolmogorov-Sinai entropy. Assume that there is a finite-valued observation function Ψ(m) = ∑n i=1 oiχαi , where α is a partition, such that {Ψ(Tt); t ∈ R} is an irrationally related multi-step semi-Markov process. The deterministic representation of this multi-step semi-Markov process has Kolmogorov-Sinai entropy E > 0 because the deterministic rep- resentation of any irrationally related multi-step semi-Markov process is a continuous Bernoulli system (cf. Theorem 13). Hence H(α, T1) ≥ E > 0 (where H(α, T1) is the entropy relative to the partition α; see equation (3.6) in subsection 3.4.1). But this means that the Kolmogorov-Sinai entropy of (M,ΣM , µ, Tt) is positive, which contradicts the assumption. Therefore, there can be no finite-valued observation function Ψ such that {Ψ(Tt); t ∈ R} is an irrationally related multi-step semi-Markov process. Consequently, there is a finite-valued observation function Φ and an ε > 0 such that no ir- rationally related multi-step semi-Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). Case 2 : Assume that the continuous deterministic system (M,ΣM , µ, Tt) CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 158 has finite Kolmogorov-Sinai entropy, is not a continuous Bernoulli system, and that for some t0 ∈ R \ {0} the discrete system (M,ΣM , µ, Tt0) is er- godic. I have to show that there is a finite-valued observation function Φ and an ε > 0 such that no irrationally related multi-step semi-Markov pro- cess strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). For this I can equally show the following claim (C): assume that a continuous measure-preserving de- terministic system (M,ΣM , µ, Tt) is given which has finite Kolmogorov-Sinai entropy and where for some t0 ∈ R \ {0} the discrete deterministic sys- tem (M,ΣM , µ, Tt0) is ergodic. Further assume that for every ε > 0 and every finite-valued observation function Φ there is an n ∈ N such that an irrationally related semi-Markov process of order n strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). Then (M,ΣM , µ, Tt) is a continuous Bernoulli system. So assume that the assumptions of claim (C) are satisfied. I need the following definition: Definition 44 A partition α = {α1, . . . , αn} of (M,ΣM , µ) is generating for (M,ΣM , µ, Tt) if, and only if, for every A ∈ ΣM there is a τ ∈ R+ and a set C of unions of elements in ⋃ all m ⋂τ t=−τ (T −t(α(T t(m)))), where α(m) is defined as the set αj ∈ α with m ∈ αj, such that µ((A \ C) ∪ (C \ A)) < ε. Because the discrete deterministic system (M,ΣM , µ, Tt0) is ergodic, the the- orem by Krieger (1970) implies that there is a partition β = {β1, . . . , βr}, r ∈ N, which is generating for (M,ΣM , µ, Tt0) and thus also generating for the continuous deterministic system (M,ΣM , µ, Tt). I need the following theorem (Ornstein & Weiss 1991, p. 66; Petersen 1983, pp. 274–275): (++) Let (K,ΣK , µK , Rt) be a continuous measure-preserving de- terministic system, and let Π(k) = ∑l i=1 oiχαi(k), l ∈ N, oi 6= oj for i 6= j, 1 ≤ i, j ≤ l, where the partition α = {α1, . . . , αl} is generating. Assume that for all ε > 0 there is a surjective measurable function Θ : K → {u1, . . . , us}, s ≥ l, and a sur- jective measurable function Γ : {u1, . . . , us} → {o1, . . . ol} with µK({k ∈ K |Π(k) 6= Γ(Θ(k))}) < ε such that the determinis- tic representation of {Θ(Rt); t ∈ R} is a continuous Bernoulli system. Then (K,ΣK , µK , Rt) is a continuous Bernoulli system. CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 159 Let Φ(m) = ∑r i=1 qrχβr(m), qi 6= qj for i 6= j, 1 ≤ i, j ≤ r. It follows that for every ε > 0 there is an n ∈ N and an irrationally related semi-Markov process of order n which strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). The deterministic representation of every irrationally related multi-step semi-Markov process is a continuous Bernoulli system (Theorem 13). Consequently, Theorem (++) implies that (M,ΣM , µ, Tt) is a continuous Bernoulli system. 5.7.12 Proof of Theorem 15 Theorem 15 Let (M,ΣM , µ, Tt) be a continuous measure-preserving determi- nistic system with zero Kolmogorov-Sinai entropy or a continuous measure- preserving deterministic system which is not a continuous Bernoulli system and where for some t0 ∈ R\{0} the discrete measure-preserving deterministic system (M,ΣM , µ, Tt0) is ergodic. Then there is a finite-valued observation function Φ and an ε > 0 such that no irrationally related semi-Markov pro- cess weakly (Φ, ε)-simulates (M,ΣM , µ, Tt). Proof : The proof is essentially the same as the proof of Theorem 14 (cf. subsection 5.7.11). Case 1: Because an irrationally related semi-Markov process is an ir- rationally related semi-Markov process of order 1, Case 1 of the proof of Theorem 14 shows that there is a finite-valued observation function Φ and an ε > 0 such that no irrationally related semi-Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). Case 2: Let (M,ΣM , µ, Tt) be a continuous measure-preserving determi- nistic system with finite Kolmogorov-Sinai entropy which is not a continuous Bernoulli system. Assume that for some t0 ∈ R \ {0} the discrete determi- nistic system (M,ΣM , µ, Tt0) is ergodic. It needs to be shown that there is a finite-valued observation function Φ and an ε > 0 such that no irrationally related semi-Markov process strongly (Φ, ε)-simulates (M,ΣM , µ, Tt). For this I only have to show the following claim (C): assume that a continuous measure-preserving deterministic system (M,ΣM , µ, Tt) is given which has finite Kolmogorov-Sinai entropy and where for some t0 ∈ R \ {0} the dis- crete system (M,ΣM , µ, Tt0) is ergodic. Further assume that for every ε > 0 CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 160 and every finite-valued observation function Φ an irrationally related semi- Markov process weakly (Φ, ε)-simulates (M,ΣM , µ, Tt). Then (M,ΣM , µ, Tt) is a continuous Bernoulli system. So assume that the assumptions of claim (C) are satisfied. According to the theorem by Krieger (1970), there is a partition β = {β1, . . . , βr}, r ∈ N, which is generating for (M,ΣM , µ, Tt0) and thus also generating for (M,ΣM , µ, Tt). Let Φ(m) = ∑r i=1 qrχβr(m), qi 6= qj for i 6= j, 1 ≤ i, j ≤ r. It follows that for every ε > 0 there is an irrationally related semi-Markov process which weakly (Φ, ε)-simulates (M,ΣM , µ, Tt). The deterministic rep- resentation of every irrationally related semi-Markov process is a continuous Bernoulli system. Consequently, Theorem (++) (as stated in the proof of Theorem 14) implies that (M,ΣM , µ, Tt) is a continuous Bernoulli system. 5.7.13 Proof of Proposition 3 Proposition 3 Let (M,ΣM , µ, T ) be a discrete measure-preserving deter- ministic system where (M,dM) is separable and where ΣM contains all open balls of (M,dM). Assume that (M,ΣM , µ, T ) satisfies the assumption of The- orem 1 and has finite Kolmogorov-Sinai entropy. Then for every ε > 0 there is a stochastic process {Zt; t ∈ Z} with outcome space M¯ = ∪hl=1ol, h ∈ N, such that {Zt; t ∈ Z} is ε-congruent to (M,ΣM , µ, T ), and for all k ∈ N there is an outcome oi ∈ M¯ such that for all oj ∈ M¯ , 1 ≤ j ≤ h, P{Zt+k=oj |Zt=oi} < 1. Proof : Recall that if (M,ΣM , µ, T ) satisfies the assumptions of Theorem 1, then (M,ΣM , µ, T ) is ergodic (cf. subsection 5.7.1). Hence the theorem by Krieger (1970) implies that there is a partition α which is generating for (M,ΣM , µ, T ) (cf. Definition 42). Let ε > 0. Since (M,dM) is separable, there exists a r ∈ N and mi ∈ M , 1 ≤ i ≤ r, such that µ(M \ ∪ri=1B(mi, ε2)) < ε2 . Because α is generating, for each B(mi, ε 2 ) there is an ni ∈ N and a Ci of union of elements in ∨nij=−niT j(α) such that µ((B(mi, ε2) \ Ci) ∪ (Ci \ B(mi, ε 2 )) < ε 2r . Define n = max{ni}, β = {β1, . . . , βl} = ∨nj=−nT j(α) and Ψ(m) = ∑l i=1 oiχβi(m) with oi ∈ βi. Ψ is finite-valued, and Theorem 1 im- plies that for the process Zt={Ψ(T t); t∈Z}, for all k∈N there is an outcome CHAPTER 5. DETERMINISM VERSUS INDETERMINISM 161 oi such that for all oj, 1≤ j≤ l, P{Zt+k=oj |Zt=oi} < 1. Furthermore, be- cause α is generating, β is generating. Therefore, (M,ΣM , µ, T ) is isomorphic (via a function φ) to the deterministic representation (M2,ΣM2 , µ2, T2,Φ0) of {Zt; t ∈ Z} (Petersen 1983, p. 274). By construction, dM(m,Φ0(φ(m))) < ε except for a set in M of measure smaller than ε. 5.7.14 Proof of Proposition 4 Proposition 4 Let (M,ΣM , µ, Tt) be a continuous measure-preserving de- terministic system where (M,dM) is separable and where ΣM contains all open balls of (M,dM). Assume that (M,ΣM , µ, Tt) satisfies the assumption of Theorem 2 and has finite Kolmogorov-Sinai entropy. Then for every ε > 0 there is a stochastic process {Zt; t ∈ R} with outcome space MO = ∪hl=1ol, h ∈ N, such that {Zt; t ∈ R} is ε-congruent to (M,ΣM , µ, Tt), and for all k ∈ R+ there is an outcome oi ∈ MO such that for all oj ∈ MO, 1 ≤ j ≤ h, P{Zt+k=oj |Zt=oi} < 1. Proof : The proof uses the same ideas as the proof for the analogous discrete- time result. By assumption, there is a t0 ∈ R \ {0} such that the dis- crete deterministic system (M,ΣM , µ, Tt0) is ergodic. Then the theorem by Krieger (1970) implies that there is a partition α which is generating for (M,ΣM , µ, Tt0) and thus also generating for (M,ΣM , µ, Tt) (cf. Definition 44). Since (M,dM) is separable, for every ε > 0 there is a r ∈ N and mi ∈ M , 1 ≤ i ≤ r, such that µ(M \ ∪ri=1B(mi, ε2)) < ε2 . Because α is generating for (M,ΣM , µ, Tt0), for each B(mi, ε 2 ) there is an ni ∈ N and a Ci of union of elements in ∨nij=−niTjt0(α) such that µ((B(mi, ε2)\Ci)∪ (Ci \B(mi, ε2))) < ε2r . Let n=max{ni}, β={β1, . . . , βl}=∨nj=−nTjt0(α) and Ψ(m) = ∑l i=1 oiχβi(m) with oi ∈ βi. Since Ψ is a finite-valued observation function, Theorem 2 im- plies that for the stochastic process Zt = {Ψ(Tt); t ∈ R}, for all k ∈ R+ there is an outcome oi, 1 ≤ i ≤ l, such that for all oj, 1 ≤ j ≤ l, P{Zt+k = oj |Zt = oi} < 1. Because β is generating for (M,ΣM , µ, Tt), (M,ΣM , µ, Tt) is isomorphic (via a function φ) to the deterministic represen- tation (M2,ΣM2 , µ2, T 2 t ,Φ0) of {Zt; t ∈ R} (Petersen 1983, p. 274). And, by construction, dM(m,Φ0(φ(m))) < ε except for a set in M smaller than ε. Chapter 6 Concluding remarks This dissertation has been about some of the most important philosophical aspects of chaos research, a famous recent area of research about determinis- tic yet unpredictable and irregular, or even random behaviour. I have treated chaos from a measure-theoretic point of view because only this viewpoint provides a connection to probability theory and to the theory of stochastic processes, contributing to many topics of philosophical relevance. Let me briefly summarise this dissertation. I started by examining mathematical notions of unpredictability in er- godic theory. On this basis, I drew conclusions about the actual practice of how mathematical definitions are justified. More specifically, I intro- duced the main account of this issue, namely Lakatos’s (1976, 1978) proof- generated definitions. After that I presented two previously unidentified but common ways of justifying definitions which play an important role for no- tions of unpredictability in ergodic theory, namely condition-justification and redundancy-justification. I argued that these two kinds of justification are among the most important ones in mathematics. Also, I analysed the inter- relationships between the different kinds of justification. Then I criticised Lakatos’s theory. I argued that it does not acknowledge the interrelationships between the different kinds of justification, and that it ignores the fact that various kinds of justification—not only proof-generation—are important. With this background on notions of unpredictability, we were ready to tackle the question of what is the unpredictability specific to chaos. There is a 162 CHAPTER 6. CONCLUDING REMARKS 163 widespread belief that chaotic systems are unpredictable in a way that other deterministic systems are not. Hence one might expect that this question has already been answered in a satisfactory way. However, I argued that this is not so: the answers in the literature are defective. This prompted the search for a better answer. An event is called ‘probabilistically irrelevant’ for predicting another event if knowledge of the latter event neither heightens nor lowers the probability of the former event. Based on defining chaos via strongly mixing, I proposed a novel answer: the unpredictability specific to chaotic systems is that for predicting any event at any level of precision, all sufficiently past events are approximately probabilistically irrelevant. Finally, the fact that some deterministic systems are unpredictable and random raised the question of whether deterministic systems and stochas- tic processes can be observationally equivalent. I showed that for many measure-theoretic deterministic systems there is a stochastic process which is observationally equivalent to the deterministic system; and conversely, that for all stochastic processes there is a measure-theoretic deterministic system which is observationally equivalent to the stochastic process. Still, one might guess that the deterministic systems which are observationally equivalent to stochastic processes used in science do not include any deterministic systems used in science. I argued that this is not so because deterministic systems used in science give rise to Bernoulli processes and to semi-Markov processes. Despite this, one might guess that deterministic systems used in science can- not give the same predictions at every observation level as stochastic pro- cesses used in science. By proving new results in ergodic theory, I showed that also this guess is misguided: there are deterministic systems used in science which give the same predictions at every observation level as Markov processes or n-step Markov processes (for discrete time) and semi-Markov processes or n-step semi-Markov processes (for continuous time). There- fore, even kinds of stochastic processes and kinds of deterministic systems which intuitively seem to give very different predictions are observationally equivalent. Furthermore, I criticised the previous philosophical literature on observational equivalence, namely Hoefer (2008), Suppes (1993), Suppes & de Barros (1996), Suppes (1999) and Winnie (1998). These authors fail to see the philosophical significance of the results on observational equivalence, and CHAPTER 6. CONCLUDING REMARKS 164 they do not seem to be aware that also non-chaotic deterministic systems can be simulated at every observation level by stochastic processes. Furthermore, the viewpoints of these authors on the question of whether the deterministic or the stochastic description is preferable are untenable, and I have argued that this question needs more careful consideration. This summary illustrates that this dissertation makes a contribution to the literature at two levels. First, the mathematical theorems and the discus- sion about how to define chaos contributes to the general mathematical field of dynamical systems theory, and hence is also of relevance to the special sci- ences where dynamical systems theory is applied, from physics and biology to the social sciences. But, of course, the contribution of this dissertation are not only of mathematical nature. Primarily, this dissertation with its con- ceptual reflection about the mathematical results advances our knowledge of important philosophical themes such as the justification of definitions, un- predictability, and the question of whether phenomena are deterministic or indeterministic. To conclude this dissertation, let me give an outlook of important open ques- tions related to my dissertation. Let me first point out four issues which are directly related to the topics I have treated. First, there has traditionally been little philosophical reflection on the actual practice of mathematics, and in particular about the mathematical practice of justifying definitions (for some recent notable work on the actual practice of mathematics, see, for instance, Corfield 2003, Larvor 2001, Leng 2002, Mancosu 2008). So I think that there is much more of philosophical interest that could be said about the justification of definitions, and more generally about mathematical practice, such as what makes theorems deep as opposed to shallow. Second, philosophers distinguish between process randomness, i.e., ran- domness of the dynamics of a system, and product randomness, i.e., ran- domness of its output (Earman 1986, p. 145). Ergodic theorists agree that chaotic processes (and not just outputs) can be random. For instance, the ergodic hierarchy, a series of mathematical definitions, is often claimed to provide a hierarchy of increasing levels of deterministic process randomness (for more on some of the notions of the ergodic hierarchy, see section 3.3 and CHAPTER 6. CONCLUDING REMARKS 165 4.3). Yet there is hardly any philosophical literature on deterministic pro- cess randomness. There is the question of what the account of randomness is endorsed in ergodic theory, and how this account adds to our philosophical understanding. To my knowledge, this question has not been treated apart from Berkovitz et al.’s (2006) analysis of the ergodic hierarchy. Yet two of the levels of the ergodic hierarchy do not correspond to the mathematical characterisation of randomness they propose. Therefore, I have doubts that their characterisation of the randomness involved in the ergodic hierarchy succeeds. The underlying thought in ergodic theory seems to be that there are certain properties which make stochastic processes random, and that chaotic deterministic systems can share these properties and hence can be random. But the details are unclear and worthy of exploration. Third, I think that there is scope for proving further philosophically rel- evant mathematical results on the observational equivalence of deterministic and indeterministic descriptions. For instance, one might prove further re- sults about limitations on observational equivalence, similar to my theorems saying that discrete deterministic systems used in science cannot be simu- lated at every observation level by Bernoulli processes. Furthermore, if there is a choice between a deterministic and an indeterministic description, the question arises which description is preferable. As already highlighted in subsection 5.5, this question deserves a more careful treatment. Fourth, as explained in some detail in section 2.1, invariant measures are often interpreted as probability densities. There are still many open questions about this issue. For instance, there are interpretations of mea- sures as probability densities which, to the best of my knowledge, have not been philosophically assessed, such as the so-called Kolmogorov measures. These measures are defined as follows: add to a given deterministic system a small random noise ε. The resulting stochastic process usually has just one stationary measure µε. The invariant measure µ = limε→0 µε often exists and is interpreted as probability density since it derives from stochastic pro- cesses (Eckmann & Ruelle 1985, p. 626), but it is still unclear whether these measures justify the appellation ‘probability’. Also, there has been no philo- sophical work on the interesting question of which measure one should choose if two methods of identifying invariant measures suggest different measures. CHAPTER 6. CONCLUDING REMARKS 166 Furthermore, there has been relatively little philosophical discussion even about the most popular interpretations of invariant measures as probability densities, such as the time-average interpretation. Hence also here there is a need for further research, such as on the topic of how the time-average inter- pretation is best understood for nonergodic systems (cf. Lavis 2010). This gap is all the more important as all the extant philosophical literature on this issue is about classical statistical mechanics, which lacks the more exotic measures of dynamical systems theory, such as physical measures on strange attractors. Finally, let me point to three open questions more generally about dy- namical systems theory and chaos. First, our understanding of how chaotic behaviour arises from the quantum world is still incomplete (there is, of course, a vast literature; for two recent articles see Belot & Earman 1997 and Landsman 2007, section 5-7). Thus it would be desirable to see more foundational work on this issue. Second, there are many open questions about the significance of chaotic behaviour in statistical mechanics. Generally, it is still debated what exactly statistical mechanics accomplishes, and it is poorly understood why the var- ious schemes of statistical mechanics such as Gibbs’ phase space averaging work (Uffink 2007). In particular, there are many open questions about the role of chaotic behaviour in explaining the second law of thermodynamics or in explaining why in Gibbsian mechanics one can take phase averages of observables. For instance, recent accounts of typicality purport to derive an analogue of the second law of thermodynamics by appealing to chaotic behaviour and ergodicity (Goldstein 2001, Lebowitz 1993): yet it remains un- clear whether this derivation indeed goes through (Frigg 2009a, Frigg 2009b). Third, chaos research, and more generally dynamical systems theory, is applied in disciplines such as as meteorology and the climate sciences. Policy recommendations and also policies are sometimes based on predictions which were derived from models of dynamical systems theory. Yet this only makes sense if these models are not prone to model error, that is, if approximately the same results are obtained when the model is changed slightly. If model error prevails, then we need to be cautious with conclusions based on these models. Leading climate researchers are aware of this issue (e.g., Smith 2007) CHAPTER 6. CONCLUDING REMARKS 167 and would like to see philosophical as well as mathematical research on the role of model error. Most of these open questions are also of theoretical importance for the specific sciences, and some of them are relevant to policy. Yet because these questions are conceptual or foundational, scientists tend not to reflect on them carefully. Philosophical research, in particular research in the philo- sophy of science including the philosophy of the special sciences, can and should fill these gaps. To conclude, there is still much interesting work to be done about the philosophical aspects of chaos and the topics of this disser- tation. Exciting work for the future! List of Figures 1.1 A billiard system with a convex obstacle . . . . . . . . . . . . 10 2.1 The baker’s system on 0 ≤ y ≤ 1/2 . . . . . . . . . . . . . . . 22 2.2 Numerical solution of the Lorenz equations for σ = 10, r = 28, b = 8/3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 (a) histogram and (b) natural measure of the baker’s system . 26 4.1 evolution of a small bundle of initial conditions I under the baker’s system . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 168 Bibliography Ambrose, W. (1941), ‘Representation of ergodic flows’, Annals of Mathemat- ics, 2nd Series 42, 723–739. Arnold, V. & Avez, A. (1968), Ergodic Problems of Classical Mechanics, W. A. Benjamin, New York. Ash, R. (1972), Measure, Integration and Functional Analysis, Academic Press, New York and London. Aubin, D. & Dahan-Dalmedico, A. (2002), ‘Writing the history of dynam- ical systems and chaos: Longue dure´e and revolution, disciplines and cultures’, Historia Mathematica 29, 273–339. Batterman, R. (1991), ‘Randomness and probability in dynamical theories: on the proposals of the Prigogine school’, Philosophy of Science 58, 241– 263. Batterman, R. & White, H. (1996), ‘Chaos and algorithmic complexity’, Foundations of Physics 26, 307–336. Belot, G. & Earman, J. (1997), ‘Chaos out of order: quantum mechanics, the correspondence principle and chaos’, Studies in History and Philosophy of Modern Physics 28, 147–182. Benedicks, M. & Young, L.-S. (1993), ‘Sinai-Bowen-Ruelle-measures for cer- tain He´non maps’, Inventiones Mathematicae 112, 541–567. Berger, A. (2001), Chaos and Chance, an Introduction to Stochastic Aspects of Dynamics, De Gruyter, New York. 169 BIBLIOGRAPHY 170 Berkovitz, J., Frigg, R. & Kronz, F. (2006), ‘The ergodic hierarchy, ran- domness and Hamiltonian chaos’, Studies in History and Philosophy of Modern Physics 37, 661–691. Billingsley, P. (1965), Ergodic Theory and Information, John Wiley and Sons, New York. Birkhoff, G. (1931), ‘Proof of the ergodic theorem’, Proceedings of the Na- tional Academy of Sciences of the United States of America 17, 656–660. Bishop, R. (2003), ‘On separating predictability and determinism’, Er- kenntnis 58, 169–188. Bishop, R. (2008), ‘What could be worse than the butterfly effect?’, Canadian Journal of Philosophy 38, 519–547. Bowen, R. (1977), ‘Bernoulli maps of the interval’, Israel Journal of Mathe- matics 28, 161–168. Brin, M. & Stuck, G. (2002), Introduction to Dynamical Systems, Cambridge, Cambridge University Press. Brown, J. (1999), Philosophy of Mathematics: an Introduction to the World of Proofs and Pictures, Routledge, London. Butterfield, J. (2005), ‘Determinism and indeterminism’. Routledge Ency- clopaedia of Philosophy Online. Carathe´odory, C. (1914), ‘U¨ber das lineare Maß von Punktmengen—eine Verallgemeinerung des La¨ngenbegriffs’, Nachrichten der ko¨niglichen Gesellschaft der Wissenschaften zu Go¨ttingen. Mathematisch- physikalische Klasse pp. 404–426. Chernov, N. & Markarian, R. (2006), Chaotic Billiards, American Mathe- matical Society, Providence. Chirikov, B. (1979), ‘A universal instability of many-dimensional oscillator systems’, Physics Reports 52, 264–379. BIBLIOGRAPHY 171 Cinlar, E. (1975), Introduction to Stochastic Processes, Prentice Hall, Engle- wood Cliffs, New Jersey. Cohn, D. (1980), Measure Theory, Birkha¨user, Boston. Corfield, D. (1997), ‘Assaying Lakatos’s philosophy of mathematics’, Studies in History and Philosophy of Science 28, 99–121. Corfield, D. (2003), Towards a Philosophy of Real Mathematics, Cambridge University Press, Cambridge. Cornfeld, I., Fomin, S. & Sinai, Y. (1982), Ergodic Theory, Springer, Berlin. Cover, T. & Thomas, J. (2006), Elements of Information Theory, second edn, Wiley, New York. Dahan-Dalmedico, A. (2004), Chaos, disorder, and mixing: a new Fin-de- sie´cle image of science?, inM.Wise, ed., ‘Growing Explanations: Histor- ical Perspective on the Sciences of Complexity’, Duke University Press, Durham, pp. 67–94. Devaney, R. (1986), An Introduction to Chaotic Dynamical Systems, Addison-Wesley, New York et al. Doob, J. L. (1953), Stochastic Processes, John Wiley & Sons, New York. Eagle, A. (2005), ‘Randomness is unpredictability’, The British Journal for the Philosophy of Science 56, 749–790. Earman, J. (1971), ‘Laplacian determinism, or is this any way to run a universe?’, Journal of Philosophy 68, 729–744. Earman, J. (1986), A Primer on Determinism, D. Reidel, Dotrecht. Eckmann, J.-P. & Ruelle, D. (1985), ‘Ergodic theory of chaos and strange attractors’, Reviews of Modern Physics 57, 617–654. Falconer, K. (1990), Fractal Geometry: Mathematical Foundations and Ap- plications, John Wiley & Sons, New York. BIBLIOGRAPHY 172 Ford, J. (1989), What is chaos that we should be mindful of it, in P. Davies, ed., ‘The New Physics’, Cambridge, Cambridge University Press, pp. 348–371. Frigg, R. (2004), ‘In what sense is the Kolmogorov-Sinai entropy a measure for chaotic behaviour?—bridging the gap between dynamical systems theory and communication theory’, The British Journal for the Philo- sophy of Science 55, 411–434. Frigg, R. (2006), ‘Chaos and randomness: An equivalence proof of a gen- eralised version of the Shannon entropy and the Kolmogorov-Sinai en- tropy for Hamiltonian dynamical systems’, Chaos, Solitons and Fractals 28, 26–31. Frigg, R. (2009a), ‘Typicality and the approach to equilibrium in Boltzman- nian statistical mechanics’, Philosophy of Science (Supplement) forth- coming. Frigg, R. (2009b), Why typicality does not explain the approach to equi- librium, in M. Sua´rez, ed., ‘Probabilities, Causes and Propensitites in Physics’, forthcoming, Springer, Berlin. Frigg, R. & Hartmann, S. (2006), Models in science, in E. Zalta, ed., ‘The Stanford Encyclopedia of Philosophy (Spring 2006 Edition)’, http://plato.stanford.edu/archives/spr2006/entries/models- science/, Stanford. Frigg, R. & Werndl, C. (2010), Entropy – a guide for the perplexed, in C. Beisbart & S. Hartmann, eds, ‘Probabilities in Physics’, forthcoming, Oxford University Press, Oxford. Goldstein, S. (2001), Boltzmann’s approach to statistical mechanics, in J. Bricmont, D. Du¨rr, M. Galavotti, G. Ghirardi, F. Pettrucione & N. Zanghi, eds, ‘Chance in Physics: Foundations and Perspectives’, Springer, Berlin and New York, pp. 39–54. Halmos, P. (1944), ‘In general a measure-preserving transformation is mix- ing’, The Annals of Mathematics 45, 786–792. BIBLIOGRAPHY 173 Halmos, P. (1949), ‘Measurable transformations’, Bulletin of the American Mathematical Society 55, 1015–1043. Halmos, P. (1950), Measure Theory, Van Nostrand, New York and London. Halmos, P. (1956), Lectures on Ergodic Theory, Chelsea Publishing Company, New York. Halmos, P. (1961), ‘Recent progress in ergodic theory’, Bulletin of the Amer- ican Mathematical Society 67, 70–80. Haskell, C. (1992), Brownian Motion and Billiards on the Torus, PhD thesis, University of Stanford, Stanford. He´non, M. (1976), ‘A two dimensional mapping with a strange attractor’, Communications in Mathematical Physics 50, 69–77. Hilborn, R. (2000), Chaos and Nonlinear Dynamics, an Introduction for Sci- entists and Engineers, Oxford University Press, Oxford. Hoefer, C. (2008), Causal determinism, in E. Zalta, ed., ‘The Stanford Encyclopaedia of Philosophy (Winter 2008 Edition)’, http://plato.stanford.edu/archives/win2008/entries/determinism- causal/, Stanford. Hopf, E. (1932a), ‘Complete transitivity and the ergodic principle’, Proceed- ings of the National Academy of Sciences of the United States of America 18, 204–209. Hopf, E. (1932b), ‘Proof of Gibbs’ hypothesis on the tendency toward statis- tical equilibrium’, Proceedings of the National Academy of Sciences of the United States of America 18, 333–340. Jacobson, M. (1981), ‘Absolutely continuous invariant measures for one- parameter families of one-dimensional maps’, Communications in Mathematical Physics 81, 39–88. Janssen, J. & Limnios, N. (1999), Semi-Markov Models and Applications, Kluwer Academic Publishers, Dordrecht, the Netherlands. BIBLIOGRAPHY 174 Kellert, S. (1993), In the Wake of Chaos, University of Chicago Press, Chicago. Klir, G. (2006), Uncertainty and Information: Foundations of Generalized Information Theory, Wiley, Hoboken, New Jersey. Kola´r˘, M. & Gumbs, G. (1992), ‘Theory for the experimental observation of chaos in a rotating waterwheel’, Physical Review A 45, 626–637. Kolmogorov, A. (1933), Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer, Berlin. Kolmogorov, A. (1958), ‘A new metric invariant of transitive dynamical sys- tems and automorphisms of Lebesgue spaces’, Dokl. Acad. Nauk SSSR 119, 861–864. Koopman, B. (1931), ‘Hamiltonian systems and transformations in Hilbert space’, Proceedings of the National Academy of Sciences of the United States of America 17, 315–318. Koopman, B. & von Neumann, J. (1932), ‘Dynamical systems of continuous spectra’, Proceedings of the National Academy of Sciences of the United States of America 18, 255–263. Krieger, W. (1970), ‘On entropy and generators of measure-preserving transformations’, Transactions of the American Mathematical Society 149, 453–456. Lakatos, I. (1961), Essays in the Logic of Mathematical Discovery, PhD the- sis, University of Cambridge, Cambridge. Lakatos, I. (1976), Proofs and Refutations, The Logic of Mathematical Dis- covery, edited by John Worrall and Elie Zahar, Cambridge University Press, Cambridge. Lakatos, I. (1978), Mathematics, Science and Epistemology, Philosophical Papers Volume 2, edited by John Worrall and Elie Zahar, Cambridge University Press, Cambridge. BIBLIOGRAPHY 175 Landsman, K. (2007), Between classical and quantum, in J. Butterfield & J. Earman, eds, ‘Philosophy of Physics (Handbooks of the Philosophy of Science A)’, North-Holland, Amsterdam, pp. 417–553. Larvor, B. (1998), Lakatos: an Introduction, Routledge, London and New York. Larvor, B. (2001), ‘What is dialectical philosophy of mathematics?’, Philosophia Mathematica 9, 212–229. Laskar, J. (1994), ‘Letter to the editor: Large-scale chaos in the solar system’, Astronomy and Astrophysics 287, 9–12. Lavis, D. (2010), An objectivist account of probabilities in statistical physics, in C. Beisbart & S. Hartmann, eds, ‘Probabilities in Physics’, forthcom- ing, Oxford University Press, Oxford. Lebowitz, J. (1993), ‘Macroscopic laws, microscopic dynamics, time’s arrow and Boltzmann’s entropy’, Physica A 194, 1–27. Leiber, T. (1998), ‘On the actual impact of deterministic chaos’, Synthese 113, 357–379. Leng, M. (2002), ‘Phenomenology and mathematical practice’, Philosophia Mathematica 10, 3–25. Lichtenberg, A. J. & Lieberman, M. A. (1992), Regular and Chaotic Dynam- ics, Springer, Berlin, New York. Lighthill, J. (1986), ‘The recently recognized failure of predictability in new- tonian dynamics’, Proceedings of the Royal Society of London, Series A 407, 35–50. Lind, D. (1975), ‘A counterexample to a conjecture of Hopf’, Duke Mathe- matical Journal 42, 755–757. Lissauer, J. (1999), ‘Chaotic motion in the solar system’, Reviews of Modern Physics 71, 835–845. BIBLIOGRAPHY 176 Lorenz, E. (1963), ‘Deterministic nonperiodic flow’, Journal of the Athmo- spheric Sciences 20, 130–141. Lorenz, E. (1964), ‘The problem of deducing the climate from the governing equations’, Tellus XVI, 1–11. Luzzatto, S., Melbourne, I. & Paccaut, F. (2005), ‘The Lorenz attractor is mixing’, Communications in Mathematical Physics 260, 393–401. Lyubich, M. (2002), ‘Almost every regular quadratic map is either regular or stochastic’, Annals of Mathematics 156, 1–78. Man˜e´, R. (1987), Ergodic Theory and Differentiable Dynamics, Springer, Berlin. Mackey, G. (1974), ‘Ergodic theory and its significance for statistical me- chanics and probability theory’, Advances in Mathematics 12, 178–268. Mancosu, P. (2008), The Philosophy of Mathematical Practice, Oxford Uni- versity Press, Oxford. Marsden, J. & Hoffman, M. (1974), Elementary Classical Analysis, W.H. Freeman and Company, New York. Martinelli, M., Dang, M. & Seph, T. (1998), ‘Defining chaos’, Mathematics Magazine 71, 112–122. May, R. (1976), ‘Simple mathematical models with very complicated dynam- ics’, Nature 261, 459–467. Mayer, D. & Roepstorff, G. (1983), ‘Strange attractors and asymptotic mea- sures of discrete-time dissipative systems’, Journal of Statistical Physics 31, 309–326. Miller, D. (1996), The status of determinism in an uncontrollable world, in P. Weingartner & G. Schurz, eds, ‘Law and Prediction in the Light of Chaos Research’, Springer, Berlin, pp. 103–114. Montague, R. (1962), Deterministic theories, in D. Wilner, ed., ‘Decisions, Values and Groups’, Pergamon Press, New York, pp. 325–370. BIBLIOGRAPHY 177 Moser, J. (1973), Stable and Random Motions in Dynamical Systems, Yale University Press, New Haven. Nillsen, R. (1999), ‘Chaos and one-to-oneness’, Mathematics Magazine 72, 14–21. Norton, J. (2003), ‘Causation as folk science’, Philosopher’s Imprints 3, 1–22. Ornstein, D. (1970a), ‘Bernoulli-shifts with the same entropy are isomorphic’, Advances in Mathematics 4, 337–352. Ornstein, D. (1970b), Imbedding Bernoulli shifts in flows, in A. Dold & B. Eckmann, eds, ‘Contributions to Ergodic Theory and Probability, Proceedings of the First Midwestern Conference on Ergodic Theory held at the Ohio State University, March 27-30’, Lecture Notes in Mathemat- ics, vol. 160, Springer, Berlin, pp. 178–218. Ornstein, D. (1971), ‘Some new results in the Kolmogorov-Sinai theory of entropy and ergodic theory’, Bulletin of the American Mathematical Society 77, 878–890. Ornstein, D. (1973a), ‘An application of ergodic theory to probability theory’, The Annals of Probability 1, 43–58. Ornstein, D. (1973b), ‘The isomorphism theorem for Bernoulli flows’, Ad- vances in Mathematics 10, 124–142. Ornstein, D. (1974), Ergodic Theory, Randomness, and Dynamical Systems, Yale University Press, New Haven and London. Ornstein, D. (1989), ‘Ergodic theory, randomness and “chaos”’, Science 243, 182–187. Ornstein, D. & Galavotti, G. (1974), ‘Billiards and Bernoulli schemes’, Com- munications in Mathematical Physics 38, 83–101. Ornstein, D. & Weiss, B. (1991), ‘Statistical properties of chaotic systems’, Bulletin of the American Mathematical Society 24, 11–116. BIBLIOGRAPHY 178 Oseledec, V. (1968), ‘Multiplicative ergodic theorem, Lyapunov characteris- tic numbers for dynamical systems’, Transactions of the Moscow Mathe- matical Society 19, 197–221. Ott, E. (2002), Chaos in Dynamical Systems, Cambridge University Press, Cambridge. Park, K. (1982), ‘A special family of ergodic flows and their d¯-limits’, Israel Journal of Mathematics 42, 343–353. Peitgen, H.-O., Ju¨rgens, H. & Saupe, D. (1992), Chaos and Fractals, New Frontiers of Science, Springer, New York. Petersen, K. (1983), Ergodic Theory, Cambridge University Press, Cam- bridge. Pitowsky, I. (1995), ‘Laplace’s demon consults an oracle: The computational complexity of prediction’, Studies in History and Philosophy of Modern Physics 27, 161–180. Polya, G. (1949), ‘With or without motivation’, The American Mathematical Monthly 56, 684–691. Polya, G. (1954), Patterns of Plausible Inference, Volume II of Mathematics and Plausible Reasoning, Princeton University Press, Princeton. Radunskaya, A. (1992), Statistical Properties of Deterministic Bernoulli Flows, PhD thesis, University of Stanford, Stanford. Robinson, C. (1995), Dynamical Systems, Stability, Symbol Dynamics and Chaos, CRC Press, Tokyo. Rohlin, V. (1960), ‘New progress in the theory of transformations with in- variant measure’, Russian Mathematical Surveys 15, 1–22. Rudolph, D. (1976), ‘A two-valued step coding for ergodic flows’, Mathema- tische Zeitschrift 150, 201–220. Rudolph, D. (1990), Fundamentals of Measurable Dynamics, Ergodic Theory on Lebesgue spaces, Oxford University Press, Oxford. BIBLIOGRAPHY 179 Ruelle, D. (1997), ‘Chaos, predictability, and idealizations in physics’, Com- plexity 3, 26–28. Ruelle, D. & Takens, F. (1971), ‘On the nature of turbulence’, Communica- tions in Mathematical Physics 20, 167–192. Schurz, G. (1996), Kinds of unpredictability in deterministic systems, in P. Weingartner & G. Schurz, eds, ‘Law and Prediction in the Light of Chaos Research’, Springer, Berlin, pp. 123–141. Schuster, G. & Just, W. (2005), Deterministic Chaos: an Introduction, Wiley-VCH Verlag, Weinheim. Scott, S. (1991), Chemical Chaos, Clarendon Press, Oxford. Shields, P. (1973), The Theory of Bernoulli-shifts, University of Chicago Press, Chicago. Shiryaev, A. (1989), ‘Kolmogorov: life and creative activities’, The Annals of Probability 17, 866–944. Sinai, Y. (1959), ‘On the concept of entropy for dynamical systems’, Dokl. Acad. Nauk SSSR 124, 768–771. Sinai, Y. (1963), ‘Probabilistic ideas in ergodic theory’, American Mathe- matical Society Translations 31, 62–84. Sinai, Y. (1989), ‘Kolmogorov’s work on ergodic theory’, The Annals of Prob- ability 17, 833–839. Sinai, Y. (2000), Dynamical Systems, Ergodic theory and Applications, Springer, Berlin. Sinai, Y. (2007), ‘Kolmogorov-Sinai entropy’. Scholarpedia, Retrieved from the World Wide Web on January 24, 2008: www.scholarpedia.org/article/Kolmogorov-Sinai entropy. Skinner, J., Goldberger, A., Mayer-Kress, G. & Ideker, R. (1997), ‘Chaos in the heart: Implications for clinical cardiology’, Bio/Technology 8, 1018– 1024. BIBLIOGRAPHY 180 Sklar, L. (1993), Physics and Chance: Philosophical Issues in the Founda- tions of Statistical Mechanics, Cambridge University Press, Cambridge. Smith, L. (2007), A Very Short Introduction to Chaos, Oxford University Press, Oxford. Smith, L., Ziehmann, C. & Fraedrich, K. (1999), ‘Uncertainty dynamics and predictability in chaotic systems’, Quarterly Journal of the Royal Mete- orological Society 125, 2855–2886. Smith, P. (1998), Explaining Chaos, Cambridge University Press, Cambridge. Stone, M. (1989), ‘Chaos, prediction and Laplacian determinism’, American Philosophical Quarterly 26, 123–131. Strogatz, S. (1994), Nonlinear Dynamics and Chaos, with Applications to Physics, Biology, Chemistry, and Engineering, Addison Wesley, New York. Suppes, P. (1993), ‘The transcendental character of determinism’, Midwest Studies in Philosophy 18, 242–257. Suppes, P. (1999), ‘The noninvariance of deterministic causal models’, Syn- these 121, 181–198. Suppes, P. & de Barros, A. (1996), Photons, billiards and chaos, in P. Wein- gartner & G. Schurz, eds, ‘Law and Prediction in the Light of Chaos Research’, Springer, Berlin, pp. 189–201. Sz´asz, D. (2000), Hard Ball systems and the Lorentz gas, Encyclopaedia of Mathematical Sciences 101, Springer, Berlin. Tappenden, J. (2008a), Mathematical concepts and definitions, in P. Man- cosu, ed., ‘The Philosophy of Mathematical Practice’, Oxford, Oxford University Press, pp. 256–275. Tappenden, J. (2008b), Mathematical concepts: fruitfulness and naturalness, in P. Mancosu, ed., ‘The Philosophy of Mathematical Practice’, Oxford, Oxford University Press, pp. 276–301. BIBLIOGRAPHY 181 Uffink, J. (2007), Compendium to the foundations of classical statistical physics, in J. Butterfield & J. Earman, eds, ‘Philosophy of Physics (Handbooks of the Philosophy of Science B)’, North-Holland, Amster- dam, pp. 923–1074. von Neumann, J. (1932a), ‘Proof of the quasi-ergodic hypothesis’, Proceedings of the National Academy of Sciences of the United States of America 18, 70–82. von Neumann, J. (1932b), ‘Zur Operatorenmethode in der klassischen Mechanik’, The Annals of Mathematics 33, 587–642. von Plato, J. (1994), Creating Modern Probability, Cambridge University Press, Cambridge. Walters, P. (1982), An Introduction to Ergodic Theory, Springer, New York. Weingartner, P. (1996), Under what transformations are laws invariant?, in P. Weingartner & G. Schurz, eds, ‘Law and Prediction in the Light of Chaos Research’, Springer, Berlin, pp. 47–88. Weingartner, P. & Schurz, G. (1996), Law and Prediction in the Light of Chaos Research, Springer, Berlin. Werndl, C. (2009a), ‘Are deterministic descriptions and indeterministic de- scriptions observationally equivalent?’, Studies in History and Philo- sophy of Modern Physics 40, 232–242. Werndl, C. (2009b), Deterministic versus indeterministic descriptions: Not that different after all?, in A. Hieke & H. Leitgeb, eds, ‘Reduction, Abstraction, Analysis, Proceedings of the 31st International Ludwig Wittgenstein-Symposium’, Ontos, Frankfurt, pp. 63–78. Werndl, C. (2009c), ‘Justifying definitions in matemathics—going beyond Lakatos’, Philosophia Mathematica doi:10.1093/philmat/nkp006. Werndl, C. (2009d), On the justification and formulation of mathematical definitions: the case of deterministic chaos, in M. Dorato, M. Re´dei & BIBLIOGRAPHY 182 M. Sua´rez, eds, ‘Proceedings of the First Conference of the European Philosophy of Science Association’, forthcoming, Springer, Berlin. Werndl, C. (2009e), ‘What are the new implications of chaos for unpredicta- bility?’, The British Journal for the Philosophy of Science 60, 195–220. Wiggins, S. (1990), Introduction to Applied Nonlinear Dynamical Systems and Chaos, Text in Applied Mathematics 2, Springer. Winnie, J. (1998), Deterministic chaos and the nature of chance, in J. Ear- man & J. Norton, eds, ‘The Cosmos of Science – Essays of Exploration’, Pittsburgh University Press, Pittsburgh, pp. 299–324. Young, L.-S. (1997), ‘Ergodic theory and chaotic dynamical systems, XII- th international congress of mathematical physics’, Reviews of Modern Physics 57, 617–654. Young, L.-S. (2002), ‘What are SRB measures, and which dynamical systems have them?’, Journal of Statistical Physics 108, 733–754. Zaslavsky, G. (2005), Hamiltonian Chaos and Fractional Dynamics, Oxford University Press, Oxford. Ziehmann, C., Smith, L. & Kurths, J. (1986), ‘Localized Lyapunov exponents and the prediction of predictability’, Physics Letters A 271, 1–15.