Quantum theory from the perspective of general probabilistic theories Sabri Walid Al-Safi Clare College University of Cambridge July 2014 This dissertation is submitted for the degree of Doctor of Philosophy 2 Acknowledgements The period during which this work was carried out would have been con- siderably less enjoyable without the intervention of many people. I am ines- timably grateful for the unflinching love, support and tolerance of my par- ents, Linda and Walid, and my brother Khalid. I owe special thanks to Jassy, whose passion, awareness and integrity I will always aspire to, and whose counsel has always proved judicious. Thank you all for your patience. I couldn't have undertaken this research without the energy and genial- ity of my supervisor, Tony Short, just as I couldn't have made it intelligible without the enthusiasm of my colleagues at the Centre for Quantum Informa- tion and Foundations in Cambridge, as well as many like-minded researchers whom I've met along the way. Thank you for helping me to see the way for- wards. I would also like to thank - in no particular order, not even a partial one - friends who have given me a great deal of pleasure during this time: Jack, Matt, Danny & Ellie, Zach, Laurence, John, Alex, Ilan, Sam, David, Dan, Sarah, Pippa, Will, Cat & Tom, Terry F. and Matt L., who brightened Pav. F basement; and tea, for inducing just the right kind of daze. 3 4 To Jassy 5 6 Abstract This thesis explores various perspectives on quantum phenomena, and how our understanding of these phenomena is informed by the study of general probabilistic theories. Particular attention is given to quantum non- locality, and its interaction with areas of physical and mathematical interest such as entropy, reversible dynamics, information-based games and the idea of negative probability. We begin with a review of non-signaling distribu- tions and convex operational theories, including “black box” descriptions of experiments and the mathematics of convex vector spaces. In Chapter 3 we derive various classical and quantum-like quasiproba- bilistic representations of arbitrary non-signaling distributions. Previously, results in which the density operator is allowed to become non-positive [1] have proved useful in derivations of quantum theory from physical require- ments [2]; we derive a dual result in which the measurement operators in- stead are allowed to become non-positive, and show that the generation of any non-signaling distribution is possible using a fixed separable state with negligible correlation. We also derive two distinct “quasi-local” models of non-signaling correlations. Chapter 4 investigates non-local games, in particular the game known as Information Causality. By analysing the probability of success in this game, we prove the conjectured tightness of a bound given in [3] concerning how well entanglement allows us to perform the task of random access coding, and introduce a quadratic bias bound which seems to capture a great deal of information about the set of quantum-achievable correlations. By refor- mulating Information Causality in terms of entropies, we find that a sensi- ble measure of entropy precludes many general probabilistic theories whose non-locality is stronger than that of quantum theory. Chapter 5 explores the role that reversible transitivity (the principle that any two pure states are joined by a reversible transformation) plays as a char- acteristic feature of quantum theory. It has previously been shown that in Boxworld, the theory allowing for the full set of non-signaling correlations, any reversible transformation on a restricted class of composite systems is merely a composition of relabellings of measurement choices and outcomes, and permutations of subsystems [4]. We develop a tabular description of Boxworld states and effects first introduced in [5], and use this to extend this reversibility result to any composite Boxworld system in which none of the subsystems are classical. 7 8 Preface This dissertation is the result of my ownwork and includes nothingwhich is the outcome of work done in collaboration except where specifically indi- cated in the text. 9 10 Contents 1 Introduction 19 1.1 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 23 2 General Probabilistic Theories 27 2.1 Bell's Theorem and the CHSH game . . . . . . . . . . . . . . . . 29 2.2 Black boxes and non-signaling distributions . . . . . . . . . . . . 34 2.3 Examples of probabilistic theories . . . . . . . . . . . . . . . . . 42 2.3.1 Classical theory . . . . . . . . . . . . . . . . . . . . . . . 44 2.3.2 Quantum theory . . . . . . . . . . . . . . . . . . . . . . 45 2.3.3 Boxworld . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4 The non-signaling polytope . . . . . . . . . . . . . . . . . . . . . 51 2.5 Convex vector space representations . . . . . . . . . . . . . . . . 54 2.5.1 Convex geometry . . . . . . . . . . . . . . . . . . . . . . 56 2.5.2 Abstract state spaces . . . . . . . . . . . . . . . . . . . . 62 2.5.3 Composite Systems . . . . . . . . . . . . . . . . . . . . . 67 2.5.4 Transformations . . . . . . . . . . . . . . . . . . . . . . 71 2.5.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3 Quasiprobability representations 83 3.1 Quasiprobability distributions . . . . . . . . . . . . . . . . . . . 85 3.2 Quantum theory with quasiprobabilities . . . . . . . . . . . . . . 89 3.2.1 Non-positive density matrices . . . . . . . . . . . . . . . 92 3.2.2 Commuting operators . . . . . . . . . . . . . . . . . . . . 98 3.3 Classical theory with quasiprobabilities . . . . . . . . . . . . . . 100 3.3.1 Non-positive mixtures . . . . . . . . . . . . . . . . . . . 102 3.3.2 Non-positive effects . . . . . . . . . . . . . . . . . . . . 107 3.3.3 Quantum corollaries . . . . . . . . . . . . . . . . . . . . 110 3.4 Further quasiprobabilistic quantum models . . . . . . . . . . . . . 112 11 3.4.1 Non-positive observables acting on ρN,d . . . . . . . . . . 113 3.4.2 Approximate product states . . . . . . . . . . . . . . . . 120 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4 Information Causality 127 4.1 Non-local games . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.1.1 The CHSH game revisited . . . . . . . . . . . . . . . . . 132 4.1.2 Inner product games . . . . . . . . . . . . . . . . . . . . 134 4.1.3 The information causality game . . . . . . . . . . . . . . 137 4.2 Probability of success in the information causality game . . . . . 145 4.2.1 Random Access Codes . . . . . . . . . . . . . . . . . . . 146 4.2.2 Optimal success probability of EARACs . . . . . . . . . 148 4.3 The role of entropy . . . . . . . . . . . . . . . . . . . . . . . . . 154 4.3.1 An entropic information causality . . . . . . . . . . . . . 157 4.3.2 Entropy in general physical theories . . . . . . . . . . . . 162 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5 Reversible Boxworld dynamics 172 5.1 Identical subsystems: review . . . . . . . . . . . . . . . . . . . . 177 5.1.1 Fiducial effect vectors . . . . . . . . . . . . . . . . . . . 179 5.1.2 Induced fiducial effect string permutations . . . . . . . . 182 5.1.3 Characterisation of reversible transformations . . . . . . . 185 5.2 Effect tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 5.2.1 Reversible transformations on effect tables . . . . . . . . 200 5.2.2 Diagrammatic proof for general bipartite systems . . . . . 202 5.3 Decompositions of effects . . . . . . . . . . . . . . . . . . . . . 208 5.3.1 Identical subsystems revisited . . . . . . . . . . . . . . . 215 5.4 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 5.5 Polytopic models with non-trivial reversible dynamics . . . . . . 220 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 6 Conclusion and Outlook 234 12 List of Figures 2.1 Quantum measurement operators for CHSH violation . . . . . . . 33 2.2 “Black box” description of experimental statistics. . . . . . . . . . 37 2.3 Multipartite experimental statistics. . . . . . . . . . . . . . . . . . 38 2.4 The non-signaling polytope . . . . . . . . . . . . . . . . . . . . . 53 2.5 Polygon state spaces . . . . . . . . . . . . . . . . . . . . . . . . 81 3.1 Qn hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.1 Non-local games and distributed computations . . . . . . . . . . . 129 4.2 Violating Information Causality with PR-box states . . . . . . . . 143 5.1 Improper tripartite effect not detected by pure product states . . . 199 13 14 List of Notation General H Finite-dimensional, complex Hilbert space. L (H) Vector space of linear operators onH. |ψ〉 Unit vector inH representing a pure quantum state. ρ Hermitian, positive semi-definite, unit-trace (density) operator onH. ρ˜ Hermitian, unit-trace operator onH. Ei Hermitian, positive semi-definite (POVM) operator onH, satisfying Tr (E) ≤ 1 Ma|x POVM operator corresponding to measuring x for and obtaining outcome a. M˜a|x Non-positive measurement operator corresponding to measuring x and obtaining outcome a. [N ] The set {1, . . . , N}. Hc({p}) Classical entropy of a probability distribution {p} = {p1, . . . pn)}. Ic({p} : {q}) Classical mutual information between distributions {p} and {q}. 15 Outcome distributions P (a1, . . . , aN |x1, . . . , xN) Multipartite probability of outcome ai being obtained on subsystem i, conditional on having locally measured xi Ω A subset of the set of subsystems [N ] over which the outcome distribution is defined. P (aΩ|xΩ) Marginal outcome distribution over the subsystems belonging to Ω. M (i) The number of fiducial measurement choices at subsystem i. K(i)xi The number of possible outcomes for of fiducial measurement xi on subsystem i. P (a|x) Abbreviated form of the above, where a = (a1, . . . , aN) and x = (x1, . . . , xN). Ex1x2 Bias of a g-bit system, conditional on measurements x1 and x2 being performed on subsystems 1 and 2. C / C1 The CHSH value of an outcome distribution. PΛ(λ) Joint probability distribution over local variables. P˜Λ(λ) Joint quasiprobability distribution over local variables. P (i)(ai|xi, λi) Local outcome function, determining probability of obtaining outcome ai on subsystem i, conditional on having locally measured xi and λi being the value of the local variable. P˜ (i)(ai|xi, λi) Quasiprobabilistic local outcome function. 16 General probabilistic theories V Vector space representing the theory. S Subset of V representing normalised states. S+ Cone of V generated by S. E Subset of V representing effects. E+ Cone of V generated by E . U Vector in V representing the unit effect. UN Unit effect vector for a composite system comprising N subsystems. S[N ] The set of sub-unit effects of a composite system. S{i} The set of i-sub-unit effects, for subsystem i of a composite system. Xa|x Fiducial effect corresponding to measuring x and obtaining outcome a. X(i)ai|xi Local fiducial effect on subsystem i. T Transformation of states in V . T † Transformation of effects in V (adjoint of T ). M Set of effects corresponding to a measurement. M∗ The set of fine-grained measurements of a system in the theory. Hme(s) Measurement entropy of a state s. Hmi(s) Mixing entropy of a state s. 17 18 Chapter 1 Introduction Physics is mathematical not because we know so much about the physical world, but because we know so little; it is only its mathe- matical properties that we can discover. “An Outline of Philosophy” Bertrand Russell 19 The history of quantum theory is replete with results, commonly referred to as “no-go theorems”, which assert that a certain physical state of affairs is impossi- ble, as long as one assumes that the mathematical formalism of quantum theory accurately describes the workings of Nature. No-go theorems often have ramifi- cations for the practical application of quantum theory: the No-Cloning Theorem for example implies that it is impossible to construct a machine which can make copies of arbitrary quantum states [6]. On the other hand, many no-go theorems also have ramifications for our understanding of Nature: the Kochen-Specker the- orem for example states that quantum mechanical properties cannot be embedded into a more fundamental set of properties of reality which, unlike quantum ob- servables, are mutually compatible [7]. Invariably however, no-go theorems offer crucial insight into phenomena which are central to a full understanding of quan- tum theory, such as indeterminacy, entanglement, uncertainty relations, and the measurement process. Arguably themost enduring and provocative phenomenon in the study of quan- tum foundations, with profound implications both practically and philosophically, is that of non-locality, encapsulated by Bell in 1964 with a no-go theorem known today as Bell's Theorem [8]. Basing his ideas on a 1930's thought experiment by Einstein et al. [9] later refined by Bohm [10], Bell proved that it is impossible for any physical theory obeying a well-defined notion of local causality to accu- rately reproduce all the experimental predictions of quantum theory. In particular, correlations in the local measurement statistics of a separated, but entangled, pair of electrons cannot be accurately reproduced by a physical theory which assumes certain - apparently reasonable - locality relations between the involved systems. Such “local” theories obey what are now known as Bell inequalities: restrictions on experimental outcome statistics which can be significantly violated by quan- tum measurements. Assuming that quantum theoretic predictions are accurate for Bell's proposed experiment (as verified by numerous experimental realisations to date, e.g. [11-19]), then local theories cannot be representative of Nature; the uni- verse itself is non-local. 20 Bell's arguments opened up more fascinating questions than they resolved, and paved the way for extensive research into non-locality that continues to this day. Philosophical debate has raged over how Bell's original assumptions are to be in- terpreted, and whether there exist hidden assumptions in his argument that one might prefer to drop rather than locality [20-24]. Repeated realisations of Bell's experiment have attempted to close down various loopholes proposed over the years as a way for local theories to “get around” tests done so far, for example due to instrumental inefficiencies [14, 15, 25, 26] or sub-luminal influences between sites [11, 12, 19]. The practical utilization of non-locality has also been explored, with the development of a key distribution protocol that relies on violation of Bell inequalities as proof against eavesdropping [27]. Much of this thesis focuses on the technical study of non-locality as a quantifi- able feature of measurement outcomes on quantum systems, and the intuition that it gives us about the mathematical structure of quantum theory. One of the greatest insights of Bell's Theorem is a focus on the measurement results, rather than the underlying physical state of affairs (or the mathematical language used to codify that state of affairs). Unlike the related phenomenon of entanglement, which ad- mits only an algebraic characterization in terms of Hilbert spaces, non-locality is defined in terms of the statistics of experimental outcomes, i.e. the results which are noted in the observer's logbook, and how they correlate with another logbook some distance away. The purity of this approach is useful in multiple ways. Firstly, it opens a direct link to information theory, which ultimately cares about the stor- age and transmission of information that is accessible to humans - even quantum information theory in the end is concerned with the eventual classical readout of results. Secondly, it grants us a language with which to discuss hypothetical mod- els of nature in terms of the measurement results they generate; these models are of special interest if quantum theory cannot replicate those results. Lastly, it is a potential means to putting quantum theory onto a more intuitive footing, and stripping it of the opaque Hilbert-space axiomatisation which pervades any intro- ductory course in quantum theory. 21 The study of general probabilistic theories [2, 28-40] has been developed in recent years in order to capture and explore the idea of adopting a purely opera- tional description of experiments. By “operational”, we mean that the objects of the theory are meaningful only insofar as they influence experimental outcomes. For example, if two states of a system cannot be distinguished via the statistics of any measurement on that system, then they are to be regarded as the same state. By making a sparse and reasonable set of operational assumptions, a mathematical framework may be developed in which any model of nature may be represented, as long as that model admits a basic notion of systems, whose states inform the probabilities of measurement outcomes. Thus, quantum theory and classical the- ory (the meaning of which we will be more precise about in Chapter 2) each have their place in the set of general probabilistic theories, as do a whole host of theories which are capable of producing a greater amount of non-locality than is achievable in quantum theory. A surprising number of features of quantum theory, which have previously been regarded as especially quantum, turn out to be quite generic within this larger set of general probabilistic theories. Features such as mixed states, entanglement, non-locality, and the use of the tensor product for composite systems in fact apply to all general probabilistic theories except classical theory, and can be seen as fol- lowing from natural assumptions about measurement outcomes [29, 32, 41]. The No-Cloning and No-Broadcasting Theorems have analogues which also hold in all non-classical general probabilistic theories [31], for which the proofs are as intu- itive as the quantum theoretic proofs (if not more so) and from which the quantum theoretic versions easily follow. Even quantum information theoretic protocols are fair game: the result that two copies of a Werner state cannot be deterministically purified follows from a more general result that holds in all general probabilistic theories satisfying a few reasonable conditions. [42]. Clearly, general probabilistic theories provide a valuable shift in perspective which allows for novel and illumi- nating proofs. In this thesis we hope to convince the reader that it is also helpful in shaping our understanding of fundamental concepts in quantum theory. 22 1.1 Overview of the Thesis This thesis is divided into a further five chapters. In Chapter 2 we introduce and develop the framework of general probabilistic theories. Chapter 2 can be further divided into two parts: firstly, the characterization of a general probabilistic theory as a set of constraints on probability distributions over experimental outcomes, and secondly, an explanation of how such a theory can be represented using the math- ematics of convex subsets of vector spaces. In both cases, we provide archetypal examples in order to clarify the meaning and importance of the concepts intro- duced. No new results are included in this chapter, however the author is not aware of a similarly comprehensive presentation of the mathematical framework of general probabilistic theories in the literature, which has an overview of the relevant convex and conic geometry, and encompasses individual and composite systems. In Chapter 3, we derive a set of results that apply when one relaxes the positiv- ity of outcome probabilities, and demonstrate that this allows for a “quasi-local” simulation of arbitrarily strong non-local states. In many situations, a local model is more convenient, since it bears similarity with the well-explored and intuitive territory of classical probability theory. We also discuss how quasiprobabilities, despite their counter-intuitive nature, have been been useful in the history of quan- tum mechanics, and may be used in future. The content of Chapter 3 is based mostly on a collaboration with Anthony J. Short [43], with additional discussion and some extended results. In 2005,Wim van Dam demonstrated that superstrong non-locality is sufficient to reduce the communication complexity of any distributed computation to a triv- iality, revealing a close relationship between non-locality and information theory. From the perspective of general probabilistic thoeries, this result gives a neat phys- ical principle which allows one to “rule out” correlations which maximally violate Bell's inequality, although not those which violate it to a non-maximal, yet post- quantum - degree. Ruling out this latter class of correlations has been achieved by a recent principle known as Information Causality; in Chapter 4, we examine 23 Information Causality from two distinct perspectives. Firstly, by analysing the probability of success in the Information Causality game and a closely related “in- ner product game”, we prove that a bound on useful entanglement for the task of random access coding given in [3] is tight, as conjectured, and derive a quadratic bias bound on measurement probabilities which seems to capture a great deal of information about the structure of the set of quantum-achievable correlations. Sec- ondly, we show that Information Causality might be more easily interpreted when reformulated in terms of entropies rather than mutual information, and show that the existence of a sensible measure of entropy is just as powerful in helping to “rule out” very non-local correlations. Chapter 4 features a detailed review of the main result of [1]; the remaining results are based on a collaboration with Anthony J. Short [44], with an extended discussion of the results. Quantum phenomena such as entanglement and the tensor product are now seen to be generic features of general probabilistic theories. To understand what makes quantum theory stand out from other general probabilistic theories, it is beneficial to focus on the those features of quantum theory that appear to be rare amongst this set. One such feature is the ability to transform any pure state to any other via a reversible transformation. In Chapter 5, we develop the study of reversible dynamics in the general probabilistic theory known as Boxworld. In order to develop the intuition behind our results, we make use of a matrix-like formulation of Boxworld states and effects which was introduced in [5] to study measurements in Boxworld. We show that the reversible dynamics of composite Boxworld systems are composed merely of relabellings of measurement choices and outcomes, and permutations of subsystems. This extends a remarkable result by Gross et al. [4], which applies to Boxworld systems made up of identical sub- systems. We then discuss the potential for, and the potential limits of, applying the techniques we develop to large classes of general probabilistic theories. This chap- ter features a review of the original result [4], a development of the above men- tioned matrix-like formalism for general probabilistic theories introduced in [5], and the results obtained in [45], in collaboration with Anthony J. Short. 24 In the final chapter, we summarise the results obtained in Chapters 3 - 5, draw out parallels and contrasts between their approaches, and discuss how future work might build upon this research. 25 26 Chapter 2 General Probabilistic Theories In other contexts, physicists have been able to take words from ordi- nary language and use them as technical terms with no great harm done. Take for example the “strangeness”, “charm”, and “beauty” of elementary particle physics. No one is taken in by this “baby talk”…. Would that it were so with “measurement”. But in fact the word has had such a damaging effect on the discussion, that I think it should now be banned altogether in quantum mechanics. “Against “Measurement”” John S. Bell 27 As a model of nature, quantum theory has met with unrivaled experimental verification and widespread acceptance. Part of its appeal lies in its unified, coher- ent account of various phenomena which are inexplicable in the classic Newtonian world-view. Whether talking about the position of a particle or the polarisation of a light-wave, experimental observations in quantum theory are predicted according to a standardised set of operations, involving a standardised set of mathematical objects. Whatever else it is, quantum theory is unarguably a useful and powerful tool for calculating the probabilistic outcomes of experiments. Yet the underlying theory as it is usually presented - the quantum theory of Hilbert spaces and positive definite operators - is plagued with a multitude of in- terpretational issues. It is natural to wonder whether many of these these issues are due simply to the language in which quantum theory is currently written, or whether they are somehow inherent in the probabilities that quantum theory pre- dicts, and thereby intrinsic to the universe. Recent years have seen the develop- ment of general probabilistic theories [2,28-40], providing a framework in which the outcome statistics of experiments become the main focus of attention rather than the underlying physical theory which actually generates those statistics, be it Newtonian physics, quantum theory, or some post-quantum theory that has yet to be discovered. A major benefit of this abstract perspective is that one can make statements not just about quantum theory, but about any conceivable physical the- ory that makes probabilistic predictions about experimental outcomes, including those that may eventually supersede quantum theory. This has profound impli- cations for the study of nature: as discussed in Chapter 1, Bell's famous theorem concerning local theories tells us not only that quantum theory is non-local, but that any theory which generates the same predictions is non-local, and hence na- ture itself is non-local. In this chapter we introduce the framework of general probabilistic theories in two stages. Sections 2.1-2.4 make up the first stage, in which hypothetical physi- cal theories are characterised by the outcome distributions which they predict are achievable in many-party experiments. Section 2.5 makes up the second stage, in 28 which we borrow various results from the mathematics of convex vector spaces in order to construct the framework of convex vector space representations. This framework provides a neat geometric picture of outcome distributions, rendering general probabilistic theories in a language that permits a muchmore general treat- ment of states, measurements and transformations. This allows for many features of quantum theory to be generalised to the full set of general probablistic theories, and consequently allows for the comparison of quantum theory to other theories. 2.1 Bell's Theorem and the CHSH game In this section we outline a proof of Bell's Theorem and provide a more rigorous footing to non-locality. Althoughmost readers will be familiar with the arguments, it is useful to introduce the language and notation that will be employed throughout this thesis. One of the simplest reformulations of Bell's Theorem, introduced by Clauser et al. in 1969 [46], introduces a “CHSH value”, a measure of the correla- tions on the inputs and outputs of a very simple 2-player game, the “CHSH game”. The basic set-up for this game is as follows: suppose that Alice and Bob choose, independently and uniformly at random, one of two inputs, and without communi- cating, generate in some way one of two possible outputs. It is convenient to label both the inputs and outputs using binary digits 0 and 1. Denoting Alice and Bob's inputs by x, y ∈ {0, 1} respectively, and their outputs a, b ∈ {0, 1}, the statistics of the game are characterised by the conditional probability distribution: P (a, b|x, y) . (2.1) The CHSH value is a linear function of this binary, bipartite conditional prob- ability distribution, and plays the role of the figure of merit in the game: the value which Alice and Bob are both attempting to maximise. In order to define it, it is first convenient to introduce a measure of the correlation of the outputs for a 29 specific choice of inputs x and y: Exy = P (0, 0|x, y) + P (1, 1|x, y)− P (0, 1|x, y)− P (1, 0|x, y) = P (a = b|x, y)− P (a 6= b|x, y) . (2.2) (Note that if we had labelled the outputs as 1 and−1 instead of 0 and 1 respec- tively, then Exy would be the expected value of the product a · b, given the choices x and y). The CHSH value is the following linear function of correlators: C = E00 + E01 + E10 − E11. (2.3) Thus, in order to maximise the CHSH value, Alice and Bob should attempt to produce correlated outputs whenever either x or y is equal to 0, but produce anti- correlated outputs whenever x = y = 1. Writing Exy = 2P (a = b|x, y) − 1 = 1−2P (a 6= b|x, y), and using the fact that Alice and Bob each choose their inputs independently and uniformly randomly, the CHSH expression may be simplified: E00 + E01 + E10 − E11 = 2 (P (a = b|0, 0) + P (a = b|0, 1) + P (a = b|1, 0) + P (a 6= b|1, 1))− 4 = 8P (a⊕ b = x · y)− 4, (2.4) where “⊕” and “·” denote addition and multiplication modulo 2. Suppose that in order to process their inputs and generate their outputs, Al- ice and Bob perform measurements on a bipartite system, such that the particular measurement performed is determined by their inputs, and the outcome of themea- surement determines their outputs. Despite having used the word “system” with all its connotations, we are making no claims about what kind of system is being used. For example, we are not committed to the measurement of a quantum sys- tem, although the reader is welcome to imagine that this is the case. We must also give heed to Bell's quotation at the beginning of this chapter, and be careful with what we mean by the word “measurement”. It is not assumed that a pre-existing 30 property of the system is being revealed, merely that each subsystem interacts with some classical system representing the input, and some classical system represent- ing the output. For the moment, as long as a single, definite value is assigned to the input and output for each run of the experiment, then what happens inbetween is none of our concern. Now, if Alice and Bob's subsystems were to operate completely independently of one another, then we should expect that outcome probability depends solely on the local input, and that the joint output probability is given by the product of the local probabilities, P (a, b|x, y) = P (a|x) · P (b|y) . (2.5) If the two subsystems are physically separated at the time of measurement, and we have no reason to believe they are related by some common cause, then we would expect the statistics to behave as a product distribution, obeying (2.5). Now suppose that we want to allow for the fact that a common cause is influ- encing the statistics of both subsystems, despite them being separate at the time of measurement. Perhaps, for example, they interacted in the past in such a way that the outputs are random but perfectly correlated. We can codify this by introducing a random variable λ, which takes values in some setΛwith probability PΛ(λ), and such that given any individual value of λ, the subsystems behave independently, i.e. P (a, b|x, y) = P (a|x, λ) ·P (b|y, λ). The actual experiment statistics are now obtained by averaging over λ, P (a, b|x, y) = ∑ λ∈Λ PΛ(λ) [P (a|x, λ) · P (b|y, λ)] . (2.6) Naively, we might expect that this state of affairs completely characterises the outcome statistics that are possible on physically separated subsystems. If neither can be influenced directly by the other then, up to some common cause, we expect them to behave independently. In short, we expect the outcome distribution to be local, one that can be written in the form 2.6 for some set of local variables Λ. However, it can be shown that any local outcome distribution is itself merely an 31 average over the set of deterministic product distributions (see Section 4.1 for a rigorous proof of this) - there are only 16 such distributions, and a straightforward check demonstrates that all of these obeyP (a⊕b = x·y) ∈ {1 4 , 3 4 } , therefore have CHSHvalues C = ±2. Themaximal CHSHvalue obtainable by local distributions is therefore 2. We now show that Alice and Bob may achieve a CHSH value greater than 2 by employing quantum measurements on a shared entangled state. Let |ψ〉 be the maximally entangled state ( |0〉|0〉+|1〉|1〉√ 2 ) on a bipartite system composed of two qubits, and suppose that Alice and Bob's inputs x and y determine which measure- ment they make on their individual qubit. We will consider measurements of the form n·σ for some unit vector n, whereσ = (σx, σy, σz) is a vector whose compo- nents are the 2-dimensional Pauli operators, and employ the following expectation values: 〈ψ|σx ⊗ σx|ψ〉 = 1 〈ψ|σx ⊗ σz|ψ〉 = 0 〈ψ|σz ⊗ σx|ψ〉 = 0 〈ψ|σz ⊗ σz|ψ〉 = 1 Note that n ·σ is itself a Hermitian operator with eigenvalues±1: suppose that Alice chooses from two possible measurement operators {A0, A1}, and B chooses from {B0, B1}, and each measurement operator is of the form n · σ. Suppose further that their outputs a and b are 0 or 1 depending on whether the outcome of the measurement is +1 or −1 respectively on Alice's or Bob's subsystem. Recall that in this case the correlatorExy is the expectation of the product of the outcomes, Exy = (−1)a+b, which can also be written as 〈ψ|Ax ⊗By|ψ〉. 32 Suppose that Alice and Bob make the following measurement choices: Input 0 1 Alice σx σz Bob ( σx+σz√ 2 ) ( σx−σz√ 2 ) (2.7) Observe that the unit vector n is always zero in the y-coordinate, and the operators Ax and By are represented by unit vectors in the two-dimensional plane spanned by σx and σz. Figure 2.1: Unit vectors representing Alice and Bob's measurement choices. The correlators can be calculated using the above expectation values and lin- 33 earity of the inner product: E00 = 〈ψ|A0 ⊗B0|ψ〉 = 1√ 2 E01 = 〈ψ|A0 ⊗B1|ψ〉 = 1√ 2 E10 = 〈ψ|A1 ⊗B0|ψ〉 = 1√ 2 E11 = 〈ψ|A1 ⊗B1|ψ〉 = − 1√ 2 giving C = E00 + E01 + E10 − E11 = 4√2 = 2 √ 2 > 2. This simple quantum me- chanical operation on entangled qubits generates a correlation between two distant parties that is impossible under the assumption that measurements made at physi- cally separate locations are governed by local distributions. The beauty of the CHSH value is that it is an easily calculated quantity which provides a succinct demonstration of how quantum theory departs from the clas- sical regime, as well as a rough measure of the strength of non-locality which is achievable in any general probabilistic theory. By exploring the maximum obtain- able value that a theory allows for quantities such as the CHSH value, we can set up a hierarchy of general probabilistic theories, based on the degree of non-locality they allow for. Local theories cannot exceed a CHSH value of 2, whereas this may be surpassed by quantum theory, up to a maximum of 2 √ 2 [47]. These values lie well below the algebraic maximum of 4, which is achieved in the maximally non- local theory known as “Boxworld” [48]. 2.2 Black boxes and non-signaling distributions In this section we attempt to extract some of the insights of Bell's Theorem, and begin to explore what can be said about Nature purely in terms of experimental out- comes. Recall our discussion in the previous Section of the word “measurement”, in particular that we are not too concerned with what is really going on between 34 the inputs and the outputs. In fact, it is useful at this stage to think of a system as a “black box” into which the inputs go, and out of which come the outputs. When we later introduce general probabilistic theories, it may be helpful to think of each system as just such a black box; it may also be surprising, from this perspective, to see just how much of the seemingly internal workings of quantum theory admits a faithful operational analogue, i.e. how much can be described purely in terms of input/output probabilities. A basic set of physical notions, or operational primitives, is clearly required in order to say anything about experimental outcomes: we assume that there exist various types of system, that each systemmay be prepared in one of various states, and that one of various measurements may be performed on a system. It goes without saying that the possible states and measurements depend on the type of system in play, and that the likelihood of a given outcome occurring is governed by the state. In adopting an operational attitude, we are interested not in the intrinsic qual- ities of states of systems, such as their Hilbert-space dimension, but merely its observable qualities, i.e. how the state informs the outcomes of measurements. It is therefore useful to introduce the concept of an effect, which is no more than the occurrence of a particular outcome. An effect must belong to the set of pos- sible outcomes for at least one measurement, but in principle multiple measure- ments may share a single effect. The state of a system is completely and uniquely characterised by the probabilities it assigns to individual effects; any two states which assign identical probabilities to the set of effects are considered to be the same state. This assumption neatly encapsulates the operational stance of general probabilistic theories: we should not suppose there to exist distinct states which nevertheless have exactly the same observable qualities. It is common at this point to make a number of finiteness assumptions; these bring convenient simplifications, andmay generally be justified on physical grounds. Firstly, a measurement comprises a finite set of effects, so that states assign dis- crete (rather than continuous) outcome distributions - probability distributions over 35 the outcome-sets of measurements. This restriction makes sense in the context of real-world quantum measurements: the result of an experiment, even after a large number of repeats with extremely refined instruments, can only be recorded to a certain number of significant figures. Although the underlying theory may pre- dict a continuum of outcomes, for example in infinite dimensional Hilbert spaces, the outcome set is in practice always restricted to a finite set. Even if we did have access to infinitesimally accurate instruments, it would still be of interest to under- stand measurement statistics resulting from finite dimensional quantum systems, and this restriction would still be a reasonable one to make. A second finiteness assumption is that for any system there exists a finite set of fiducial measurementswhose outcome distributions suffice to uniquely identify the state of that system. The effects belonging to fiducial measurements are known as fiducial effects; a state is therefore completely characterised by the probabilities it assigns to the finite set of fiducial effects. This means that we restrict ourselves to considering only the statistics generated in finite dimensional quantum theory. Although this is not fully general, many of the quantum phenomena we wish to illuminate, such as entanglement, non-locality and non-contextuality, occur just as prominently in finite dimensions. In this thesis we are interested in seeing what can be demonstrated in this simpler setting, and leave the infinite-dimensional considerations to others. We now make rigorous the above descriptions of measurement and outcome. An individual system, in which an experiment is modelled, allows for a certain number of measurement choices, indexed by a variable x which takes values in the finite set {1, . . . ,M}. For each measurement choice x, a certain number of outcomes are permitted, indexed by a variable a which takes values in the finite set {1, . . . , Kx} (note that the number of possible outcomes may depend on which measurement is performed). For convenience we will make an exception to this rule in the case thatM = K = 2, and will instead label the measurement choices and outcomes using the set {0, 1}. An experiment involving this system is char- acterised by a set of real numbers P (a|x), giving the probability that a is output, 36 conditioned on x having been input (see Figure 2.2). Figure 2.2: “Black box” description of experimental statistics. Now consider a multipartite system comprising N subsystems, each of which can be regarded as an individual system in its own right, with its own set of mea- surements. Suppose that the measurements at subsystem i are indexed by some variable xi which takes values in the finite set {1, . . . ,M (i)}. Each measurement choice allows for one of a finite number of outcomes, indexed by the variable ai which takes values in the finite set {1, . . . , K(i)xi }. Imagine that a measurement is performed individually on each subsystem, simultaneously and without any com- munication between subsystems. This global measurement on the multipartite system is described by a set of real numbers P (a1, . . . , aN |x1, . . . , xN), giving the probability that outcomes a1, . . . , aN occur at systems 1, . . . , N respectively, conditioned on measurement choices x1, . . . , xN having been performed on those systems (see Figure 2.3). Just as a specification of the outcome distributions P (a|x) uniquely charac- terises the experiment on a single system, we assume that a specification of the values P (a1, . . . , aN |x1, . . . , xN) is sufficient to characterise the multipartite ex- periment. This assumption, known variously as local tomography [2, 34], local distinguishability [33], the Local Observability Principle [41] or the Global State 37 Figure 2.3: Multipartite experiment composed of N black boxes. Assumption [29], asserts that there are no additional degrees of freedom in a com- posite system which are not captured by the differing measurement choices for the individual subsystems. The values P (a1, . . . , aN |x1, . . . , xN) correspond to a conditional probability distribution, and in order for them to be physically meaningful they must obey the normal laws of probability. Since each individual value is a probability, it must be positive: 0 ≤ P (a1, . . . , aN |x1, . . . , xN) . (2.8) For any fixed choice of measurements x1, . . . , xN on all subsystems, the probabil- ities for all possible values of outcomes a1, . . . , aN must sum to 1, i.e. the outcome distribution for that choice of local measurements is normalised: ∑ a1,...,aN P (a1, . . . , aN |x1, . . . , xN) = 1. (2.9) (This also implies that each value P (a1, . . . , aN |x1, . . . , xN) is at most 1). We have stated that the conditional probability P (a1, . . . , aN |x1, . . . , xN) re- sults from individual measurements made on separate subsystems, although we have not defined what we mean by “separate”. This operational framework con- cerns only experimental outcomes, and has no a priori notion of distance or simul- taneity. Nevertheless, our goal is to discuss the outcomes of experiments made in the real world, and so we make the assumption that information cannot be trans- 38 mitted from one set of subsystems to another through the process of measurement. A justification for this is that the subsystems may represent physically separated real-world structures; since the local measurements in principle may be carried out simultaneously, there should be no influence from one set of subsystems to another during the measurement procedure. To ensure that information cannot be sent between the subsystems, we demand that a no-signaling condition is satisfied by the outcome distribution P . This con- dition asserts that the outcome statistics for any subsetΩ of theN subsystemsmust not be affected by measurement choices made on those subsystems not belonging to Ω, i.e. there is no way for one subsystem to signal information to any other subsystems. In mathematical terms, we demand that for all Ω ⊂ [N ], the reduced value, ∑ ai:i/∈Ω P (a1, . . . , aN |x1, . . . , xN) , (2.10) is well-defined and independent of the value of each xi for systems i /∈ Ω. If the no-signaling condition is obeyed by P , then we say that P is a non-signaling distribution. Non-signaling distributions have the following property: whatever the choice of measurement made by systems outside of any subset Ω, the marginal outcome distribution on the systems inside Ω is invariant. In fact, the no-signaling condition is equivalent to the existence of a well-defined reduced outcome distri- bution for any subset of N ; writing Ω = {i1, . . . , ir} ⊂ N , the reduced outcome distribution is defined: P (ai1 , . . . , air |xi1 , . . . , xir) = ∑ ai:i/∈Ω P (a1, . . . , aN |x1, . . . , xN) . (2.11) It should be stressed that the no-signaling condition is distinct from the impos- sibility of superluminal propagation of information or matter, which holds in the special theory of relativity. The no-signaling condition is a reasoned assumption stated in terms of outcome statistics, and does not rely on any concept of space- time. On the other hand, the impossibility of superluminal signaling is deduced 39 from the principles of special relativity, and is therefore true in our best model of spacetime. It is difficult to imagine reasonable theories of nature which allow for states that violate the no-signaling condition. The possibility that the reduced state of a subsystem might immediately change according to choices made in a distant location violates any intuitive feeling one has about what the reduced state of a subsystem even means. Conversely, it is not difficult to conceive of theories in which superluminal propagation is in principle possible: non-relativistic quantum mechanics provides one such example, in which the no-signaling condition holds but there is no reason that the movement of a particle should be hampered by the speed of light. In order to simplify notation, we write a and x for the strings (a1, . . . , aN) and (x1, . . . , xN). When discussing non-signaling distributions we use the no- tation aΩ and xΩ for the “reduced strings” (ai1 , . . . , air) and (xi1 , . . . , xir), and write P (aΩ|xΩ) for the reduced outcome distribution defined by (2.11). The no- signaling condition can then be expressed in a compact form: for any Ω ⊂ [N ], measurement-choice strings x, x′ satisfying xΩ = x′Ω, and fixed choice of outcomes aΩ, ∑ ai:i/∈Ω P (a|x) = ∑ a′i:i/∈Ω P (a′|x′) = P (aΩ|xΩ) , (2.12) where a′ denotes the string whose components are ai for i ∈ Ω and a′i for i /∈ Ω. The following Proposition demonstrates that in order to checkwhether a known outcome distribution is non-signaling, it suffices to verify that the non-signaling condition holds for all Ω with |Ω| = 1, i.e. that it is not possible for any single subsystem to signal to the rest via the choice of measurement on that subsystem. Proposition 1. Suppose that the outcome distribution P (a|x) satisfies the condi- tion that for all subsystems i and distinct measurement choices xi 6= x′i on subsys- 40 tem i, Kxi ∑ ai=1 P (a1, . . . , ai, . . . , aN |x1, . . . , xi, . . . xN) = Kx′i ∑ a′i=1 P (a1, . . . , a′i, . . . , aN |x1, . . . , x′i, . . . xN) . (2.13) Then P is a non-signaling outcome distribution. Proof. Suppose that x and x′ are measurement-choice strings, satisfying xΩ = x′Ω for a given subset Ω of the N subsystems. Fix a choice of outcomes aΩ for the subsystems belonging toΩ. Wewish to show that summing over the set of possible outcomes for all subsystems not inΩ leads to the same value, whether the outcome distribution is taken with respect to x or with respect to x′, i.e. ∑ ai:i/∈Ω P (a|x) = ∑ a′i:i/∈Ω P (a′|x′) . (2.14) As strings of numbers, x and x′ differ in at most s = N − |Ω| components (those not belonging to Ω). Writing [N ] \ Ω = {i1, . . . , is}, it is possible to change s components of x one at a time, terminating at x′: x = x0 → xi1 → · · · → xis = x′. (2.15) However, by varying the order of summation on the LHS of (2.14) and using 41 property (2.13), we have that, ∑ ai:i/∈Ω P (a|x) = ∑ ai:i/∈Ω∪{i1} ∑ ai1 P (a|x) = ∑ ai:i/∈Ω∪{i1} ∑ ai1 P ( a|xi1 ) = ∑ ai:i/∈Ω∪{i2} ∑ ai2 P ( a|xi2 ) ... ... = ∑ ai:i/∈Ω∪{is} ∑ ais P ( a|xis ) = ∑ a′i:i/∈Ω P (a′|x′) . (2.16) 2.3 Examples of probabilistic theories Any theory of nature must provide some means for predicting experimental out- comes. By virtue of this fact, any experiment described by that theory may con- sequently be described by a “black box” scenario as outlined in the previous Sec- tion, in which we ignore the internal workings of the box (i.e. the mathematical operations which generate predictions in that theory). A state in the theory must have some mechanism for specifying outcome probabilities for any measurement allowed by the theory; thus it assigns probabilities to effects, or possible measure- ment outcomes. Furthermore, we assume that any two distinct states must be dis- tinguishable via their outcome statistics for at least one measurement; thus distinct states assign distinct probabilities to at least one effect. As we have previously dis- cussed, we will assume that there exists a finite set of fiducial measurements, each consisting of a finite set of fiducial effects, such that distinct states assign distinct probabilities to at least one fiducial effect. 42 Note that the positivity (2.8), normalisation (2.9) and non-signaling (2.10) con- ditions are all clearly preserved under convex combinations. That is to say, if the number of subsystems, measurements and outcomes is fixed, the outcome distri- butions P1, . . . , Pr satisfy all three conditions, and {p1, . . . , pr} is any probability distribution, then the outcome distribution defined by: P (a|x) = r ∑ i=1 piPi (a|x) , (2.17) also satisfies all three conditions. Hence the set of non-signaling outcome distri- butions forms a convex set, which we will explore in more detail in Section 2.4. It is often convenient to specify a physical theory by explicitly delineating the set of outcome distributions which are obtainable within that theory; generally speaking, this involves a threefold specification: firstly, which types of individual system are described by the theory, including their fiducial measurements; sec- ondly, for each individual system a specification of which outcome distributions are obtainable on the fiducial measurements; and finally a specification of which joint distributions are permitted in composite systems. A general probabilistic theory is just such a specification of systems and outcome distributions. We will demand that the set of distributions in a general probabilistic theory is always con- vex: the reason for this is that for any set of states -- with outcome distributions P1, . . . , Pr -- that can be prepared in a system, it should be possible to prepare the state whose outcome distribution is Pi with probability pi, so that (2.17) gives the outcome distribution after such a preparation, and hence also corresponds to a state in the theory. By analogy to the pure states of quantum theory, the extreme states of a general physical theory are those that give rise to extremal points of the convex set of outcome distributions: they are those states which cannot be written as a convex sum of distinct states of the theory (see Section 2.5.1 for a rigorous treatment of these notions). It should finally be noted that there is a distinction between “black boxes” and full general probabilistic theories. The black box scenario is simply a descrip- 43 tion of inputs, outputs, and the subsequent outcome distributions. This is useful, for example, for defining the non-signaling principle and the CHSH value purely in terms of such distributions. General probabilistic theories, on the other hand, whilst often described in terms of outcome distributions, include a characterisa- tion (usually based on physical principles) of which local and composite outcome distributions are possible, as well as concepts like states, effects, fiducial measure- ments and so on. As in quantum theory, general probabilistic theories often allow for measurements of effects other than fiducial effects. 2.3.1 Classical theory In classical theory, individual systems have a single fiducial measurement, and any outcome distribution on that measurement constitutes a permitted state. The fiducial measurement captures full information about the system. An example of a classical system is a die: the fiducial measurement corresponds to observing the upturned face and a fair die (once rolled) represents the uniformly distributed (or completely mixed) state. A general state of the system is given by a probability distribution (p1, . . . , p6). The extreme states are the deterministic states si, which give outcome iwith certainty, e.g. the state s1 = (1, 0, . . . , 0): it is clear that these cannot be written as a convex combination of distinct states, and conversely that any state is a convex combination of the extreme states. The joint states of composite classical systems are convex combinations of product states. Product states are given by a product of outcome distributions at each site: P (a|1) = P (1) (a1|1) · · ·P (N) (aN |1) = N ∏ i=1 P (i) (ai|1) , (2.18) where xi = 1 for all i since we only have one fiducial measurement. Clearly, this distribution can be achieved simply by preparing the state P (i) at each subsystem 44 i. A convex combination of states of the form (2.18) can be described by a ran- dom variable λ = (λ1, . . . , λN) and an associated probability distribution PΛ(λ). Each λi is a local variable taking values from some finite set Λi, and each value of λi corresponds to some outcome distribution P (i) (ai|1, λi) that is allowed at subsystem i. The joint states of multipartite classical systems are exactly those of the form: P (a|1) = ∑ λ PΛ(λ) N ∏ i=1 P (i) (ai|1,λi) . (2.19) 2.3.2 Quantum theory In quantum theory, outcome distributions are generated bymeasurements on quan- tum states. A quantum system, withM fiducial measurements and Kx outcomes for measurement x, consists of a finite-dimensional Hilbert space H and for each x a set of POVM elements {Ma|x} satisfyingMa|x ≥ 0 and ∑Kx a=1Ma|x = I. The outcome distributions P (a|x) allowed in the system are generated from density operators ρ ∈ D (H) according to the Born trace rule: P (a|x) = Tr ( Ma|xρ ) (2.20) On any finite-dimensional Hilbert space there always exists a finite set of fidu- cial POVM measurements, such that distinct density operators will generate dis- tinct probabilities for at least one fiducial POVM element [49]. In general, at least D2 independent POVM matrices are required to specify a single density opera- tor in a D-dimensional Hilbert space - however, the choice of POVM elements and how they are split up into distinct measurements is not determined by the di- mension D. In order to define a quantum system in terms of general probabilistic theories, therefore, one needs to specify both the dimension of the Hilbert space, and the particular fiducial measurements being employed. 45 The simplest example of a quantum system is a qubit, with one possible set of fiducial measurements being the three projective measurements onto the eigen- bases of the Pauli operators σx, σy and σz. Listing these measurements as x = 1, 2 and 3 respectively, and listing their two possible outcomes as a = 1 or 2, the state of the qubit is completely specified by the vector: ( P (1|1) , P (2|1) P (1|2) , P (2|2) P (1|3) , P (2|3) ) . (2.21) A qubit that is in the “up” state of the x-direction is represented by the vector (1, 0|12 , 1 2 | 1 2 , 1 2), whereas the vector (1, 0|1, 0|1, 0) is not achievable by any qubit state. Multipartite quantum systems consist of a collection of N quantum systems with Hilbert spaces H1, . . . ,HN , and with the fiducial measurements at system i given by the POVM operators {M (i)ai|xi}. The set of multipartite outcome distri- butions on the composite system is generated by local POVM measurements on a joint quantum state ρ ∈ D (H1 ⊗ · · · ⊗ HN): P (a|x) = Tr (( M (1)a1|x1 ⊗ · · · ⊗M (N) aN |xN ) ρ ) . (2.22) Any outcome distribution resulting from a multipartite quantum measurement is non-signaling: this is due to the fact that ∑ ai M (i) ai|xi = I (i), regardless of i and the choice of measurement xi. Indeed, fixing i and xi, we see that: Kxi ∑ ai=1 Tr (( M (1)a1|x1 ⊗ · · · ⊗M (N) aN |xN ) ρ ) = Tr    M (1)a1|x1 ⊗ · · · ⊗   Kxi ∑ ai=1 M (i)ai|xi  ⊗ · · · ⊗M (N)aN |xN   ρ   = Tr (( M (1)a1|x1 ⊗ · · · ⊗ I (i) ⊗ · · · ⊗M (N)aN |xN ) ρ ) , (2.23) which is clearly independent of the vaule of xi. It follows from Proposition 1 that 46 P (a|x) is non-signaling. The set of outcome distributions generated by a fixed composite quantum sys- tem is subtly different from the set of non-signaling outcome distributions (with a set number of measurement choices and outcomes) which can be achieved by measurements of arbitrary quantum systems. In the former case, we have already explicitly defined each subsystem (i.e. their Hilbert spaces and fiducial measure- ments); on the other hand, we may say that an outcome distribution P (a|x) for some pre-defined number of subsystems, measurement choices and outcomes is quantum achievable if there exist some set of finite-dimensional Hilbert spaces H1, . . . ,HN , POVM operators {M (i)ai|xi} and some state ρ ∈ D (H1 ⊗ · · · ⊗ HN) such that (2.22) holds. 2.3.3 Boxworld In an individual Boxworld system, any set of distributions over the outcomes of the fiducial measurements is permitted: the state space is constrained only by the positivity and normalisation conditions: any conditional probability distri- bution P (a|x) corresponds to an allowed state. Thus the outcome distribution (1, 0|1, 0|1, 0) which was not allowed in the qubit system, is a perfectly valid Boxworld distribution. The set of Boxworld distributions for a single system is a convex polytope, whose extremal vertices are the deterministic states, i.e. those which, conditional on the measurement choice, produce one output with certainty. AnyBoxworld systemwith only one fiducial measurement is equivalent to a classi- cal system. The Boxworld system with two fiducial measurements, each of which has two possible outcomes is referred to as a g-bit system, which stands for gener- alised bit. This systemmay loosely be seen as a Boxworld analogue of the classical bit and quantum qubit systems. As with classical theory, the joint state space of composite Boxworld systems should include at least those states which are convex combinations of products of local states. A multipartite Boxworld state with outcome distribution P (a|x) is 47 local if there exists a random variableλ = (λ1, . . . , λN) taking values in a product of finite sets Λ1 × · · · ×ΛN , an associated probability distribution PΛ(λ), and for each value of λi ∈ Λi a local outcome distribution P (i) (ai|x1, λi) such that: P (a|x) = ∑ λ PΛ(λ) N ∏ i=1 P (i)(ai|xi, λi). (2.24) The set of local Boxworld outcome distributions also forms a convex polytope, whose extremal vertices are given by products of deterministic states; such states are known as pure product states. Any outcome distribution which cannot be writ- ten as a convex combination of pure product states (2.24) is said to be non-local. A multipartite density operator ρ is said to be a local quantum state if any local POVMmeasurement on ρ results in a local outcome distribution being generated: thus, separable states are local quantum states, whereas the Bell states are non- local quantum states (since there exist local POVM measurements on them which violate the CHSH inequality). Not all valid non-signaling distributions in composite Boxworld systems ad- mit a local description (2.24). For example, the outcome distribution generated by CHSH-violating measurements on a Bell state is non-signaling, but non-local. Boxworld is the general probabilistic theorywhose systems are as described above, and whose multipartite states are all those outcome distributions which obey the positivity (2.8), normalisation (2.9) and non-signaling (2.10) conditions. Since these three conditions are represented by sets of linear inequalities, the set of out- come distributions in a composite Boxworld system forms a convex polytope. Any composite quantum systemwith has the same number of fiducial measurements on each local subsystem is capable of generating a subset of the Boxworld outcome distributions (as we will see, this turns out to be a strict subset). The first known example of a non-signaling, non-quantum-achievable out- come distribution is the celebrated PR-box state on a bipartite system comprising two g-bit subsystems [48]. It is convenient to label the measurement choices and outcome sets of a g-bit as binary numbers {0, 1} rather than {1, 2}, though this 48 change in labeling does not affect the outcome probabilities, and therefore has no operational significance. In binary notation, the PR-box is defined by: P (a1, a2|x1, x2) = 1 2 δ(a1 ⊕ a2 = x1 · x2), (2.25) where “⊕” and “·” denote addition and multiplication modulo 2. This outcome distribution is clearly seen to be non-signaling, as the outcome on a single system is always uniformly random, regardless of the measurement choice on that or the other system. Note also that a1 and a2 are perfectly correlated for every choice of fiducial measurement except for x1 = x2 = 1, in which case they are perfectly anti-correlated. Hence the CHSH value takes its algebraic maximum of 4 on the PR-box state: C1 = E00 + E01 + E10 − E11 = 1 + 1 + 1− (−1) = 4. (2.26) where Exy was defined in Section 2.1 as P (a = b|x, y)− P (a 6= b|x, y) The PR-box state is in fact one of the extremal vertices of the convex polytope of states described by Boxworld for two g-bit systems [30]. This is the simplest example of a composite system in which the sets of locally achievable, quantum- achievable and non-signaling outcome distributions are distinct; we explore the geometry of these sets in more detail in Section 2.4. The state of a system of two g-bits may be depicted by means of a table or matrix, whose rows correspond to outcomes of fiducial measurements on system 1, andwhose columns correspond to outcomes of fiducial measurements on system 2. The entry in the row correspond- ing to (a1, x1) and column corresponding to (a2, x2) is the value P (a1, a2|x1, x2). The PR-box state is then represented by the following table: 49 x2 = 0 x2 = 1 a2 = 0 a2 = 1 a2 = 0 a2 = 1 x1 = 0 a1 = 0 1/2 0 1/2 0 a1 = 1 0 1/2 0 1/2 x1 = 1 a1 = 0 1/2 0 0 1/2 a1 = 1 0 1/2 1/2 0 (2.27) Increasing the number of measurements on a subsystem will generate more blocks of the table, whereas increasing the number of outcomes specific to a mea- surement will increase the size of the blocks corresponding to that measurement. An arbitrary bipartite Boxworld system need not have the same number of blocks across as it has down, and those blocks need not be square. Certain properties of the table must remain invariant however: all entries must be positive, and the sum of the entries in an individual block must be equal to 1 (due to the positivity and normalisation conditions). Each row of the table is subdivided into a set of smaller “block-rows”; in the above example the third row is subdivided into two block-rows, (1/2, 0) and (0, 1/2). Due to the non-signaling condition the sum of entries across each of the block-rows is invariant across a row (and is equal to 1 2 for all rows in the above PR-box example). The same is true for columns and block-columns. Whilst 2-dimensional tables are sufficient to describe all bipartite Boxworld distributions, an n-partite Boxworld system will generally be described by an n- dimensional array. This representation thus becomes infeasible for obvious rea- sons when the number of subsystems exceeds 3; however, since many of the inter- esting Boxworld phenomena occur with 3 or fewer subsystems, this does not pose too great a problem. 50 2.4 The non-signaling polytope The full set of outcome distributions P (a|x) for a composite Boxworld system can be naturally represented by vectors, whose components are indexed by all possible strings (a|x) of inputs and outputs at each subsystem. The allowed set of non-signaling vectors then forms a bounded polytope, due to the linear con- straints imposed by the positivity, normalisation and non-signaling conditions, which is often known as the Non-signaling Polytope. In this naive representa- tion of the Non-signaling Polytope, the vectors are of length ∏N i=1 ( ∑M(i) xi=1Kxi ) , but generally belong to a lower-dimensional vector subspace. Even for the bipar- tite g-bit system in which the PR-box lives, these vectors have 16 components, but lie in an 8-dimensional vector subspace [30], which is, for example, orthogonal to the vector e1 + e2 − e3 − e4, since the non-signaling condition demands that P (0, 0|0, 0) + P (0, 1|0, 0) = P (0, 0|0, 1) + P (0, 1|0, 1). The extremal local states of the Non-signaling Polytope, i.e. the pure product states, are represented by vectors whose only non-zero entries are equal to 1. In a system of two g-bits, exactly four of the entries of a pure product state are equal to 1, since each of the four possible global measurement choices (x1, x2) results deterministically in an outcome (a1, a2). The number of extremal local states is equal to the product of the number of deterministic states on each individual g-bit system. There are four deterministic states of a g-bit system, corresponding to the four possible maps from {0, 1} to itself; this gives 16 extremal local states: P µνστL (a1, a2|x1, x2) = δ ( a1 = [µx1]⊕ [ν] ) · δ ( a2 = [σx2]⊕ [τ ] ) (2.28) µ, ν, σ, τ ∈ {0, 1}. Such states, being {0, 1}-valued, are clearly extremal in the set of non-signaling states; however, they do not constitute the entire set of extremal non-signaling states. The PR-box state described in Section 2.3.3 is an example of an extremal non-local vertex of the Non-signaling Polytope [30]. By performing a simple lo- cal operation - for example, flipping the bit x1 at subsystem 1 - the PR-box is con- 51 verted into a new outcome distribution - in this case defined by P (a1, a2|x1, x2) = 1 2δ(a1 ⊕ a2 = [x1 · x2]⊕ x2). This is also an extremal state, in which the outputs are anti-correlated for all choices of measurement except x1 = 0, x2 = 1. Via this type of local operation on inputs and outputs, 8 distinct extremal non-local distributions can be obtained: P µνσNL (a1, a2|x1, x2) = 1 2 δ ( a1 ⊕ a2 = [x1 · x2]⊕ [µx1]⊕ [νx2]⊕ [σ] ) (2.29) µ, ν, σ ∈ {0, 1}. The CHSH value C1 takes a maximal value of 2 in the set of local outcome distributions, a maximal value of 2 √ 2 in the set of quantum-achievable outcome distributions, and a maximal value of 4 in the full set of non-signaling outcome distributions. Note that for each value of (x1, x2) the bias, Ex1x2 = P (0, 0|x1, x2) + P (1, 1|x1, x2)− P (0, 1|x1, x2)− P (1, 0|x1, x2) , (2.30) is a linear functional on the set of vectors, hence the CHSH value is itself a linear functional. So too is the similarly defined figure of merit: C2 = E00 − E01 + E10 + E11, (2.31) for which the maximal values taken by local, quantum and non-signaling outcome distributions are the same as for C1 (but are not necessarily achieved on the same outcome distributions). For example, C1 achieves its algebraic maximum of 4 and minimum of -4 on the non-local states P 000NL and P 001NL respectively, whereas C1 achieves the same values on the non-local states P 010NL and P 011NL respectively. All other extremal non-local states achieve 0 for both C1 and C2, whereas all extremal local states achieve±2 for both C1 and C2. By plotting C1 and C2 on x- and y- axes, one obtains a 2-dimensional projection, or slice, of the Non-signaling Polytope (see figure 2.4). The set of quantum-achievable states in this projection is described by the for- 52 mula C21 + C22 ≤ 8. A derivation of this inequality from Information Causality is given in Section 4.2.2, although it can also be derived from the fact that the set of biases Exy is quantum achievable if and only if: |E00E01 − E10E11| ≤ ∑ y=0,1 √ (1− E20y)(1− E21y). (2.32) Figure 2.4: 2-dimensional projection of the non-signaling polytope for two g-bits. Many of the beguiling aspects of quantum non-locality are captured by the simple diagram illustrated in Figure 2.4. The CHSH and Tsirelson inequalities demonstrate that for some figures of merit, the optimal quantum value lies strictly between the optimal local value and the maximum algebraic value. Figure 2.4 reveals a more complex story: even in this simple two g-bit case, the quantum achievability of correlations describes a convex set whose boundary traces a mys- terious path between the local boundary and the non-signaling boundary; touching both at various points, but for the most part lying somewhere in the middle. What governs the shape traced out by the quantum boundary? What aspects 53 of nature can elucidate its form, not just in the simple 2-dimensional slice of two g-bit systems illustrated above, but for any number of systems and measurement choices? A definitive answer to this question is not yet known, though various at- tempts have led to partial answers which shed light on the issue, notably the prin- ciples known as Macroscopic Locality [50] and Information Causality [51], and arguments based on the non-local computation of binary-valued functions [52]. The author's own contribution to this research is described in Chapter 4, in which an investigation of the Information Causality principle yields two distinct perspec- tives on why the quantum boundary takes the shape that it does. 2.5 Convex vector space representations In Section 2.4 we described a natural embedding of the states of a general proba- bilistic theory into a vector space. This method is just one possible (and usually inefficient) vector space representation of a system, although it does serve the useful purpose of illustrating that such a representation always exists, and can be realised in a real, finite-dimensional vector space as long as the set of fiducial measurements is finite. Note that a very elegant description of this form of finite- dimensional quantum theory has long been established: the well-known Bloch- sphere representation of a qubit system [53], in which a 2-dimensional density operator corresponds to a vector with modulus less than or equal to one. The Bloch-sphere representation differs from the embedding discussed in Section 2.4, in which each component is equal to a fiducial outcome probability; for example, the completely mixed state is represented by a three-dimensional zero vector in the Bloch-sphere, and by a six-dimensional vector with every entry equal to 12 in the representation given in Section 2.4. Nevertheless, these representations are closely related, and describe the same basic set of quantum states. In order to understand the structure of a given general probabilistic theory, and to compare it with quantum theory, it is beneficial to study vector space represen- tations of that theory, thus describing it in a language closely resembling that of 54 quantum theory, about which we already have a good deal of intuition. By ex- amining theories from this perspective, we can highlight those features which are central to a good description of real life experiments, and those features which differ markedly from quantum theory. It is often the case, for example, that quan- tum phenomena which appear special when contrasted with classical physics are partly or entirely generic in the set of general probabilistic theories; teleportation, entanglement-swapping and the impossibility of cloning are but a few such phe- nomena [31, 54]. On the other hand, many properties are particular to a select group of theories. One such property is the existence of a rich class of reversible transformations, such that a pure state of a system may be mapped to any other pure state by a reversible transformation [2, 55] (in quantum theory, by a unitary operator). An- other such property is self-duality, in which the set of vectors representing states and the set of vectors representing effects are essentially the same [56, 57] (in quantum theory, these are the positive operators of the underlying Hilbert space). These properties are not at all obvious from the description of systems in terms of fiducial measurements, but become much more approachable using vector space representations. By skipping the black box scenario and constructing probabilistic models directly from their vector space representation, it is straightforward to find theories which share some but not all of the properties of quantum theory, so that conjectures concerning these properties can be more easily generated and tested (the so-called polygon models, which link non-locality and the property of strong self-duality, are a good example [36]). Exploring which features of general proba- bilistic theories are possessed by quantum theory helps to build our intuition about its capabilities, which is an invaluable step towards understanding its information processing potential. Various constructions of general probabilistic theories and their vector space representations exist, each differing slightly from the others due to different op- erational assumptions, conventions or mathematical convenience. The treatment given here tries to be as general as possible, whilst retaining the ability to describe 55 classical, quantum and Boxworld theories on an equal footing so that their respec- tive features can be highlighted and contrasted. In Section 2.5.1 we introduce the mathematical prerequisites for this framework, most of which belong to the realm of convex geometry (a similar and very readable introduction can be found in [37]). In Sections 2.5.2 and 2.5.3 we describe how one or more systemsmay be described via a real, finite-dimensional vector space. In Section 2.5.4 we then discuss how the notion of transformations can be introduced in this setting and demonstrate that unlike quantum theory, Boxworld does not permit entanglement to be reversibly generated. We then provide some concrete examples of these representations in Section 2.5.5. It is worth mentioning that the convex vector space approach to general prob- abilistic theories is but one node in a network of efforts to put quantum theory on a more fundamental footing than is provided by the complex Hilbert space for- malism. Other notable instances include the quantum logic approach pioneered by Piron, Foulis and Randall [58, 59], and the categorical approach pioneered by Abramsky and Coecke [60]. The former of these sets a precedent for analysing Hilbert-space quantum mechanics as a probability calculus with axioms differing from standard probability but of interest in its own right. The latter formulates a “categorical semantics” for quantum theory, in which quantum theory is inter- preted as a dagger compact category, whose morphisms correspond to physical processes. Both these approaches are novel, and present a fascinating vantage point, but it can be argued that they lack the conservatism and intuitive physical appeal of the approach we now outline. 2.5.1 Convex geometry Much of the discussion of general probabilistic theories relies on fundamental con- cepts arising in the geometry of convex subsets of real vector spaces. Our interest in convexity stems from a simple argument that any set of vectors corresponding to preparable states of a system must be convex: if the vectors s1 and s2 correspond to two distinct states in which the system may be prepared, then ps1 +(1− p)s2 is 56 the state corresponding to having prepared the system in state s1 with probability p, and in state s2 with probability 1− p. Definition. Let C be a subset of Rn. • The convex hull of C is conv(C) = { ∑m i=1 λici : λi ≥ 0, ∑ λi = 1, ci ∈ C}. • The conic hull of C is cone(C) = { ∑m i=1 λici : λi ≥ 0, ci ∈ C}. • C is convex if C = conv(C), and C is a cone if C = cone(C) For example, the positive quadrant of Rn is convex and a cone. All cones are convex, however bounded convex sets, such as the unit ball, are convex without being a cone. In the Bloch-sphere, pure quantum states are assigned vectors on the surface of the sphere: these are the extreme points of the sphere. In order to generalise this notion to all general probabilistic theories it will be important to define extreme points of convex sets, and extreme rays of cones. Definition. Let C ⊂ Rn be convex; • x ∈ C is an extreme point if, whenever c1, c2 ∈ C and 0 < λ < 1 are such that x = λc1 + (1− λ)c2, then c1 = c2 = x. • Denote by ext(C) the set of extreme points of C. Definition. Let C ⊂ Rn be a cone; • For x ∈ Rn, the ray generated by x is defined Rx = {λx : λ ≥ 0}. For x ∈ C, Rx is an extreme ray if, whenever c1, c2 ∈ cone(C) and λ1, λ2 > 0 are such that λ1c1 + λ2c2 = x, then c1 and c2 both belong to Rx. We say that x generates an extreme ray of C. • Denote by extray(C) the set of extreme rays of C. 57 The following well-known Theorems concerning extreme points are very use- ful when discussing convex sets. Recall that in Rn, C is compact iff it is closed and bounded. Theorem 1 (originally due to Minkowski, see [61]). Let C ⊂ Rn be compact and convex. Then, C = conv(ext(C))). in particular, ext(C) is non-empty, and its convex hull is the whole of C. Theorem 2 (Carathéodory, [62]). Let C ⊂ Rn be compact. Then conv(C) is also compact. Theorem 3 (Carathéodory, [62]). Let P ⊂ Rn and let x ∈ conv(P ). Then there exists a subset Px ⊆ P with |Px| ≤ n+ 1 such that x ∈ conv(Px). Clearly the unit sphere has infinitely many extreme points, whereas a tetrahe- dron or a square has only finitely many. Theories in which sets of states possess only finitely many extreme points are of interest since they are easy to define, and yet their non-locality properties are often non-trivial [36]. Definition. A subsetC ⊂ Rn is a polytope (resp. polytopic cone) if it is the convex (resp. conic) hull of finitely many points. Theorems 1 and 2 tell us that equivalently, a compact, convex subset C ⊂ Rn is a polytope if ext(C) is finite. Note also that if P is a finite set of points, then ext(conv(P )) ⊂ P . Hence one way of defining a polytope is to give a finite set of points which makes up its extreme points - this is known as the V-representation of a polytope. Another standard way to define a polytope is by giving a finite set of half-spaces of Rn whose intersection is bounded - this is known as the H- representation of a polytope [63]. Definition. A half-space of Rn is a set of the form H = {v ∈ Rn : 〈x, v〉 ≤ b} for some x ∈ Rn and b ∈ R. If b = 0 thenH is a linear half-space. 58 Theorem 4 (see [63]). A compact subset C ⊂ Rn is convex if and only if it can be written as an intersection of half-spaces. It is a polytope if and only if it can be written as the intersection of a finite set of half-spaces. Definition. Let C be any subset of Rn. The dual cone C∗ is the cone defined by C∗ = {x ∈ V : 〈x, y〉 ≥ 0 ∀y ∈ C} (2.33) For example, the dual cones of rays are half-spaces, and vice versa. The concept of a dual cone becomes useful when talking about effects, and particu- larly when constructing states and effects of composite systems. Recall that Rn comes equiped with the standard, Euclidean inner product, and the correspond- ing Euclidean norm: this allows us to define a topology on Rn via the metric d(x, y) = ‖x− y‖2. In this context, C∗ is a closed cone for any C ⊂ V (even if C itself is neither a cone nor closed), and hence (C∗)∗ is a closed cone. Moreover, since every vector in C has non-negative inner product with every vector in C∗, we also have that C ⊆ (C∗)∗ . If C itself is a closed cone then it turns out that C = (C∗)∗; this will be useful later in establishing duality relations between sets of states and effects. Proposition 2. Let C be a closed cone in Rn. Then (C∗)∗ = C. Proof. As argued above, clearly C ⊆ (C∗)∗. In order to prove that (C∗)∗ ⊆ C it suffices to show that x /∈ C ⇒ x /∈ (C∗)∗; equivalently, for x /∈ C there exists z ∈ C∗ such that 〈z, x〉 < 0. Given x /∈ C, the function f : V → R f(v) = ‖x− v‖, is continuous and bounded below by 0. Moreover, picking an arbitrary c ∈ C, the setD = C ∩ {d ∈ Rn : f(d) ≤ f(c)} is the intersection of two closed sets, and is bounded. Being closed and bounded in a finite-dimensional space, D is therefore compact. Hence f attains its minimum value inD at a point y: clearly f(y) is the 59 minimum value of f over C also. Let z = y − x: either y = 0 or it must be that y is the closest non-zero point to z on the ray Ry, hence 〈z, y〉 = 0. Moreover, since, 0 < ‖y − x‖2 = 〈y − x, y〉 − 〈y − x, x〉 = −〈z, x〉, (2.34) it must also be that 〈z, x〉 < 0. It remains to show that z is indeed a member of C∗, i.e. for all v ∈ C, 〈z, v〉 > 0. Suppose for contradiction that y′ ∈ C satisfies 〈z, y′〉 < 0. Define the vector, yε = (1− ε)y + εy′, (2.35) noting that for all 0 ≤ ε ≤ 1, yε is a member of C. We will show that for suf- ficiently small ε, ‖yε − x‖2 < ‖y − x‖2, i.e. f(yε) < f(y), contradicting the definition of y. More precisely, we demand ε < 2〈z, y − y ′〉 ‖y − y′‖2 , (2.36) noting that both numerator and denominator are necessarily positive. By the Law of Cosines, ‖yε − x‖2 = ‖y − x‖2 + ‖y − yε‖2 − 2〈y − x, y − yε〉 = ‖y − x‖2 + ε2‖y − y′‖2 − 2ε〈z, y − y′〉 < ‖y − x‖2, (2.37) as desired. Definition. Let C be a cone in Rn. The generalised inequality associated with C is the partial order ≤C on the vectors in Rn given by x ≤C y if ∃z ∈ C such that x+ z = y. For example, ifC is the cone defined by the positive quadrant ofRn, then x ≤c y iff every component of x has a lesser or equal value than the same component of y. The partial order induced by any cone C is transitive (in the sense that x ≤C y and 60 y ≤C z implies x ≤C z), reflexive (in the sense that x ≤C x) and antisymmetric (in the sense that x ≤C y and y ≤C x implies x = y). Conversely, any such partial ordering inRn induces a coneC = {v ∈ Rn : v ≥ 0}, however this is not relevant to our discussion. Once a partial order is established, an analogue of an interval on the real line may be defined, viz. Definition. Let C ⊂ Rn be a cone. For any x, y ∈ Rn, the order interval [x, y]C is the set {z ∈ Rn : x ≤C z ≤C y}. For example, if C is the cone defined by the positive quadrant in Rn, then the interval [0, 1] between the zero vector and the vector with 1 in every component is the hypercube of vectors for which every component has a value between 0 and 1. On the other hand, the interval [0, e1], where e1 is the first standard basis vector, is simply the line segment between 0 and e1. Clearly, some order intervals have a richer structure than others - the following definition characterises those vectors u which generate interesting order intervals [0, u]. Definition. LetC ⊂ Rn be a cone. u ∈ C is an order unit ofC if, for any v ∈ Rn, there exists a λ > 0 such that v ≤C λu. In some sense an order unit provides a benchmark by which the “order mag- nitude” of any vector can be evaluated, in much the same way as the unit element 1 does on the real line. If u is an order unit in C then for all other v ∈ C, there exists a λ > 0 such that 1λv is contained in the interval [0, u], hence this interval inherits something of the structure of the cone C. Not all cones possess an order unit: for example, a ray contains an order unit only in 1-dimensional space. Proposition 3. Let C ⊂ Rn be a closed cone. If u is an order unit in C∗, then D = {c ∈ C : 〈u, c〉 = 1} is a compact, convex subset of C. Proof. SinceD is an intersection of closed and convex sets, it is closed and convex itself. It remains to show that D is bounded: suppose for contradiction that there exists a sequence {di} ⊂ D such that ‖di‖ → ∞. For any vector v, there exists a λ such that v ≤C∗ λu, i.e. 〈v, d〉 ≤ λ for all d ∈ D. The sequence of real numbers 61 〈d1,di〉 ‖d1‖‖di‖ therefore tends to zero as i tends to infinity, hence, defining i1 = 1, for any ε > 0 there exists a positive integer i2 such that 〈di1 ,dj〉 ‖di1‖‖dj‖ < ε for all j > i2. Continuing iteratively, for any  > 0 it is possible to find a set of n+1 vectors {d′1, . . . , d′n+1} such that 〈d′i,d′j〉 ‖d′i‖‖d′j‖ < ε for all 1 ≤ i, j ≤ n+ 1. The angle between d′i and d′j is given by: θij = cos−1 ( 〈d′i, d′j〉 ‖d′i‖‖d′j‖ ) (2.38) Thus there exist sets of n+1 vectors whose pairwise angles can be taken to be arbi- trarily close to pi2 , contradicting the fact that we are in an n-dimensional Euclidean space. 2.5.2 Abstract state spaces We established in Section 2.4 that the states of any system with finitely many fiducial measurements may be represented by vectors in a real, finite-dimensional vector space. Abstract state spaces provide a broad framework for describing sys- tems in general probabilistic theories in terms of real, finite-dimensional vector spaces, without first having to introduce canonical sets of fiducial measurements and the outcome distributions which correspond to allowed states. Whilst any sys- tem defined in terms of fiducial measurements admits at least one representation in this manner, the appeal of the abstract state space is that there is no preferred set of fiducial measurements. Rather, fiducial sets of measurements can be in- ferred from the description of state and effect vectors within the real vector space. Before defining an abstract state space, it is helpful to discuss how the mathemat- ical machinery of Section 2.5.1 corresponds to operational assumptions about the system. To begin with, some convex subset S ⊂ V corresponds to the set of preparable states of the system. By restricting attention to a vector subspace if necessary, the set of states can be assumed to span the entire vector space. Wewill also assume for convenience that the set of states is closed: a physical justification for this is that whenever {si}i≥0 is a sequence of states such that si → s for some vector s, then 62 the physical properties of the states s can be implemented to arbitrary accuracy by preparing the states si, hence we may as well assume that s is a state. The state cone S+ = cone(S) is therefore a closed cone in V : this has a convex subset the sub-normalised states S≤1 = {λs : 0 ≤ λ ≤ 1, s ∈ S} = conv(S ∪ {0}). Sub-normalised states play an important role in the discussion of measurement outcomes, and we wish them to be distinct from the set of allowed states, hence we demand that for all s ∈ S and 0 ≤ λ < 1, λs /∈ S . This prohibits the zero vector from being a member of S. This in turn implies that λS ∩ S = ∅ for all λ 6= 1, and that S+ 6= V . In fact the condition λs /∈ S does not lead to a loss of generality: if it is not satisfied, it is possible to add an extra dimension to the vector space in order to make it so, whilst keeping the property that S spans the whole of V . An effect takes the form of a function eˆ from the set of states to the interval [0, 1], such that the value eˆ(s) is the probability of the outcome corresponding to e occurring, given that the system is in state s. An effect function must respect the convexity of the set of states, i.e. for any states si ∈ S and pi ∈ [0, 1] satisfying ∑ i pi = 1, eˆ ( ∑ i pisi) = ∑ i pieˆ(si). Such a function is said to be convex-linear on S. The physical motivation for convex-linearity is as follows: if a system is prepared in state si with probability pi, then for any measurement involving e, the probability of the outcome corresponding to eˆmust be the associated average over the probabilities that would be obtained for each preparation. It turns out that this condition is enough to extend eˆ to a linear function on the whole of V . Proposition 4. Let S ⊂ V be a convex subset of a real vector space, such that span(S) = V and λS ∩ S = ∅ for all λ 6= 1, and let W be a real vector space. Any convex-linear function eˆ : S → W is uniquely extensible to a linear function eˆ : V → W . Proof. Any linear function which is an extension of eˆ must respect linear combi- 63 nations involving members of S: eˆ ( ∑ i λisi ) = ∑ i λieˆ(si). (2.39) Since span(S) = V , this defines a linear function on the whole of V , assuming that the function is in fact well-defined. To show that it is well-defined, we must show that if a vector v is decomposed in distinct ways as a linear combination of members of S , then the above definition of eˆ is independent of which combination is used. Suppose then that v = ∑ i λisi = ∑ j λ′js′j , and firstly assume that the λi and λ′j are all positive, i.e. v ∈ cone(S). Letting ∑ i λi = Λ, we have 1Λv = ∑ i λi Λ si ∈ S, since S is convex. By the same reasoning, 1 Λ′v ∈ S where Λ ′ = ∑ j λ′j , thus Λ = Λ′. Since eˆ ( 1 Λ v ) = eˆ ( ∑ i λi Λ si ) = eˆ ( ∑ j λ′j Λ s′j ) , (2.40) is uniquely defined, so is eˆ(v) = Λeˆ ( 1 Λv ) , hence eˆ extends in a unique and well- defined way to the whole of cone(S). Now suppose that some of the λi and λ′j are negative; we may re-arrange the equality so that all coefficients are positive: ∑ λi>0 λisi − ∑ λ′j<0 λ′jsj = ∑ λ′j>0 λ′jsj − ∑ λi<0 λisi. (2.41) Since the LHS and RHS are both elements of cone(S), we have from our previous argument that, ∑ λi>0 λieˆ(si)− ∑ λ′j<0 λ′j eˆ(sj) = ∑ λ′j>0 λ′j eˆ(sj)− ∑ λi<0 λieˆ(si). (2.42) Re-arranging again gives that ∑ i λieˆ(si) = ∑ j λ′j eˆ(s′j), which verifies that our linearly extended function is indeed well-defined on the whole of V . 64 Therefore an effect function eˆ : S → [0, 1] is associated with a unique vector e such that 〈e, s〉 = eˆ(s) for all states s, i.e. the set of effects generates an effect cone E+ = S∗+ which is dual to the state cone. It follows from Proposition 2 that also S+ = E∗+. From here on we will use the term “effect” both for a non-negative, convex-linear functional eˆ on S and for the effect vector e which corresponds to that functional. We make an assumption at this point that any such effect vec- tor e corresponds to a physically realisable effect. That is to say, there exists a measurement outcome whose probability of occurring is equal to 〈e, s〉 whenever that measurement is physically performed on a system in the state represented by s. The duality of the cones E+ and S+ is a direct result of this assumption. This duality property holds in finite-dimensional quantum theory, as we will see, but is difficult to justify on physical grounds; if the assumption is not made, then S+ and E+ must be defined separately, and many of the results concerning multipartite systems will not follow, or would require modification. Ameasurement in an abstract state space consists of a set of effects {e1, . . . , en} such that ∑n i=1〈ei, s〉 = 1 for all s ∈ S. This implies that the vector U = ∑ i ei is the vector uniquely defined by linear extension of the function which maps all vectors in S to 1. Just as we assume that all valid effect vectors are physically realisable, we assume that any set of effects which sum to give U corresponds to a possible measurement that can physically be performed (again, this is in principle true in finite-dimensional quantum theory). The vector U corresponds to the unit effect, which itself can be seen as a trivial measurement with only one outcome. Note that all physically realisable effects must therefore obey e ≤E+ U , where “≤E+” is the generalised inequality associated with E+. Definition. An abstract state space is a 3-tuple (V,S+,U), where V is a real, finite-dimensional vector space, S+ ⊂ V is a closed cone which spans V , and U is an order unit in S∗+. Note that the set of normalised states is not an explicit part of the definition, rather it is defined as the set S = {s ∈ S+ : 〈U , s〉 = 1}, which by Proposition 3 is compact and convex. The effect cone is also not explicitly mentioned, but is 65 implicitly the dual cone to S+. By analogy with normalised states, e ∈ E+ is said to be a proper effect if 〈e, s〉 ≤ 1 ∀s ∈ S , and we denote by E the set of proper effects. Note that E can equivalently be defined as the order interval [0,U ]E+ . The elements of E+ \ E are said to be improper effects. Definition. Let (V,S+,U) be an abstract state space. Then, • The pure states are those that belong to ext(S); • The pure effects are those that belong to ext([0,U ]E+); • The extreme ray effects are those that generate rays in extray(E+); • A pure measurement is one for which the outcomes correspond to distinct extreme ray effects. The condition λS ∩ S = ∅ for λ 6= 1 ensures that the pure states are exactly those states which generate the extreme rays of S+, hence there is no need to distinguish between the extreme vectors of S and extreme rays of S+. As we will see in Section 2.5.5 however, there often exist pure effects which are not extreme ray effects. Pure measurements are those that provide maximal information about the system, in the sense that the outcome statistics of any measurement may be deduced from the outcome statistics of a related pure measurement. To see this, note that any set of effects lies in the conic hull of the extreme ray effects, so it is always possible to “split up” an outcome into a disjoint union of outcomes corresponding to extreme ray effects. We can define this “splitting up” procedure more rigorously, and equivalently define the pure measurements as those which admit no non-trivial refinements. Definition. Let M1 = {e1, . . . , en} and M2 = {f1, . . . , fm} be two measure- ments on an abstract state space. M1 is a refinement of M2 if there exists a surjective mapping φ : [n] → [m] such that, fi = ∑ j:φ(j)=i ej ∀i. (2.43) 66 Conversely,M2 is a coarse-graining ofM1. A trivial refinement is one for which, φ(j) = i ⇐⇒ ei = λfj. (2.44) An abstract state space may be defined in two equivalent ways: firstly, by giving the vector space V , the state cone S+ and an order unit U of S∗+, from which the sets E+ and E can be deduced. Secondly, by giving the vector space V , the effect cone E+ and order unit U ∈ E+, from which S+ is the dual cone to E+ and S is the compact, convex subset of S+ having unit inner product with U . These complementary methods allow us to construct lower and upper bounds on the set of states of composite systems. 2.5.3 Composite Systems In order to construct a composite system in terms of its subsystems, wemust define the vector space, state (or effect) cone and unit effect that characterise its abstract state space. Recall that the principle of local tomography tells us that any joint state is characterised by the outcome statistics of joint fiducial measurements, i.e. the normalised set of conditional probabilities P (a1, . . . , aN |x1, . . . , xN). How- ever, if the systems are not Boxworld or classical systems, we must be careful that the reduced states of an individual system are in fact allowed on that sys- tem, i.e. ∑ aj :j 6=i P (a|x) is an allowed state of system i, whatever the choice of {xj : j 6= i}. As we will shortly see, it turns out that all possible joint states can be neatly represented in the tensor product of the vector spaces representing each individual system. Definition. LetC(1), . . . , C(N) be cones in the real, finite-dimensional vector spaces V (1), . . . , V (N) respectively. The tensor coneC(1)⊗˜ · · · ⊗˜C(N) is the cone inV (1)⊗ · · · ⊗ V (N) defined by cone({c(1) ⊗ · · · ⊗ c(N) : c(i) ∈ C(i)}). Let {(V (i),S(i)+ ,U (i))}Ni=1 be a set of abstract state spaces representing systems in some general probabilistic theory, and consider theN -fold tensor product space 67 VN = ⊗N i=1 V (i). Suppose that on each of the systems, the state s(i) is prepared: then it is natural to define the global state corresponding to this joint preparation as the product state s(1) ⊗ · · · ⊗ s(N). Moreover, instead of being prepared in- dependently, the systems may agree to prepare states according to some shared classical correlation, i.e. there is a shared probability distribution {pj} such that with probability pj , the state s(i)j is prepared on subsystem i. The global state after this preparation is then a convex combination of product states: ∑ j pjs(1) ⊗ · · · ⊗ s(N)j (2.45) The cone generated by all convex combinations of product states is the tensor cone Smin+ = S (1) + ⊗˜ · · · ⊗˜S (N) + . Whatever construction we employ for the set of states of the composite system, it is physically reasonable to demand that the composite state at least contains Smin+ . Define UN = U (1) ⊗ · · · ⊗ U (N), and note that this has unit inner product with the convex hull of product states. Note that the set of states S(i) on each system i span the vector space V (i), therefore the set of product states spans VN , and UN is the unique vector with unit inner product on all product states. We therefore take UN to be the N -fold unit effect. Designating the set of allowed composite states to be Smin = {s ∈ Smin+ : 〈UN , s〉 = 1} defines a perfectly valid composite system. Measurements on the composite system then correspond to sets of effects belonging to the effect cone Emin+ = ( Smin+ )∗, and the proper effects are those belonging to the order interval [0,UN ]Emin+ . Definition. Let {(V (i),S(i)+ ,U (i))}Ni=1 be a set of abstract state spaces. The mini- mal tensor product or (min tensor product) of systems 1, . . . , N is represented by the abstract state space V min = ( N ⊗ i=1 V (i),Smin+ ,UN ) (2.46) Notice that convex combinations of product states do not allow for any notion 68 of entanglement or non-locality, both of which are present in quantum theory and Boxworld, so there must in general be larger consistent sets of states that can be prepared on composite systems. Clearly, any composite state s should obey the following properties: 〈e(1) ⊗ · · · ⊗ e(N), s〉 ≥ 0 ∀e(i) ∈ E (i)+ (2.47) 〈UN , s〉 = 1. (2.48) In quantum and classical theory, composite states have well-defined reduced states, which determine the local outcome statistics of local measurements, regard- less of operations performed on the remaining subsystems. Suppose that a joint measurement is made on a state s, and the outcomes on systems 2, . . . , N corre- spond to the effects e(2), . . . , e(N). In principle, the choice of measurement at sys- tem 1 may be made after these measurements occur; the reduced state s˜(1) ∈ S(1)≤1 relative to the fixed outcomes e(2), . . . , e(N) is such that for any e(1) ∈ E (1)+ , 〈e(1), s˜(1)〉 = 〈e(1) ⊗ e(2) ⊗ · · · ⊗ e(N), s〉. (2.49) This defines a unique vector s˜(1) ∈ V which, since the RHS is strictly between 0 and 1, must lie in S(1)≤1 . s˜(1) can be interpreted as the state of subsystem 1 post- measurement, conditioned on the outcomes corresponding to effects e(2), . . . , e(N) being recorded at subsystems 2, . . . , N . In order to normalise the state s˜(1), it must be divided by the positive real num- ber 〈U (1), s˜(1)〉: this value can be interpreted as the marginal probability of ob- taining the outcomes e(2), . . . , e(N) for any global measurement which includes the outcomes corresponding to these effects on each subsystem. Suppose that we complete this global measurement using an additional set of effects on each sub- system i ≥ 2 which, when summed together with e(i), give the unit effect. Then the average over the resulting (normalised) reduced states, relative to each pos- sible outcome on subsystems 2, . . . N , and weighted according to their marginal 69 probabilities, is the unique state s(1) ∈ S(1) such that for any e(1) ∈ E (1)+ , 〈e(1), s(1)〉 = 〈e(1) ⊗ U (2) ⊗ · · · ⊗ U (N), s〉, (2.50) known as the reduced state on subsystem 1. This reduced state provides a full description of the outcome statistics for a local measurement at subsystem 1, given that the state of the global system is s. As a consequence of our tensor product construction, any allowed state satisfying (2.47) and (2.48) also leads to reduced states which are genuine states on each individual subsystem. This set of states may be explicitly constructed by defining the cones of effects and states in the following way, Emax+ = E (1) + ⊗ˆ · · · ⊗ˆE (N) + Smax+ = ( Emax+ )∗ , (2.51) thus defining Smax = {s ∈ Smax+ : 〈UN , s〉 = 1} as the largest possible set of states for the composite system. Definition. Let {(V (i),S(i)+ ,U (i))}Ni=1 be a set of abstract state spaces. The maxi- mal tensor product or (max tensor product) of systems 1, . . . , N is represented by the abstract state space V max = ( N ⊗ i=1 V (i),Smax+ ,UN ) (2.52) As well as describing an abstract state space for each type of individual sys- tem, a general probabilistic theory must provide a description of how those sys- tems may combine into composite systems. For any set {(V (i),S(i)+ ,U (i))}Ni=1 of permitted abstract state spaces, the theory must offer a composite state cone Smin+ ⊆ (S+)N ⊆ Smax+ , or a composite effect cone Emax+ ⊆ (E+)N ⊆ Emin+ . 70 2.5.4 Transformations Quantum theory is not just a static description of what kinds of measurements can be performed, and what kind of outcome statistics we can expect from those mea- surements. It also provides an explicit model of how one state may transform into another. These transformations are described by unitary operators on the Hilbert space of the system, or at least of a larger system containing it. Any transformation, indeed any evolution of part of the universe as portrayed by quantum mechanics, is described by a reversible transformation from the perspective of anyone who regards that part of the universe as a closed system. Transformations also fit naturally into the abstract state space framework de- scribed in Sections 2.5.2 and 2.5.3. Suppose that (V,S+,U) is an abstract state space, and consider a mapping T from S into itself (i.e. a transformation from allowed states to allowed states). Recall from Section 2.5.2 that effects must be convex-linear on S in order to respect probabilistic preparations of states. Much the same reasoning can be applied here: the transformation acting on the state ps1+(1−p)s2 is expected to have the operational properties of the corresponding average of transformed states, i.e. T (ps1 + (1− p)s2) = pT (s1) + (1− p)T (s2). It follows from Proposition 4 that T can be seen as a linear map from V to it- self, which also maps S into itself. An allowed transformation is any such linear map: just as we assume that any linear functional which maps states to the inter- val [0, 1] corresponds to a physically realisable effect, we assume that any allowed transformation may be realised by some physical operation on the system. Suppose now that a system is comprised of N subsystems, with vector spaces V (1), . . . , V (N) admitting state spaces S(1), . . . ,S(N), and that an allowed trans- formation T (i) is locally performed on each subsystem i. We would expect that there is an allowed global transformation T of the composite system which occurs as a result of these local transformations being performed. Consider the effect of this global transformation on a product state s(1) ⊗ · · · ⊗ s(N), and then perform- ing a measurement locally on each subsystem. The state which gives the correct outcome probabilities for any local measurement is T (1)(s(1))⊗ · · · ⊗ T (N)(s(N)). 71 T = T (1) ⊗ · · · ⊗ T (N) is the unique linear extension to ⊗ V (i) of the map which acts in this manner on the set of product states. “Product transformations” of this kind must therefore be allowed transformations of the composite system: observe that this will always hold in min- and max- tensor product spaces, since product transformationsmap the setsSmin andSmax into themselves. Note, if a local trans- formation is performed on only a subset of theN subsystems, then we consider the identity transformation I(i) to have been performed on the remaining subsystems. In quantum theory, one may view transformations as acting on measurement operators rather than states, by moving to the Heisenberg picture. An analogous trick may be performed here. Since, in the operational perspective, a state is de- fined by its outcome statistics, the action of a transformation is defined by how it influences the inner product between states and effects, i.e. T is determined by the set of values 〈e, T (s)〉. Recall that the adjoint T † of a linear map T is the linear map for which satisfies 〈T †(x), y〉 = 〈x, T (y)〉 for all x, y ∈ V . Thus T is de- termined equivalently by its adjoint transformation on effects: note that if T is an allowed transformation then T † maps the effect cone E+ to itself. Proposition 5. Let (V,S+,U) be an abstract state space, and T an allowed trans- formation. Then T †(U) = U . Proof. By definition of the adjoint, we have that for all states s ∈ S , 〈T †(U), s〉 = 〈U , T (s)〉 = 1. (2.53) Since S spans V , any two linear functionals which are equivalent on S are equiva- lent on the whole of V . T †(U) has the same inner product as U does for all s ∈ S , thus T †(U) = U . An allowed transformation is reversible if there exists an allowed transforma- tion S such that TS(s) = s for all states s in S . This implies that S = T−1 when viewed as linear maps on the whole of V . If a transformation is reversible, then so is its adjoint, which satisfies ( T † )−1 = (T−1)†. The property of being 72 reversible imposes strong restrictions on T . Clearly, for example, it must be a bi- jective mapping from S to itself, otherwise T−1 will not be a well-defined allowed transformation. As the following Proposition demonstrates, this means that the set of pure states must be permuted by T . Proposition 6. Let (V,S+,U) be an abstract state space, and T an allowed, re- versible transformation. Then T maps pure states to pure states, T † maps pure effects to pure effects and T † maps extreme ray effects to extreme ray effects. Proof. Suppose that T (s) = ps1 + (1− p)s2, where 0 < p < 1 and s1, s2 ∈ S . T has a linear inverse T−1, so we have that s = pT−1(s1) + (1− p)T−1(s2). Since s is a pure state, this implies T−1(s1) = T−1(s2) = s, hence s1 = s2 = T (s). Therefore T (s) is also a pure state. The proof that T † maps pure effects to pure effects is identical, and the proof that T † maps extreme ray effects to extreme effects is very similar (it simply involves switching the convex combination for a conic combination). 2.5.5 Examples So far we have introduced machinery for dealing with abstract state space repre- sentations of general probabilistic theories, and derived results about their struc- ture in this abstract setting; we now supplement this with concrete examples. We begin by recapping and constructing the abstract state spaces for quantum theory, classical theory and Boxworld, then describe polygon models introduced in [36], whose state spaces take the shape of regular polygons. Quantum Theory In the standard presentation, quantum theory is not introduced by means of a real vector space, but instead by means of a complex Hilbert spaceH, in which the states are given by density operators andmeasurements by sets of POVMelements. However, the set of density operators form a convex subset of the real vector space herm (H) of Hermitian operators over H. Defining V = herm (H) to be our 73 vector space, and S+ = P (herm (H)) to be the cone of positive operators on H (i.e. Hermitian operators P for which 〈x|P |x〉 > 0 for all non-zero x ∈ H), we almost have a full abstract state space. Before defining U , we need to have an inner product on V : this is provided by the Hilbert-Schmidt inner product on linear operators, 〈A,B〉 = Tr ( A†B ) . With this inner product, the unit effect is simply the identity I, and the cone of effects is identical to the cone of states. The proper effects are positive operators E for which E ≤ I in the operator norm. The convex set of normalised states S is the set {ρ ∈ P (H) : Tr (ρ) = 1} (i.e. those that have unit inner product with I). Pure states are given by the set of projectors onto 1-dimensional subspaces. Pure effects are given by projectors of any rank, however extreme ray effects are again given by the set of projectors onto 1-dimensional subspaces (this is in agreement with the fact that E+ = S+, and our previous comment that the pure states generate the extreme rays of S+). Observe that whenever H is of dimension larger than 2, there will exist infinitely many pure effects which are not extreme ray effects. Reversible transformations are maps ρ → UρU †, where U is a unitary operator. As expected, these map the cone of positive operators bijectively to itself, and pure states onto pure states. The adjoint mapping also takes pure effects to pure effects, and extreme ray effects to extreme ray effects. The set of states of composite quantum systems, the unit-trace positive oper- ators in the tensor product of the Hilbert spaces, lies strictly between the sets of states described by the min and max tensor products. The states of the min tensor product are the separable states, i.e. those which can be written as a convex com- bination of pure product states: this also permits effects that are not allowed in standard quantum theory, for example entanglement witnesses, which give posi- tive outcomes for any product state, but are negative for some entangled states. On the other hand, composite quantum systems with the max tensor product include states not allowed in the standard formulation, for example the partial transpose of many entangled states generates an operator which is positive for tensor products of POVM elements, but has global negative eigenvalues. 74 Classical Theory Recall that classical systems are those for which there is a single fiducial mea- surement, over which any outcome distribution corresponds to an allowed state. For a fiducial measurement with K outcomes, this can be represented in RK by setting the fiducial effects to be the standard basis vectors {e1, . . . , eK}. These are the extreme ray effects, hence the effect cone is the positive quadrant. This implies that the ith component of a vector representing a state is the probability of obtaining outcome iwhen the fiducial measurement is performed on the system in that state. Thus all state vectors have unit inner product with the unit effect, which is the vector containing 1 in every component. The set of normalised states forms a simplex, whose vertices are also the standard basis vectors. The pure state ei with a 1 in the ith component (and 0 elsewhere) is the state which deterministically as- signs outcome i when the measurement is performed. Like quantum theory, the state and effect cones are identical. Reversible transformations of a classical system must permute the fiducial ef- fects, which can be interpreted as a “relabelling” of themeasurement outcome once the measurement has been performed. Note that any such transformation maps the standard basis to itself, and hence extends to a valid linear map on RK . The min and max tensor products of composite classical systems coincide, and thus there is only one reasonable construction of composite states and effects: the only allowed states are classically correlated, and there is no entanglement. To see this, it is enough to observe that the only composite product effect with which the pure product state e(1)i1 ⊗ · · · ⊗ e (N) iN has non-zero product, is in fact the effect given by the same vector. Hence any linear combination of product effects must have only positive coefficients in order to have positive inner product with all pure product states. Since the product effects span the tensor product space, the only possible composite effects are given by Emax+ , which must therefore be equal to Emin+ . 75 Boxworld In a Boxworld system it is again simplest to construct the effect cone first, then obtain the state space via duality. For a system with M measurements, for which measurement x hasKx outcomes, recall that any set of distributions on the outcomes for each measurement is an allowed state. This can be encapsulated in a real vector space V of dimension d = 1 + ∑ x (Kx − 1). Choose a linearly independent set of vectors {U} ∪ {Xa|x} for 1 ≤ x ≤ M and 1 ≤ a ≤ Kx − 1, where the vectors of the formXa|x are fiducial effects, corresponding to outcome a occurring when fiducial measurement x is performed. These vectors form a basis for V . The remaining fiducial effect vectors are then restricted by the fact that summing the effects of a fiducial measurement must give the unit effect: XKx|x = U − Kx−1 ∑ a=1 Xa|x. (2.54) For example, a g-bit system can be represented in a 3-dimensional real vector space, where U , X0|0 and X0|1 take the place of the standard basis vectors, so that e.g. X1|1 = (1, 0,−1). Any vector s which has a positive inner product with all fiducial effect vectors and unit inner product with the unit effect U corresponds to the state with outcome distribution P (a|x) = 〈Xa|x, s〉. These inner products always have a value at most one since 〈U , s〉 = 1. Pure states are those which have inner product only 0 or 1 with all fiducial effects: these deterministically assign a single outcome to each choice of measurement (for example, the state which always produces outcome 1 is a pure state). Min tensor product Boxworld states are convex combinations of tensor prod- ucts of these pure states. There exist effects in Emin+ \ Emax+ which are positive for all such states: for example, the effect, X1|0 ⊗X1|0 +X0|0 ⊗X1|1 +X1|1 ⊗X0|0 −X1|1 ⊗X1|1, (2.55) gives inner product 0 or 1 for all pure product states, but is not a positive linear 76 combination of tensor products of fiducial effects. However, for a variant of the PR-box state, in which the outcomes are exactly correlated (with probability 1/2 each for both being 0 or both being 1) whenever both inputs are equal to 0, but anti-correlated (again with probability 1/2 for the two possibilities) otherwise, the probability of obtaining the above effect is 3/2, which is clearly not permitted. Min tensor product Boxworld outcome distributions clearly cannot violate the CHSH inequality, and are not often the subject of much interest. When referring to composite Boxworld systems, it is usually assumed that the max tensor product is being used; this is the convention that we will follow in this thesis. A Box- world state on a composite system -whoseN subsystems have abstract state spaces (V (i),S(i)+ ,U (i)) - is defined by a non-signaling outcome distribution P (a|x), and corresponds to a unique vector s ∈ Smax, such that 〈U , s〉 = 1 (i.e. the state is normalized) and P (a1, . . . , aN |x1, . . . , xN) = 〈X(1)a1|x1⊗· · ·⊗X (N) aN |xN , s〉 [4]. Pure product states are those of the form s(1) ⊗ · · · ⊗ s(N) where s(i) is a deterministic state on subsystem i. For a system of two g-bits, all pure states are either pure product states or variants of the PR-box; we do not yet have a full understanding of the pure states of composite Boxworld systems, although computational enu- merations have been carried out for some cases [64, 65]. In quantum theory, the Pauli measurements are not the only measurements that can be made on a qubit system, despite constituting a fiducial set of measurements; any positive semi-definite operatorP ≤ I corresponds to an outcome of some care- fully constructed measurement. Similarly, in Boxworld the fiducial measurements are not the only measurements that can be performed, and the fiducial effects are not the only vectors which correspond to measurement outcomes. However, since the fiducial measurements fully characterise a state s, the probability of obtaining any non-fiducial effect must be a function of the values P (a1, . . . , aN |x1, . . . , xN) associated with s. The set of allowed N -partite effects in Boxworld is the set of vectors e ∈ ⊗ V (i) such that 〈e, s〉 ∈ [0, 1] for all s ∈ S. This includes the N - partite fiducial effects, as well as all linear combinations of the fiducial effects that are positive and sub-normalised for all states. These non-fiducial effects can in 77 principle be measured by ignoring particular measurement outcomes, forgetting which of several outcomes occurred, or performing measurements probabilisti- cally. The same “relabelling”' idea introduced for classical systems can be employed in Boxworld to generate reversible transformations. For example, on a single sys- tem, relabelling by switching the measurement choices x and x′ corresponds to the permutation of fiducial effectsXa|x ↔ Xa|x′ for all a (we must haveKx′ = Kx for this to work). Relabelling by switching the measurement outcomes a and a′ for the same measurement corresponds to a single switch of the fiducial effect Xa|x with Xa′|x. Since these transformations can be validly defined on the basis made up ofU and a subset of the fiducial effects, they extend to linear maps on the whole vector space. Chapter 5 is concerned with proving that all allowed, reversible Boxworld transformations are composed of these local relabellings, as well as permutations of subsystems, so long as no subsystem is classical. This is no easy task; however, it is possible to prove relatively easily that in Boxworld, reversible transformations map pure product states to pure product states. Proposition 7. Let {(V (i),S(i)+ ,U (i))Ni=1} be subsystems of a composite Boxworld system. Let T be a reversible transformation on the composite system. Then for any pure product state s, T (s) is also a pure product state. Proof. This proof closely follows that given in [4]. Recall that the adjoint T † must permute the extreme rays of the effect cone: for a Boxworld system these are the product fiducial effects on each system. Hence 〈e, T (s)〉 = 〈T †(e), s〉 ∈ {0, 1} for any product fiducial effects e. A joint local measurement involving measurement choices xi on subsystem i comprises all product fiducial effects of the form Xa1|x1 ⊗ · · · ⊗ XaN |xN , where each ai ranges from 1 to Kxi . Since the sum of these vectors is UN = U (1) ⊗ · · ·⊗U (N), T (s)must have inner product 1 with exactly one such vector, and zero with the others. Thus for each string of measurement choices (x1, . . . , xN), T (s) deterministically assigns some string of outcomes (a1, . . . , aN). 78 Suppose now that for two measurement strings x and x′ with xi = x′i for any fixed i, T (s) assigns the outcome strings a and a′ for which ai 6= a′i. This implies that T (s) has inner product 1 with both the effects, U (1) ⊗ · · · ⊗X(i)ai|xi ⊗ · · · ⊗ U (N) U (1) ⊗ · · · ⊗X(i)a′i|xi ⊗ · · · ⊗ U (N). However, these distinct effects are both present in the same decomposition of UN : UN = ∑ a U (1) ⊗ · · · ⊗X(i)a|xi ⊗ · · · ⊗ U (N), thus T (s) has inner product at least 2 with UN , contradicting the assumption that T takes normalised states to normalised states. Polygon Models Polygon models, first introduced in [36], are specifically constructed so that the normalized state space of a single system takes the shape of a regular n-gon (or n-sided polygon) in 3-dimensional real Euclidean space. In contrast to the above theories, whose state spaces result from mathematical restrictions on the outcome distributions, polygon systems are defined by their state space, and their outcome distributions follow accordingly. Note that as n becomes large, the state space begins to resemble a circle, which itself represents a “2-dimensional slice” of the Bloch sphere of a qubit, in which only the σx and σz measurements are considered. An analysis of polygon models for varying n potentially gives us insight into the significance of the shape of the quantum local state space for phenomena such as non-locality. For fixed n, the normalized state space S of the n-gon system is the convex hull of the extremal states ω1, . . . , ωn, defined by: 79 ωn =    rn cos (2ipi n ) rn cos (2ipi n ) 1    , (2.56) where rn = √ sec (pi n ) . Recall that the unit effect is the unique vector whose inner product with all states is equal to 1. Clearly, this is the vector: U =    0 0 1    . (2.57) The extremal effect vectors must be defined separately in the cases where n is odd or even. For even n, they are the vectors e1, . . . , en defined by: ei = 1 2     rn cos ( (2i−1)pi n ) rn cos ( (2i−1)pi n ) 1     . (2.58) For odd n, there are 2n extremal effect vectors. The first n are defined by: ei = 1 1 + r2n    rn cos (2ipi n ) rn cos (2ipi n ) 1    (1 ≤ i ≤ n), (2.59) and the remaining n are defined by en+i = U − ei for 1 ≤ i ≤ n. Note that in the case of even n, U − ei is equal to e(i+n/2)modn, whereas in the case of odd n, this process generates 2n distinct effect vectors. Compare this with the features of odd- and even- sided polygons: the first n vectors correspond to normal planes which are parallel to and touch the sides of the polygon. The second set of n vectors correspond to normal planes which touch the polygon diametrically opposite to its sides. The sides of even polygons are arranged in diametrically opposing pairs, so that the first and second sets of effect vectors correspond to the same st of planes. 80 On the other hand, the sides of odd polygons are diametrically opposite to their vertices, so that the second set of effect vectors corresponds to an entirely new set of planes. Figure 2.5 (taken from [36]) depicts the state spaces of the polygon models for some values of n, with state and effect vectors projected onto the 2-dimensional polygon. Note that setting n = 3 gives a classical “triangle” system whose single fiducial measurement has 3 outcomes, and setting n = 4 gives a Boxworld system with binary inputs and outputs. For n = 5 and beyond, however, this set of systems do not overlap with classical, quantum or Boxworld systems. Figure 2.5: Illustration of polygon state spaces and the first n extremal effects (ΩQM denotes the quantum state space). Taken from [36]. Composite states of bipartite polygon systems may be described by a 3x3 ma- trix S, such that eTSf ≥ 0 whenever e and f are local extremal effects or unit vectors of the first and second systems respectively. The value eTSf gives the 81 probability of obtaining the local outcomes corresponding to e and f , whenever a local measurement is performed which contains these effects on each system. It can be shown that the states defined by the following matrices are entangled pure states (except for the case n = 3): even n : φ =    cos (pi n ) sin (pi n ) 0 − sin (pi n ) cos (pi n ) 0 0 0 1    (2.60) odd n : φ =    1 0 0 0 1 0 0 0 1    (2.61) By maximising over local binary measurements of the form {ei,U − ei}, it is possible to violate Tsirelson's bound on the CHSH value for arbitrarily large, even values of n. For odd values of n on the other hand, Tsirelson's bound is always obeyed by these measurements. Moreover, as n increases, the optimal values for both odd and even n appear to approach Tsirelson's bound from below and above respectively. Intriguingly, Tsirelson's bound provides a kind of dividing line between the optimal CHSH values obtained by odd and even polygons. Yet, even when n is so large that the polygon approximates a circular “slice” of the Bloch sphere, it is still possible to violate quantum predictions. 82 Chapter 3 Quasiprobability representations We must by all means stick to our sensations, that is, simply to the present impressions whether of the mind or of any criterion whatever, and similarly to our actual feelings, in order that we may have the means of determining that which needs confirmation and that which is obscure. “Letter to Herodotus” Epicurus 83 The notion of negative probabilities has a history that is both surprisingly long, and surprisingly closely intertwined with the history of quantum mechanics. Ideas about the relation of negative probabilities to quantum theory arose not long after its own development: in his Bakerian Lecture to the Royal Society in 1942 [66], Dirac remarked that negative energies and negative probabilities arose naturally in his calculations regarding relativistic extensions of quantum mechanics. How- ever, when calculating final measurement probabilities, the negative values were invariably “averaged out” to produce positive quantities and probabilities of ob- served events, and were physically significant only in their contribution to these observed values. He commented, “Negative energies and probabilities should not be considered as nonsense, … [but] simply as things which do not appear in ex- perimental results” [66]. Dirac conceived that those particles associated with negative quantities could be seen as belonging to some hypothetical world differing markedly from our own, but in which analogous transition laws applied, and in which the “averaged out” fi- nal probabilities were in complete agreement with the probabilities we observe. In the paper's introduction, in which he discussed Heisenberg's operational take on quantum mechanics, Dirac appears to propose the following viewpoint: as long as we are still ultimately talking about natural observations, it is no great sin if negative quantities are encountered along the way. Even earlier, in 1932, Wigner gave a striking precedent to this attitude in his paper introducing the well-known Wigner phase-space distribution [67]. Whilst a classical joint probability distribu- tion is generally incapable of describing simultaneously a wave function's position and momentum, the Wigner distribution nearly accomplishes this by allowing the values to be negative for some choices of position and momentum. Averaging over all choices of position gives the correct formula for the momentum, and vice versa. Wigner noted, “negative values … must not hinder the use of [the Wigner distribution] in calculations, as an auxiliary function” [67]. In the context of quantum information theory, interest in quasiprobability dis- tributions has seen a marked revival, including several applications and results in 84 quantum information and quantum foundations (a comprehensive review of de- velopments in this topic can be found in [68]). These have tended to focus on quasiprobabilistic representations of quantum states, with a view to progressing our calculational abilities and intuition regarding them. In this chapter we take a different perspective, and are concerned with simulating all non-signaling theo- ries via quasiprobabilistic extensions of both quantum and classical theory. The work herein is largely based on [1] and [43]; Subsection 3.2.2 and Section 3.4 are exceptions to this, and constitute original, unpublished work. Section 3.3 is the result of collaboration with Anthony Short. In Section 3.1we introduce and discuss quasiprobability distributions in aman- ner that is directly applicable to the discussion of outcome statistics of multipartite experiments. In Section 3.2 we give a detailed discussion of a recent result by Acín et al [1] which demonstrates that a specific quasiprobabilistic extension of quantum theory allows the generation of arbitrary non-signaling distributions. In Section 3.3 we show that the same can be achieved for local, classical probability theory in two distinct ways. This results in four (two quantum-like, two classical- like) distinct quasiprobabilistic models for non-signaling correlations; we discuss several interesting features of these models as well as their relevance to the study of quantum theory. In Section 3.4 we then discuss further quasiprobabilistic ex- tensions of quantum theory, focusing on one that employs zero entanglement and an arbitrarily small degree of classical correlation. 3.1 Quasiprobability distributions Definition. A function P˜ : A1×· · ·×AN → R, for finite setsA1, . . . , AN , is said to be a (multipartite) quasiprobability distribution if: ∑ a1,...aN P˜ (a1, . . . , aN) = 1. (3.1) Since we will generally be studying the multipartite case, the word “multi- 85 partite” preceding the phrase “quasiprobability distribution” will be dropped. For convenience we will also use bold symbols to denote strings of values, for exam- ple a denotes (a1, . . . , aN), and consequently P˜ (a) denotes P˜ (a1, . . . , aN). Like ordinary probability distributions, quasiprobabilities may be conditioned on some other set of variables: Definition. A function P˜ : A1 × · · · × AN × X1 × · · ·XN → R is said to be a conditional quasiprobability distribution if, for all fixed choices of x ∈ X1×· · ·× XN : ∑ a1,...aN P˜ (a|x) = 1, (3.2) where the notation P˜ (a|x) is used in place of P˜ (a1, . . . , aN , x1, . . . , xN). Conditional quasiprobability distributions form a direct analogue of outcome distributions. Indeed, we can define a more general form of the non-signaling condition which applies to quasiprobability distributions. Due to the difficulty in finding a convincing physical interpretation for negative probabilities, it is difficult to physically motivate this non-signaling condition in terms of communication between distant parties. Nevertheless, this form of the non-signaling condition forms an important mathematical step in the derivation of our results. Definition. The conditional quasiprobability distribution P˜ (a|x) is non-signaling if, for all i and choices of x1, . . . , xN , the expression Kxi ∑ ai=1 P˜ (a1, . . . , aN |x1, . . . , xN) (3.3) is independent of the particular value of xi. The only assumption we have made about the outcome sets A1, · · · , AN so far is that they are finite. Without loss of generality, we may as well assume that Ai = {1, . . . , K(i)xi }, where K (i) xi is the number of possible local outcomes when measurement xi is performed locally at subsystem i. For a string of outcome val- ues a and a subset Ω = {i1, . . . , ik} ⊆ [N ] with i1 < · · · < ik, we will use the 86 notation aΩ, xΩ to denote the reduced strings (ai1 , . . . , aik), (xi1 , . . . , xik). Just as with conventional outcome distributions, a non-signaling conditional quasiproba- bility distribution haswell-definedmarginal distributions P˜ (aΩ|xΩ) for any choice of Ω. The following technical definition will be helpful in characterizing condi- tional quasiprobability distributions according to a restricted subset of its marginal values. Definition. The outcome ai on system i is maximal with respect to xi if it takes on the greatest value permitted by measurement choice xi, i.e. ai = K(i)xi . For a subset Ω ⊆ [N ], the string (aΩ|xΩ) is nowhere-maximal if, for all i ∈ Ω, ai is not maximal with respect to xi. We now state and prove a lemma concerning non-signaling quasiprobability distributions, of which great use will be made throughout this chapter. The moral of the lemma is as follows: if one wants to generate a non-signaling quasiproba- bility distribution P , then essentially one needs only focus on the set of nowhere- maximal, marginal values of P . As long as one's generated distribution is itself non-signaling and agrees with P for these inputs, then it is guaranteed to agree with P for all inputs. Ideas very similar to this have been utilised implicitly in the literature of non-signaling correlations [1,69], but the author does not know of any explicit statement of the result. Lemma 1. A non-signaling, conditional quasiprobability distribution P˜ (a|x) is uniquely characterized by the set of non-maximal marginal values { P˜ (aΩ|xΩ) } as (aΩ|xΩ) runs over the set of nowhere-maximal strings and all subsets ∅ 6= Ω ⊆ [N ]. Proof. Suppose that P˜ (a|x) and P˜ ′ (a|x) are non-signaling conditional quasiprob- ability distributions, whosemarginals P˜ (aΩ|xΩ) and P˜ ′ (aΩ|xΩ) agree for all nowhere- maximal choices of (aΩ|xΩ). We argue that P˜ (aΩ|xΩ) and P˜ ′ (aΩ|xΩ) actually agree for all choices of (aΩ|xΩ), by induction on n = #{i ∈ Ω : ai maximal}. The case n = 0 holds by assumption. Consider a string (aΩ|xΩ) with n > 0 maximal ai; let Ω = {i1, . . . , iM} and without loss of generality suppose that 87 ai1 = K (i1) xi1 (the proof is no different otherwise). Since P˜ (a1, . . . , aN |x1, . . . , xN) is non-signaling, so are all of its marginal distributions, therefore applying the non-signaling condition to system i1, K(i1)xi1 ∑ ai1=1 P˜ (aΩ|xΩ) = P˜ (ai2 , . . . , aiM |xi2 , . . . , xiM ) . (3.4) Rewriting this, for our particular choice of (aΩ|xΩ), the marginal P˜ (aΩ|xΩ) can be expressed in terms of quantities where fewer of the aij are maximal: P˜ ( K(i1)i1 , . . . , aiM |xi1 , . . . , xiM ) = P˜ (ai2 , . . . , aiM |xi2 , . . . , xiM ) − ∑ ai1not maximal P˜ (ai1 , . . . , aiM |xi1 , . . . , xiM ) . (3.5) The number of maximal ai is now smaller than n for each term on the RHS, so by the induction hypothesis we can switch P˜ with P˜ ′ for each term individually: P˜ ( K(i1)i1 , . . . , aiM |xi1 , . . . , xiM ) = P˜ ′ (ai2 , . . . , aiM |xi2 , . . . , xiM ) − ∑ ai1not maximal P˜ ′ (ai1 , . . . , aiM |xi1 , . . . , xiM ) . (3.6) P˜ ′ is also non-signaling, hence equation (3.4) is still valid after replacing P˜ with P˜ ′. Therefore the reverse of the procedure used to obtain equation (3.5) can be applied to P˜ ′ on the RHS, giving P˜ ( K(i1)i1 , . . . , aiM |xi1 , . . . , xiM ) = P˜ ′ ( K(i1)i1 , . . . , aiM |xi1 , . . . , xiM ) ⇒ P˜ (aΩ|xΩ) = P˜ ′ (aΩ|xΩ) . (3.7) Example. A simple example may help to illustrate the central idea of the proof, 88 which is that we can exploit the non-signaling condition in an iterative fashion in order to express a particular value P (a|x) in terms of marginals for which the ai are nowhere maximal. Suppose that P is the outcome distribution for a 2-party, 2-input and 2-output experiment, i.e. one possible state of a g-bit system. Recall that for a g-bit system, we will prefer to use binary notation {0, 1} to label inputs and outputs, rather than the set {1, 2}. The probability P (1, 1|0, 0) has a maximal outcome value for both parties, but admits the following reduction using the non- signaling and normalisation conditions: P (1, 1|0, 0) = P (a2 = 1|x2 = 0)− P (0, 1|0, 0) = [ 1− P (a2 = 0|x2 = 0) ] − [ P (a1 = 0|x1 = 0)− P (0, 0|0, 0) ] = 1 + P (0, 0|0, 0)− [ P (a1 = 0|x1 = 0) + P (a2 = 0|x2 = 0) ] . 3.2 Quantum theory with quasiprobabilities An outcome distribution P is quantum-achievable if there exist Hilbert spaces (of any dimension) H1, . . . ,HN , positive operators M (i)ai|xi ∈ P (Hi) satisfying ∑ ai M (i) ai|xi = I (i) , and a density matrix ρ ∈ D (H1 ⊗ · · · ⊗ HN) such that, for all choices of ai and xi, P (a1, . . . , aN |x1, . . . , xN) = Tr ( M (1)a1|x1 ⊗ · · · ⊗M (N) aN |xN ρ ) . (3.8) The set of quantum-achievable outcome distributionsQ inhabits a specific por- tion of the full set of non-signaling distributions. If quantum theory - or some (perhaps relativistic) extension of it - provides an accurate description of natu- ral phenomena, then the correlations belonging to Q are exactly those which, in principle, it is possible to generate between spatially separated parties in the real world. It therefore seems a worthwhile endeavour for two reasons to explore what defining characteristics single out Q as a subset of non-signaling distributions. Firstly, from a practical vantage, it is beneficial to have a good understanding 89 of exactly which correlations it is possible to generate between distant parties, and which correlations are unattainable. Any additional intuition gained from devel- oping a broad treatment of correlations (rather than the formal paradigm of quan- tum theory) is also a useful aid in developing algorithms that solve information- theoretic problems with real-world applications [70, 71]. Secondly, on a more foundational level, analysing the principles or rules that delimitQ from other non- signaling distributions [50, 52, 72, 73] gives us a greater appreciation of what it is that makes quantum theory so uniquely successful at describing Nature. Even if quantum theory is superseded by a more complete or more accurate physical the- ory, abstract notions concerning which correlations are allowable will persist. One approach to this issue that has met with some success is to ask the ques- tion “given a non-signaling distribution P(a|x), are there simple criteria by which one can determine whether P is quantum-achievable or not?”. Even when the number of parties is small, this turns out to be a more difficult task than might be anticipated: in 2007 Navascués et al introduced a hierarchy of conditions which are necessarily satisfied by bipartite quantum-achievable correlations [74], and later demonstrated that any non-signaling distribution which lies outside of Q violates one of these conditions (and therefore all subsequent conditions) [75]. Specifically, they introduce a sequence {Qn}n≥1 of sets of outcome distributions which satisfies Qn+1 ⊆ Qn and ∩∞n=1Qn = Q (see Figure 3.1). Determining whether a given outcome distribution belongs to the setQn can be formulated as a semi-definite program; if the distribution fails this test then it is not quantum- achievable. However, in general it cannot be confirmed that a distribution is quantum-achievable after finitely many tests. Another perspective is to instead ask “in what ways does the setQ stand out?”. Acín et al [1] have shown that the standard Born trace rule can be extended to generate any non-signaling correlation, by allowing the quantum state (usually represented by a positive density operator) to be a non-positive operator. Explic- itly, given any non-signaling distribution P (a|x) on N parties, there exist Hilbert spaces Hi, POVM elements M (i)ai|xi ∈ L (Hi) and a Hermitian, unit-trace but not 90 Figure 3.1: Hierarchy of sets which approximates the set of quantum-achievable outcome distributions. Taken from [75]. necessarily positive operator ρ˜ such that: P (a|x) = Tr ( M (1)a1|x1 ⊗ · · · ⊗M (N) aN |xN ρ˜ ) . (3.9) This result provides a universal, quantum-like manner in which all non-signaling correlations can be written. In other words, the set of non-signaling correlations is exactly that which can be generated using the formula (3.9). Moreover, the set of quantum-achievable correlations can be neatly recovered by demanding a single additional constraint: that the operator ρ˜ be positive. The set of classical or local correlations can also be neatly recovered, by demanding that ρ˜ be positive and separable. As well as being of standalone interest, this formulation has recently been used to prove the following result: any physical theory for which the local structure is identical to that of qubits, and which admits at least one continuous, reversible interaction, must have the global structure specified by quantum theory [2]. The ability to represent the correlations of a broad class of theories in a similar way to quantum or classical correlations may provide a powerful tool in analysing such theories, and identifying which properties of quantum theory are special within this class of theories. In Section 3.2.1 we reproduce the proof of the result of Acín et al in detail, generalising the published version to allow for outcome sets that vary according 91 to the subsystem i and measurement choice xi. We then make the original and interesting remark that with some simplemodifications, the operator ρ˜ can bemade so that it and the set of POVM operatorsM (1)a1|x1⊗· · ·⊗M (N) aN |xN pairwise commute. This will lead us naturally into Section 3.3, in which we study quasiprobabilistic extensions of the local classical formalism, rather than the quantum formalism. 3.2.1 Non-positive density matrices We now give a detailed exposition of the procedure described by Acín et al [1] for representing arbitrary non-signaling outcome distributions in terms of local POVM operators acting on a Hermitian, unit-trace, but not necessarily positive density operator ρ˜. In fact, the published proof assumes that all measurements have the same number of outcomes and that all subsystems have the same number of measurements, and gives the explicit form of the constructed operators only in the caseN = 2 (but sketches themultipartite generalisation). The proof given here provides a concise and rigorous construction in the multipartite case and allows for differing numbers of measurement outcomes. Lemma 2. Let H be a complex, d-dimensional Hilbert space, and let V be the real, d2-dimensional vector space of Hermitian matrices acting onH. Then there exists a basis {Eα} of V satisfying the following properties: (i) Eα is positive semidefinite for all α; (ii) E1 = I; (iii) I− ∑ α>1Eα is a positive semi-definite operator. Proof. Let {|ei〉} be a basis forH, and define the d · (d− 1)/2 vectors |fjk〉 = |ej〉+ |ek〉 k < l (3.10) and the d · (d− 1)/2 vectors |gmn〉 = |em〉+ i|en〉 m < n. (3.11) 92 The set of d2 rank-1 projectors {|ei〉〈ei|, |fjk〉〈fjk|, |gmn〉〈gmn|} forms a lin- early independent subset of the real vector space V . To see this, consider the decompositions of these projectors into matrices of the form |ej〉〈ek|. Note that |fjk〉〈fjk| is the only projector to produce an |ej〉〈ek| cross-term with a real coef- ficient, whilst |gmn〉〈gmn| is the only projector to produce an |em〉〈en| cross-term with a complex coefficient. Since V is of dimension d2, this set of projectors forms a basis for V . Suppose that {Eα} denotes a relabelling of the above basis for V such that E1 = |ed〉〈ed|. Note that the identity can be decomposed as I = ∑d i=1 |ei〉〈ei|; therefore E1 = |ed〉〈ed| lies in the span of the set {I, Eα}α>1. Setting E1 equal to I instead of |ed〉〈ed| forms a new set which therefore also spans V . Hence we may assume without loss of generality that E1 = I. Since this new set also has d2 elements, it again forms a basis for V , thus conditions (i) and (ii) are satisfied. To additionally satisfy condition (iii), note that the matrix E = ∑ α>1Eα is also positive semi-definite, and so by compactness the real number ω = max |||ψ〉||=1 〈ψ|E|ψ〉 (3.12) is both positive and finite. After scaling every Eα by 1ω for α > 1, we have that I− ∑ α>1Eα is also a positive operator. Theorem 5. An outcome distribution P is non-signaling if and only if there exist Hilbert spaces (of any dimension)H1, . . . ,HN , operatorsM (i)ai|xi ∈ P (Hi) satisfy- ing ∑ ai M (i) ai|xi = I (i) , and aHermitian, unit-trace operator ρ˜ ∈ L (H1 ⊗ · · · ⊗ HN) such that P (a1, . . . , aN |x1, . . . , xN) = Tr ( M (1)a1|x1 ⊗ · · · ⊗M (N) aN |xN ρ˜ ) . (3.13) Proof. The “if” direction of the proof is little more than an algebraic substitution of the condition ∑ ai M (i) ai|xi = I (i). Indeed, for an arbitrary choice of x1, summing 93 over a1 and using linearity of the trace operation gives: K(1)x1 ∑ a1=1 P (a1, . . . , aN |x1, . . . , xN) = K(1)x1 ∑ a1=1 Tr ( M (1)a1|x1 ⊗ · · · ⊗M (N) aN |xN ρ˜ ) = Tr     K(1)x1 ∑ a1=1 M (1)a1|x1  ⊗ · · · ⊗M (N)aN |xN ρ˜   = Tr ( I(1) ⊗ · · · ⊗M (N)aN |xN ρ˜ ) (3.14) Equation (3.14) clearly has no dependence on the value of x1, hence any outcome distribution P that can be represented in this way is non-signaling. To prove the converse, we construct local measurement operatorsM (i)ai|xi and a Hermitian operator ρ˜ of unit tracewhich generates a non-signaling quasi-distribution P˜ ′, then show that P˜ ′ agrees with P for all nowhere-maximal marginal strings (aΩ|xΩ). Using Lemma 1, this is sufficient to conclude that P˜ ′ is identical to the outcome distribution P , and therefore that P is generated according to (3.13). Recall that system i has M (i) measurement choices, with measurement x al- lowing for K(i)x possible outcomes. LetHi ∼= Cdi , where di is large enough that d2i ≥ 1 + M(i) ∑ x=1 ( K(i)x − 1 ) . (3.15) Let {Eα} be the basis of positive semi-definite matrices defined in Lemma 2 for the vector space V of Hermitian matrices acting onHi. Note that V has a real inner product 〈A,B〉 = Tr (AB). Let {E˜α} be the dual basis to {Eα}, defined by 〈Eα, E˜β〉 = Tr ( EαE˜β ) = δαβ . For 1 ≤ xi ≤ M (i) and 1 ≤ ai ≤ K(i)x − 1, define {M (i)ai|xi} (the set of non-maximal POVM elements) to be some subset of {Eα}α>1; note that this is possible as long as di is large enough to satisfy (3.15). For each matrixM (i)ai|xi , let M˜ (i) ai|xi denote the corresponding dual matrix, which is 94 Hermitian but not necessarily positive semi-definite. To complete the POVM set for each measurement xi, define M (i)K(i)xi |xi = I(i) − ∑K(i)xi −1 ai=1 M (i) ai|xi which, due to condition (iii) of Lemma 2, is also a positive semi-definite operator. Note that the POVM elements have been defined purely in terms of the number of measurement choices and outcomes, and do not depend on the specific outcome distribution P . Our non-positive density operator ρ˜ ∈ H1 ⊗ · · · ⊗ HN , on the other hand, does depend on P . In giving a concise formula for ρ˜, a slight abuse of notation is needed: for a subset Ω ⊂ [N ] of the N systems, ˆ ⊗ i∈ΩM˜ (i) ai|xi will denote an N -fold tensor product, in which the matrix I˜(i) is inserted for components i not belonging to Ω. For example, ˆ⊗ i∈{2,3} M˜ (i)ai|xi = I˜ (1) ⊗ M˜ (2)a2|x2 ⊗ M˜ (3) a3|x3 . (3.16) Note that since P is assumed to be non-signaling, the marginal distributions P (aΩ|xΩ) are well-defined for all ∅ 6= Ω ⊆ [N ]. For the empty set Ω = ∅, we will also define P (a∅|x∅) = 1. The operator ρ˜ is then given by the formula ρ˜ = ∑ Ω⊆[N ] ∑ nowhere-maximal strings (aΩ|xΩ) P (aΩ|xΩ) ⊗ i∈Ω M˜ (i)ai|xi . (3.17) Since each M˜ (i)ai|xi is not guaranteed to be positive semi-definite, neither is the operator ρ˜. However, by construction they are all Hermitian, hence so is ρ˜. Sup- pose now that P˜ ′ is the (non-signaling) quasiprobability distribution generated by the POVM elementsM (i)ai|xi acting on the state ρ˜, according to the Born trace rule (3.13). Given a subsetΩ ⊆ [N ] and nowhere-maximal string (aΩ|xΩ), themarginal value P˜ ′ (aΩ|xΩ) is obtained by taking the trace of the product of ρ˜with the matrix ⊗ i∈ΩM (i) ai|xi . Using the linearity of the trace, the fact that Tr (A⊗B) = Tr (A) ·Tr (B), and the duality of the constructed basis sets, it is clear that these marginal values are exactly P (aΩ|xΩ). Therefore by Lemma 1, P˜ ′ is in fact the outcome distribution P , as desired. By post-multiplying ρ˜ by the N -partite identity matrix ⊗N i=1 I(i), a 95 similar procedure verifies that ρ˜ is of unit trace: Tr (ρ˜) = Tr ( ρ˜ ( N ⊗ i=1 I(i) )) = 1. (3.18) Example The construction outlined in the previous proof will now be applied to the canonical PR-box state. Recall that the PR-box is a special state of the system comprised of two g-bit subsystems, i.e. each subsystem has two fiducial measure- ment choices and two possible outcomes for each fiducial measurement. Hence it is sufficient to set d1 = d2 = 2 in order to satisfy (3.15). By following the proce- dure in Lemma 2, before performing the rescaling, we obtain the following basis of positive semi-definite matrices for the vector space of 2x2 complex Hermitian matrices: E1 = ( 1 0 0 1 ) , E2 = ( 1 0 0 0 ) , E3 = ( 1 1 1 1 ) , E4 = ( 1 i −i 1 ) . (3.19) In order to satisfy property (iii) of Lemma 2, it is straightforward to check that the following matrix is positive semi-definite, E1 − 1 ω (E2 + E3 + E4) , (3.20) as long as ω ≥ 52 . For convenience, we will set ω = 4, generating the follow- ing set of POVM operators (recall that for the PR-box, measurement choices and 96 outcomes belong to the set {0, 1}): M0|0 = ( 1 4 0 0 0 ) , M1|0 = ( 3 4 0 0 1 ) , M0|1 = ( 1 4 1 4 1 4 1 4 ) , M1|1 = ( 3 4 − 1 4 −14 3 4 ) . (3.21) Since the POVM elements are taken to be identical for both systems, the su- perscript (i) is unnecessary. and has been omitted. The relevant dual elements to the POVM operators are then, I˜ = ( 0 −1+i2 −1−i 2 1 ) , M˜0|0 = ( 4 0 0 −4 ) , M˜0|1 = ( 0 2 2 0 ) . (3.22) Recall that the PR-box outcome statistics are described by the formulaP (a1, a2|x1, x2) = 1 2δ(a1⊕a2 = x1·x2 (mod 2)). In this case, the marginal distributions are uniformly distributed for both systems, and the formula for ρ˜ reduces to, ρ˜ = 1 ∑ x1,x2=0 1 2 δ ( x1 · x2 = 0 ) M˜ (1)0|x1 ⊗ M˜ (2) 0|x2 + 1 2 ( M˜ (1)0|0 ⊗ I˜ (2) + M˜ (1)0|1 ⊗ I˜ (2) ) + 1 2 ( I˜(1) ⊗ M˜ (2)0|0 + I˜ (1) ⊗ M˜ (2)0|1 ) + I˜(1) ⊗ I˜(2). (3.23) Substituting the matrices obtained in (3.22) gives, ρ˜ =       8 3 + i 3 + i −1 + i2 3− i −6 −12 −5−i 2 3− i −12 10 −5−i 2 −1− i2 −5+i 2 −5+i 2 −11       (3.24) 97 Note that ρ˜ is Hermitian, of unit trace, and non-positive (for example 〈e4|ρ˜|e4〉 = −11). It can be checked that the POVM elements (3.21) do indeed generate the correct probabilities when acting on the matrix ρ˜ even in the case where the string (a|x) is not nowhere-maximal, for example, Tr ( M0|1 ⊗M1|1 ρ˜ ) = 1 2 (3.25) Tr ( M1|1 ⊗M1|1 ρ˜ ) = 0. (3.26) This concludes the example of generating PR-box statistics using a non-positive density operator. 3.2.2 Commuting operators It turns out that with a simple modification of the proof of Theorem 5, the same result can be achieved but with the additional constraint that all operators (both the POVM elements and ρ˜) commute. The cost of such a modification is an increase in the dimension of the Hilbert space at each system - in fact we require that di = 1+ ∑M(i) x=1 ( K(i)x − 1 ) , so that dimension of the Hilbert space is roughly the square of that in the proof of Theorem 5. The construction of the POVM elements is then particularly simple. Let {|e0〉} ∪ {|eai|xi〉 |1 ≤ xi ≤M (i), 1 ≤ a ≤ K(i)xi − 1}, (3.27) be a basis for the Hilbert space Cdi , and for non-maximal pairs (ai, xi), define M (i)ai|xi to be the projector |eai|xi〉〈eai|xi|. Writing the identity as I (i) = |e0〉〈e0| + ∑ |eai|xi〉〈eai|xi|, it is clear for each xi that the matrix given by, M (i) K(i)xi |xi = I(i) − ∑ ai 1, write a = a(0)a(1) as a concatenation of two bit-strings of length n/2, and let (k1, . . . , kr) be a bit-string representation of (k− 1), so that the value of the rth bit kr indicates whether ak belongs to either a(0) or a(1). Assume that the protocol has been constructed for the case n = 2r−1, and suppose that Alice and Bob perform the protocol for both of Alice's strings a(0) and a(1) - with Bob using the input represented by (k1, . . . , kr−1) in both cases - but that Alice does not transmit any messages to Bob. Suppose that Alice would normally communicate the single bit x(0) to Bob had they been performing the protocol only for the string a(0), and similarly the bit x(1) had they been performing the protocol only for a(1). Note that Bob now wishes to know the value of exactly one of x(0) or x(1), depend- ing on whether kr is 0 or 1. This reduces the task to the r = 1 case, hence by the previous paragraph, using one extra PR-box Bob may obtain the value of bit x(kr), and complete the protocol for the (n/2)-bit string a(kr). Figure 4.2 depicts this 142 protocol in the case where r = 2 and k = 3; note that whilst the general protocol requires three PR-box states, Bob only makes use of two of the outputs. Figure 4.2: How to violate the Information Causality protocol using perfect PR- box states (r = 2, k = 3). Adapted from [51]. Now suppose that Alice and Bob use a noisy PR-box, which on inputs x, y outputs bits a and b, whose individual values are uniformly random, such that with probability p, a⊕ b = x · y and with probability 1− p, a⊕ b = (x · y)⊕ 1 (in fact, any non-signaling outcome distribution can be brought to this form by means of local operations, without affecting its overall CHSH value [85]). Tsirelson's bound is violated by such a box if and only if p > 2+ √ 2 4 [47]. In the protocol with perfect PR-boxes, Alice and Bob compute their final answer by a modulo 2 summation of the outputs of r PR-boxes. Therefore in the noisy case, the correct answer is obtained not just when all the PR-boxes function correctly, but if and only if an even number of them output the wrong answer. Thus the probability of 143 Bob having the correct answer at the end of the protocol is given by: b r2 c ∑ j=0 ( r 2j ) pr−2j(1− p)2j = 1 2 [1 + (2p− 1)r] = 1 + E r 2 , (4.26) where E = p − (1 − p) is the bias of the PR-box. The figure of merit I can be manipulated into a form into which we can subsitute (4.26). Expanding it in terms of the classical entropy, I = n ∑ k=1 Ic(ak : β(k)) = n ∑ k=1 H(ak)−Hc(ak|β(k)). (4.27) Since a is uniformly distributed, Hc(ak) = 1 for all k. If one has access to β(k), then ak has the same distribution as ak ⊕ β(k), hence Hc(ak|β(k)) = Hc(ak ⊕ β(k)|β(k)), so: I ≥ n ∑ k=1 1−Hc(ak ⊕ β(k)) = n ∑ k=1 1− h(Prob(ak = β(k))), (4.28) (4.29) where h(p) is the binary entropy of the distribution {p, 1 − p}. Now substituting 144 the success probability (4.26) and using the inequality 1− h(1+y2 ) ≥ y2 2 ln 2 : I ≥ n ∑ k=1 1− h ( 1 + Er 2 ) (4.30) ≥ 1 2 ln 2 n ∑ k=1 E2r (4.31) = (2E 2)r 2 ln 2 (4.32) In order for this last value not to become arbitrarily large with r, and exceed 1 (the value of m in this protocol), we require that E2 < 12 , i.e. p < 2+ √ 2 4 . Hence, Tsirelson's bound gives the precise threshold beyond which the violation of Infor- mation Causality is possible. 4.2 Probability of success in the information causal- ity game As mentioned at the beginning of the previous Section, the Information Causality game differs from other non-local games in that one attempts tomaximise a strange function I on the inputs and outputs, rather than the probability of success. On the other hand, Tsirelson's bound is commonly formulated as an upper limit on the probability of success at the CHSH game. This contrast raises some natural questions about Information Causality, and how one uses it to derive Tsirelson's bound. If quantum theory cannot improve on the classical bound on I , does this also imply that quantum theory also cannot improve on the optimal probability of success? If not, how are probability of success and I (defined as in (4.18)) related? In fact, it can easily be seen that the optimal quantum probability of success in the Information Causality game is sometimes greater than the optimal classical probability of success. For example, in a simple version of the game in which n = 2 and m = 1, the optimal classical probability of success is 34 (e.g. when 145 Alice sends Bob x = a1 and he guesses β(k) = x regardless of k, they always win when k = 1 and win half the time when k = 2). However, by exploiting quan- tum measurements which saturate Tsirelson's bound, Alice and Bob can achieve a success probability of 2+ √ 2 4 . To do this, Alice and Bob first generate bits a ′ and b′ satisfying a′ ⊕ b′ = (a1 ⊕ a2) · (k − 1) with probability 2+ √ 2 4 . Then Alice sends Bob x = a′ ⊕ a1 and Bob outputs β(k) = b′ ⊕ x. It is also possible to obtain very different values of I for strategies with the same probabilities of success. As above, Alice can send Bob her first bit to obtain I = 1 and probability of success 34 ; alternatively, Alice and Bob can uniformly “mix” this strategy with one where Alice sends Bob her second bit and he out- puts that, so that the overall probability of success is the same but they win with probability 34 if either k = 1 or k = 2. For this strategy, I = Ic(a1 : β(1)) + Ic(a2 : β(2)) = 2 (Hc(β(1))−Hc(β(1)|a1)) = 2 ( 1− h ( 3 4 )) ≈ 0.38. (4.33) The figure I seems to take heavily into considerationwhether the likelihood of suc- cess is “shifted” into a few of Alice's components, or is evenly distributed across them. Consider mixing the first-bit strategy with a small amount of noise: since I is a continuous function, the value of I will be greater than 0.38. However, the probability of success will now be slightly weighted toward a half, i.e. strictly smaller. Thus, I and the probability of success are not even related monotonically. 4.2.1 Random Access Codes With probability of success as the figure of merit, the Information Causality game becomes very similar to the task of Random Access Coding [71]. In an (n,m, p) Random Access Code (RAC), Alice has a random bit string a of length n, which 146 she codes into a bit string of lengthm < n which is transmitted to Bob. Bob must then be able to recreate any one of Alice's bits ak with probability at least p. Much of the research into Random Access Coding so far concerns the case wherem = 1 and where the bit string a is uniformly chosen, and attempts to find the optimal success probability p as a function of n. When Alice and Bob are restricted to classical strategies, the optimal success probability in the case m = 1 is known; it is attained by using a “majority-vote” strategy, in which Alice simply sends Bob the bit that most frequently occurs in her string [71]. This gives an analytical success probability which yields a simple approximation for large n using Stirling's formula: PCsuccess = 1 2 ( 1 + 1 2n−1 ( n− 1 bn−12 c )) ≈ 1 2 ( 1 + √ 2 pin ) . (4.34) If Alice and Bob are given access to measurements on quantum states, there are two further classes of RAC that can be considered. In an Entanglement-Assisted RAC (EARAC) [3], Alice and Bob may have an initial supply of entangled states upon which to perform measurements. In a Quantum RAC (QRAC), Alice and Bob do not share any prior entanglement, however instead of transmittingm clas- sical bits to Bob, Alice transmitsm qubits to Bob. Both scenarios have been inves- tigated in the m = 1 case; for both EARACs and QRACs, the following formula has been shown to be an upper bound on the probability of success [3, 71]. PQsuccess ≤ 1 2 ( 1 + √ 1 n ) . (4.35) Pawloski et al construct an EARAC protocol in [3] which attains the bound (4.35) exactly when n is of the form 2j3k, for integers j and k. We show in Section 4.2.2 that this bound is achievable for all n, using strategies converted from a related inner product game. For QRACs it is still not known for which n (4.35) is achievable, however the optimal success probability is known to be lower-bounded 147 by the following formula [71]: PQsuccess ≥ 1 2 ( 1 + √ 1 6pin ) . (4.36) 4.2.2 Optimal success probability of EARACs Since the Information Causality game is not itself judged in terms of probability of success, it is interesting to explore inmore depth how it is used to derive a bound on the probability of success at winning the CHSH game. With this in mind, a crucial step in the argument appears to be the relation of entropy to PR-box probabilities, in the approximated bound I ≥ N ∑ k=1 1−H(Pk) ≥ 1 2 ln 2 N ∑ k=1 E2k , (4.37) where Ek is the success bias of the game, conditioned on Bob being given the number k: Ek = Prob(ak = β(k))− (1− Prob(ak = β(k))) = 2Prob(ak = β(k))− 1. (4.38) (Note that Ek is not the same as the bias E of a noisy PR-box state, as defined in Section 4.1.3.) The protocol designed in Section 4.1.3 can also be seen as a derivation of a quadratic bound for quantum strategies in the Information Causality game: N ∑ k=1 E2k ≤ 2 ln 2. (4.39) It turns out to be very useful to look at a more direct way of deriving a similar bound via the rules of quantum theory, rather than using the properties of mutual information. Recall the Inner Product game defined in Section 4.1.2, in particular the fact that any strategy can be converted to a strategy for the Distributed Inner 148 Product Computation with m = 1. Note also that the setup for the Information Causality game is in fact an example of a Distributed Inner Product Computation. Bob attempts to determine the value of a ·y, with the following additional promise on the distribution of Bob's string y: with uniform probability, exactly one of the bits (i.e. the kth bit) is 1, whilst the rest are non-zero. In this Section we derive a quantum upper bound on the Inner Product game which is very similar in form to the quadratic bias bound (4.39), and show that there exist strategies that achieve this bound. By the above reasoning, such strategies can be converted to strategies for Information Causality: we then show that one of these converted strategies attains the optimal probability of success for EARACs, given by (4.35), and hence gives the optimal probability of success for Information Causality withm = 1. The following Lemma, proved in [86], is crucial in deriving our results about the Inner Product Game. Lemma 4. For any sub-normalized vectors u1, . . . , us and v1, . . . , vt in a real Euclidean space of dimension min (s, t), there exists a density matrix ρ on a finite dimensional Hilbert space H = H1 ⊗ H2, with ±1-valued Hermitian operators A1, . . . , As on H1 and B1, . . . , Bt on H2 such that Tr((Ai ⊗Bj)ρ) = 〈ui, vj〉 for all 1 ≤ i ≤ s, 1 ≤ j ≤ t. Theorem 9. Suppose that Alice and Bob play the Non-local Inner Product Game, where Alice's input x is chosen uniformly from bit strings of length n, and Bob's input y is chosen according to any random distribution from bit strings of length n. Let Ey be the bias of the game, conditioned on the value of y. Then the set of biases {Ey} is achievable if, and only if, ∑ yE2y ≤ 1. Proof. We first prove the “only if” direction. Since Alice and Bob do not com- municate in the game, the outcome distribution P (a, b|x, y) of their outputs given their inputs must result from a local POVMmeasurement on a shared density ma- trix ρ. That is to say, for each value of x Bob receives, there is a pair of positive operators {M (1)0|x ,M (1) 1|x } which sum to the identity I(1), and for each value of y a 149 similar pair {M (2)0|y ,M (2) 1|y } for Alice, such that: P (a, b|x, y) = Tr ( M (1)a|x ⊗M (2) b|y ρ ) (4.40) The POVM measurement (4.40) can be realised as a standard (projective) quantum measurement on a density matrix ρ˜ belonging to a Hilbert space of larger dimension; moreover, the state ρ˜ can be purified into a state |ψ〉 of even larger dimension. Thus we can assume that Alice and Bob begin with the entangled pure state |ψ〉, and their outputs are obtained by measurement of Hermitian operators aˆx and bˆy respectively with eigenvalues in {0, 1}. For this set of measurements, let Pxy denote the probability that a⊕ b = x · y given that Alice and Bob are given the particular strings x and y. Suppose that a +1 value is associated to the game if a ⊕ b = x · y (i.e. they succeed), and a −1 value is associated to it otherwise. Then the bias Exy = Pxy − (1 − Pxy) can be seen as the expected value of the game, conditioned on the inputs x and y. In terms of the state |ψ〉 and operators aˆx and bˆy, the bias has a simple form as the expected value of the operator (−1)aˆx+bˆy+x·y. Exy = 〈ψ|(−1)aˆx+bˆy+x·y|ψ〉. (4.41) The bias of the game conditioned only on Bob's input y is given by the average over Alice's inputs. Since Alice's bit string is chosen uniformly at random, this is Ey = 12n ∑ xExy. To derive the quadratic bound on Bob's biases, define the normalized states: |A〉 = 1√ 2n ∑ x (−1)aˆx|ψ〉 ⊗ |x〉 (4.42) |By〉 = 1√ 2n ∑ x (−1)bˆy+x·y|ψ〉 ⊗ |x〉 (4.43) 150 so that {|By〉} forms an orthonormal set satisfying 〈By|By′〉 = δyy′ . It follows that 〈A| ( ∑ y |By〉〈By| ) |A〉 = ∑ y 〈A|By〉2 = ∑ y ( 1 2n ∑ x ∑ x′ 〈x|x′〉〈ψ|(−1)aˆx(−1)bˆy+x′·y|ψ〉 )2 = ∑ y ( 1 2n ∑ x 〈ψ|(−1)aˆx+bˆy+x·y|ψ〉 )2 = ∑ y E2y . (4.44) Since ∑ y |By〉〈By| is a projector, and |A〉 is normalised, ∑ y E2y = 〈A| ( ∑ y |By〉〈By| ) |A〉 ≤ 1. (4.45) We now prove the “if” direction: suppose that the real numbers By satisfy ∑ yB2y ≤ 1, and let the vectors ey be the standard basis vectors of some real vector space. Define the sub-normalised vectors ux = ∑ y(−1)x·yByey and vy = ey. By Lemma 4 (after purifying the state ρ), there exists a state |ψ〉 and 0, 1- valued operators aˆx and bˆy (where e.g. (−1)aˆx = Ax) such that 〈ψ|(−1)aˆx+bˆy|ψ〉 = 〈ux, vy〉 = (−1)x·yBy, and hence Ey = 1 2n ∑ x Exy = 1 2n ∑ x (−1)x·y〈ψ|(−1)aˆx+bˆy |ψ〉 = 1 2n ∑ x By = By 151 Corollary 3. There exists a quantum strategy for the Information Causality game withm = 1 which succeeds with probability 12 ( 1 + 1√n ) . Proof. Suppose that Alice and Bob play the Non-local Inner Product game where Alice's input string is chosen uniformly at random, but Bob's bit string is chosen uniformly from the set of n-bit strings which have exactly one non-zero bit. By the “if” direction of Theorem 9, it is possible to find a quantum strategy such thatEy = 1√ n for all such y (since this choice of Ey satisfies ∑ yE2y = 1). Consequently, for each string y with one non-zero bit their probability of success conditioned on y is Py = 1 + Ey 2 = 1 2 ( 1 + 1√ n ) . (4.46) The total probability of success is given by averaging Py over the distribution on Bob's string y. However, since Py is the same for all y, the probability of success is simply 12 ( 1 + 1√n ) . Recall that any strategy for this version of the Non-local Inner Product game can be transferred to a strategy for m = 1 Information Causality with the same probability of success. The probability of success in Corollary 3 is equal to the upper bound (4.35) on probability of success for EARACs, hence this is the optimal probability of success and (4.35) can be saturated for all integersn; moreover, any optimal strategy for the modified inner product game converts to an optimal strategy inm = 1 Information Causality. For larger values ofm, the optimal success probability for EARACs is not generally known; however, it's worth noting that in deriving Tsirelson's bound from Information Causality, the protocol given in Section 4.1.3 only involves one bit being transmitted from Alice to Bob, who then computes the direct product with his own output. If in fact the Information Causality game specified that at most one bit is transmitted to Bob, and the corresponding condition I ≤ 1 was used instead of I ≤ m, then the same results follow. Although it is interesting to consider the case of generalm, the crux of the argument lies in them = 1 scenario. 152 The quadratic bias bound ∑ yE2y ≤ 1 given in Theorem 9 for the Non-local Inner Product game appears to capture a great deal about the set of quantum cor- relations. In the case where Bob's input y is chosen uniformly from all bit strings of length n, the probability of success conditioned on y obeys the bound: Py = 1 2 ( 1 + 1 2n ∑ y Ey ) ≤ 1 2  1 + √ 1 2n ∑ y E2y   ≤ 1 2 ( 1 + 1√ 2n ) . (4.47) Again, since this value is constant, the total probability of success also obeys this bound. In the case n = 1, the Non-local Inner Product game is the CHSH game, and the formula (4.47) reduces to Tsirelson's bound. In fact, in this case the in- equality E20 + E21 ≤ 1 describes the exact quantum boundary in a particular 2-D slice of the non-signaling polytope. However, it is not known whether either the quadratic bias bound or the Information Causality principle is sufficient to recover the complete boundary on the set of quantum correlations. Just as with the Information Causality principle, note that it is possible to sat- urate the quadratic bias bound using only classical correlations. If Alice and Bob pick some fixed value y = y′, and Alice outputs x · y′ whilst Bob outputs 0, then they win every time Bob's input is y′, hence Ey′ = 1; clearly this implies Ey = 0 for y 6= y′. Unfortunately, although the quadratic bias bound is appealing for its relatively simple proof and obvious connection to success probabilities, it lacks the direct physical intuition of the Information Causality principle. 153 4.3 The role of entropy In Section 4.1 we derived the Information Causality principle from the existence of a measure of mutual information I˜ which satisfies four conditions: Consistency with classical mutual information, a Data Processing inequality, the Chain Rule and Symmetry. This in itself suggests an interesting perspective: any physical theory which admits a “reasonable” measure of mutual information (where “rea- sonable” here means “satisfying at least the above four conditions”) must obey the Information Causality principle, and therefore must obey any bounds on non- signaling correlations which follow from the Information Causality principle (in- cluding Tsirelson's bound). In Boxworld, where strongly non-local states such as the PR-box are permitted, clearly no such measure of mutual information can be attributed to pairs of systems. The usual classical and quantum mutual informations, denoted Ic and Iq re- spectively, are known to obey all four of the above conditions [53]. However, neither of these represent the most basic informational primitive of their respec- tive theory; rather, they are often both defined as the same function of the classical or quantum entropy: I(X : Y ) = H(X) +H(Y )−H(X, Y ). (4.48) This suggests that it is worth exploring the principle of Information Causality for measures of mutual information which are defined in a similar way, i.e. I˜(X : Y ) = H˜(X) + H˜(Y ) − H˜(X,Y ), where H˜ is some measure of entropy in a general probabilistic theory. Measures of entropy in the framework of general probabilistic theories have already received attention in the literature, including their relation to mutual information and Information Causality [38-40]. Unsur- prisingly, given the above comments, the measures of entropy considered have some intuitive properties, but tend to disobey one or more natural principles. Suppose then that a mutual information I˜ between two systems is defined in terms of an entropy function H˜ on individual systems as in (4.48). What properties 154 must the entropy satisfy in order that I˜ satisfies the four assumptions required in the derivation of Information Causality? Since (4.48) is symmetric in X and Y , I˜ will automatically satisfy Symmetry. If H˜ reduces to the classical entropy Hc whenX is classical, then I˜(X : Y )will also satisfy Consistency. The Chain Rule, which we earlier commented was without immediate physical significance, also follows straight from (4.48): I˜(X, Y : Z)− I˜(X : Z) = I˜(Y : X,Z)− I˜(X : Y ) ⇐⇒ [ H˜(X, Y ) + H˜(Z)− H˜(X,Y, Z) ] − [ H˜(X) + H˜(Z)− H˜(X,Z) ] = [ H˜(Y ) + H˜(X,Z)− H˜(Y,X,Z) ] − [ H˜(X) + H˜(Y )− H˜(X,Y ) ] ⇐⇒ H˜(X) + H˜(X, Y ) + H˜(X,Z)− H˜(X,Y, Z) = H˜(X) + H˜(X, Y ) + H˜(X,Z)− H˜(X,Y, Z) (4.49) Thus the awkward-looking Chain Rule simply disappears, when everything is formulated in terms of entropies rather than mutual information. We are left with the Data Processing inequality; for any two systems X and Y , and an allowed transformation τ on system Y alone: I˜(X : Y ) ≥ I˜(X : τ(Y )) ⇐⇒ H˜(X) + H˜(Y )− H˜(X,Y ) ≥ H˜(X) + H˜(τ(Y ))− H˜(X, τ(Y )) ⇐⇒ H˜(X, τ(Y ))− H˜(τ(Y )) ≥ H˜(X,Y )− H˜(Y ) ⇐⇒ H˜(X|τ(Y )) ≥ H˜(X|Y ) (4.50) where we identify the generalised conditional entropy, H˜(X|Y ) = H˜(X, Y )− H˜(Y ), (4.51) 155 and interpret it as the uncertainty about the state of systemX , given knowledge of the state of system Y . The inequality (4.50) has an intuitive physical interpretation in this sense: one cannot learn more about systemX by performing a transforma- tion on system Y alone, hence one's uncertainty about X , given Y , should not decrease just because of a local transformation performed at Y . The constraint (4.50) can also be written in the following way: H˜(X, τ(Y ))− H˜(X,Y ) ≥ H˜(τ(Y ))− H˜(Y ) ⇐⇒ ∆τH˜(X,Y ) ≥ ∆τH˜(Y ), (4.52) where τ is a local operation at Y , and ∆τH˜ represents the change in entropy due to τ . In this form, the inequality is a statement about changes in entropy rather than about the conditional entropy, thus has the advantage that one does not need to introduce the idea of conditional entropy. The inequality (4.52) also has an intuitive physical interpretation: local transformations may generate uncertainty on a bipartite level which is greater than the uncertainty generated only locally (for example, the transformation may destroy correlations between the systems). We have thus demonstrated that the Information Causality principle holds in any theory that permits an entropy H˜ satisfying two very reasonable properties: (i) Consistency If system X is classical, H˜(X) reduces to the classical entropy of the probability distribution on X . (ii) Ancillary Evolution For any local transformation τ on system Y , H˜(X|τ(Y )) ≥ H˜(X|Y ). (4.53) For the Information Causality principle to follow from these principles, it is in fact also necessary for the theory to contain certain local transformations. In Section 2.5.4 we assumed that any transformation which maps the set of allowed states to itself in a convex-linear fashion is part of the theory. However, there is no overriding reason why the theory should not contain a strict subset of all such 156 transformations. Indeed, one might conceive of a physical theory whose states are the full set of Boxworld states, yet for which no transformations exist, either from a system to itself or from one system to another. In such a theory, Ancillary Evolution holds for everymeasure of entropy. Therefore, even though Information Causality is violated by this theory, both Consistency and Ancillary Evolution hold for the measure of entropy which assigns the classical entropy to states of classical systems, and 0 to states of all other systems. It is interesting to determine the minimal set of transformations which a theory must allow, alongside admitting an entropy that obeys Consistency and Ancil- lary Evolution, in order for the Information Causality principle to hold. This is achieved by examining where the Data Processing inequality is used in the proof of the Information Causality principle in terms of the mutual information. There are three types of transformation to which the Data Processing inequality is ap- plied: ``tracing out'' of a system; ``constructing'' a system, i.e. a transformation from a classical system (possibly an ``empty'' system denoted by ∅) to any other type of system; and ``measuring'' a system, i.e. a transformation from any type of system to a classical system. So long as all permissible transformations belonging to these classes are contained in the theory, then Information Causality will indeed follow from Consistency and Ancillary Evolution. Nevertheless, this reduction in the number and complexity of properties moti- vates an attempt to reformulate - and derive - the Information Causality principle purely in terms of entropies; this is the topic of Section 4.3.1. In Section 4.3.2 we then give a brief overview of some attempts at introducing measures of entropy into general physical theories, and discuss how they relate to our own perspective on the Information Causality principle. 4.3.1 An entropic information causality Before deriving an entropic version of the Information Causality principle, we deduce a number of standard inequalities which must hold for any measure of entropy satisfying the Consistency and Ancillary Evolution properties from the 157 previous section: Subadditivity For systems X and Y , if τ : Y → ∅ is the “tracing out” map, then by Ancillary Evolution: H˜(X|∅) ≥ H˜(X|Y ). (4.54) The entropy ofX given a traced-out system must be the same as the entropy of X , hence: H˜(X) ≥ H˜(X, Y )− H˜(Y ) ⇒ H˜(X, Y ) ≤ H˜(X) + H˜(Y ). (4.55) IfX and Y are independent systems, then there is a local map corresponding to the preparation of the state of Y independent ofX , ω : ∅ → Y . Ancillary Evolution then implies thatH(X|Y ) ≥ H(X|∅); combining this with (4.55) gives H˜(X,Y ) = H˜(X) + H˜(Y ). Strong Subadditivity For three systems X1, X2 and Y , if τ : (X2, Y ) → Y is the marginalization map, then it follows from Ancillary Evolution that H˜(X1|X2, Y ) ≤ H˜(X1|Y ). Hence, H˜(X1, X2|Y ) = H˜(X1, X2, Y )− H˜(Y ) = H˜(X1, X2, Y )− H˜(X2, Y ) + H˜(X2, Y )− H˜(Y ) = H˜(X1|X2, Y ) + H˜(X2|Y ) ≤ H˜(X1|Y ) + H˜(X2|Y ). (4.56) The process may be iterated for a larger number of systems: H˜(X1, . . . , Xn|Y ) ≤ H˜(X1|Y ) + . . .+ H˜(Xn|Y ). (4.57) Classical Positivity One's uncertainty about the state of a classical system X is 158 never negative, even when one conditions on another system Y (which may or may not be independent of X , or classical itself): System X is classical⇒ H˜(X|Y ) ≥ 0. (4.58) To prove this, suppose that ψ is the state of system X,. Note that the state ofX is described by a probability distribution on a finite set E of outcomes for a single fiducial measurement. For each outcome e ∈ E, there is an associated probability pe, a deterministic state xe on systemX which outputs e with certainty, and a corresponding reduced state ψe of Y . The state ψ is therefore the convex combination of the product states xe · ψe (i.e. the product state whereX is in state xe and Y is in state ψe) weighted according to pe. Suppose that the systemX ′ is identical toX , and that the joint systemX,X ′ is initially in a convex combination of the product states xe · x′e, weighted according to pe. On system X ′, let τ : X ′ → Y be the transformation described by performing the fiducial measurement E on system X ′, and preparing system Y in state ψe if outcome e is obtained. The state of system X,Y is now the convex combination of the product states xe · ψe weighted according to pe, i.e. after the transformation τ , the state of system X, Y is ψ. By the Consistency postulate, before the transformation τ is performed, the conditional entropy H˜(X|X ′) reduces to the classical conditional entropy: H˜(X|X ′) = Hc(X|X ′) = Hc(X,X ′)−Hc(X ′). (4.59) Since X ′ is perfectly correlated with X , Hc(X,X ′) = Hc(X ′) therefore H˜(X|X ′) = 0. Then by the Ancillary Evolution postulate: H˜(X|Y ) = H˜(X|τ(X ′)) ≥ H˜(X|X ′) = 0. (4.60) 159 The Subadditivity, Strong Subadditivity and Classical Positivity properties must be obeyed by any measure of entropy which satisfies the Consistency and Ancillary Evolution postulates. Using these three natural properties, it is quite easy to derive an Entropic Information Causality principle. Recall that in the In- formation Causality game, Alice is given a bit-string a of length n and Bob is given a number 1 ≤ k ≤ n. After Alice transmits an m-bit message x to Bob, Bob outputs β(k), his best guess at the value of ak, the kth bit of Alice's string. Unlike in Section 4.1.3, we will no longer assume that Alice's input a is uniformly chosen. Suppose that Alice and Bob share a state of some system A,B, in a general probabilistic theory that admits some measure of entropy H˜ satisfying the Con- sistency and Ancillary Evolution postulates. Whatever their strategy, Bob's final guess β(k) is ultimately deduced from his system B and the classical system X which holds the message x; therefore there exists a transformation from X, B to β(k), and so by Consistency and Ancillary Evolution: ∑ k Hc(ak|β(k)) = ∑ k H˜(ak|β(k)) ≥ ∑ k H˜(ak|x, B). (4.61) Writing a as a multipartite state a1, . . . , an and using the iterated Strong Subaddi- tivity property (4.57), ∑ k H˜(ak|x, B) ≥ H˜(a|x, B) = H˜(a, x, B)− H˜(x, B). (4.62) Then by Subadditivity on the bipartite system x, B, H˜(a, x, B)− H˜(x, B) ≥ H˜(a, x, B)− H˜(B)− H˜(x). (4.63) 160 Introducing a term H˜(a)− H˜(a) and using the fact that H˜(a, B) = H˜(a)+ H˜(B) since a and B are independent, H˜(a, x, B)− H˜(B)− H˜(x) = H˜(a, x, B)− H˜(B)− H˜(a) + H˜(a)− H˜(x) = H˜(x, a, B)− H˜(a, B) + H˜(a)− H˜(x) = H˜(x|a, B) + H˜(a)− H˜(x). (4.64) Since a and x are classical, and using the positivity of H˜ on classical systems, H˜(x|a, B) + H˜(a)− H˜(x) ≥ Hc(a)−Hc(x). (4.65) Putting this all together gives an entropic formulation of the Information Causality principle: ∑ k Hc(ak|β(k)) ≥ Hc(a)−Hc(x). (4.66) The standard Information Causality principle ∑N k=1 Ic(ak : β(k)) ≤ m can be deduced from (4.66) as follows: since x is anm-bit string m ≥ Hc(x) ≥ Hc(a)− ∑ k Hc(ak|β(k)). (4.67) If the string a is uniformly random, then Hc(a) = ∑ kHc(ak): m ≥ ∑ k [Hc(ak)−Hc(ak|β(k))] = ∑ k Ic(ak : β(k)). (4.68) This entropic formulation of Information Causality has several advantages: firstly, it deals in entropy, which is a more basic informational quantity thanmutual information, and has a more immediate interpretation as the uncertainty inherent in the state of a system. Secondly, it leads to an even more pleasing statement re- 161 garding measures of information in general physical theories. If a general physical theory admits some measure of entropy satisfying Consistency and Ancillary Evo- lution (which are both very reasonable demands), then the Information Causality principle (and subsequent bounds on non-locality) must hold in that theory. To put it more strongly, Tsirelson's bound cannot be violated in any physical theory which admits an extension of classical entropy that behaves reasonably under dy- namics. Though this fact already follows from the proofs contained in the original Information Causality paper [51], it is striking to see it presented in this way. Both the entropic Information Causality principle (4.66) and the quadratic bias bound (4.39) can be seen as fundamental restrictions on information processing which, when applied to the full set of non-signaling correlations, recovers many of the bounds on quantum non-locality. However, whereas the quadratic bias bound suffers from a lack of a simple, physical interpretation, it is relatively easy to mo- tivate equation (4.66) from physical considerations. After his guess β(k), the re- maining uncertainty that Bob has about Alice's inputs should be no less than his original uncertainty Hc(a), minus the information in the message x. 4.3.2 Entropy in general physical theories We now describe some measures of entropy which have previously been intro- duced for general probabilistic theories such as Boxworld. These measures are invariably seen as a generalisation of the Shannon entropy for classical systems, and the Consistency property will hold. Hence, in strongly non-local theories such as Boxworld, it is expected that Ancillary Evolution will not hold, and that this measure of entropy will behave poorly under some dynamical transformation. It is most convenient to introduce these entropies using the convex set rep- resentation of general probabilistic theories, described in Chapter 2. Recall that states and effects are assigned vectors s and e respectively, such that the probabil- ity of obtaining the outcome corresponding to e, for a system in state s, is given by the inner product 〈e, s〉. A measurement M is described by a set of effects {e1, . . . er} which sum to the unit effect: ∑ i ei = u. 162 Measurement entropy For any state s, a measurement M defines a classical probability distribution M(s) = {〈ei, s〉}; it is tempting to define a measure of entropy which is the minimum of the classical Shannon entropy of this distribution over all measure- ments M. However, this poses a problem: an entropy defined in this way could always be decreased by simply “merging” effects together and considering the cor- responding outcomes to be the same - at the extreme, the unit measurement {U} consisting of just the unit effect, will always give zero entropy. In order to counter this problem, recall the notion of a refinement of a mea- surement given in Section 2.5.2, and note that “merging” effects as described above leads to a coarse-graining of the measurement. Therefore we wish to con- sider Shannon entropies of outcome distribution only over pure measurements, i.e. those which are made up of (non-parallel) extreme-ray effects. We will recap the notion of refinement for convenience: measurementM2 = {f1, . . . , fs} is a refinement ofM1 if there exists a function φ : M2 → M1 such that each ei is the sum of those fj which map to it: ei = ∑ j:φ(fj)=ei fj. (4.69) Thus, every measurement is a refinement of the unit measurement, which is com- pletely non-informative; in general, the refined measurementM1 will be strictly more informative than M2. However, simply splitting up e1 into two effects f1 = f2 = 12e1 will result in a refinement which is not more informative (this could be achieved in practice by a post-processing of results: if outcome 1 is ob- tained, the actual output is uniformly assigned one of two distinct, new values). To avoid this issue, the refinement is said to be trivial if φ(fj) = ei ⇒ fj = λjei for real numbers λj . A measurement is fine-grained if it admits no non-trivial fine-grainings. LetM∗ denote the set of all fine-grained measurements. The measurement entropy Hme of a state is the infimum of the Shannon en- 163 tropies of the outcome distributions for all fine-grained measurements [38, 40]: Hme(s) = inf M∈M∗ Hc(M(s)). (4.70) In a classical system, a fine-grained measurement consists only of vectors which are scalar multiples of the effects belonging to the single fiducial measure- ment. It is easily checked that such a measurement is a trivial refinement of the fiducial measurement itself, and that the outcome entropy is therefore minimized for the fiducial measurement. Thus the measurement entropy reduces to the Shan- non entropy for classical systems. In a quantum system, a fine-grained measure- ment consists of measurement operators which are scalar multiples of rank-one projectors; it turns out also that the measurement entropy reduces to the von Neu- mann entropy [38, 40], however this is not immediately obvious. In Boxworld, a fine-grained measurement is one in which all effects are tensor products of fiducial effect vectors, i.e. ei = X(1)a1|x1 ⊗ · · · ⊗X (n) an|xn . All three of the above theories share the property that whenever e(1) is an extreme-ray effect on system 1 and e(2) is an extreme-ray effect on system 2, then e(1) ⊗ e(2) is an extreme-ray effect of the joint system (in Boxworld, as with all max-tensor product theories, these are the only joint extreme-ray effects). Sup- pose that M(1) = {e(1)i } and M(2) = {e (2) j } are fine-grained measurements on systems 1 and 2 respectively. For any joint state s, the outcome distribution M(1) ⊗M(2)(S) = {〈e(1)i ⊗ e (2) j , s〉} has as its marginal on system 1 the outcome distribution M(1)(s) (where s(1) is the reduced state on system 1), and similarly for system 2. Therefore, by subadditivity of the Shannon entropy, Hc(M(1) ⊗M(2)(s)) ≤ Hc(M(1)(s(1))) +Hc(M(2)(s(2))). (4.71) By considering tensor products of fine-grained measurements which approach the measurement entropy to arbitrary precision on the RHS, it is clear that the mea- surement entropy is indeed subadditive in theories for which the aforementioned property holds. 164 On the other hand, the measurement entropy fails to obey strong subadditivity for some tripartite Boxworld systems [38, 40]. This is demonstrated in [38] using a similar procedure as employed in Section 4.1.3 in order for Bob to perfectly guess one of a pair of random bits given to Alice. Suppose that the random bits given to Alice are represented by two classical systems A1 and A2, whilst Bob is in possession of a g-bit system B. In order to obtain a random bit string (a1, a2), Alice simply measures her systems using the single fiducial measurements x1 and x2. Depending on whether Bob wishes to obtain the value of the first bit or the second, his measurement choice is given by x3 = 1 or x3 = 2 respectively. For Bob's output a3, suppose that the tripartite system is in the state represented by: P (a1, a2, a3|x1, x2, x3) = 1 4 δ(a3 = ax3). (4.72) The only system with more than one fiducial measurement choice is B, hence to show that this is non-signaling, it suffices to observe that the reduced systemA1A2 is in a completely uniform state, no matter the value of x3. The reduced state on B is the completely mixed state of a g-bit, hence Hme(B) = 1. Since Bob can choose his measurement in order to correlate exactly with either a1 or a2, then Hme(A1B) = Hme(A2B) = 1. Any measurement on the total system A1A2B generates 4 possible outcomes with uniform probability, henceHme(A1A2B) = 2. Therefore, Hme(A1A2B) +Hme(B) = 3 > 2 = Hme(A1B) +Hme(A2B). (4.73) The measurement entropy does not obey Classical Positivity in Boxworld, ei- ther. Consider a bipartite system composed of a classical bit system A, and a sys- tem B with two measurement choices, each of which admits three outputs, which is in the state represented by the following matrix: 165 x2 = 1 x2 = 2 a2 = 1 a2 = 2 a2 = 3 a2 = 1 a2 = 2 a2 = 3 a1 = 1 1/2 0 0 0 1/4 1/4 a1 = 2 0 1/4 1/4 1/2 0 0 Now consider a bipartite measurement which takes the following form: first measure system A to obtain a1, then measure system B using the measurement choice x1 = a1. This guarantees the output a2 = 1 on the second system, and a1 takes each of its possible valueswith probability 12 . HenceH me(AB) ≤ Hc({12 , 1 2}) = 1. However, each of the fine-grained measurements on the reduced state on sys- tem 2 results in an outcome distribution {12 , 1 4 , 1 4}, which has entropy 1.5. Thus Hme(A|B) = Hme(A|B) − Hme(B) ≤ −0.5, in violation of the Classical Posi- tivity principle. Mixing Entropy A second measure of entropy for general probabilistic theories which reduces to the Shannon and von Neumann entropies on classical and quantum systems re- spectively, is the mixing entropy [38, 40]. Every decomposition of a state s into pure states: s = r ∑ i=1 pisi (4.74) corresponds to a probability distribution p = {pi}. Suppose that P(s) is the set of probability distributions that result from decompositions of s in this manner. The mixing entropy Hmi of s is the infimum of the classical entropy over this set: Hmi(s) = inf p∈P(s) Hc(p). (4.75) 166 As shown in [38], the mixing entropy is not subadditive (let alone strongly subadditive) in Boxworld. An explicit counterexample is the following state s of a system comprising two g-bits: x2 = 1 x2 = 2 a2 = 1 a2 = 2 a2 = 1 a2 = 2 x1 = 1 a1 = 1 1/4 3/8 3/8 1/4 a1 = 2 3/8 0 1/4 1/8 x1 = 2 a1 = 1 1/4 3/8 1/4 3/8 a1 = 2 3/8 0 3/8 0 The reduced state on either system is the state which outputs 1 with probability 5 8 and 2 with probability 3 8 , regardless of the measurement choice. This confirms that the state is non-signaling, and demonstrates that Hmi(A) = Hmi(B) = 0.95... . (4.76) Recall that there are 16 local deterministic and 8 non-local PR-box variants making up the extreme states of the bipartite system AB (the PR-box variants are obtained from the PR-box by local relabellings of the inputs and outputs). Out of these 24 extreme states, only 4 of the local deterministic states and 1 PR-box variant have positive entries only where s also has positive entries. For each of these extreme states si, it can then be checked that for pisi to have all its positive entries less than or equal to those of s, we must have pi ≤ 14 for every i. This implies that any decomposition of s consists of at least 4 extreme states, none of whose coefficients is greater than 14 . Hence Hmi(AB) ≥ 2 > Hmi(A) +Hmi(B). (4.77) 167 However, Classical Positivity is satisfied by the mixing entropy [40]. If system A is classical, then for any system B, the extreme states of system AB are of the form ωA ⊗ ωB, where ωA are ωB are reduced states on systems A and B respectively. Then any decomposition of a state s of system AB takes the form s = ∑ i,j pijωAi ⊗ ωBj = ∑ j ( ∑ i pijωAi ) ⊗ ωBj , (4.78) where ωAi correspond to the outcomes of the single fiducial measurement on sys- tem A. Writing qj = ∑ i pij , one decomposition of the reduced state on system B is sB = ∑ j qjωBj . (4.79) Any decomposition of s generates an entropy Hc(p) ≥ Hc(q) ≥ Hmi(B), there- fore Hmi(AB) ≥ Hmi(B) and so Hmi(A|B) ≥ 0. Relation to Information Causality Both the measurement and mixing entropies are extensions of the von Neu- mann entropy for quantum systems, and therefore obey all three of Subadditivity, Strong Subadditivity and Classical Positivity when confined to only classical or quantum theory. In Boxworld however, they both violate two of these natural properties; since they both obey Consistency, they both must certainly violate An- cillary Evolution. The entropic formulation of Information Causality gives two interesting extra perspectives on this: firstly, because Boxworld allows violations of Tsirelson's bound, there simply can be no measure of entropy that will be “per- fect” in the sense of obeying both Ancillary Evolution and Consistency. Even if there exists some measure of entropy in Boxworld which satisfies Subadditivity, Strong Subadditivity and Classical Positivity, it will inevitably violate some other natural consequence of Ancillary Evolution and Consistency. Secondly, despite already possessing many natural properties, both the measurement and mixing en- tropies will always disobey Ancillary Evolution in any theory allowing a violation 168 of Tsirelson's bound. On one hand, this is slightly disappointing: without natural extensions of en- tropies which obey natural laws, we may find it more difficult to explore informa- tion processing in more general physical paradigms. Ordinarily, counter-intuitive phenomena provide us with something to learn from rather than to be scared of, however in the case of Boxworld it seems that even the most fundamental tenets of information processing run into intractable obstacles. On the other hand, this provides exciting evidence that entropy plays a deep role in limiting the amount of non-locality which is achievable in nature. The existence of a reasonable mea- sure of entropy may even be a more fundamental physical requirement than the standard Hilbert-space quantum postulates. 4.4 Discussion We have explored two different perspectives on the Information Causality game presented in [51]. The first perspective is to consider the probability of success as the figure of merit: we see that quantum theory gives an advantage which is not reflected in the original figure of merit I , for which quantum and classical theory perform equally well. Investigating how these probabilities are involved in deriving Tsirelson's bound from Information Causality leads us to a quadratic quantum bound on the biases achieved, given the different inputs for Bob in the non-local inner product game: ∑ y E2y ≤ 1. (4.80) This inequality holds for any probability distribution over Bob's inputs, and hence applies to the version of Information Causality without any communication from Alice to Bob. This is another example of a bound which quantum and classi- cal correlations can both saturate, but stronger non-local correlations can violate, and from which one can recover the same sections of quantum boundary within 169 two-dimensional slices of the non-signaling polytope. Furthermore, the fact that quantum correlations allow one to achieve any set of biases satisfying this rule suggests that it captures a significant amount of information about the set of quan- tum correlations. The question remains whether quadratic inequalities such as this are sufficient to completely recover the quantum boundary, and whether other quadratic inequalities can tell us useful things about quantum capabilities in other non-local tasks. The second perspective is to view the Information Causality inequality not just as a constraint on possible physical theories but also as a consequence of the existence of a 'good' measure of entropy. In this direction, we have shown that In- formation Causality can be derived given any extension of the entropy from clas- sical to more general systems which satisfies H˜(X|τ(Y )) ≥ H˜(X|Y ) under local transformations on system Y . Conversely, any theory which violates Information Causality (such as Boxworld) cannot have anymeasure of entropy which obeys the above evolution law and agrees with the Shannon entropy for classical systems. In particular, any general probabilistic theory which violates Tsirelson's bound can- not possess a measure of entropy which obeys these conditions; this strange link between non-locality and entropy is definitely worth exploring further. Given the results of Section 4.3, as well as those of [38-40], it seems that the existence of a “good” entropy for quantum theory, which shares so many of the properties of the classical entropy, is very special within the class of general prob- abilistic theories. Are there other theories for which one can define a “reasonable” measure of entropy satisfying Consistency and Ancillary Evolution, or is this a defining feature of quantum theory? Of course, we could consider a general prob- abilistic theory in which states and measurements are represented by the same Hilbert-space objects as quantum theory, but which allows only a strict subset of quantum states. Trivially, the restriction of the von Neumann entropy would sat- isfy the same properties as for full quantum theory. Amore interesting question is whether a reasonable entropy is definable in the- ories which cannot be simulated by quantum theory, for example those which have 170 non-local correlations that are unattainable via measurements of quantum states. Since Information Causality may be deduced from simple entropic properties, the existence of a reasonable entropy potentially places stronger bounds on quantum theory than Information Causality does alone. It would be interesting to look for other games in which possessing a reasonable measure of entropy precludes one from performing better than is possible classically. 171 Chapter 5 Reversible Boxworld dynamics To do mathematics is to engage in an act of discovery and conjecture, intuition and inspiration; to be in a state of confusion— not because it makes no sense to you, but because you gave it sense and you still don't understand what your creation is up to; to have a breakthrough idea; to be frustrated as an artist; to be awed and overwhelmed by an almost painful beauty; to be alive, damn it. “A Mathematician's Lament” Paul Lockhart 172 In this chapter we explore the class of reversible dynamics in the general prob- abilistic theory known as Boxworld. Recall from Section 2.5.4 that the allowed transformations of a general probabilistic theory are determined once its state space has been defined, and that reversible transformations are those for which there is an allowed inverse transformation. Thus this chapter is concerned with the set of reversible transformations that are permitted by the state space structure of Box- world. As this state space is constrained only by the conditions of non-signaling and local tomography for composite systems (as well as the probabilistic laws of positivity and normalization), Boxworld represents something of a canonical ex- ample of a general probabilistic theory, and is a natural benchmark against which to analyse various features of quantum theory. One such feature is the principle of reversible transitivity, which states that any pure state of either an individual or composite system may be reversibly transformed to any other. This holds triv- ially in quantum theory; indeed, any set of n orthonormal states in n-dimensional Hilbert space may be linearly mapped to any other such set, and this mapping extends naturally to a unitary transformation on the Hilbert space. The importance of reversible transitivity as a defining feature of quantum the- ory is indicated by its ubiquitous use (or the use of stronger versions of it) in various so-called “derivations” of quantum theory from reasonable axioms [2, 28, 33-35, 87]. Roughly speaking, in these derivations the authors consider reasonable (or physically motivated) constraints on the systems of general probabilistic theories, much the same as the types of systems we described in Chapter 2. It is then shown that some small set of these constraints is sufficient to single out quantum theory. By “single out” we mean that in any theory which conforms to these constraints, the systems represent finite-dimensional quantum systems, and individual systems combine to form composite systems in exactly the manner proscribed by the tensor product of their respective Hilbert spaces. Reversible transitivity is one such constraint that applies to systems of general probabilistic theories, and tends to play a crucial role in the process of singling out quantum theory. For example, if reversible transitivity is assumed, then every 173 system has a well-defined maximally mixed state analogous to the quantum max- imally mixed state. To show this, it is often first argued on physical grounds that the set of reversible transformations on any system forms a compact, topological group T , which therefore admits a Haar measure. This allows one to define on any system a unique state, ω = ∫ T T (s) dµ(T ), (5.1) for an arbitrary choice of pure state s (note that ω is independent from s, again due to reversible transitivity). The maximally mixed state ω is equivalent in various ways to the well-known quantum maximally mixed state 1nI in a Hilbert space of dimension n: ω is invariant under reversible transformations, ω gives a non-zero outcome probability for every non-zero effect, and no state s is perfectly distin- guishable from ω in a single experiment. We have stated that reversible transitivity is a useful constraint, but have not yet shown why it might be a reasonable one. To motivate reversible transitivity on physical grounds, one might appeal to the notion of conservation of information [28, 33], i.e. that information is ultimately neither lost nor gained in the course of any physical process. If a systemA is in a pure state sA, then it is always possible to transform A into some other pure state tA: if necessary, one can perform a “throw away and replace” operation, in which every state inA is irreversiblymapped to tA. However, conservation of information stipulates that -- in order for information not to be lost -- the system A must be coupled with the environment E in such a way that the global transformation on AE is reversible. Before the transformation, the global systemAE is in some state sAE whose marginal on systemA is sA; after the transformation, the global systemAE is in state tAE , whose marginal on systemA is tA. Therefore the state of system A may be transformed from an arbitrary pure state sA to any other pure state tA in a manner which is ultimately reversible. It must be conceded that this does not quite grant us a reason to expect re- versible transitivity in the form we have defined it, since it does not guarantee that 174 a reversible transformation exists on system A alone which maps state s to state t. In fact, the author is not aware of a conclusive, natural reason to expect that the principle of reversible transitivity is obeyed by the universe; despite this, the prin- ciple of reversible transitivity holds a far greater immediate appeal than the stan- dard Hilbert-space quantum postulates. Moreover, the conservation of information does at least suggest that it is interesting to study which reversible transformations are permitted in general probabilistic theories, since it implies that all transforma- tions must be reversible as long as the system is taken to be “large enough”. A demand for reversibility on some level would also open up links to the field of thermodynamics, in which closed systems undergo reversible processes (the links between quantum information theory, quantum foundations and thermodynamics is a recent area of investigation, for example see [88-90]). As demonstrated by Proposition 7 in Section 2.5.5, the property of reversible transitivity does not hold in Boxworld. Proposition 7 shows that reversible trans- formations map pure product states to pure product states: therefore, a pure prod- uct state cannot be reversibly mapped to a pure entangled state (such as the PR- box state). However, this result does not give us a full characterisation of re- versible Boxworld transformations: entanglement cannot be reversibly generated, but do there exist interesting reversible dynamics that begin and end with entan- gled states? Recently, it has been shown that, for a restricted class of composite Boxworld systems, the set of reversible transformations does not allow for any interesting dynamics [4]. Specifically, for composite systems in which all subsys- tems have the same number of local fiducial measurements, and the same number of outcomes for each measurement, the only allowed reversible transformations are compositions of relabellings of measurement choices and outcomes, and per- mutations of subsystems. These transformations can be described easily via the outcome distribution representing the state. For example, the transformation of a system comprising two g-bit subsystems, which permutes the two subsystems 175 then switches the outputs of subsystem 2, is described by: P (a1, a2|x1, x2) → P (a2, a1 ⊕ 1|x2, x1) . (5.2) These transformations are trivially convex-linear, i.e. they respect probabilistic mixtures of states, and they map allowed states to allowed states in a reversible manner. That they are the only possible transformations in the Boxworld systems considered is surprising, and suggests a powerful link between a theory's state space and its set of reversible transformations. What is it about these systems that reduces the reversible transformations to trivial relabellings? Is it related to the polytopic nature of the state space, its symmetries, or simply the fact that so many states are allowed? By developing techniques to extend this result to the case of non-identical subsystems, we can hope to shed some light on the answers to these questions, and to learn something about the significance of the reversible transitivity which quantum theory enjoys. We begin in Section 5.1 by recapping themethod of proof used in [4] to demon- strate that all reversible dynamics are trivial in the case that all subsystems have the same number of measurements, each with the same number of outcomes. In Sec- tion 5.2 we introduce and develop a “tabular” representation of Boxworld states and effects, which has previously been used in [5] to explore the structure of Box- world measurements. By considering the action of reversible transformations in this tabular regime, it is not difficult to recover the bipartite version of the general- isation of [4] to non-identical subsystems. By considering these tabular represen- tations, one builds up an intuition with which the full generalisation to multipartite systems becomesmuchmore tractable. In Section 5.3we use this intuition to set up some important results about decompositions of Boxworld effects, then in Section 5.4 we employ these results to obtain Theorem 13, the main result of the chap- ter. In Section 5.5 we explore the limits of extending this result to other general probabilistic theories: namely, we answer in the negative the question of whether a lack of interesting reversible dynamics is due solely to the polytopic structure of the state space. 176 Much of the work in this chapter was undertaken collaboratively with Anthony Short and published in [43], although the recognition of the importance of multi- form effects, and the final proof of the result in Section 5.4, are due to the author. The probabilistic model constructed in Section 5.5 is a result of the author's origi- nal, unpublished work. 5.1 Identical subsystems: review Recall from Chapter 2 that an individual Boxworld system is characterised by a finite number of fiducial measurements, each of which has a fixed, finite number of possible outcomes. Boxworld systems combine under the max-tensor product, so that the joint states of composite Boxworld systems are all those non-signaling states which are normalised and produce positive probabilities for all local fiducial effects. In the case where all subsystems are identical, and all fiducial measure- ments have the same number of outcomes, we can assign constants M and K to be the number of fiducial measurements per subsystem and the number of out- comes per measurement respectively. For each subsystem we will assign a set of M ·K fiducial effect vectorsX(i)a|x = Xa|x of the abstract state space for each sub- system, which in our construction will not depend on i. This set of vectors will correspond to a valid representation of Boxworld as long as the set U ∪ {Xa|x} (where 1 ≤ x ≤ M and 1 ≤ a ≤ K − 1) is linearly independent, and for each x, ∑ aXa|x = U . In Boxworld, the composite effect cone is generated by tensor products of the extreme rays of the local effect cones. These extreme rays are the local fiducial effect vectors, hence the composite fiducial effects e are the tensor products of local fiducial effects: e = Xa1|x1 ⊗ · · · ⊗XaN |xN . (5.3) Recall from Proposition 6 of Section 2.5.4 that reversible adjoint transformations must map extreme rays of the effect cone to other extreme rays. Each reversible transformation is therefore a permutation of the set of composite fiducial effects. 177 Since each fiducial effect may be described as a “string” of local fiducial effects, it is convenient to consider a one-to-one correspondence between composite fidu- cial effects and strings of fiducial effects, such that the above composite effect corresponds to the fiducial effect string, e˜ = (Xa1|x1 , . . . , Xan|xN ). (5.4) The ith component of e˜ is therefore e˜i = Xai|xi . This notation allows for a conve- nient notion of distance between fiducial effect strings. Definition. For fiducial effect strings e˜ and f˜ , the Hamming distance between e˜ and f˜ is defined to be the number of components in which they differ, i.e. dH(e˜, f˜) = #{i ∈ [N ] : e˜i 6= f˜i}. (5.5) With these comments in hand, the proof proceeds via the following steps: • The local fiducial effect vectors are explicitly constructed in a vector space V so that any reversible adjoint transformation T † corresponds to an orthog- onal matrix on the composite tensor product space. • The inner product between composite fiducial effects is then the product of the component-wise inner products between local fiducial effects, and is preserved by T †. • For any pair of composite fiducial effects who differ in exactly one compo- nent, their images under T † similarly differ in exactly one component. • The permutation induced by T † on the fiducial effect strings thus preserves Hamming distance 1. A combinatorial theorem implies that any such per- mutation must be composed of permutations of components, and local per- mutations. 178 • T † therefore acts on the composite fiducial effects in an identical manner, thus it is trivial, in the sense of being composed of permutations of sub- systems and local permutations. If T † is trivial for this particular choice of the fiducial effect vectors, then T † is trivial as a mapping of the composite outcome distributions P (a1, . . . , aN |x1, . . . , xN). We now proceed through these steps in more detail, closely following the ar- gument given in [4]. 5.1.1 Fiducial effect vectors We now construct explicit real vectors for the fiducial effects Xai|xi of a Box- world system that hasM fiducial measurement choices andK outcomes for each fiducial measurement. We first construct, for each fiducial measurement choice, a (K−1)-simplex whoseK vertices loosely represent theK outcomes of that mea- surement, then form a direct sum of the spaces containing these simplices in order to represent the full outcome set. We add to this direct sum a one-dimensional space to represent the unit effect, and ensure that each fiducial effect overlaps symmetrically with this extra dimension so that summing over the effects of any one measurement gives the unit effect. A (K − 1)-simplex is formed ofK vertices in RK−1, whose positions may be defined as a set of vectors {v1, . . . , vK} equidistant from the origin, with constant pairwise inner products. To construct such a set of vectors, consider the standard basis vectors ei ofRK , and define c = 1K ∑K 1 ei to be their barycentre. Then using the fact that 〈ei, c〉 = 〈c, c〉 = 1K for all i it is straightforward to check that the K 179 vectors defined by vi = (ei − c) satisfy the following equalities, 〈vi, vj〉 = { K−1 K (i = j) − 1K (i 6= j) (5.6) 〈vi, c〉 = 0 (5.7) K ∑ i=1 vi = 0. (5.8) The hyperplane orthogonal to c contains the vi and is isometrically isomorphic to RK−1, hence our desired set of vectors exists also in a space of one less dimension, and moreover {v1, . . . , vK−1} is a basis for RK−1. Note that for all 0 ≤ j, k ≤ K − 1, 〈vj| ( K ∑ i=1 |vi〉〈vi| ) |vk〉 = { K−1 K (j = k) − 1K (j 6= k) = 〈vj, vk〉. (5.9) Since the vi span RK−1, it follows that K ∑ i=1 |vi〉〈vi| = I. (5.10) Let {ei} be the standard basis of the vector space RM . We associate each measurement x with the basis vector ex, so that the simplex corresponding to that measurement is embedded in a separate vector space ex ⊗ RK−1. We wish to construct the local fiducial effect vectors in the space ( RM ⊗ RK−1 ) ⊕ R, and associate the rightmost R with the unit effect by setting U = (0 ⊗ 0) ⊕ 1. The local fiducial effect vectors are: Xa|x = ( √ M K ex ⊗ va ) + ( 1 K U ) . (5.11) 180 From (5.8) it can be seen that for each x, we have ∑ aXa|x = U as required. Moreover, since {ei} and {v1, . . . , vK−1} are linearly independent sets, so is the set U ∪ {Xa|x} for 1 ≤ x ≤ M and 1 ≤ a ≤ K − 1. Therefore this choice of fiducial effect vectors is a valid representation of a single Boxworld system. From (5.10) it also holds that, ∑ a,x |Xa|x〉〈Xa|x| = M K ∑ x ( |ex〉〈ex| ⊗ ( ∑ a |va〉〈va| )) + 1 K2 ∑ a,x |U〉〈U| = M K ( ∑ a,x |ex〉〈ex| ⊗ |va〉〈va|+ |U〉〈U| ) = M K I. (5.12) Similarly, for the composite fiducial effect vectors, ∑ a,x |Xa1|x1〉〈Xa1|x1 | ⊗ · · · ⊗ |XaN |xN 〉〈XaN |xN | = ( M K )N I (5.13) This immediately implies that any reversible adjoint transformation T † is orthog- onal, since it acts to permute the composite fiducial effects, T †T = ( K M )N T † ( ∑ a,x |Xa1|x1〉〈Xa1|x1 | ⊗ · · · ⊗ |XaN |xN 〉〈XaN |xN | ) T = ( K M )N ( ∑ a,x |Xa1|x1〉〈Xa1|x1 | ⊗ · · · ⊗ |XaN |xN 〉〈XaN |xN | ) = I. (5.14) Therefore T † preserves the inner product between any pair of composite fidu- cial effects, which is determined by the component-wise inner products between 181 local fiducial effects: 〈Xa|x, Xa′|x′〉 = 1 K2      1 (x 6= x′) 1−M (x = x′, a 6= a′) 1 +M · (K − 1) (x = x′, a = a′) (5.15) 5.1.2 Induced fiducial effect string permutations Recall that any reversible adjoint transformation T † induces a permutation on the fiducial effect strings, i.e. strings whose entries are the local fiducial effects. When M andK are constant across subsystems, and in particular whenM ≥ 2 (ensuring that the subsystems are non-classical), then we can use the vector representation of Section 5.1.1 to show that these permutations are highly restricted in form. The assumption that M ≥ 2 guarantees that the inner product between any two lo- cal fiducial effect vectors is non-zero, and is strictly negative exactly when they correspond to different outcomes of the same measurement. Lemma 5. On a composite Boxworld system withM ≥ 2 local measurements and K local outcomes, suppose that T is a reversible transformation, with an induced permutation T˜ † on fiducial effect strings. Then T˜ † preserves a Hamming distance of 1 between pairs of fiducial effect strings, i.e., for any composite fiducial effects e and f with corresponding fiducial effect strings e˜, f˜ , dH(e˜, f˜) = 1 =⇒ dH(T˜ †(e˜), T˜ †(f˜)) = 1 (5.16) Proof. Suppose e˜ = ( Xa1|x1 , . . . , XaN |xN ) and f˜ = ( Xa′1|x′1 , . . . , Xa′N |x′N ) are two fiducial effect strings that differ only in component j. The inner product between the composite fiducial effects e and f is the product of the component-wise inner products between local fiducial effects: 〈e, f〉 = N ∏ i=1 〈Xai|xi , Xa′i|x′i〉. (5.17) 182 From (5.15), the greatest value that K2n〈e, f〉 can take is (1 + M(K − 1))N , which occurs when ai = a′i and xi = x′i for all i. Since the only negative term in (5.15) is 1 − M (when x = x′, a 6= a′), the least value that can be taken is (1 − M)(1 + M(K − 1))N−1. As M ≥ 2, this occurs if and only if e and f differ in one component, on which they represent different outcomes of the same measurement. In this case, since T is orthogonal, K2n〈T †(e), T †(f)〉 = (1−M)(1 +M(K − 1))N−1, (5.18) therefore T˜ †(e˜) and T˜ †(f˜) must also differ in exactly one component. It remains to show that if e˜ and f˜ differ in one component, on which they represent outcomes of different measurements, i.e. xj 6= x′j , then again T˜ †(e˜) and T˜ †(f˜) differ in exactly one component. In this case K2n〈e, f〉 = K2n〈T †(e), T †(f)〉 = (1 +M · (K − 1))N−1. (5.19) This alone does not suffice to show that T †(e) differs in only one component from T †(f), since for example it may be that (1 −M)2 = (1 +M · (K − 1)). How- ever, consider any local pure state s(j) on subsystem j which gives outcome aj for measurement xj and outcome a′j for measurement x′j (outcomes for the remaining measurement choices may be assigned arbitrarily). For the remaining subsystems with i 6= j, let s(i) be any pure state which gives outcome ai for measurement xi. Then the pure product state s = s(1) ⊗ · · · ⊗ s(N) is constructed such that 〈s, e〉 = 〈s, f〉 = 1, hence e+ f E+ UN . Recall from Section 2.5.4 that a reversible adjoint transformation T † maps Emax+ linearly and bijectively to itself such that T †(UN) = UN ; hence T †(e) + T †(f) E+ UN . It follows that for at least one component k, the local fiducial ef- fects T˜ †(e˜)k and T˜ †(f˜)k correspond to outcomes of different measurements. This fact, combined with (5.19), implies that T˜ †(e˜) differs from T˜ †(f˜) in exactly one component. We now apply a general combinatorial theorem concerning permutations of 183 strings. We will consider strings belonging to some Cartesian product of finite alphabets A1 × · · · × AN ; in this Section each alphabet will comprise theM ·K distinct local fiducial effect vectors and will therefore be identical, however we state and prove the theorem without the assumption of identical alphabets, as this will become useful when we apply the theorem in the case that not all subsystems are identical. In order to prove the theorem, it is convenient to make rigorous our notion of local operations and permutations of components, at least in regards to fiducial effect strings. A permutation Q of A1 × · · · × AN is a local operation if there is some per- mutation Qi of the set Ai such that Q : (a1, . . . , ai, . . . , aN) 7→ (a1, . . . , Qi(ai), . . . , aN). (5.20) Q is a component permutation if there is some permutation σ of the set [N ] such that Q : (a1, . . . , aN) 7→ (aσ(1), . . . , aσ(N)). (5.21) Note that if i, j ∈ [N ] such that σ : i 7→ j, then Q is a valid permutation only if Ai = Aj . We may as well assume for the time being that an alphabet of size n is simply the set {0, . . . , n− 1}. Denote by 0 the string (0, . . . , 0), and define the Hamming weight of a string a to be the number of non-zero components, i.e. WH(a) = dH(a, 0). Theorem 10. Let A1, . . . ,AN be finite alphabets, and Q be a permutation of the set A1 × · · · × AN . If Q preserves a Hamming distance of 1 between pairs of strings, then it is a composition of operations which permute components, and operations which act independently on each component (local operations). Proof. Let Ai = {0, . . . , ni − 1}. We may assume (by pre-composing Q with local operations if necessary) that Q maps the string 0 to itself. By assumption, Q acts as a permutation on the set of strings with exactly one non-zero component. Let Li denote the set of strings which are non-zero only in the ith component: the 184 components of each Li are at Hamming distance 1 from each other, so Q maps each Li to some Lj , and moreover maps Li to Lj only if ni = nj . Suppose that we pre-composeQ with the component permutation which maps component j to component i just when Q maps Li to Lj , to obtain a new map Q1 which is maps each set Li to itself. Suppose further that we pre-compose Q1 with a local operation for every component i, which acts as the inverse of Q1 on the set Li, to obtain a new map Q2 which therefore fixes each string of Hamming weight 1. Note that if Q2 can be written as a combination of local operations and component permutations then so can Q, hence we may assume without loss of generality that Q fixes strings of Hamming weight 1. Given that Q fixes strings of Hamming weight 1, we show by induction that it fixes all strings. Suppose that it fixes strings of weight less than W , and let a be a string of weightW . However, a is uniquely specified by all those strings of Hamming distance 1 from it, that have weightW −1. Since those strings are fixed by Q, so too must a be. Hence, Q fixes strings of weightW and by induction, all strings. 5.1.3 Characterisation of reversible transformations It follows from Sections 5.1.1 and 5.1.2 that reversible adjoint transformations permute the composite fiducial effect vectors in a way which is a composition of local subsystem operations and permutations of subsystems. Thus the transforma- tion itself acts in a similar manner on states: it is a composition of local subsystem operations and permutations of subsystems. Since the systems are identical, we will see that any permutation of subsystems is a valid reversible transformation. However, local operations are further restricted by the following Lemma which, importantly, applies in the general Boxworld scenario, without assuming that all measurements have the same number of outcomes. Lemma 6. The only allowed reversible transformations of a single boxworld sys- tem are relabellings of measurement choices and measurement outcomes. 185 Proof. Let T be a reversible transformation, so that the adjoint T † permutes the local fiducial effect vectors Xa|x. Suppose that the system hasM measurements, and that measurement x has Kx outcomes. If x 6= x′, then for any choice of 1 ≤ a ≤ Kx and 1 ≤ a′ ≤ Kx′ there exists a pure state s which gives outcome a when x is measured and outcome a′ when x′ is measured. This state obeys 〈Xa|x, s〉 = 〈Xa′|x′ , s〉 = 1, hence Xa|x + Xa′|x′ E+ UN . However, for any 1 ≤ a, a′ ≤ Kx we have that Xa|x +Xa′|x ≤E+ UN , since ∑ aXa|x = U , and any conic combination of the fiducial effect vectors is a member of E+. Now, Xa|x +Xa′|x ≤E+ UN ⇐⇒ T †(Xa|x) + T †(Xa′|x) ≤E+ UN , (5.22) hence two fiducial effect vectors correspond to outcomes of the same measure- ment x only if their images under T † also correspond to outcomes of the same measurement x′. Therefore the permutation is a composition of permutations of the form: Xa|x 7→ Xa′|x (5.23) corresponding to relabelling of outcomes, and permutations of the form: Xa|x 7→ Xa|x′ (5.24) corresponding to relabelling of measurement choices. Note that one measurement choice may be mapped onto another only if they have the same number of out- comes. Combining the results of the previous Sections leads to the main result of [4]. Theorem 11. In a composite Boxworld system, withM ≥ 2 local measurements for each subsystem and K local outcomes for each measurement, the only re- versible transformations are permutations of subsystems, and local relabellings of measurement choices and outcomes. Furthermore, all such transformations are allowed. 186 Proof. The fact that these are the only possible reversible transformations follows from Sections 5.1.1 and 5.1.2, and Lemma 6. It remains to show that any such transformation is allowed. This follows easily from viewing these transformations acting on the outcome distributions representing states. For example, permuting subsystems 1 and 2 is given by the mapping: P (a1, a2, . . . , aN |x1, x2, . . . , xN) 7→ P (a2, a1, . . . , aN |x2, x1, . . . , xN) , (5.25) and relabelling the outcomes of measurement x1 on subsystem 1 via the permuta- tion σ of the set [K] is given by the mapping: P (a1, . . . , aN |x1, . . . , xN) 7→ P (σ(a1), . . . , aN |x1, . . . , xN) . (5.26) All of these mappings map allowed states to allowed states in a reversible and convex-linear fashion, hence correspond to allowed reversible transformations. We conclude this Section with some remarks about attempting a direct general- ization of this result to composite systemswith differing numbers of measurements on subsystems, and differing numbers of outcomes per measurement. A natural first step would be to relax the demand that M and K are constant across sub- systems, but retain the demand that each local fiducial measurement has the same number of outcomes for a given subsystem. The local fiducial effect vectors for subsystem i can still be constructed as in Section 5.1.1, given that there areM (i) fiducial measurement choices, each of which hasK(i) possible outcomes. The in- ner product between any pair of vectors is still given by (5.15) -- but withM and K replaced by M (i) and K(i) -- and composite fiducial effects are still given by tensor products of the local fiducial effects. The proof that reversible transforma- tions are orthogonal carries through, and so it seems worth examining whether a consideration of inner products will again demonstrate that a Hamming distance of 1 is preserved. If so, due to the fact that Theorem 10 applies to arbitrarily-sized alphabets, it follows that all reversible transformations are compositions of local 187 operations and subsystem permutations. Unfortunately however, the proof of Lemma 5 does not carry through so nicely ifM and K are allowed to vary across subsystems. A simple counterexample is provided by a composite system comprising 3 subsystems where the number of measurement outcomes K(1) = K(2) = K(3) = 2 is constant but M (1) = 2, M (2) = 2 and M (3) = 10. In this case, for any two effects whose components agree except for the third subsystem, where they correspond to outcomes of dif- ferent measurements, the inner product will be: (1 +M (1) · (K(1) − 1))(1 +M (2) · (K(2) − 1)) (K(1) ·K(2) ·K(3))2 = 3× 3 12 = 3 4 . (5.27) On the other hand, a pair of effects with different outcomes of the same mea- surement on components 1 and 3, and outcomes of different measurements on component 2, will generate the same inner product: (1−M (1))(1−M (3)) (K(1) ·K(2) ·K(3))2 = (−1)× (−9) 12 = 3 4 . (5.28) Any reversible transformation whichmaps a pair of effects of the first kind to a pair of the second kind would necessarily be non-trivial, since it would not preserve the Hamming distance. However, as we prove later in this chapter, it is impossible to extend this transformation in a consistent, linear manner to a full permutation of the composite fiducial effect vectors; this system does not in fact allow for non- trivial reversible transformations. It may be observed that there is a lot of play in the variablesM (i) andK(i) in attempting to generate these “pathological” systems, and that as the number of subsystems increases there will certainly be infinitely many pathological systems (for example, combining any number of copies of the above system will result in a system which will have many more overlaps in the inner product). It does not appear to be a promising task, therefore, to simply go through and check each case where Lemma 5 might go wrong. A natural second attempt to extend the proof is to embed an arbitrary Box- 188 world system into one which is of the identical-subsystem form considered in this Section, in the hope that the smaller system will inherit its reversible transforma- tions from the larger. Suppose, for example, that it was possible to “add in” a new measurement with just two outcomes on any one subsystem, in such a way that the reversible transformations of the old system are induced by those of the new. By repeatedly performing this procedure, one can arrive at a system for which subsys- tems have the same number of measurements. Suppose also that it was possible to also “add in” outcomes to measurements in a similar way, so that one may arrive at a system for whichM andK are both constant across subsystems. Having char- acterised the reversible transformations of this larger system, we would perhaps be able to infer from this procedure that the reversible transformations of the smaller system are trivial, since they are inherited from the larger system. However, this embedding procedure does not proceed as straightforwardly as hoped. Consider for example a system with any number of subsystems, measure- ments and outcomes, and suppose we added in an extra outcome to the first fidu- cial measurement of the first subsystem (which previously hadK outcomes, say). This generates a set of new composite fiducial effects, whose first component is X(1)K+1|1. For any reversible transformation on the larger system including this set of effects, we might attempt to induce a transformation on the smaller system by simply restricting the permutation of fiducial effect strings to those which do not have X(1)K+1|1 in the first component. However, two problems are encountered: firstly, the transformation may map fiducial effects not in the above set to those which are; secondly, the induced permutation is unlikely to respect the linearities of the smaller set of fiducial effects, such as K(1)1 ∑ a=1 X(1)a|1 ⊗ · · · ⊗X (N) aN |xN = K(1)2 ∑ a=1 X(1)a|2 ⊗ · · · ⊗X (N) aN |xN (5.29) Given these difficulties, it seems that a fresh approach is needed to demonstrate that reversible transformations preserve Hamming distance 1 between pairs of ef- fects. Note that proving this in the general case is sufficient to generalise the fact 189 that all reversible transformations are trivial, since the remaining steps of the proof do not rely on the number of measurements and outcomes being constant across subsystems. In the rest of this Chapter we develop a proof of this property of re- versible transformations, which is not based on considering specific orthogonal representations of Boxworld systems as before, but on the basic convex structure of the state and effect spaces. It should be noted that our demands concerning classical systems must be adapted somewhat: in the case where M is constant, simply demanding M ≥ 2 ensures that no subsystem is classical. However, ifM is allowed to vary then we also need to rule out the possibility that some subsystems are classical and others are not. In the remainder of this Chapter we will demand thatM (i) ≥ 2 for all i, i.e. that all subsystems are non-classical. If this were not the case, then non-trivial reversible transformations do in general exist. As a simple example, consider a bipartite system with M (1) = 1, K(1)1 = 2 so that subsystem 1 is classical, and M (2) = 2, K(2)1 = K (2) 2 = 2 so that subsystem 2 is a g-bit system. The reversible “C-NOT” operation on this system is defined by: X(1)1|1 ⊗X (2) a|x → X (1) 1|1 ⊗X (2) a|x (5.30) X(1)2|1 ⊗X (2) a|x → X (1) 2|1 ⊗X (2) a⊕1|x. (5.31) To check that this transformation is indeed linear, one need only verify that the unit effect U (1) ⊗U (2) is mapped to itself, and that effects of the form U (1) ⊗X(2)a|x and X(1)a|x ⊗ U (2) have a well-defined image, independent of how the local unit effect is decomposed on each subsystem. This turns out to be true because there is only a single decomposition of the unit effect of the classical subsystem into local fiducial effects. Since the operation is a linear permutation of composite fiducial effects, it is an allowed transformation. A similar example can be constructed for any system with at least one classical subsystem, in which a local operation is performed, conditioned on the outcome obtained on the classical subsystem. 190 5.2 Effect tables Recall from Section 2.3.3, in which we first introduced Boxworld, that under the assumption of local tomography, a composite Boxworld state can be represented as a multi-dimensional tabular array. This idea was first introduced and discussed in [5] in the context of Boxworld measurements; we will develop our own inves- tigation of this idea, as it provides a neat diagrammatic portrayal of many of the results we will prove before achieving the main result. Our development of these tabular representations is original work that has not been published. The number of dimensions or axes is the number of subsystems, the rows in a given axis corre- spond to outcomes of measurements on that subsystem, and each entry gives the probability of obtaining the outcomes corresponding to the row containing that en- try for each respective subsystem. For example, the PR-box state of two g-bits, in which outcomes are randomly correlated as long as the local fiducial measurement choices are not both equal to 1, and randomly anti-correlated otherwise, has the following tabular representation: x2 = 1 x2 = 2 a2 = 1 a2 = 2 a2 = 1 a2 = 2 x1 = 1 a1 = 1 1/2 0 1/2 0 a1 = 2 0 1/2 0 1/2 x1 = 2 a1 = 1 1/2 0 0 1/2 a1 = 2 0 1/2 1/2 0 (5.32) We will refer to the subset of entries corresponding to the outcomes of only a single measurement choice from each subsystem as a block of the tabular array. Diagrammatically, a block is a set of entries enclosed by lines that separate local measurement choices. For example, by specifying the second measurement on both subsystems in (5.32), we get the bottom-right block: 0 1/2 1/2 0 (5.33) 191 From hereon in we will omit for convenience the first rows and columns which label the local measurements and outcomes, since these can be deduced from the number of blocks and rows per block, and include only the outcome probabilities, viz. 1/2 0 1/2 0 0 1/2 0 1/2 1/2 0 0 1/2 0 1/2 1/2 0 (5.34) A pure product state is represented by a table whose only non-zero entries are 1. For each measurement choice on each axis, a particular row is selected (the outcome determined by themeasurement choice) -- the 1s in the table occur exactly where these selected rows overlap. For example, the following table represents a state of two subsystems with three fiducial measurements each, where in the first subsystem the state assigns outcome 1 for the first measurement and 2 for the second and third measurements, and in the second subsystem the state assigns the outcome equal in value to the measurement choice. 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 (5.35) Composite effects can also be represented using tables, which we will refer to as effect tables, so that the probability of obtaining that effect is given by the entry-wise dot product between its table and the table representing the state of 192 the system. The composite fiducial effects, which are tensor products of local fiducial effects, have only a single 1 in the entry corresponding to the measurement outcome specified on each subsystem. General composite effects, which are conic combinations of the composite fiducial effects, have effect tables that are non- negative in every entry (a table can correspond to a valid effect even if some entries are negative, but in this case there always exists another table with all positive entries, which corresponds to the same effect). The tabular representation of effects is over-complete, in the sense that some effects may correspond to more than one possible table. This over-completeness is codified by a set of linearities which are imposed on the set of effect tables, due to the set of non-signaling conditions on the system. For example, in a two g-bit system, the condition that the reduced outcome probability of obtaining outcome 0 for measurement 0 on subsystem 2 is independent of the measurement choice on subsystem 1 is given by the equation: P (0, 0|0, 0) + P (0, 1|0, 0) = P (0, 0|0, 1) + P (0, 1|0, 1) . (5.36) This equation is reflected by demanding the following equivalence between effect tables: 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ≡ 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 (5.37) Both of these tables are representations of the effect U (1) ⊗ X(2)0|0 ; demanding that they are equivalent is no more than demanding consistency with the equation, X(1)0|0 ⊗X (2) 0|0 +X (1) 1|0 ⊗X (2) 0|0 = X (1) 0|01 ⊗X (2) 0|0 +X (1) 1|1 ⊗X (2) 0|0 , (5.38) 193 where each side corresponds to a distinct decomposition of the local unit effectU (1) into local fiducial effects. In this manner, the linearities inherent in the local effect cone of the abstract state space automatically engender the single-system non- signaling conditions which are respected by the set of composite states Smax. By Proposition 1, we know that the full set of non-signaling conditions is generated by those non-signaling conditions involving single subsystems. Hence, the linearities of the local effect cone give rise to the full set of non-signaling conditions on the composite state space. The effect tables which interest us the most are the binary effect tables, those consisting of 0s and 1s, which represent effects that are sums of composite fidu- cial effects (with all coefficients equal to 1). A binary effect table is said to be multiform if it is equivalent to some other binary effect table, as in (5.37). We will use the term sub-row to refer to binary effect tables which are similar to the effect tables in (5.37) in that their non-zero entries lie only along one row of their constituent blocks. As long as every subsystem is non-classical, then every axis splits up into at least 2 blocks, so that sub-rows (and any tables covering a sub- row) are always multiform. Moreover, each equivalence between sub-rows is due to a decomposition of the local unit effect on the corresponding subsystem, as in (5.38). We will say that a binary effect table covers another binary effect table if the non-zero entries of the latter are a subset of the non-zero entries of the former: clearly, any effect table which covers a sub-row is again multiform, for example: 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ≡ 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 (5.39) In order to check whether two effect tables are equivalent, it suffices to show that they have the same dot product with all pure product states. This is because the pure product states span the vector space of the composite abstract state space [29], hence any two effects which the same dot product with all of them must 194 be identical. In cases where both tables cover sub-rows, it may be possible to determine an equivalence between binary effect tables by applying various sub- row equivalences. This is achieved by “sliding” a sub-row across to another sub- row contained in the same row. For example, in (5.40) below, the two horizontal sub-rows of the top left block can be slid across to the right, and then the two vertical sub-rows slid downwards to form the effect table on the RHS. 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 ≡ 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 (5.40) However this sliding process will not always be a useful indicator of equiva- lence, as demonstrated in the following example which does not cover any sub- rows: 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 ≡ 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 (5.41) The equivalence (5.41) can be verified by dot product with pure product states. However, a similar consideration demonstrates that this is not in fact an equiva- lence between valid effects. The binary table on the LHS of (5.41) has dot product 2 with the pure product state s that deterministically outputs 0 for every fiducial measurement choice on both subsystems, hence this effect table represents an im- proper effect. This is illustrated in (5.42), where arrows are used to highlight which rows and columns correspond to the outcomes determined by s; the entries of the effect table which are picked out by s are those lying in the overlap of these rows and columns, and their sum is the dot product of the effect with s, which is 2 in this case. The results described in this chapter will rely on the multiform tables which do correspond to proper effects and give genuine probabilities for all states, and thus examples like (5.41) will not generally present a major obstacle. 195 ↓ ↓ → 1 0 0 0 0 0 0 1 → 0 0 1 0 0 1 0 0 ; 1 + 0 + 0 + 1 = 2 (5.42) Some further observations will help to familiarise the reader with effect tables and their properties. As mentioned above, sub-rows are always multiform. In the bipartite g-bit examples given, all sub-rows have the same total number of 1s, however this will not be true if any two local fiducial measurements have differing numbers of outcomes. If in fact there are two fiducial measurements with differing numbers of outcomes K1 and K2 on the same subsystem, then there will exist a sub-row containingK1 1swhich is equivalent to a sub-row containingK2 1s, hence equivalent binary effect tables need not have the same number of non-zero entries. Calculating dot products between binary effect tables and pure product states is an invaluable tool; as we have mentioned, it determines whether two distinct tables correspond to the same effect or not. In bipartite systems, it also allows for a useful characterisation of exactly when a binary effect table corresponds to a proper effect. In (5.42) we see that the presence of non-zero entries in diagonal blocks allows for the construction of a pure product state, which selects the rows and columns necessary to “hit” those two entries, and, if necessary, completes the lattice by selecting arbitrary rows and columns from remaining measurements on each subsystem. Note that the blocks do not have to be on the same diagonal line, merely unaligned (i.e. not in the same block-row or block-column). Below is an example of this kind of improper binary effect table, and one of the six pure product states for which it gives dot product greater than 1. 196 ↓ ↓ ↓ → 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 → 0 0 0 0 1 0 (5.43) A binary effect table also corresponds to an improper effect if it has distinct non-zero entries in the same row, but not within the same block of that row. If this is the case, then a similar pure product state construction can be used to hit both non-zero entries, again with an arbitrary completion of the lattice for that state if necessary. Below is an example of this kind of improper binary effect table, and one of the eight pure product states for which it gives dot product greater than 1. ↓ ↓ ↓ → 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 → 0 0 0 0 0 0 (5.44) These two constraints are the only obstacles to a binary, bipartite effect table being a proper effect. Indeed, as long as all non-zero entries are contained in the same row of blocks (or block-row), but never in distinct blocks of the same row, then the effect is proper. To see this, suppose that an effect table corresponding to an effectE ∈ E+ satisfies these conditions, and suppose without loss of generality that it has non-zero entries only in the block row corresponding to the first fiducial measurement on subsystem 1 (i.e. the leftmost column of blocks). In each of these blocks, fill in extra 1s if necessary to get an effect F ≥E+ E such that every non-zero entry lies in a vertical sub-row. Then slide all sub-rows up to the topmost block (the subrowswill not clash since no two non-zero entries lie in distinct blocks of the same row) to obtain an effect which is either the unit effect, or a strict subset of it. Now E ≤E+ F ≤E+ U , hence E must be a proper effect. This process is 197 illustrated by the following operations on a binary, bipartite effect table satisfying these conditions: 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 → 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 → 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Thus, a binary, bipartite effect table is proper if and only if all its non-zero entries lie in a single block-row, without any two entries lying in distinct blocks of a single row. It is tempting to try to extend these conditions in order to characterise proper binary effect tables of multipartite systems. Unfortunately however, it is not the case in multipartite Boxworld systems that an improper effect can be “detected” using pure product states in the same way; there exist improper effectsE for which 〈E, s〉 ∈ {0, 1} for all pure product states s. In fact, this occurs even in tripartite systems, as demonstrated by the tripartite effect E depicted in Figure 5.1. In this representation of E, each of the smallest cubes represents an entry of the effect table; shaded cubes represent entries with a 1 in them, whereas unshaded cubes represent zero entries. Assigning subsystems 1, 2 and 3 to the x-, y- and z- axes respectively, it can be seen that the algebraic form of E is, E =X(1)1|1 ⊗X (2) 1|1 ⊗X (3) 1|1 +X (1) 2|1 ⊗X (2) 1|2 ⊗X (3) 2|2 +X(1)1|2 ⊗X (2) 2|1 ⊗X (3) 1|2 +X (1) 2|2 ⊗X (2) 2|2 ⊗X (3) 2|1 , (5.45) where the ordering of the summands agrees with the alphabetical ordering of the cubes' labels in Figure 5.1. Similarly to bipartite systems, specifying a tripartite pure product state corre- sponds to picking out a (three-dimensional) grid of entries in the effect table. This grid is obtained by choosing one particular outcome for each local fiducial mea- 198 Figure 5.1: Improper tripartite binary effect table which is not detected by pure product states. surement on each subsystem, and selecting the entries that lie in the intersection of the subsequent rows. For example, the pure product state which gives outcome 1 for every local fiducial measurement will result in a grid that contains the shaded cube A in Figure 5.1, but none of the other shaded cubes. By inspection it can be verified that any pure product state will pick out at most one shaded cube, hence 〈E, s〉 ∈ {0, 1} for all pure product states s: in short, we would not be able to de- duce that E is improper by looking at its inner products with pure product states. We must resort to other methods to prove that E is actually improper. We first argue that if any further cube were to be shaded in Figure 5.1, there would be some pure product state that picks out two of the shaded cubes. To see this, note that whenever two cubes are completely unaligned (i.e. correspond to different mea- surement choices in all three subsystems), or are in the same plane but diagonally opposing blocks (i.e. correspond to the same local effect on one subsystem, but different measurement choices on the remaining subsystems), then it is possible to construct a pure product state which hits both of them. In fact, a similar observation shows that if any entry corresponding to an un- shaded cube is assigned a positive, non-zero entry, then there will be a pure product state whose inner product with the resulting effect is larger than 1. Moreover, for any shaded cube, setting the corresponding entry of the effect table to be larger 199 than 1 will generate in an improper effect (one need only consider a pure product state which hits the corresponding fiducial effect). It follows that for any fiducial effect F and positive number ε, the effect E + εF is an improper effect. There- fore, either E is actually the unit effect, or else it does not lie in the order interval [0,U ]E+ , as no conic combination of fiducial effects can be added to it to obtain U . It remains to show simply that E is not the unit effect. This follows from the existence of a pure product state s for which 〈E, s〉 = 0. Let s(1) and s(2) both be equal to the g-bit state which deterministically assigns outcome 2 to both fiducial measurements, and let s(3) be the g-bit state which assigns outcome 1 for the first fiducial measurement and outcome 2 for the second fiducial measurement. It can then be straightforwardly verified that 〈E, s(1)⊗s(2)⊗s(3)〉 = 0, either by directly using the decomposition (5.45) or by observing that the grid of entries specified by s(1) ⊗ s(2) ⊗ s(3) does not hit any shaded cubes in Figure 5.1. Hence E is not the unit effect and is therefore improper, despite the fact that 〈E, s〉 ∈ {0, 1} for all pure product states s. 5.2.1 Reversible transformations on effect tables Since reversible adjoint transformations permute the set of composite fiducial ef- fects, they induce a permutation on the entries of binary effect tables, such that entry a is mapped to entry b if and only if the fiducial effect which is represented by a single 1 in entry a is mapped to the fiducial effect which is represented by a single 1 in entry b. Effect tables provide a compact visualisation of the reversible transformations which correspond to relabellings of measurements and outcomes, and permutations of subsystems. These are given by block permutations (along an axis), row permutations and transpose-like operations on tables respectively. We give some examples to illustrate this procedure. A relabelling of the measurement inputs on subsystem 1 will permute the blocks in line with the first axis, but leave the ordering of the rows within each block invariant. For effects in a bipartite system of two g-bits, switching the mea- surement inputs on subsystem 1 is pictured below. 200 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 → 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 (5.46) A relabelling of the outcomes of a measurement in a subsystem are given by permuting the rows along the axis corresponding to the subsystem, but doing so only within the block that corresponds to the measurement in question. For exam- ple, switching the measurement outcomes for the first measurement on the second subsystem of the table on the RHS of (5.46) is given below. 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 → 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 (5.47) Permuting the subsystems corresponds to permuting the axes corresponding to those subsystems (equivalently, permuting the coordinates of each entry). In the case of bipartite systems, this is the same as taking the transpose of the table (imagining it were a matrix). Note that the table need not be symmetric, and no two blocks need be the same size, so in general one must be careful to transpose the lines delineating the blocks as well as the entries themselves. If we apply a permutation of the two subsystems to the table on the RHS of (5.47), we obtain the mapping below. 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 → 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 (5.48) 201 5.2.2 Diagrammatic proof for general bipartite systems We now give a proof that the only reversible transformations allowed in a bipar- tite Boxworld system are composed of local operations and permutations of the subsystems, so long as none of the subsystems are classical. This proof applies even to the case where the subsystems are non-identical, and is therefore already a novel result, not implied by the proof outlined in Section 5.1. Bipartite Boxworld effects are represented by 2-dimensional effect tables, for which local operations correspond to permutations of rows along a single axis, and a permutation of the subsystems corresponds to transposing the table (note that this transpose is only valid if the two subsystems are identical). Assuming that every measurement is non-trivial and hasmore than one outcome, each block in the effect table has height and width both at least 2 cells. As long as none of the subsystems are classical, then along both the horizontal and vertical axes there are at least 2 blocks. Recall from Section 5.1.3 that we need only demonstrate that reversible trans- formations preserve a Hamming distance of 1 between pairs of fiducial effect strings. We will obtain this result for binary, bipartite Boxworld systems in a stepwise manner, supplementing many of the steps with an effect table illustration which provides a visualisation of the argument. In the bipartite setting a satisfac- tory proof of each step can be given by reference to effect table diagrams, since they take a much simpler form. In the general multipartite setting the same cannot be said, however the stepwise process used will be very similar, hence this proof of the bipartite case can be seen as setting up the necessary ideas and intuition that will then be applied to the multipartite case. Central to the following argument are the notions of sub-rows and multiform effect tables. Recall that a sub-row is a binary effect table where the entries along a single block of a single row are exactly the non-zero entries. A binary effect table is multiform if there exists a distinct binary effect table which gives the same inner product for all states. Binary effect tableE (strictly) covers binary effect table F if the non-zero components of F are a strict subset of the non-zero components ofE. 202 Step 1. The only proper binary effect tables equivalent to a sub-row are other sub- rows in the same row. Suppose that E is a sub-row, and that F is an effect table which is not a sub- row in the same row. Without loss of generality, suppose that E is a sub-row for some column of the table, i.e. E represents an effect of the form X(1)a|x ⊗ U (2). To demonstrate that E and F cannot represent equivalent effects, it suffices to construct a pure product state which has differing inner products with the effects represented by E and F . Recall that a pure, bipartite product state is uniquely specified by selecting one column from each block along the horizontal axis, and one row from each block along the vertical axis. If F has a non-zero entry which is in a distinct row, then it is possible to select outcomes for each measurement of subsystem 1 so that the corresponding set of columns contains non-zero entries ofF but not ofE. To complete the specification of the pure product state, select a row from each block along the second axis, such that the previous non-zero entry of F is contained in at least one of them. The resulting state has inner product 1 with F and 0 with E. In the below table, the relevant non-zero entry of F is underlined; by selecting the second and third columns, and e.g. the second and third rows, we get a pure product state which distinguishes E from F . ↓ ↓ 1 0 0 0 → 1 0 1 0 → 0 0 0 0 0 0 0 0 (5.49) If instead F has non-zero entries only in the same column that E does, then either F strictly covers a sub-row, in which case it is improper, or F has non-zero entries only for a strict subset of each block of that column. In the latter case, it is possible to select a set of rows of the table, one for each block along the 203 vertical axis, such that the set of rows contains non-zero entries of E but not of F . Selecting a set of columns which includes the one in which both E and F have non-zero entries will generate a pure product state which has inner product 1 with E and 0 with F . This process is again demonstrated in the below diagram, where the underlined entries are the non-zero entries of F . ↓ ↓ → 0 1 0 0 0 1 0 0 → 0 0 0 0 0 1 0 0 0 1 0 0 → 0 0 0 0 (5.50) This concludes the proof of Step 1. Step 2. A proper, binary effect table which does not cover any sub-rows is not multiform. LetE be a binary, bipartite effect table which does not cover any sub-rows, and suppose for contradiction that it is equivalent to a distinct, binary, bipartite effect table F . Then there must exist some component (i, j) of the table for which E is zero and F is non-zero (here i represents the column and j the row, unlike conven- tional matrix notation). We aim to construct a pure product state which “hits” this (i, j), but none of the entries ofE. Recall that the effectE can be proper only if all its non-zero entries lie in some block-row - without loss of generality suppose this is the first column of blocks, i.e. every non-zero entry of E is associated with the first measurement choice on subsystem 1. If the non-zero entry of F does not lie in this column of blocks, then by moving leftwards from this entry we must arrive at a cell (i′, j) in the first column of blocks for whichE is zero (otherwiseE would cover a horizontal sub-row). Select a set of columns which includes i and i′, and a 204 set of rows which includes j and a row from each remaining block such that E is zero on the i′th component of that row (this is possible since otherwise E would cover a vertical sub-row). The resulting pure product state has inner product 0 with E and 1 with F . In the below diagram, the underlined entry is the non-zero entry of F , so that i = 3 and j = 5; thus we obtain i′ = 2 and also select row 3. ↓ ↓ 0 1 0 0 0 1 0 0 → 0 0 0 0 0 0 0 0 → 1 0 1 0 (5.51) Suppose instead that entry (i, j) belongs to the first column of blocks. In this case it is similarly possible to select a set of rows which includes j and a row from each remaining block such that E is zero on the ith component of that row. Any set of columns which includes i will now result in a pure product state which has inner product 0 with E and 1 with F . Step 3. Reversible transformations map sub-rows to sub-rows. Let E be a sub-row; E is multiform, being equivalent to all other sub-rows contained in the same row as E (there is at least one other if all systems are non- classical). Let F be another sub-row contained in the same row: thus E and F are binary effect tables representing the same proper effect, but whose non-zero com- ponents are disjoint. Let T be a permutation of the effect table components, which represents an allowed reversible transformation on the Boxworld system. The im- age T (E) of E under this transformation is a proper binary effect table which is equivalent to T (F ), hence is multiform. It follows from Step 2 that T (E) covers a sub-row. The diagram below is an example of sub-rows E and F (underlined) 205 transforming under an arbitrary permutation of components - in this diagram the permutation does not in fact correspond to an allowed transformation, since it is ill-defined on effects which have multiple effect table representations. 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 → 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 (5.52) Suppose that T (E) is not itself a sub-row, i.e. that it strictly covers a sub-row. LetG denote this sub-row, and letG′ be another sub-row in the same row asG. The image T−1(G) of G under the inverse permutation T−1 is a binary effect which is strictly covered by E, and which is equivalent to the effect T−1(G′). Hence T−1(G) is multiform. However, since T−1(G) is a strict subset of a sub-row, this contradicts Step 2. Therefore T (E) is itself a sub-row, and T maps sub-rows to sub-rows. Step 4. Reversible transformations preserve Hamming distance 1 between pairs of fiducial effect strings. Suppose that e and f are composite fiducial effects, which have Hamming distance 1 between their respective fiducial effect strings. e and f are uniquely represented by binary effect tables E and F which have only a single non-zero entry, such that these two non-zero entries lie in some row or column of the table. E and F may be extended uniquely to sub-rows E ′ and F ′ which lie in this row and cover E and F respectively. If the E ′ and F ′ are in fact the same sub-row, then by Step 3 their image under an allowed reversible transformation T is some sub-row T (E ′). Thus the images T (E) and T (F ) of E and F under T also belong to the same sub-row, therefore their corresponding fiducial effect strings also have a Hamming distance of 1. If E ′ and F ′ are not the same sub-row, they are nevertheless equivalent sub-rows, 206 and by Step 3 must be transformed to equivalent sub-rows under T . However, according to Step 1, sub-rows can only be equivalent if they belong to the same row, hence again T (E) and T (F ) correspond to fiducial effect strings which have a Hamming distance of 1. Step 5. The only reversible transformations allowed in bipartite boxworld sys- tems, where none of the subsystems are classical, are compositions of relabellings of measurement choices and outcomes, and permutations of subsystems (assum- ing they are identical). FromStep 4 and Theorem 10 (which applies to arbitrary alphabets in each com- ponent), we have that any reversible transformation must be a composition of local operations and permutation of the subsystems. By Lemma 6, which also applies in the case where different measurements may have different numbers of outcomes, we have that the only local operations allowed are relabellings of measurement choices and outcomes. Note that permuting the states of the two subsystems cor- responds to a valid operation only if the measurement choices on each subsystem can be matched up, so that the paired measurements have the same number of out- comes. It is not possible to permute the state of the subsystems if, for example, they have a different number of fiducial measurements or if the first subsystem has a measurement withM outcomes, but none of the measurements on the sec- ond subsystem haveM outcomes. However, if the first subsystem has two fiducial measurements with 2 and 3 outcomes, and the second subsystem has two fiducial measurements measurements with 3 and 2 outcomes, then the states of these sub- systems may be permuted as long as the first measurement of the first subsystem is matched with the second measurement of the second subsystem. 207 5.3 Decompositions of effects In Section 5.2.2 we demonstrated, by means of diagrammatic representations, that the allowed reversible transformations of a bipartite Boxworld system are triv- ial, so long as none of the subsystems are classical. In generalising this result to general multipartite systems, it will be helpful to shift our primary focus to the algebraic, rather than diagrammatic, representations of effects, but to keep these diagrammatic notions at the back of one's mind as an aid to understanding the method of proof. The proof proceeds in a stepwise fashion largely similar to the bipartite case, though some care has to be taken to make rigorous some facts which are less evidently true in tabular arrays of larger dimension. We begin by recapping the necessary features of Boxworld. AnN -partite sys- tem is made up of N subsystems, for which M (i) is the number of local fiducial measurement choices on subsystem i, andK(i)xi is the number of outcomes for mea- surement xi on subsystem i. We will always assume that these values are at least equal to 2, so that all subsystems are non-classical and all measurements are non- trivial. To each subsystem i belongs a set of effects E (i), whose extreme rays are the fiducial effects X(i)ai|xi which correspond to outcome ai being obtained when measurement xi is performed. The composite fiducial effects of an N -partite sys- tem are those of the form e = X(1)a1|x1 ⊗ · · · ⊗ X (N) aN |xN : a general N -partite effect is a conic combination of the composite fiducial effect vectors, and a general N - partite state is a vector which has non-negative inner product with all the composite fiducial effect vectors. The pure product states of anN -partite system are those of the form s = s(1)⊗ · · · ⊗ s(N), where s(i) is a pure state of subsystem i. Pure states of an individual subsystem assign specific outcomes for each fiducial measurement choice; we will use the notation s(i)xi to index this outcome, with 1 ≤ s (i) xi ≤ K (i) xi . If the state s and effect e satisfy 〈e, s〉 = 1, we will say that s hits e. Note that if s = s(1)⊗· · ·⊗s(N) is a pure product state and e = X(1)a1|x1⊗· · ·⊗X (N) aN |xN is a composite fiducial effect, then s hits e if, and only if, s(i)xi = ai for all i. Recall that reversible transformations permute the set of composite fiducial 208 effects, hence induce permutations on the components of the effect table repre- senting the system. In particular, binary effect tables, whose non-zero entries are all equal to 1, are mapped to other binary effect tables. Binary effect tables rep- resent effects which are sums of composite fiducial effects, i.e. which are of the formE = ∑ α eα, where each eα is a composite fiducial effect. This type of effect merits special attention in our treatment. Definition. Suppose that for a set of composite fiducial effects {eα} and an effect E we have that E = ∑ α eα. Then {eα} is a decomposition of E. We may also use the terminology: E admits the decomposition {eα}. We will tend to use lowercase letters for composite fiducial effects, and uppercase for sums of extreme ray effects. Definition. An effect E is multiform if it can be written E = ∑ α eα = ∑ β fβ where {eα} and {fβ} are distinct sets of composite fiducial effects. In other words, the distinct binary effect tables which have 1s in the entries corresponding to effects eα and fβ respectively, are equivalent, both representing the effect E. Definition. A sub-unit effect is an effect of the form X(1)a1|x1 ⊗ · · · ⊗ U (i) ⊗ · · · ⊗ X(N)aN |xN , where exactly one component of the tensor product is the unit effect, and the remainder are local fiducial effects. A sub-unit effect is an i-sub-unit effect if its ith component is the unit effect. Sub-unit effects are represented by the binary effect tables previously referred to as sub-rows; for bipartite systems, 1-sub-unit effects are represented by hori- zontal sub-rows and 2-sub-unit effects are represented by vertical sub-rows. Just as sub-rows are multiform, so are sub-unit effects, with the number of decompo- sitions of an i-sub-unit effect being at leastM (i), the number of fiducial measure- ment choices on subsystem i. Fiducial and sub-unit effects can uniquely be written as a tensor products of vectors, such that each component i of the tensor product is equal either to some 209 X(i)ai|xi or to U (i). Therefore we may refer to their ith component in a well-defined sense; for example, if E = X(1)a1|x1 ⊗ U (2) ⊗X(3)a3|x3 , (5.53) then E(1) = X(1)a1|x1 and E (2) = U (2). For a subset Ω ⊆ [N ] we will write EΩ = ⊗ i∈ΩE(i), so that in the above exampleE{2,3} = U (2)⊗X (3) a3|x3 . Finally, in analogy to one binary effect table covering another if the non-zero components of the latter are a subset of the non-zero components of the former, we have the following definition. Definition. A set of fiducial effects {eα}α∈A (strictly) covers the effect E if there is some (strict) subset B ⊆ A such that ∑ α∈B eα = E. The following two lemmas (and their corollaries) establish useful facts about effects and the ways in which they may be decomposed into sums of composite fiducial effects. We are primarily concerned with effects that are sums of compos- ite fiducial effects with coefficient 1, and hence which may in principle be repre- sented by binary effect tables (although these may well exist in many-dimensional spaces). In terms of binary effect tables, Lemma 7 is the algebraic version of Step 1 of Section 5.2.2, demonstrating that the only binary effect tables equivalent to a sub-row are other sub-rows belonging to the same row. Being rather technical in nature, the proof of Lemma 7 merits some discus- sion (the following proofs are no less technical, but the discussion here will serve equally well to illuminate them also). In essence, the linear relations (or lack of) between the fiducial effect vectors are exploited in order to derive a categorisa- tion of the ways in which sub-unit effects may be decomposed as sums of fiducial effects. However, instead of working directly with the notion of linear indepen- dence, we employ the trick of considering the inner products between fiducial effects and pure product states. This strategy is hinted at in [4] and, after some thought, is unsurprising: we know that the pure states on a single subsystem are the deterministic states, and we know also that distinct effects must have distinct 210 inner products for at least one pure product state. Indeed, one useful perspective to bear in mind is the following: demanding that every valid conditional distribution over the measurements corresponds to an allowed state is equivalent to demanding the precise linear dependencies that exist between the fiducial effect vectors. Consider a pair of distinct fiducial effects in an individual non-classical Box- world system. If this pair of effects belongs to the same measurement, for example the pair X1|1 and X2|1, then the state which deterministically outputs 1 for every fiducial measurement will hit the first effect, but not the second. If the pair belongs to distinct measurements, for example the pairX1|1 andX1|2, then any state which outputs 1 for the first measurement, 2 for the second measurement, and anything for the remaining measurements will again hit the first effect but not the second. It will be noticed that the ability to distinguish distinct effects with pure states is not specific to the examples chosen; this is central to proving the first part of the Lemma. Suppose again that a pair of fiducial effects belongs to different measurements, for exampleX1|1 andX1|2. Then it is just as easy to find a pure product state which hits both effects: the state s which always outputs 1 will again suffice. Therefore 〈X1|1 +X1|2, s〉 = 2, implying that X1|1 +X1|2 is an improper effect. Thus, X1|1 and X1|2 cannot belong to the same decomposition of the unit effect. Again, this is not specific to the example chosen: it follows almost immediately that the only fiducial effect decompositions of the unit effect are given by the set of outcomes belonging to a single measurement. Lemma 7. Let E = ∑ α eα be an i-sub-unit effect. Then each composite fiducial effect eα satisfies e(j)α = E(j) for all components j 6= i. Moreover, the set of ith components {e(i)α } forms a local fiducial measurement on subsystem i. Proof. We will prove the lemma by contradiction. Suppose first that e(j)α′ 6= E(j) for some α′ and some j 6= i. Let E(j) = X(j)aj |xj and e (j) α′ = X (j) a′j |x′j . Either xj 6= x′j , or xj = x′j but aj 6= a′j , so we can construct a pure product state s(j) on system j such that s(j)xj 6= aj and s (j) x′j = a′j , so that s(j) hits e (j) α′ but not E(j). Then there 211 exists a pure product state s whose jth component is s(j), so that 〈E, s〉 = 0, but for which 〈eα′ , s〉 = 1, contradicting the fact that 〈eα′ , s〉 ≤ 〈E, s〉. {e(i)α } is a set of fiducial effects satisfying ∑ α e (i) α = U (i), hence any pure state s(i) on system i must hit exactly one of the e(i)α . If any two of the e(i)α are effects corresponding to different measurements, then there is a pure state s(i) which hits both of them. Hence the effects all belong to the same fiducial measurement x; if {e(i)α } is not the full set of outcomes of measurement x, then there is a pure state s(i) which hits none of them. It follows that {e(i)α } forms a fiducial measurement on subsystem i. Corollary 4. Let E = ∑r α=1 eα be a sub-unit effect. Then {eα} does not strictly cover a multiform effect. Proof. From Lemma 1 it follows that e(j)α = E(j) for all components j 6= i, and that the set {e(i)α } corresponds to the full set of outcomes for a local fiducial mea- surement on subsystem i. In other words, the decompositions ofE are in a one-to- one correspondence with the local fiducial effect decompositions of U (i), each of which is obtained by fixing a measurement choice xi, and summing all the fiducial effect vectors which correspond to an outcome for that measurement: U (i) = ∑ xi X(i)ai|xi . (5.54) In particular, there are only a finite number of such local decompositions of U (i), and the sets of local fiducial effects making up these decompositions are pairwise disjoint. Likewise, there are only a finite number of decompositions of E, and the sets of composite fiducial effects making up these decompositions are pairwise disjoint. Suppose that {eα} covers a multiform effect, i.e. (after relabelling the eα if necessary) for some integer s < r and some integer t, there exist distinct sets {eα}sα=1, {fβ}tβ=1 of fiducial effects such that ∑s α=1 eα = ∑t β=1 fβ . Then {f1, . . . , ft, es+1, . . . er} is a decomposition ofE distinct from {e1, . . . , er}. How- 212 ever, both these sets contain er, contradicting the fact that the decompositions of E are pairwise disjoint. Therefore {eα} does not cover a multiform effect. Just as Lemma 7 was the algebraic version of Step 1 of 5.2.2, Lemma 8 is the algebraic version of Step 2; however, since it is difficult to categorise which effects are proper when the number of subsystems is large, a slightly different approach must be used. Specifically, let r be the least number of non-zero entries over all possible sub-rows of the system. Then Lemma 8 states that any binary effect table which has at most r non-zero entries, and which does not cover any sub-rows, is not multiform. It turns out that, along with Lemma 7, this is enough to prove the desired result in the case that all subsystems have at least one measurement with r outcomes - for the general case, however, we will require a more sophisticated approach. For convenience we will assume from here on that the subsystems are arranged in order of increasing numbers of measurement outcomes, i.e. K(i)j ≤ K (i) j+1 and K(i)1 ≤ K (i+1) 1 ; this amounts to a relabelling of subsystems and measurement choices. K(1)1 is therefore the smallest number of outcomes possible for any fidu- cial measurement. Lemma 8. For r ≤ K(1)1 suppose that {eα}rα=1 does not cover any sub-unit effects. Then for any fiducial effect f /∈ {eα}, there is a pure product state which hits f but none of the eα. Proof. Let f = X(1)a1|x1 ⊗ · · · ⊗X (N) aN |xN . We proceed by induction on the number of subsystems N . When N = 1 set sx1 = a1 to ensure that s hits X (1) a1|x1 . The conditions imply that no partial sum of {eα} equals the unit effect, hence for each other choice of measurement x′ 6= x1, it must be possible to choose sx′ such that Xsx′ |x′ /∈ {eα}. By construction s hits X (1) a1|x1 but none of the eα. WhenN > 1, note that for any fiducial effect g on system 1, the set {e{2,...,N}α : e(1)α = g} is a decomposition of some effect on the remaining N − 1 subsystems. This decomposition has size most K(1)1 ≤ K (2) 1 , and also cannot cover any sub- unit effects, hence, by the induction hypothesis applied to the case N − 1, there 213 exists a pure product state s(2) ⊗ · · · ⊗ s(N) which hits f {2,...N} but none of the set {e{2,...,N}α : e(1)α = g}. Again, it is necessary to set s(1)x1 = a1. Consider the set {e (1) α }, and the outcomes for fiducial measurement choices other than x1 on system 1. One of two cases must occur: (a) The set {e(1)α } fills none of the other measurements, i.e. for every x′ 6= x1, there is an ax′ such that X(1)ax′ |x′ /∈ {e (1) α }. For each such x′ set s(1)x′ = ax′ so that s can hit eα only if e(1)α = f (1). However, using the inductive hypothesis, there exists a pure product state s(2)⊗· · ·⊗s(N) which hits f {2,...,N} but none of the set {e{2,...,N}α : e(1)α = f (1)}. (b) There exists a measurement x′ 6= x1 on system 1 with K(1)x′ = r which is filled by the set {e(1)α }, i.e. (after reordering) e(1)α = X(1)α|x′ for 1 ≤ α ≤ r. {eα} covers no sub-unit effects, so there must be some α′ and some system i 6= 1 such that e(i)α′ 6= f (i). Set s (1) x′ = α′ so that s does not hit any eα with α 6= α′; the remaining components of s(1) may be chosen arbitrarily. By the inductive hypothesis, there exists a pure product state s(2) ⊗ · · · ⊗ s(N) which hits f {2,...,N} but not the single effect e{2,...,N}α′ . In both cases, by construction s = s(1) ⊗ · · · ⊗ s(N) hits f but none of the eα. Corollary 5. The only multiform effects which have a decomposition with exactly K(1)1 elements are sub-unit effects. Proof. Suppose E = ∑r α=1 eα = ∑s β=1 fβ are distinct decompositions, with r = K(1)1 , and suppose without loss of generality that f1 /∈ {eα}rα=1. Every pure product state which hits f1 must also hit one of the eα, so it follows from Lemma 8 that {eα} covers a sub-unit effect. By Lemma 7, every decomposition of a sub-unit effect has at leastK(1)1 elements, hence E is itself a sub-unit effect. 214 5.3.1 Identical subsystems revisited We now demonstrate that Lemmas 7 and 8 are sufficient to prove that all re- versible Boxworld transformations are trivial, in the case that all subsystems are non-classical, and that there is a fixed positive integer r such that the least number of outcomes for any local fiducial measurement on each subsystem i is r. This is a weaker condition than demanding that all subsystems are identical, which is weaker still than demanding in addition that all fiducial measurements on all sub- systems have the same number of outcomes. This last condition is exactly that assumed in [4], hence the result we recover in this section, whilst not fully gen- eral, is already stronger than has previously been demonstrated. Assuming that the least number of fiducial measurement outcomes is fixed across subsystems, consider an i-sub-unit effect E for arbitrary i. Note that E is a multiform effect with at least one decomposition with exactly r = K(1)1 elements (corresponding to the measurement on subsystem i which has r outcomes). It follows that T †(E) is also a multiform effect with at least one decomposition with exactlyK(1)1 elements, hence by Corollary 5, T †(E) is a sub-unit effect. Thus, T † maps sub-unit effects to sub-unit effects. Now let e = X(1)a1|x1⊗· · ·⊗X (N) aN |xN and e ′ = X(1)a′1|x′1⊗· · ·⊗X (N) a′N |x ′ N be composite fiducial effects whose fiducial effect strings differ only in component i. Observe that both effects belong to a decomposition of the i-sub-unit effect E = X(1)a1|x1 ⊗ · · · ⊗ U (i) ⊗ · · · ⊗X(N)aN |xN . (5.55) Suppose that E is mapped to a j-sub-unit effect, for some j (not necessarily equal to i). It follows that T †(e) and T †(e′) are distinct composite fiducial effects which belong to some decomposition of T †(E). Lemma 7 then implies that T †(e) and T †(e′) differ only in component j. Therefore T † preserves a Hamming distance of 1 between fiducial effect strings, and we may apply Theorem 10 and Lemma 6 to deduce that T † must be some relabelling of subsystems, measurement choices and measurement outcomes. 215 Theorem 12. The allowed reversible transformations of a Boxworld system with non-classical subsystems, such that the least possible number of measurement out- comes is constant across subsystems, are relabellings of subsystems, and local relabellings of measurements and measurement outcomes. Proof. We have already argued that these are the only possible reversible trans- formations. It remains to show that any such transformation is allowed, as long as the number of outcomes of each local fiducial measurement is respected. In particular, a relabelling of measurements on a single subsystem is valid only if whenever measurement x is mapped to measurement x′, then both x and x′ have the same number of outcomes. A relabelling of subsystems is similarly valid only if whenever subsystem i is mapped to subsystem j, then the measurement choices in i are individually mapped to measurement choices in j which have the same number of outcomes. Modulo these considerations, it is clear that any such relabelling is a well-defined mapping on outcome distributions of the form P (a1, . . . , aN |x1, . . . , xN). Thus valid relabellings are reversible and convex- linear mappings of allowed states to allowed states. 5.4 Main result We now relax the condition that K(i)1 = K (1) 1 for all subsystems i, and demand merely that each subsystem is non-classical (has more than one fiducial measure- ment), and each local fiducial measurement is non-trivial (has more than one out- come). This makes the task more complicated; not all sub-unit effects have de- compositions withK(1)1 elements, and so Corollary 5 cannot universally be applied in the same way. However, Corollary 5 can still be applied for each subsystem i that does obey K(i)1 = K (1) 1 , so that T † permutes the set of sub-unit effects within this set of subsystems. Now consider a system j for which K(j)1 is the next possible greater value than K(1)1 . A j-sub-unit effect E must be transformed to something which is multiform with some decomposition {eα} involving K(j)1 elements. We will 216 demonstrate that if (a) ∑ α eα is not a sub-unit effect, and (b) T † permutes sub-unit effects on subsystems with K(i)1 = K (1) 1 , then for any other decomposition {fβ} of T †(E) there exists a pure product state s which hits f1 but none of the effects {eα}. This sets up an iterative process, at each stage assuming that T † permutes the sub-unit effects on subsystems with smaller numbers of outcomes. The iteration terminates when K(j)1 takes on its maximal value, and T † thus permutes the full set of sub-unit effects in the system. The following definition is convenient for discussing the set of sub-units which belong to one of several subsystems: Definition. In a multipartite Boxworld system comprising N subsystems, S{i} is the set of sub-unit effects at system i ∈ [N ]. If Ω ⊆ [N ] is a subset of the N subsystems, then SΩ = ∪i∈ΩS{i}. In order to prove that sub-unit effects are mapped to sub-unit effects, the fol- lowing lemmas will prove useful. Lemma 9. Let Ω ⊆ [N ] and suppose that T † is an allowed reversible transforma- tion which permutes the set SΩ. Then the images of two composite fiducial effects will be identical for all components outside Ω if and only if the original effects were. i.e.: eΩ¯1 = eΩ¯2 ⇐⇒ ( T †(e1) )Ω¯ = ( T †(e2) )Ω¯ where Ω¯ = [N ] \ Ω. Proof. Suppose firstly that the composite fiducial effects e1 and e2 differ only in one component i ∈ Ω. Observe that e1 and e2 belong to (possibly different) decompositions of a unique sub-unit effect E ∈ S{i}. By assumption T †(E) is a j-sub-unit effect for some j ∈ Ω; T †(e1) and T †(e2) belong to decompositions of T †(E), hence by Lemma 7 can only differ in component j. Suppose now that e1, e2 satisfy e(k)1 = e (k) 2 for all k /∈ Ω, but that they differ in any number of components belonging to Ω. Then it is possible to move from e1 217 to e2 by changing one component at a time (each component belonging to Ω). At each step, T † maps the corresponding pair of effects to a pair which differ only in components belonging to Ω. Hence T †(e1)(k) = T †(e2)(k) for all k /∈ Ω. To prove the converse direction, note that if T † is an allowed reversible trans- formation which permutes the set SΩ, then so is (T †)−1. Lemma 10. Suppose that E = ∑r α=1 eα is a sub-unit effect, and that {T †(eα)} covers a sub-unit effect F . Then T †(E) = F . Proof. Without loss of generality let ∑s α=1 T †(eα) = F for s ≤ r, and let ∑ β fβ be a distinct decomposition ofF . ThenE covers themultiform effect (T †)−1(F ) = ∑s α=1 eα = ∑ β=1(T †)−1(fβ). It follows from Corollary 4 that s = r and that T †(E) = F . Lemma 11. Suppose that {eα} does not cover any sub-unit effects, but that there exists some subsystem i for which ∑ α e (i) α = U (i). Then for any fiducial effect f /∈ {eα}, there exists a pure product state which hits f but none of the eα. Proof. Let f (i) = X(i)a|x and Ωi = [N ] \ {i}. Note that {e (i) α } is the complete set of outcomes for some fiducial measurement x′ on subsystem i: without loss of generality, e(i)α = X(i)α|x′ . If x′ = x, then f (i) = e(i)a . Set s(i)x = a and choose the remaining components of s(i) arbitrarily, so that s(i) hits e(i)a but none of the other e(i)α . Note that fΩi and eΩia must be distinct fiducial effects, so by Lemma 8 there exists a pure product state sΩi which hits fΩi but not eΩia . If x′ 6= x, then since ∑ eα is not a sub-unit effect, there existsα′ and i′ 6= i such that e(i ′) α′ 6= f (i ′). Set s(i)x = a and s(i)x′ = α′, and choose the remaining components of s(i) arbitrarily. Again, by Lemma 8 there is a pure product state sΩi which hits fΩi but not the single fiducial effect eΩiα′ . In both cases, combining s(i) with sΩi gives a pure product state s which hits f but none of the eα. Lemma 12. Reversible Boxworld transformations map sub-unit effects to sub-unit effects, so long as none of the subsystems are classical. 218 Proof. We begin by considering the action of T † on a 1-sub-unit effect E. T †(E) is a multiform effect with a decomposition containing K(1)1 elements, hence by Corollary 5 it is a j-sub-unit effect for some subsystem j with K(j)1 = K (1) 1 . By the same reasoning T † permutes the set SΩ, where Ω = {j : K(j)1 = K (1) 1 }. We now show iteratively for each positive integer r > K(1)1 that T † permutes the set SΩr , where Ωr = {i : K (i) 1 = r}. Let i ∈ Ωr, let ∑r α=1 eα = ∑s β=1 e′β be distinct decompositions of an i-sub-unit effect E and assume that T † permutes the set SΩ, where Ω = {j : K(j)1 < r}. Note that T †(E) is also multiform, since T †(E) = ∑r α=1 T †(eα) = ∑ β T †(e′β). Write fα = T †(eα) and g = T †(e′1), noting that g /∈ {fα}. Assuming that {fα} does not cover a sub-unit effect, our aim is to construct a pure product state s that hits g but none of the fα, giving a contradiction. Hence {T †(eα)}must cover a sub-unit effect. It then follows from Lemma 10 that T †(E) is itself an i′-sub-unit effect for some i′ ∈ Ωr. By continuing the iteration, we complete the proof of the lemma. To obtain the desired contradiction, assume that {fα} does not cover a sub-unit effect. Let Ω¯ = [N ] \ Ω and consider the set {f Ω¯α }rα=1 (recall that for a fiducial effect f , f Ω¯ is the tensor product of all those components of f belonging to Ω¯). Since e′(i)1 is distinct from e (i) α for all α, and i ∈ Ω¯, we have that e′Ω¯1 /∈ {eΩ¯α}. It follows from Lemma 9 that gΩ¯ /∈ {f Ω¯α }. Suppose that there exists a subsystem i′ ∈ Ω¯ such that {f (i ′) α }rα=1 covers the local unit effect U (i′), i.e. ∑ α f (i′) α ≥E(i′)+ U (i′). Recall that the fiducial effect de- compositions of U (i′) are obtained by fixing a fiducial measurement on subsystem i′ and taking the set of local fiducial effect vectors which correspond to an out- come of that measurement. Since all fiducial measurements on subsystem i′ have at least r outcomes, it follows that ∑ α f (i) α = U (i), and by Lemma 11 there exists a state which hits g but none of the fα. Suppose instead that there is no subsystem i′ ∈ Ω¯ for which {f (i ′) α }rα=1 covers U (i′). Then the set {f Ω¯α } -- considered as a collection of fiducial effects over the maximal tensor product of all subsystems belonging to Ω¯ -- does not cover any 219 sub-unit effects. By Lemma 8 applied to the subsystems belonging to Ω¯, there exists a pure product state sΩ¯ which hits gΩ¯ but none of the f Ω¯α . Combining sΩ¯ with any pure state sΩ which hits gΩ gives a pure product state s which hits g but none of the fα. Having proved that reversible transformations of Boxworld systems map sub- unit effects to sub-unit effects, it is straightforward once again to show that they must be trivial. Theorem 13. The only reversible transformations of non-classical systems al- lowed in Boxworld are relabellings of subsystems, and local relabellings of mea- surement choices and measurement outcomes. Proof. To complete the proof, we again need only check that all relabellings of this form are allowed transformations, as long as subsystems are only permuted only if they are identical, and measurement choices are permuted only if they have the same number of outcomes. Again, this is obvious from considering the action of such transformations on distributions in the form P (a1, . . . , aN |x1, . . . , xN). 5.5 Polytopic models with non-trivial reversible dy- namics In this Section we introduce and discuss a new probabilistic model which does not belong to Boxworld, but shares some similarities with it. At several points in the proof of Theorem 13 above, we have relied on two essential features of Boxworld which do not hold in quantum theory; that the number of extreme-ray vectors of the local state and effect cones is finite (that is to say, the model is polytopic), and that local systems combine under the maximal tensor product. The combination of these features implies that the extreme-ray vectors of the composite effect cone are exactly the tensor products of the local extreme-ray effect vectors. Consequently, the adjoint of any reversible transformation of the composite space must map composite fiducial effects to composite fiducial effects. 220 The obvious question is then: are these features sufficient to derive the con- clusion of Theorem 13? Does there exist any probabilistic model whose local state-cones are polytopic cones, and whose systems combine under the maximal tensor product, yet which has reversible transformations on its composite systems which are not made up of local operations and relabelling of subsystems? In this Section we demonstrate that such models do exist, by way of an explicit example. Before introducing this example, it is worth discussing polytopic models in more detail. The local systems of a polytopic model will still be characterised by a finite set of fiducial measurements, however it will not generally be the case that any outcome distribution on those measurements is permitted. Instead, some further set of restrictions are imposed on the space of possible outcome distribu- tions, so that they form a strict subset of the full set of outcome distributions that would be present in a Boxworld system which has the same number of fiducial measurements and outcomes. To see this in a concrete example, recall the “poly- gon models” of Chapter 2, in which the local state space takes the form of a regular n-gon, for some integer n. In the case n = 5, the set of pairs of fiducial effects {ei, en+i}5i=1 form a set of 5 measurements that is fiducial for the state space. Thus it is possible to characterise a state by assigning outcome probabilities for each of these measurements, rather than specifying its position on the pentagon: in this way the set of outcome distributions is naturally a subset of the set of outcome distributions of a Boxworld systems with 5 measurements and 2 outcomes. This subset is strict, since unlike a Boxworld system it is not possible for a state in a pentagon system to assign a definite outcome to every fiducial measurement (such a vector would have to have inner product 1 with 5 fiducial effect vectors, as well as the unit effect, which it can easily be checked is impossible). Recall that an individual Boxworld systemmay be defined by specifying a unit effectU and a set of fiducial effect vectors {Xa|x}Kx−1a=1 , all of which are linearly in- dependent. The remaining fiducial effect vectors are defined via the normalization conditions XKx|x = U − ∑Kx−1 a=1 Xa|x. Thus a g-bit (binary input/output) system is represented by a 3-dimensional state space, wherein the set of normalised states 221 lie in the subspace {v : 〈U , v〉 = 1}. Suppose now that we “squash” the g-bit state space, by introducing one extra restriction on the outcome statistics: namely, that the probability of obtaining outcome a = 1 is invariant of whether measurement choice x = 1 or x = 2 was performed. The vector of probabilities representing the outcome distribution must then take the form: (p, 1− p | p, 1− p). (5.56) In fact, this is a representation of a classical bit, since only one fiducial measure- ment (x = 1 for example) is sufficient to characterise any such state, and any out- come distribution on measurement 1 is possible. In order to make this “squashing” procedure more interesting we can apply it to a g-trit, i.e. a Boxworld system with two fiducial measurements, each of which has three outcomes. In this case, the vector of outcome probabilities takes the following form: (p, q, 1− p− q | p, q′, 1− p− q′). (5.57) The effect cone which is dual to this set of states may be constructed in the following way: in a real vector space of dimension 4, choose a set of linearly independent vectors {U , X1, X2|1, X2|2}. Unlike Boxworld, the vector X1 corre- sponds to the obtaining the first outcome for both measurements x = 1 and x = 2. The remaining fiducial effect vectors X3|2 and X3|3 are defined according to the normalisation relations: U = X1 +X2|1 +X3|1 = X1 +X2|2 +X3|2. (5.58) In this Section we will refer to such a system as a squashed g-trit. Proposition 8. Suppose that a composite system is the max-tensor product of two subsystems, the first subsystem being a squashed g-trit and the second subsystem being a standard g-bit. Then there is an allowed adjoint transformation which switches the composite fiducial effectsX(1)1 ⊗X (2) 1|1 andX (1) 1 ⊗X (2) 2|1 , whilst leaving 222 all other composite fiducial effects invariant. Moreover, this transformation is reversible and non-trivial (not a composition of local operations and permutations of subsytems). Proof. Recall that a g-bit system has fiducial effect vectors {X1|1, X2|1, X1|2, X2|2}, such thatX1|1 +X2|1 = X1|2 +X2|2 = U . The given transformation is essentially a kind of C-NOT operation on the systems involved: by mapping X(1)1 ⊗X (2) 1|1 ↔ X(1)1 ⊗ X (2) 2|1 we are imposing that the outcomes of measurement x = 1 on sub- system 2 are switched, conditional on outcome 1 occurring on subsystem 1. In order to prove the proposition, we must verify that the given transformation obeys three properties: that it is allowed (i.e. maps allowed states to allowed states in a convex-linear fashion), that it is reversible, and that it is non-trivial. 1. Allowed. Suppose that vector spaces V1 and V2 represent the squashed g-trit and g-bit subsystems respectively. It suffices to show that there exists a linear operator T † on V1⊗V2 which permutes the set of composite fiducial effect vectors in the required manner. Take bases B(1) = {U , X1, X2|1, X2|2} and B(2) = {U , X1|1, X1|2} for V1 and V2 respectively. By expanding tensor products of these basis elements in terms of the composite fiducial effect vectors, and applying the given permutation of fiducial effect vectors (the “C-NOT” operation), we may derive the action that T † must take on the tensor product basis elements. For example, we may begin by demanding that, T † ( X(1)1 ⊗X (2) 1|1 ) = X(1)1 ⊗X (2) 2|1 = X(1)1 ⊗ U (2) −X (1) 1 ⊗X (2) 1|1 . (5.59) 223 As required, T † leaves the composite unit effect invariant: T † ( U (1) ⊗ U (2) ) = T † ([ X(1)1 +X (1) 2|1 +X (1) 3|1 ] ⊗ [ X(2)1|1 +X (2) 2|1 ]) = T † ( X(1)1 ⊗ [ X(2)1|1 +X (2) 2|1 ] + [ X(1)2|1 +X (1) 3|1 ] ⊗ [ X(2)1|1 +X (2) 2|1 ]) = X(1)1 ⊗ [ X(2)2|1 +X (2) 1|1 ] + [ X(1)2|1 +X (1) 3|1 ] ⊗ [ X(2)1|1 +X (2) 2|1 ] = U (1) ⊗ U (2). (5.60) Note that alternatively decomposing U (1) asX(1)1 +X (1) 2|2 +X (1) 3|2 , or decomposing U (2) as X(2)1|2 + X (2) 2|2 (or both), will not affect this calculation, which reassures us that the action of T † on the composite unit effect is well-defined. We find that the vector U (1) ⊗X(2)1|1 is transformed in the following manner: T † ( U (1) ⊗X(2)1|1 ) = T † ( X(1)1 ⊗X (2) 1|1 +X (1) 2|1 ⊗X (2) 1|1 +X (1) 3|1 ⊗X (2) 1|1 ) = X(1)1 ⊗X (2) 2|1 +X (1) 2|1 ⊗X (2) 1|1 +X (1) 3|1 ⊗X (2) 1|1 = X(1)1 ⊗ [ U (2) −X(2)1|1 ] + [ U (1) −X(1)1 ] ⊗X(2)1|1 = X(1)1 ⊗ U (2) + U (1) ⊗X (2) 1|1 − 2X (1) 1 ⊗X (2) 1|1 . (5.61) Again, this is well-defined in the sense that we get the same result if the alternative fiducial effect decomposition of U (1) is used. It turns out that X(1)1 ⊗ X (2) 1|1 and U (1)⊗X(2)1|1 are the only two members of the tensor product basis not kept invariant by T †, as can be checked by similar calculations. For example, T † ( X(1)1 ⊗ U (2) ) = T † ( X(1)1 ⊗ [ X(2)1|1 +X (2) 2|1 ]) = X(1)1 ⊗ [ X(2)2|1 +X (2) 1|1 ] = X(1)1 ⊗ U (2). (5.62) At this point we have defined a linear map T †, defined on V1⊗V2 by its action on the tensor product basis resulting from the bases B(1) and B(2). Specifically, T † 224 maps: X(1)1 ⊗X (2) 1|1 → X (1) 1 ⊗ U (2) −X (1) 1 ⊗X (2) 1|1 U (1) ⊗X(2)1|1 → X (1) 1 ⊗ U (2) + U (1) ⊗X (2) 1|1 − 2X (1) 1 ⊗X (2) 1|1 , (5.63) and leaves all other basis vectors invariant. We must now check that T † correctly permutes the composite fiducial effects in the manner originally specified, i.e. that it switchesX(1)1 ⊗X (2) 1|1 andX (1) 1 ⊗X (2) 2|1 , whilst leaving all other composite fiducial effects invariant. This will in turn demonstrate that the specified permutation is allowed, since it necessarily corresponds to a linear map that maps the joint effect cone Emax+ into itself. Note that many of the composite fiducial effects are also members of the tensor product basis, but not those with any component belonging to the set {X(1)3|1 , X (1) 3|2 , X (2) 2|1 , X (2) 2|2}. (5.64) However, we may expand these non-basis composite fiducial effects in the tensor product basis, and hence check that T † acts correctly on them. The effectsX(1)1 ⊗ X(2)1|1 and X (1) 1 ⊗X (2) 2|1 are indeed switched: T † ( X(1)1 ⊗X (2) 1|1 ) = X(1)1 ⊗ U (2) −X (1) 1 ⊗X (2) 1|1 = X (1) 1 ⊗X (2) 2|1 (5.65) T † ( X(1)1 ⊗X (2) 2|1 ) = T † ( X(1)1 ⊗ [ U (2) −X(2)1|1 ]) = X(1)1 ⊗ U (2) − ( X(1)1 ⊗ U (2) −X (1) 1 ⊗X (2) 1|1 ) = X(1)1 ⊗X (2) 1|1 (5.66) Moreover, the remaining composite fiducial effect vectors are invariant under T †. 225 For example, T † ( X(1)3|1 ⊗X (2) 2|1 ) = T † ([ U (1) −X(1)1 −X (1) 2|1 ] ⊗ [ U (2) −X(2)1|1 ]) = [ U (1) −X(1)1 −X (1) 2|1 ] ⊗ U (2) +X(1)2|1 ⊗X (2) 1|1 + T † ( X(1)1 ⊗X (2) 1|1 − U (1) ⊗X(2)1|1 ) = [ U (1) −X(1)1 −X (1) 2|1 ] ⊗ U (2) +X(1)2|1 ⊗X (2) 1|1 +X(1)1 ⊗X (2) 1|1 − U (1) ⊗X(2)1|1 = X(1)3|1 ⊗ [ X(2)1|1 +X (2) 2|1 ] −X(1)3|1 ⊗X (2) 2|1 = X(1)3|1 ⊗X (2) 1|1 . (5.67) It is worth stressing the order of steps taken in this part of the proof. We were given a permutation of the joint fiducial effects, but with no guarantee that it cor- responds to an allowed transformation. We then derived the form of the linear operation which - if the given permutation of fiducial effect vectors corresponds to an allowed transformation - must be the unique linear operator that performs that permutation. It was then necessary to verify that the linear operator does in- deed permute the fiducial effect vectors in the correct manner. 2. Reversible. Given that T † as defined above is allowed, and permutes composite fiducial effects as in the statement of the Proposition, it is trivial to see that is is reversible. Composing T † with itself gives an operator that leaves all composite fiducial ef- fect vectors invariant. Since this set of vectors spans the space V1 ⊗ V2, it must be that T † is its own inverse. 3. Non-trivial. We now demonstrate that T † is not a composition of local operations and swap- ping of subsystems (although, since the subsystems are non-identical, the latter is 226 impossible anyway). Recall our use of the Hamming distance between pairs of fiducial effect strings in Section 5.1, which is defined as the number of compo- nents in which those strings differ. Clearly, local operations and permutations of subsystems must preserve the Hamming distance between pairs of effects. How- ever, consider the composite fiducial effects e = X(1)1 ⊗X (2) 1|1 and f = X (1) 2|1⊗X (2) 1|1 , which are mapped to the composite fiducial effects T †(e) = X(1)1 ⊗ X (2) 2|1 and T †(f) = f . The Hamming distance between e and f is 1, whereas the Hamming distance between T †(e) and T †(f) is 2. Hence T † is non-trivial. Further insight into why T † is an allowed transformation can be gained by considering the tabular representation of the max-tensor product of a squashed g- trit and a g-bit system. Suppose that, instead of a squashed g-trit, a similar C-NOT transformation is applied to a max-tensor product of a g-trit and a g-bit system. A slight change of notation is needed, so that the composite fiducial effects X(1)1|1 ⊗ X(2)1|1 andX (1) 1|1 ⊗X (2) 2|1 are switched; optionally, the effectsX (1) 1|2 ⊗X (2) 1|1 andX (1) 1|2 ⊗ X(2)2|1 may also be switched. Determining the linear form of such a transformation becomes problematic when considering the tensor product basis vectorU (1)⊗X(2)1|1 , due to the fact that a multiform effect is transformed into a non-multiform effect (recall that a bipartite binary effect table in Boxworld is multiform if and only if it contains a sub-row): 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 → 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . (5.68) However, the effect table on the RHS of (5.68) ismultiform if it represents the max-tensor product of a squashed g-trit and g-bit system. It is equivalent to the 227 following effect table: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 , (5.69) which can be seen from the fact that they are identical on pure product states (for which the first and fourth lines of the corresponding state table are identical). It is not difficult to give an example in which T † generates classical correla- tions, giving a second proof that T † is non-trivial. The product state (1/2, 1/2, 0 | 1/2, 1/2, 0)⊗ (1, 0|1, 0), (5.70) is transformed by T † in the following manner: 1/2 0 1/2 0 1/2 0 1/2 0 0 0 0 0 1/2 0 1/2 0 1/2 0 1/2 0 0 0 0 0 → 0 1/2 1/2 0 1/2 0 1/2 0 0 0 0 0 0 1/2 1/2 0 1/2 0 1/2 0 0 0 0 0 . (5.71) The state on the RHS is not a product state, but is rather the following convex combination of pure product states: 1 2 (1, 0, 0|1, 0, 0)⊗ (0, 1|0, 1) + 1 2 (0, 1, 0|0, 1, 0)⊗ (1, 0|1, 0). (5.72) On the other hand, it can be shown that T † does not generate any entanglement in this model. Recall that any reversible transformation must map pure states to pure states: for entanglement to be generated reversibly, at least one pure product state must be transformed to a pure entangled state, otherwise the set of convex 228 combinations of pure product states will be mapped into itself. We already know that the pure states of a g-bit system are the 4 states that assign deterministic out- comes for both fiducial measurement choices; a similar argument applies to the squashed g-trit, whose pure states are the 5 states that assign deterministic out- comes and give equal probabilities for the first outcome of both fiducial measure- ments. Therefore the state table representations of the pure product states can be enumerated, and it can be verified that T † permutes the set of pure product states. Indeed, we can follow this line of reasoning to conclude that no reversible transformation of the bipartite system we have described is capable of generating entanglement. Note that the state tables representing pure product states are made up of only 1s and 0s, and conversely that any state table consisting only of 1s and 0s represents a pure product state. In a similar vein to Proposition 7, we may therefore deduce that s is a pure product state if and only if 〈e, s〉 ∈ {0, 1} for all composite fiducial effects e. The adjoint of any reversible transformation T permutes the set of composite fiducial effects, hence for any pure product state s we have that 〈e, T (s)〉 = 〈T †(e), s〉 ∈ {0, 1} for all composite fiducial effects e. Thus every reversible transformation T maps pure product states to pure product states, and does not generate entanglement. 5.6 Discussion We have refined and extended the result of [4], and demonstrated that - as long as no subsystem is classical - reversible dynamics in general composite Boxworld systems always take the form of permutations of subsystems and local relabellings of measurement choices and outcomes on individual subsystems. This character- isation, and in particular the method of proof we develop, may be useful as tools for exploring which theories are reversible transitive. Note that we were already able to show in Proposition 7 of Section 2.5.5 that Boxworld is not a reversibly transitive theory, by proving that reversible transformations permute the set of pure product states. It is therefore worth highlighting in what ways it is useful to 229 provide an explicit characterisation of reversible Boxworld transformations. One very useful outcome of this work is in developing a broader understanding of reversible dynamics; in particular, developing the study of reversible transfor- mations in general probabilistic theories, and a framework for exploring the ap- plication of these techniques to theories other than Boxworld. In Section 5.5 we gave a result which limits the extent to which Theorem 13 may be generalised -- such a result would have been significantly harder to come by without the insights and perspective gained from our construction of the proof of Theorem 13 and the tabular formalism developed along the way. We have consequently shown that the analogous result to Theorem 13 does not generally hold in theories for which the number of local fiducial effects is finite, and for which composite systems combine under the max-tensor product. However, the local squashed g-trit system is not reversibly transitive, as the state which assigns outcome 1 to both fiducial measurements cannot be reversibly transformed to the state which assigns 2 to both fiducial measurements (it can be checked that the adjoint transformation will not be well-defined on the fiducial effect X(1)1 , using the notation of Section 5.5). A natural next step in this direction would be to investigate the set of allowed reversible transformations of the polygonal models given in Section 2.5.5. Such models also admit a finite number of fiducial effects, and in Section 5.5 we saw that an n-gon system may be viewed as a subset of the outcome distributions of a Boxworld system with n fiducial measurements, each of which has two outcomes. The case of even n presents an even greater similarity to Boxworld in that the extreme-ray effects are identical with the fiducial effects, thus improving the ap- plicability of the techniques we have developed. Unfortunately, this equivalence between fiducial and extreme-ray effects is not true for odd n, where only half of the fiducial effects are extreme-ray effects. At the beginning of the present chapter we discussed the centrality of reversible transitivity in derivations of quantum theory; an interesting open conjecture is that this feature alone is sufficient as a “physical axiom” fromwhich to derive quantum theory. In order to prove such a conjecture it would be necessary to demonstrate 230 that any general probabilistic theory which is not in a well-defined sense “embed- ded in” quantum theory, is not reversibly transitive either. We may say that a the- ory is “embedded in” quantum theory if the systems of that theory can be mapped to quantum systems, and the effects and states mapped to positive operators and density operators respectively on those systems, in such a manner that composite systems are mapped to the corresponding compositions of quantum systems, and that the predictions (both local and composite) of the theory may equivalently be calculated by taking the trace of the product of the relevant quantum operators. For example, Boxworld is not contained within quantum theory, since it allows for a strictly greater violation of the CHSH inequality than quantum theory does. From this perspective, the squashed g-trit example given in Section 5.5 may be seen as providing a counter-example to the strongest of three possible versions of this conjecture. In order of strongest to weakest, these versions of the conjecture may be explicitly stated as: Strong Any theory with non-trivial reversible joint dynamics is embedded in quan- tum theory. Medium Any theory which is reversibly transitive on local systems and has non- trivial, reversible, joint dynamics is embedded in quantum theory. Weak Any theory which is reversibly transitive on local and composite systems is embedded in quantum theory. Note that the bipartite system consisting of a max-tensor product of a squashed g-trit and a g-bit is not embedded in quantum theory, since it allows for maximal CHSH violations in much the same way that a max-tensor product of two g-bits does, via fiducial measurements on the following allowed state: 231 0 0 0 0 1/2 0 1/2 0 0 1/2 0 1/2 0 0 0 0 1/2 0 0 1/2 0 1/2 1/2 0 (5.73) Thus the squashed g-trit provides a counter-example to the strong version of the conjecture, but not to the medium or weak versions. As already mentioned, it would be interesting to look at whether polygon models (which are locally re- versibly transitive) give any insight into whether the medium conjecture holds in general. One result by Masanes et al [2] is in a similar vein to the Medium conjec- ture, and states that any general probabilistic theory whose local systems are qubit systems, and which admits at least one non-trivial, continuous, reversible interac- tion between systems, must also be identical to quantum theory on its composite systems. This suggests that structure of the local state space, in conjunction with reversible transitivity, may have a lot to tell us about the nature of quantum theory. If it is true that all reversibly transitive theories are embedded in quantum the- ory, then quantum theory is the maximal reversibly transitive theory that could possibly describe Nature; this would go some way towards settling the matter of understanding why quantum theory is the most accurate description of Nature. On the other hand, if a “foil” theory is discovered, which is reversibly transitive but which makes predictions that quantum theory cannot, then this is of great interest in itself. Perhaps such a counterexample will eventually supersede quantum the- ory as a theory of Nature; at the very least it will give us some indication of what further axioms are necessary (rather than simply sufficient) to distinguish quan- tum theory from the full set of general probabilistic theories. Whichever outcome turns out to be true, the question of reversible transitivity is undoubtedly worthy of further investigation. 232 233 Chapter 6 Conclusion and Outlook The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. We should be grateful for it and hope that it will remain valid in future research and that it will extend, for better or for worse, to our pleasure, even though perhaps also to our bafflement, to wide branches of learning. "The Unreasonable Effectiveness of Mathematics" Eugene Wigner 234 In this thesis we have derived various results which attempt to provide a more intuitive and comprehensive understanding of some of the stranger features of quantum theory. In Chapter 3 we investigated the remarkable fact that if one re- laxes either one of two assumptions in quantum theory's Hilbert space formalism - that states be represented by positive operators, or that observables be represented by positive operators - then the predictions of quantum theory extend to the full class of non-signaling distributions, and hence encompass the full class of gen- eral probabilistic theories. We have also shown that the same feat is possible by introducing quasiprobabilies into local, classical outcome distributions, even in the case where almost no correlation exists between the involved systems. These results are useful in that they can be applied to the study of general probabilistic theories (for example, in deriving quantum theory from physical principles [2]) and that they provide a neat categorisation of quantum-achievable correlations as those which are generated so long as we do enforce the positivity of all operators. Chapter 4 explores the uniqueness of quantum theory from the perspective of information theory. We prove the conjectured tightness of a bound on how well entanglement allows us to perform random access codes, and introduce a quadratic bias bound ∑ yE2y ≤ 1 which seems to capture a great deal of information about the set of quantum-achievable correlations. We also argue that the existence of a sensible measure of entropy precludes many general probabilistic theories whose non-locality is stronger than that of quantum theory, and have discussed how this relates to recent research on entropies in general probabilistic theories. Both the quadratic bias bound and the suggested link between entropy and non-locality throw up many intriguing open questions which may in future lead to fruitful lines of research. Chapter 5 explores the role that reversible transitivity plays as a characteristic and fundamental feature of quantum theory. It is demonstrated that the triviality of reversible Boxworld transformations, previously only known in the case that all subsystems are identical [4], extends to the most general case, so long as none of the subsystems are classical. A key insight in this proof was to consider a special 235 class of effects known as sub-unit effects, and to show that the set of sub-unit effects is mapped to itself by the adjoint of any reversible transformation. As a visual aid to motivate the central idea of the proof, and to provide a simpler proof for the bipartite case, we introduced and further developed a tabular formalism of Boxworld states and effects first introduced in [5]. The techniques developed in this chapter may provide useful tools in exploring the reversible dynamics of other general probabilistic theories. We give one example of a model which is more non-local than quantum theory, but does allow for non-trivial reversible dynamics, although this model is not reversibly transitive. There exists a fascinating interplay between how non-local a theory is and the richness of its reversible dynamics. Further research might illuminate how the dynamical properties of the universe inform the strength of correlations allowed between distant parties. The reader may make the justified observation that the content of each chap- ter is somewhat disparate, in that there is little interplay between their respective results, and in that they do not appear to constitute a linear progression towards some unified goal. Nevertheless, in the interests of tying together several years spent thinking about the same physical theory, it is worth analysing what similar- ities can be found amongst these chapters. There is one quite conspicuous feature that they share in common: each chapter constitutes an investigation of quantum theory “from the outside”; that is to say, as a special member of the class of general probabilistic theories. We may go one step further and say that the main results of each chapter are an attempt to mark out quantum theory from all other theories; to discern what makes quantum theory special in the first place. Chapters 4 and 5 are each concerned with a fundamental property of quantum theory, the imposition of which renders one or more alternative theories as unnat- ural, or at least inconvenient. One moral that can be drawn equally well from both chapters is thus: if the world were not as described by quantum theory, then one or another fundamental physical aspect of the universe would be violated. These results strive towards a characterisation of quantum theory as the only reasonable theory that could possibly describe nature. Of course, we should always keep at 236 the back of our minds the possibility that quantum theory is not the most accurate description of nature, or indeed that the attempt to model nature in a way which is somehow agreeable to our common sense is futile, since nature might simply be too weird for us to grasp in a way that we find intuitive. Despite this reservation, we can take the optimistic view from these chapters (and other recent results in quantum foundations) that there is still more ground to be gained in obtaining an intuitive and reasonable explanation of quantum phenomena. Chapter 3 stands apart from Chapters 4 and 5 in that it makes no real judge- ment about non-quantum theories. Rather, the results in this chapter explore the relation between the outcome distributions achievable in classical theory, quan- tum theory and Boxworld, by means of extending the normal rules of probability. The overarching moral of the chapter takes on a more negative tone concerning the uniqueness of quantum theory: it is not likely a special feature of quantum theory that all non-signaling distributions admit quasiprobabilistic quantum repre- sentations, and negative probabilities should probably be regarded as a useful cal- culational tool more than as a profound insight into the workings of the universe. However, it is very possible that future work on quasiprobabilistic representations will, for example, draw a stronger connection between non-locality and negative probabilities, in such a way as to provide a reasonable principle by which strongly non-local outcome distributions are ruled out. Even if this does not turn out to be the case, it is still useful to have a unified, quantum-like framework for non- signaling correlations, and to have a variety of local quasiprobabilistic models for these correlations. Quantum non-locality has played almost as prominent a role in our discus- sions as have general probabilistic theories. Each chapter draws out interactions between non-locality and some other object of physical or mathematical interest, be it negative probability, information-based games, entropy, or reversible dynam- ics. An interesting take on the local quasiprobabilistic distributions of Chapter 3 is the idea that one is replacing the weirdness of non-locality with the weirdness of negative probabilities. Assigning positive probabilities to all possible outcomes 237 is clearly an assumption of Bell's Theorem; we have shown that Bell's Theorem does not hold if one just slightly relaxes to the extent that negative probabilities are assigned to outcomes which never actually occur. Despite its treatment by various notable physicists including Dirac [66] and Feynman [91], the idea of negative probabilities has never gained much traction. Is it simply a lesser de- gree of discomfort that seems to incline us towards accepting non-locality over negative probabilities? Perhaps our comfort with locality is the result of cultural and historical familiarity; if Newtonian physics had been written in a different conceptual language, would the language of quantum theory - being heavily in- fluenced by Newtonian mechanics - have developed differently too, perhaps even having a general probabilistic theory slant from the beginning? These questions are difficult to answer, but as long as the presiding theories of nature admit distinct interpretations and distinct mathematical representations, it is in our own nature to differentiate these interpretations and representations according to our sense of reason, usefulness and beauty. 238 Bibliography [1] A. Acín, R. Augusiak, D. Cavalcanti, C. Hadley, J. K. Korbicz, M. Lewen- stein, Ll. Masanes, andM. Piani. Unified framework for correlations in terms of local quantum observables. Phys. Rev. Lett., 104:140404, 2010. [2] G. de la Torre, L. Masanes, A. J. Short, and M. P. Müller. Deriving quantum theory from its local structure and reversibility. Phys. Rev. Lett., 109:090403, 2012. [3] M. Pawlowski and M. Zukowski. Entanglement-assisted random access codes. Phys. Rev. A, 81:042326, 2010. [4] D. Gross, M. P. Mueller, R. Colbeck, and O. C. O. Dahlsten. All re- versible dynamics in maximally non-local theories are trivial. Phys. Rev. Lett., 104:080402, 2010. [5] A.J. Short and J. Barrett. Strong nonlocality: a trade-off between states and measurements. New J. Phys., 12:033034, 2010. [6] W.K.Wootters andW.H. Zurek. A single quantum cannot be cloned. Nature, 299:802, 1982. [7] S. Kochen and E. P. Specker. The problem of hidden variables in quantum mechanics. Indiana Univ. Math. J., 17:59, 1967. [8] J. S. Bell. On the Einstein-Podolsky-Rosen paradox. Physics, 1:195, 1964. 239 [9] A. Einstein, B. Podolsky, and N. Rosen. Can quantum-mechanical descrip- tion of physical reality be considered complete? Phys. Rev., 47:777, 1935. [10] D. Bohm and Y. Aharonov. Discussion of experimental proof for the paradox of Einstein, Rosen, and Podolsky. Phys. Rev., 108:1070, 1957. [11] A. Aspect, P. Grangier, and G. Roger. Experimental realization of Einstein- Podolsky-Rosen-Bohm gedankenexperiment: A new violation of Bell's in- equalities. Phys. Rev. Lett., 49:91, 1982. [12] G.Weihs, T. Jennewein, C. Simon, H.Weinfurter, and A. Zeilinger. Violation of Bell's inequality under strict Einstein locality conditions. Phys. Rev. Lett., 81:5039, 1998. [13] M. A. Rowe, D. Kielpinski, V. Meyer, C. A. Sackett, W. M. Itano, C. Mon- roe, and D. J. Wineland. Experimental violation of a Bell's inequality with efficient detection. Nature, 409:791, 2001. [14] D. N. Matsukevich, P. Maunz, D. L. Moehring, S. Olmschenk, and C. Mon- roe. Bell inequality violation with two remote atomic qubits. Phys. Rev. Lett., 100:150404, 2008. [15] M. Giustina, A. Mech, S. Ramelow, B. Wittman, J. Kofler, J. Beyer, A. Lita, B. Calkins, T. Gerrits, S. W. Nam, R. Ursin, and A. Zeilinger. Bell viola- tion using entangled photons without the fair-sampling assumtption. Nature, 497:227, 2013. [16] S. Groblacher, T. Paterek, R. Kaltenbaek, C. Brukner, M. Zukowski, M. As- pelmeyer, and A. Zeilinger. An experimental test of non-local realism. Na- ture, 446:871, 2007. [17] N. Gisin and H. Zbinden. Bell inequality and the locality loophole: Active versus passive switches. Phys. Lett. A, 264(2–3):103, 1999. 240 [18] D. Salart, A. Baas, J. A. W. van Houwelingen, N. Gisin, and H. Zbinden. Spacelike separation in a Bell test assuming gravitationally induced col- lapses. Phys. Rev. Lett., 100:220404, 2008. [19] T. Scheidl, R. Ursin, J. Kofler, S. Ramelow, X.-S. Ma, T. Herbst, L. Ratschbacher, A. Fedrizzi, N. K. Langford, T. Jennewein, and A. Zeilinger. Violation of local realism with freedom of choice. Proc. Natl. Acad. Sci. U.S.A., 107(46):19708, 2010. [20] G. Blaylock. The E. P. R. paradox, Bell's inequality, and the question of locality. Am. J. Phys., 78:111, 2009. [21] T. Maudlin. What Bell proved: A reply to Blaylock. Am. J. Phys., 78:121, 2010. [22] P. H. Eberhard. Bell's theorem without hidden variables. Nuovo Cimento B, 38:75, 1977. [23] T. Norsen. Against ‘realism’. Found. Phys., 37(3):311--340, 2007. [24] T. Maudlin. Space-time in the quantum world. In Bohmian mechanics and quantum theory: an appraisal. Kluwer Academic Publishers, 1996. [25] P. M. Pearle. Hidden-variable example based upon data rejection. Phys. Rev. D, 2:1418, 1970. [26] A. Garg and N. D. Mermin. Detector inefficiencies in the Einstein-Podolsky- Rosen experiment. Phys. Rev. D, 35:3831, 1987. [27] A. K. Ekert. Quantum cryptography based on Bell's theorem. Phys. Rev. Lett., 67:661, 1991. [28] L. Hardy. Quantum theory from five reasonable axioms, 2001. arXiv:quant- ph/0101012. 241 [29] J. Barrett. Information processing in generalized probabilistic theories. Phys. Rev. A, 75:032304, 2007. [30] J. Barrett, N. Linden, S. Massar, S. Pironio, S. Popescu, and D. Roberts. Nonlocal correlations as an information-theoretic resource. Phys. Rev. A, 71:022101, 2005. [31] H. Barnum, J. Barrett, M. Leifer, and A. Wilce. Cloning and broadcasting in generic probabilistic theories, 2006. arXiv:quant-ph/0611295. [32] G. Chiribella, G. M. D'Ariano, and P. Perinotti. Probabilistic theories with purification. Phys. Rev. A, 81:062348, 2010. [33] G. Chiribella, G. M. D'Ariano, and P. Perinotti. Informational derivation of quantum theory. Phys. Rev. A, 84:012311, 2011. [34] L. Masanes and M. P. Müller. A derivation of quantum theory from physical requirements. New J. Phys., 13:063001, 2011. [35] H. Barnum, M. P. Mueller, and C. Ududec. Higher-order interfer- ence and single-system postulates characterizing quantum theory, 2014. arXiv:1403.4147 [quant-ph]. [36] P. Janotta, C. Gogolin, J. Barrett, and N. Brunner. Limits on nonlocal corre- lations from the structure of the local state space. New J. Phys., 13:063024, 2011. [37] C. Pfister. One simple postulate implies that every polytopic state space is classical, 2012. (Master's Thesis, Institute for Theoretical Physics, ETH Zurich) arXiv:1203.5622 [quant-ph]. [38] A. J Short and S.Wehner. Entropy in general physical theories. New J. Phys., 12:033023, 2010. [39] G. Kimura, K. Nuida, and H. Imai. Distinguishability measures and entropies for general probabilistic theories. Rep. M. Phys., 66:175, 2010. 242 [40] H. Barnum, J. Barrett, L. O. Clark, M. Leifer, R. Spekkens, N. Stepanik, A. Wilce, and R. Wilke. Entropy and information causality in general prob- abilistic theories. New J. Phys., 12:033024, 2010. [41] G. M. D'Ariano. No-signaling, dynamical independence, and the local ob- servability principle. J. Phys. A, 40:8137, 2007. [42] A. J. Short. No purification for two copies of a noisy entangled state, 2008. arXiv:0809.2622 [quant-ph]. [43] S. W. Al-Safi and A. J. Short. Simulating all nonsignaling correlations via classical or quantum theory with negative probabilities. Phys. Rev. Lett., 111:170403, 2013. [44] S. W. Al-Safi and A. J. Short. Information causality from an entropic and a probabilistic perspective. Phys. Rev. A, 84:042323, 2011. [45] S. W. Al-Safi and A. J. Short. Reversible dynamics in strongly non-local boxworld systems. J. Phys. A: Math. Theor., 47:325303, 2014. [46] J. Clauser, M. Horne, A. Shimony, and R. Holt. Proposed experiment to test local hidden-variable theories. Phys. Rev. Lett., 23:880, 1969. [47] B. Tsirelson. Quantum generalizations of Bell's inequality. Lett. Math. Phys., 4:93, 1980. [48] S. Popescu and D. Rohrlich. Quantum nonlocality as an axiom. Found. Phys., 24:379, 1994. [49] C. A. Fuchs. Quantum mechanics as quantum information (and only a little more), 2002. arXiv:quant-ph/0205039. [50] T. H. Yang, M. Navascués, L. Sheridan, and V. Scarani. Quantum bell in- equalities from macroscopic locality. Phys. Rev. A, 83:022105, 2011. 243 [51] M. Pawlowski, T. Paterek, D. Kaszlikowski, V. Scarani, A. Winter, and M. Zukowski. Information causality as a physical principle. Nature, 461:1101, 2009. [52] N. Linden, S. Popescu, A. J. Short, and A. Winter. Quantum nonlocality and beyond: Limits from nonlocal computation. Phys. Rev. Lett., 99:180502, 2007. [53] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Infor- mation. Cambridge University Press, Cambridge, 2000. [54] H. Barnum, J. Barrett, M. Leifer, and A. Wilce. Teleportation in general probabilistic theories. In Mathematical Foundations of Information Flow (Proceedings of the Clifford Lectures 2008), page 25. American Mathemati- cal Society, 2012. [55] M. P. Müller and C. Ududec. The structure of reversible computation de- termines the self-duality of quantum theory. Phys. Rev. Lett., 108:130401, 2012. [56] H. Barnum, C. P. Gaebler, and A. Wilce. Ensemble steering, weak self- duality, and the structure of probabilistic theories, 2009. arXiv:0912.5532 [quant-ph]. [57] A. Wilce. Symmetry, self-duality and the jordan structure of quantum me- chanics, 2011. arXiv:1110.6607 [quant-ph]. [58] C. Piron. Axiomatique quantique. Helv. Phys. Acta, 37:439, 1964. [59] D.J. Foulis and C.H. Randall. Empirical logic and quantum mechanics. Syn- these, 29(1-4):81, 1974. [60] B. Coecke S. Abramsky. A categorical semantics of quantum protocols. In Proceedings of the 19th IEEE conference on Logic in Computer Science, page 415. IEEE Computer Science Press, 2004. 244 [61] R. Webster. Convexity. Oxford University Press, Oxford, 1994. [62] C. Carathéodory. Über den variabilitätsbereich der fourier’schen konstan- ten von positiven harmonischen funktionen. Rend. Circ. Mat. Palermo, 32(1):193, 1911. [63] B. Grünbaum. Convex Polytopes. Interscience Publishers, 1967. [64] N. S. Jones and L. Masanes. Interconversion of nonlocal correlations. Phys. Rev. A, 72:052312, 2005. [65] S. Pironio, J.-D. Bancal, andV. Scarani. Extremal correlations of the tripartite no-signaling polytope. J. Phys. A: Math Theor., 44:065303, 2011. [66] P. A. M. Dirac. Bakerian lecture. the physical interpretation of quantum me- chanics. Proc. R. Soc. A, 180(980):1, 1942. [67] E.Wigner. On the quantum correction for thermodynamic equilibrium. Phys. Rev., 40:749, 1932. [68] C. Ferrie. Quasi-probability representations of quantum theory with appli- cations to quantum information science. Rep. Prog. Phys., 74(11):116001, 2011. [69] T. J. Barnea, J.-D. Bancal, Y.-C. Liang, and N. Gisin. Tripartite quantum state violating the hidden-influence constraints. Phys. Rev. A, 88:022123, 2013. [70] H. Buhrman,M. Christandl, F. Unger, S.Wehner, andA.Winter. Implications of superstrong nonlocality for cryptography. Proc. Roy. Soc. A, 462:1919, 2006. [71] A. Ambainis, D. Leung, L. Mancinska, and M. Ozols. Quantum random access codes with shared randomness, 2008. (Master's thesis, University of Waterloo) arXiv:0810.2937 [quant-ph]. 245 [72] G. Brassard, H. Buhrman, N. Linden, A. A. Méthot, A. Tapp, and F. Unger ̇Limit on nonlocality in any world in which communication complexity is not trivial. Phys. Rev. Lett., 96:250401, 2006. [73] J. Allcock, N. Brunner, M. Pawlowski, and V. Scarani. Recovering part of the boundary between quantum and nonquantum correlations from information causality. Phys. Rev. A, 80:040103, 2009. [74] M. Navascués, S. Pironio, and A. Acín. Bounding the set of quantum corre- lations. Phys. Rev. Lett., 98:010401, 2007. [75] M. Navascues, S. Pironio, and A. Acín. A convergent hierarchy of semidef- inite programs characterizing the set of quantum correlations. New J. Phys., 10:073013, 2008. [76] R. Franco and V. Penna. Discrete wigner distribution for two qubits: a char- acterization of entanglement properties. J. Phys. A, 39(20):5907, 2006. [77] J. Degorre, M. Kaplan, S. Laplante, and J. Roland. The communication com- plexity of non-signaling distributions. Quantum Info. Comput., 11:649, 2011. [78] E. R. Loubenets. Local quasi hidden variable modelling and violations of bell-type inequalities by a multipartite quantum state. J. Math. Phys., 53:022201, 2012. [79] E. R. Loubenets. Nonsignaling as the consistency condition for local quasi classical probability modelling of a general multipartite correlation scenario. J. Phys. A: Math. Theor., 45:185306, 2012. [80] R. Cleve, P. Høyer, B. Toner, and J. Watrous. Consequences and limits of nonlocal strategies. In 19th Annual IEEE Conference on Computational Complexity, page 236. IEEE, 2004. 246 [81] P. K. Aravind. A simple demonstration of bell's theorem involving two observers and no probabilities or inequalities, 2002. arXiv:quant- ph/0206070v2. [82] J. Briet, H. Buhrman, T. Lee, and T. Vidick. Multiplayer xor games and quantum communication complexity with clique-wise entanglement, 2009. arXiv:0911.4007 [quant-ph]. [83] M. L. Almeida, J.-D. Bancal, N. Brunner, A. Acín, N. Gisin, and S. Pironio. Guess your neighbor's input: A multipartite nonlocal game with no quantum advantage. Phys. Rev. Lett., 104:230404, 2010. [84] W. van Dam. Implausible consequences of superstrong nonlocality. Natural Computing, 12:9, 2013. [85] L. Masanes, A. Acin, and N. Gisin. General properties of nonsignaling the- ories. Phys. Rev. A, 73:012112, 2006. [86] B. Tsirelson. Quantum analogues of Bell inequalities: The case of two spa- tially separated domains. J. Sov. Math., 36:557, 1987. [87] G. Chiribella, G. M. D'Ariano, and P. Perinotti. Quantum theory, namely the pure and reversible theory of information. Entropy, 14:1877, 2012. [88] P. Skrzypczyk, A. J. Short, and S. Popescu. Work extraction and thermody- namics for individual quantum systems. Nat. Commun., 5:4185, 2014. [89] N. Brunner, M. Kaplan, A. Leverrier, and P. Skrzypczyk. Dimension of physical systems, information processing, and thermodynamics, 2014. arXiv:1401.4488 [quant-ph]. [90] M. Huber, M. Perarnau-Llobet, K. V. Hovhannisyan, P. Skrzypczyk, C. Klöckl, N. Brunner, and A. Acín. Thermodynamic cost of creating corre- lations, 2014. arXiv:1404.2169 [quant-ph]. 247 [91] R. P. Feynman. Negative probability. In F. David Peat and Basil Healy, editors, Essays in Honour of David Bohm. Routledge & Kegan Paul Ltd, London, 1987. 248