Quantum theory from the
perspective of general probabilistic
theories
Sabri Walid Al-Safi
Clare College
University of Cambridge
July 2014
This dissertation is submitted for the degree of
Doctor of Philosophy
2
Acknowledgements
The period during which this work was carried out would have been con-
siderably less enjoyable without the intervention of many people. I am ines-
timably grateful for the unflinching love, support and tolerance of my par-
ents, Linda and Walid, and my brother Khalid. I owe special thanks to Jassy,
whose passion, awareness and integrity I will always aspire to, and whose
counsel has always proved judicious. Thank you all for your patience.
I couldn't have undertaken this research without the energy and genial-
ity of my supervisor, Tony Short, just as I couldn't have made it intelligible
without the enthusiasm of my colleagues at the Centre for Quantum Informa-
tion and Foundations in Cambridge, as well as many like-minded researchers
whom I've met along the way. Thank you for helping me to see the way for-
wards.
I would also like to thank - in no particular order, not even a partial one
- friends who have given me a great deal of pleasure during this time: Jack,
Matt, Danny & Ellie, Zach, Laurence, John, Alex, Ilan, Sam, David, Dan,
Sarah, Pippa, Will, Cat & Tom, Terry F. and Matt L., who brightened Pav. F
basement; and tea, for inducing just the right kind of daze.
3
4
To Jassy
5
6
Abstract
This thesis explores various perspectives on quantum phenomena, and
how our understanding of these phenomena is informed by the study of
general probabilistic theories. Particular attention is given to quantum non-
locality, and its interaction with areas of physical and mathematical interest
such as entropy, reversible dynamics, information-based games and the idea
of negative probability. We begin with a review of non-signaling distribu-
tions and convex operational theories, including “black box” descriptions of
experiments and the mathematics of convex vector spaces.
In Chapter 3 we derive various classical and quantum-like quasiproba-
bilistic representations of arbitrary non-signaling distributions. Previously,
results in which the density operator is allowed to become non-positive [1]
have proved useful in derivations of quantum theory from physical require-
ments [2]; we derive a dual result in which the measurement operators in-
stead are allowed to become non-positive, and show that the generation of
any non-signaling distribution is possible using a fixed separable state with
negligible correlation. We also derive two distinct “quasi-local” models of
non-signaling correlations.
Chapter 4 investigates non-local games, in particular the game known as
Information Causality. By analysing the probability of success in this game,
we prove the conjectured tightness of a bound given in [3] concerning how
well entanglement allows us to perform the task of random access coding,
and introduce a quadratic bias bound which seems to capture a great deal
of information about the set of quantum-achievable correlations. By refor-
mulating Information Causality in terms of entropies, we find that a sensi-
ble measure of entropy precludes many general probabilistic theories whose
non-locality is stronger than that of quantum theory.
Chapter 5 explores the role that reversible transitivity (the principle that
any two pure states are joined by a reversible transformation) plays as a char-
acteristic feature of quantum theory. It has previously been shown that in
Boxworld, the theory allowing for the full set of non-signaling correlations,
any reversible transformation on a restricted class of composite systems is
merely a composition of relabellings of measurement choices and outcomes,
and permutations of subsystems [4]. We develop a tabular description of
Boxworld states and effects first introduced in [5], and use this to extend this
reversibility result to any composite Boxworld system in which none of the
subsystems are classical.
7
8
Preface
This dissertation is the result of my ownwork and includes nothingwhich
is the outcome of work done in collaboration except where specifically indi-
cated in the text.
9
10
Contents
1 Introduction 19
1.1 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 23
2 General Probabilistic Theories 27
2.1 Bell's Theorem and the CHSH game . . . . . . . . . . . . . . . . 29
2.2 Black boxes and non-signaling distributions . . . . . . . . . . . . 34
2.3 Examples of probabilistic theories . . . . . . . . . . . . . . . . . 42
2.3.1 Classical theory . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.2 Quantum theory . . . . . . . . . . . . . . . . . . . . . . 45
2.3.3 Boxworld . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4 The non-signaling polytope . . . . . . . . . . . . . . . . . . . . . 51
2.5 Convex vector space representations . . . . . . . . . . . . . . . . 54
2.5.1 Convex geometry . . . . . . . . . . . . . . . . . . . . . . 56
2.5.2 Abstract state spaces . . . . . . . . . . . . . . . . . . . . 62
2.5.3 Composite Systems . . . . . . . . . . . . . . . . . . . . . 67
2.5.4 Transformations . . . . . . . . . . . . . . . . . . . . . . 71
2.5.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3 Quasiprobability representations 83
3.1 Quasiprobability distributions . . . . . . . . . . . . . . . . . . . 85
3.2 Quantum theory with quasiprobabilities . . . . . . . . . . . . . . 89
3.2.1 Non-positive density matrices . . . . . . . . . . . . . . . 92
3.2.2 Commuting operators . . . . . . . . . . . . . . . . . . . . 98
3.3 Classical theory with quasiprobabilities . . . . . . . . . . . . . . 100
3.3.1 Non-positive mixtures . . . . . . . . . . . . . . . . . . . 102
3.3.2 Non-positive effects . . . . . . . . . . . . . . . . . . . . 107
3.3.3 Quantum corollaries . . . . . . . . . . . . . . . . . . . . 110
3.4 Further quasiprobabilistic quantum models . . . . . . . . . . . . . 112
11
3.4.1 Non-positive observables acting on ρN,d . . . . . . . . . . 113
3.4.2 Approximate product states . . . . . . . . . . . . . . . . 120
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4 Information Causality 127
4.1 Non-local games . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.1.1 The CHSH game revisited . . . . . . . . . . . . . . . . . 132
4.1.2 Inner product games . . . . . . . . . . . . . . . . . . . . 134
4.1.3 The information causality game . . . . . . . . . . . . . . 137
4.2 Probability of success in the information causality game . . . . . 145
4.2.1 Random Access Codes . . . . . . . . . . . . . . . . . . . 146
4.2.2 Optimal success probability of EARACs . . . . . . . . . 148
4.3 The role of entropy . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.3.1 An entropic information causality . . . . . . . . . . . . . 157
4.3.2 Entropy in general physical theories . . . . . . . . . . . . 162
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5 Reversible Boxworld dynamics 172
5.1 Identical subsystems: review . . . . . . . . . . . . . . . . . . . . 177
5.1.1 Fiducial effect vectors . . . . . . . . . . . . . . . . . . . 179
5.1.2 Induced fiducial effect string permutations . . . . . . . . 182
5.1.3 Characterisation of reversible transformations . . . . . . . 185
5.2 Effect tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
5.2.1 Reversible transformations on effect tables . . . . . . . . 200
5.2.2 Diagrammatic proof for general bipartite systems . . . . . 202
5.3 Decompositions of effects . . . . . . . . . . . . . . . . . . . . . 208
5.3.1 Identical subsystems revisited . . . . . . . . . . . . . . . 215
5.4 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.5 Polytopic models with non-trivial reversible dynamics . . . . . . 220
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6 Conclusion and Outlook 234
12
List of Figures
2.1 Quantum measurement operators for CHSH violation . . . . . . . 33
2.2 “Black box” description of experimental statistics. . . . . . . . . . 37
2.3 Multipartite experimental statistics. . . . . . . . . . . . . . . . . . 38
2.4 The non-signaling polytope . . . . . . . . . . . . . . . . . . . . . 53
2.5 Polygon state spaces . . . . . . . . . . . . . . . . . . . . . . . . 81
3.1 Qn hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1 Non-local games and distributed computations . . . . . . . . . . . 129
4.2 Violating Information Causality with PR-box states . . . . . . . . 143
5.1 Improper tripartite effect not detected by pure product states . . . 199
13
14
List of Notation
General
H Finite-dimensional, complex Hilbert space.
L (H) Vector space of linear operators onH.
|ψ〉 Unit vector inH representing a pure quantum state.
ρ Hermitian, positive semi-definite, unit-trace (density) operator onH.
ρ˜ Hermitian, unit-trace operator onH.
Ei Hermitian, positive semi-definite (POVM) operator onH, satisfying
Tr (E) ≤ 1
Ma|x POVM operator corresponding to measuring x for
and obtaining outcome a.
M˜a|x Non-positive measurement operator corresponding to measuring x
and obtaining outcome a.
[N ] The set {1, . . . , N}.
Hc({p}) Classical entropy of a probability distribution
{p} = {p1, . . . pn)}.
Ic({p} : {q}) Classical mutual information between distributions
{p} and {q}.
15
Outcome distributions
P (a1, . . . , aN |x1, . . . , xN) Multipartite probability of outcome ai being
obtained on subsystem i, conditional on having
locally measured xi
Ω A subset of the set of subsystems [N ] over which
the outcome distribution is defined.
P (aΩ|xΩ) Marginal outcome distribution over the subsystems
belonging to Ω.
M (i) The number of fiducial measurement choices
at subsystem i.
K(i)xi The number of possible outcomes for of
fiducial measurement xi
on subsystem i.
P (a|x) Abbreviated form of the above, where
a = (a1, . . . , aN) and x = (x1, . . . , xN).
Ex1x2 Bias of a g-bit system, conditional on
measurements x1 and x2 being performed on
subsystems 1 and 2.
C / C1 The CHSH value of an outcome distribution.
PΛ(λ) Joint probability distribution over local variables.
P˜Λ(λ) Joint quasiprobability distribution over local
variables.
P (i)(ai|xi, λi) Local outcome function, determining probability
of obtaining outcome ai on subsystem i, conditional
on having locally measured xi and λi being the value
of the local variable.
P˜ (i)(ai|xi, λi) Quasiprobabilistic local outcome function.
16
General probabilistic theories
V Vector space representing the theory.
S Subset of V representing normalised states.
S+ Cone of V generated by S.
E Subset of V representing effects.
E+ Cone of V generated by E .
U Vector in V representing the unit effect.
UN Unit effect vector for a composite system comprising N
subsystems.
S[N ] The set of sub-unit effects of a composite system.
S{i} The set of i-sub-unit effects, for subsystem i of a composite
system.
Xa|x Fiducial effect corresponding to measuring x and obtaining
outcome a.
X(i)ai|xi Local fiducial effect on subsystem i.
T Transformation of states in V .
T † Transformation of effects in V (adjoint of T ).
M Set of effects corresponding to a measurement.
M∗ The set of fine-grained measurements of a system in the theory.
Hme(s) Measurement entropy of a state s.
Hmi(s) Mixing entropy of a state s.
17
18
Chapter 1
Introduction
Physics is mathematical not because we know so much about the
physical world, but because we know so little; it is only its mathe-
matical properties that we can discover.
“An Outline of Philosophy”
Bertrand Russell
19
The history of quantum theory is replete with results, commonly referred to as
“no-go theorems”, which assert that a certain physical state of affairs is impossi-
ble, as long as one assumes that the mathematical formalism of quantum theory
accurately describes the workings of Nature. No-go theorems often have ramifi-
cations for the practical application of quantum theory: the No-Cloning Theorem
for example implies that it is impossible to construct a machine which can make
copies of arbitrary quantum states [6]. On the other hand, many no-go theorems
also have ramifications for our understanding of Nature: the Kochen-Specker the-
orem for example states that quantum mechanical properties cannot be embedded
into a more fundamental set of properties of reality which, unlike quantum ob-
servables, are mutually compatible [7]. Invariably however, no-go theorems offer
crucial insight into phenomena which are central to a full understanding of quan-
tum theory, such as indeterminacy, entanglement, uncertainty relations, and the
measurement process.
Arguably themost enduring and provocative phenomenon in the study of quan-
tum foundations, with profound implications both practically and philosophically,
is that of non-locality, encapsulated by Bell in 1964 with a no-go theorem known
today as Bell's Theorem [8]. Basing his ideas on a 1930's thought experiment by
Einstein et al. [9] later refined by Bohm [10], Bell proved that it is impossible
for any physical theory obeying a well-defined notion of local causality to accu-
rately reproduce all the experimental predictions of quantum theory. In particular,
correlations in the local measurement statistics of a separated, but entangled, pair
of electrons cannot be accurately reproduced by a physical theory which assumes
certain - apparently reasonable - locality relations between the involved systems.
Such “local” theories obey what are now known as Bell inequalities: restrictions
on experimental outcome statistics which can be significantly violated by quan-
tum measurements. Assuming that quantum theoretic predictions are accurate for
Bell's proposed experiment (as verified by numerous experimental realisations to
date, e.g. [11-19]), then local theories cannot be representative of Nature; the uni-
verse itself is non-local.
20
Bell's arguments opened up more fascinating questions than they resolved, and
paved the way for extensive research into non-locality that continues to this day.
Philosophical debate has raged over how Bell's original assumptions are to be in-
terpreted, and whether there exist hidden assumptions in his argument that one
might prefer to drop rather than locality [20-24]. Repeated realisations of Bell's
experiment have attempted to close down various loopholes proposed over the
years as a way for local theories to “get around” tests done so far, for example due
to instrumental inefficiencies [14, 15, 25, 26] or sub-luminal influences between
sites [11, 12, 19]. The practical utilization of non-locality has also been explored,
with the development of a key distribution protocol that relies on violation of Bell
inequalities as proof against eavesdropping [27].
Much of this thesis focuses on the technical study of non-locality as a quantifi-
able feature of measurement outcomes on quantum systems, and the intuition that
it gives us about the mathematical structure of quantum theory. One of the greatest
insights of Bell's Theorem is a focus on the measurement results, rather than the
underlying physical state of affairs (or the mathematical language used to codify
that state of affairs). Unlike the related phenomenon of entanglement, which ad-
mits only an algebraic characterization in terms of Hilbert spaces, non-locality is
defined in terms of the statistics of experimental outcomes, i.e. the results which
are noted in the observer's logbook, and how they correlate with another logbook
some distance away. The purity of this approach is useful in multiple ways. Firstly,
it opens a direct link to information theory, which ultimately cares about the stor-
age and transmission of information that is accessible to humans - even quantum
information theory in the end is concerned with the eventual classical readout of
results. Secondly, it grants us a language with which to discuss hypothetical mod-
els of nature in terms of the measurement results they generate; these models are
of special interest if quantum theory cannot replicate those results. Lastly, it is
a potential means to putting quantum theory onto a more intuitive footing, and
stripping it of the opaque Hilbert-space axiomatisation which pervades any intro-
ductory course in quantum theory.
21
The study of general probabilistic theories [2, 28-40] has been developed in
recent years in order to capture and explore the idea of adopting a purely opera-
tional description of experiments. By “operational”, we mean that the objects of
the theory are meaningful only insofar as they influence experimental outcomes.
For example, if two states of a system cannot be distinguished via the statistics of
any measurement on that system, then they are to be regarded as the same state.
By making a sparse and reasonable set of operational assumptions, a mathematical
framework may be developed in which any model of nature may be represented,
as long as that model admits a basic notion of systems, whose states inform the
probabilities of measurement outcomes. Thus, quantum theory and classical the-
ory (the meaning of which we will be more precise about in Chapter 2) each have
their place in the set of general probabilistic theories, as do a whole host of theories
which are capable of producing a greater amount of non-locality than is achievable
in quantum theory.
A surprising number of features of quantum theory, which have previously
been regarded as especially quantum, turn out to be quite generic within this larger
set of general probabilistic theories. Features such as mixed states, entanglement,
non-locality, and the use of the tensor product for composite systems in fact apply
to all general probabilistic theories except classical theory, and can be seen as fol-
lowing from natural assumptions about measurement outcomes [29, 32, 41]. The
No-Cloning and No-Broadcasting Theorems have analogues which also hold in all
non-classical general probabilistic theories [31], for which the proofs are as intu-
itive as the quantum theoretic proofs (if not more so) and from which the quantum
theoretic versions easily follow. Even quantum information theoretic protocols are
fair game: the result that two copies of a Werner state cannot be deterministically
purified follows from a more general result that holds in all general probabilistic
theories satisfying a few reasonable conditions. [42]. Clearly, general probabilistic
theories provide a valuable shift in perspective which allows for novel and illumi-
nating proofs. In this thesis we hope to convince the reader that it is also helpful
in shaping our understanding of fundamental concepts in quantum theory.
22
1.1 Overview of the Thesis
This thesis is divided into a further five chapters. In Chapter 2 we introduce and
develop the framework of general probabilistic theories. Chapter 2 can be further
divided into two parts: firstly, the characterization of a general probabilistic theory
as a set of constraints on probability distributions over experimental outcomes, and
secondly, an explanation of how such a theory can be represented using the math-
ematics of convex subsets of vector spaces. In both cases, we provide archetypal
examples in order to clarify the meaning and importance of the concepts intro-
duced. No new results are included in this chapter, however the author is not
aware of a similarly comprehensive presentation of the mathematical framework
of general probabilistic theories in the literature, which has an overview of the
relevant convex and conic geometry, and encompasses individual and composite
systems.
In Chapter 3, we derive a set of results that apply when one relaxes the positiv-
ity of outcome probabilities, and demonstrate that this allows for a “quasi-local”
simulation of arbitrarily strong non-local states. In many situations, a local model
is more convenient, since it bears similarity with the well-explored and intuitive
territory of classical probability theory. We also discuss how quasiprobabilities,
despite their counter-intuitive nature, have been been useful in the history of quan-
tum mechanics, and may be used in future. The content of Chapter 3 is based
mostly on a collaboration with Anthony J. Short [43], with additional discussion
and some extended results.
In 2005,Wim van Dam demonstrated that superstrong non-locality is sufficient
to reduce the communication complexity of any distributed computation to a triv-
iality, revealing a close relationship between non-locality and information theory.
From the perspective of general probabilistic thoeries, this result gives a neat phys-
ical principle which allows one to “rule out” correlations which maximally violate
Bell's inequality, although not those which violate it to a non-maximal, yet post-
quantum - degree. Ruling out this latter class of correlations has been achieved
by a recent principle known as Information Causality; in Chapter 4, we examine
23
Information Causality from two distinct perspectives. Firstly, by analysing the
probability of success in the Information Causality game and a closely related “in-
ner product game”, we prove that a bound on useful entanglement for the task of
random access coding given in [3] is tight, as conjectured, and derive a quadratic
bias bound on measurement probabilities which seems to capture a great deal of
information about the structure of the set of quantum-achievable correlations. Sec-
ondly, we show that Information Causality might be more easily interpreted when
reformulated in terms of entropies rather than mutual information, and show that
the existence of a sensible measure of entropy is just as powerful in helping to
“rule out” very non-local correlations. Chapter 4 features a detailed review of the
main result of [1]; the remaining results are based on a collaboration with Anthony
J. Short [44], with an extended discussion of the results.
Quantum phenomena such as entanglement and the tensor product are now
seen to be generic features of general probabilistic theories. To understand what
makes quantum theory stand out from other general probabilistic theories, it is
beneficial to focus on the those features of quantum theory that appear to be rare
amongst this set. One such feature is the ability to transform any pure state to
any other via a reversible transformation. In Chapter 5, we develop the study of
reversible dynamics in the general probabilistic theory known as Boxworld. In
order to develop the intuition behind our results, we make use of a matrix-like
formulation of Boxworld states and effects which was introduced in [5] to study
measurements in Boxworld. We show that the reversible dynamics of composite
Boxworld systems are composed merely of relabellings of measurement choices
and outcomes, and permutations of subsystems. This extends a remarkable result
by Gross et al. [4], which applies to Boxworld systems made up of identical sub-
systems. We then discuss the potential for, and the potential limits of, applying the
techniques we develop to large classes of general probabilistic theories. This chap-
ter features a review of the original result [4], a development of the above men-
tioned matrix-like formalism for general probabilistic theories introduced in [5],
and the results obtained in [45], in collaboration with Anthony J. Short.
24
In the final chapter, we summarise the results obtained in Chapters 3 - 5, draw
out parallels and contrasts between their approaches, and discuss how future work
might build upon this research.
25
26
Chapter 2
General Probabilistic Theories
In other contexts, physicists have been able to take words from ordi-
nary language and use them as technical terms with no great harm
done. Take for example the “strangeness”, “charm”, and “beauty”
of elementary particle physics. No one is taken in by this “baby
talk”…. Would that it were so with “measurement”. But in fact the
word has had such a damaging effect on the discussion, that I think it
should now be banned altogether in quantum mechanics.
“Against “Measurement””
John S. Bell
27
As a model of nature, quantum theory has met with unrivaled experimental
verification and widespread acceptance. Part of its appeal lies in its unified, coher-
ent account of various phenomena which are inexplicable in the classic Newtonian
world-view. Whether talking about the position of a particle or the polarisation of a
light-wave, experimental observations in quantum theory are predicted according
to a standardised set of operations, involving a standardised set of mathematical
objects. Whatever else it is, quantum theory is unarguably a useful and powerful
tool for calculating the probabilistic outcomes of experiments.
Yet the underlying theory as it is usually presented - the quantum theory of
Hilbert spaces and positive definite operators - is plagued with a multitude of in-
terpretational issues. It is natural to wonder whether many of these these issues
are due simply to the language in which quantum theory is currently written, or
whether they are somehow inherent in the probabilities that quantum theory pre-
dicts, and thereby intrinsic to the universe. Recent years have seen the develop-
ment of general probabilistic theories [2,28-40], providing a framework in which
the outcome statistics of experiments become the main focus of attention rather
than the underlying physical theory which actually generates those statistics, be it
Newtonian physics, quantum theory, or some post-quantum theory that has yet to
be discovered. A major benefit of this abstract perspective is that one can make
statements not just about quantum theory, but about any conceivable physical the-
ory that makes probabilistic predictions about experimental outcomes, including
those that may eventually supersede quantum theory. This has profound impli-
cations for the study of nature: as discussed in Chapter 1, Bell's famous theorem
concerning local theories tells us not only that quantum theory is non-local, but
that any theory which generates the same predictions is non-local, and hence na-
ture itself is non-local.
In this chapter we introduce the framework of general probabilistic theories in
two stages. Sections 2.1-2.4 make up the first stage, in which hypothetical physi-
cal theories are characterised by the outcome distributions which they predict are
achievable in many-party experiments. Section 2.5 makes up the second stage, in
28
which we borrow various results from the mathematics of convex vector spaces
in order to construct the framework of convex vector space representations. This
framework provides a neat geometric picture of outcome distributions, rendering
general probabilistic theories in a language that permits a muchmore general treat-
ment of states, measurements and transformations. This allows for many features
of quantum theory to be generalised to the full set of general probablistic theories,
and consequently allows for the comparison of quantum theory to other theories.
2.1 Bell's Theorem and the CHSH game
In this section we outline a proof of Bell's Theorem and provide a more rigorous
footing to non-locality. Althoughmost readers will be familiar with the arguments,
it is useful to introduce the language and notation that will be employed throughout
this thesis. One of the simplest reformulations of Bell's Theorem, introduced by
Clauser et al. in 1969 [46], introduces a “CHSH value”, a measure of the correla-
tions on the inputs and outputs of a very simple 2-player game, the “CHSH game”.
The basic set-up for this game is as follows: suppose that Alice and Bob choose,
independently and uniformly at random, one of two inputs, and without communi-
cating, generate in some way one of two possible outputs. It is convenient to label
both the inputs and outputs using binary digits 0 and 1. Denoting Alice and Bob's
inputs by x, y ∈ {0, 1} respectively, and their outputs a, b ∈ {0, 1}, the statistics
of the game are characterised by the conditional probability distribution:
P (a, b|x, y) . (2.1)
The CHSH value is a linear function of this binary, bipartite conditional prob-
ability distribution, and plays the role of the figure of merit in the game: the value
which Alice and Bob are both attempting to maximise. In order to define it, it
is first convenient to introduce a measure of the correlation of the outputs for a
29
specific choice of inputs x and y:
Exy = P (0, 0|x, y) + P (1, 1|x, y)− P (0, 1|x, y)− P (1, 0|x, y)
= P (a = b|x, y)− P (a 6= b|x, y) . (2.2)
(Note that if we had labelled the outputs as 1 and−1 instead of 0 and 1 respec-
tively, then Exy would be the expected value of the product a · b, given the choices
x and y). The CHSH value is the following linear function of correlators:
C = E00 + E01 + E10 − E11. (2.3)
Thus, in order to maximise the CHSH value, Alice and Bob should attempt to
produce correlated outputs whenever either x or y is equal to 0, but produce anti-
correlated outputs whenever x = y = 1. Writing Exy = 2P (a = b|x, y) − 1 =
1−2P (a 6= b|x, y), and using the fact that Alice and Bob each choose their inputs
independently and uniformly randomly, the CHSH expression may be simplified:
E00 + E01 + E10 − E11
= 2 (P (a = b|0, 0) + P (a = b|0, 1) + P (a = b|1, 0) + P (a 6= b|1, 1))− 4
= 8P (a⊕ b = x · y)− 4, (2.4)
where “⊕” and “·” denote addition and multiplication modulo 2.
Suppose that in order to process their inputs and generate their outputs, Al-
ice and Bob perform measurements on a bipartite system, such that the particular
measurement performed is determined by their inputs, and the outcome of themea-
surement determines their outputs. Despite having used the word “system” with
all its connotations, we are making no claims about what kind of system is being
used. For example, we are not committed to the measurement of a quantum sys-
tem, although the reader is welcome to imagine that this is the case. We must also
give heed to Bell's quotation at the beginning of this chapter, and be careful with
what we mean by the word “measurement”. It is not assumed that a pre-existing
30
property of the system is being revealed, merely that each subsystem interacts with
some classical system representing the input, and some classical system represent-
ing the output. For the moment, as long as a single, definite value is assigned to
the input and output for each run of the experiment, then what happens inbetween
is none of our concern.
Now, if Alice and Bob's subsystems were to operate completely independently
of one another, then we should expect that outcome probability depends solely on
the local input, and that the joint output probability is given by the product of the
local probabilities,
P (a, b|x, y) = P (a|x) · P (b|y) . (2.5)
If the two subsystems are physically separated at the time of measurement, and we
have no reason to believe they are related by some common cause, then we would
expect the statistics to behave as a product distribution, obeying (2.5).
Now suppose that we want to allow for the fact that a common cause is influ-
encing the statistics of both subsystems, despite them being separate at the time of
measurement. Perhaps, for example, they interacted in the past in such a way that
the outputs are random but perfectly correlated. We can codify this by introducing
a random variable λ, which takes values in some setΛwith probability PΛ(λ), and
such that given any individual value of λ, the subsystems behave independently,
i.e. P (a, b|x, y) = P (a|x, λ) ·P (b|y, λ). The actual experiment statistics are now
obtained by averaging over λ,
P (a, b|x, y) =
∑
λ∈Λ
PΛ(λ) [P (a|x, λ) · P (b|y, λ)] . (2.6)
Naively, we might expect that this state of affairs completely characterises the
outcome statistics that are possible on physically separated subsystems. If neither
can be influenced directly by the other then, up to some common cause, we expect
them to behave independently. In short, we expect the outcome distribution to be
local, one that can be written in the form 2.6 for some set of local variables Λ.
However, it can be shown that any local outcome distribution is itself merely an
31
average over the set of deterministic product distributions (see Section 4.1 for a
rigorous proof of this) - there are only 16 such distributions, and a straightforward
check demonstrates that all of these obeyP (a⊕b = x·y) ∈
{1
4 ,
3
4
}
, therefore have
CHSHvalues C = ±2. Themaximal CHSHvalue obtainable by local distributions
is therefore 2.
We now show that Alice and Bob may achieve a CHSH value greater than 2
by employing quantum measurements on a shared entangled state. Let |ψ〉 be the
maximally entangled state
(
|0〉|0〉+|1〉|1〉√
2
)
on a bipartite system composed of two
qubits, and suppose that Alice and Bob's inputs x and y determine which measure-
ment they make on their individual qubit. We will consider measurements of the
form n·σ for some unit vector n, whereσ = (σx, σy, σz) is a vector whose compo-
nents are the 2-dimensional Pauli operators, and employ the following expectation
values:
〈ψ|σx ⊗ σx|ψ〉 = 1
〈ψ|σx ⊗ σz|ψ〉 = 0
〈ψ|σz ⊗ σx|ψ〉 = 0
〈ψ|σz ⊗ σz|ψ〉 = 1
Note that n ·σ is itself a Hermitian operator with eigenvalues±1: suppose that
Alice chooses from two possible measurement operators {A0, A1}, and B chooses
from {B0, B1}, and each measurement operator is of the form n · σ. Suppose
further that their outputs a and b are 0 or 1 depending on whether the outcome of
the measurement is +1 or −1 respectively on Alice's or Bob's subsystem. Recall
that in this case the correlatorExy is the expectation of the product of the outcomes,
Exy = (−1)a+b, which can also be written as 〈ψ|Ax ⊗By|ψ〉.
32
Suppose that Alice and Bob make the following measurement choices:
Input 0 1
Alice σx σz
Bob
(
σx+σz√
2
) (
σx−σz√
2
)
(2.7)
Observe that the unit vector n is always zero in the y-coordinate, and the operators
Ax and By are represented by unit vectors in the two-dimensional plane spanned
by σx and σz.
Figure 2.1: Unit vectors representing Alice and Bob's measurement choices.
The correlators can be calculated using the above expectation values and lin-
33
earity of the inner product:
E00 = 〈ψ|A0 ⊗B0|ψ〉 =
1√
2
E01 = 〈ψ|A0 ⊗B1|ψ〉 =
1√
2
E10 = 〈ψ|A1 ⊗B0|ψ〉 =
1√
2
E11 = 〈ψ|A1 ⊗B1|ψ〉 = −
1√
2
giving C = E00 + E01 + E10 − E11 = 4√2 = 2
√
2 > 2. This simple quantum me-
chanical operation on entangled qubits generates a correlation between two distant
parties that is impossible under the assumption that measurements made at physi-
cally separate locations are governed by local distributions.
The beauty of the CHSH value is that it is an easily calculated quantity which
provides a succinct demonstration of how quantum theory departs from the clas-
sical regime, as well as a rough measure of the strength of non-locality which is
achievable in any general probabilistic theory. By exploring the maximum obtain-
able value that a theory allows for quantities such as the CHSH value, we can set
up a hierarchy of general probabilistic theories, based on the degree of non-locality
they allow for. Local theories cannot exceed a CHSH value of 2, whereas this may
be surpassed by quantum theory, up to a maximum of 2
√
2 [47]. These values lie
well below the algebraic maximum of 4, which is achieved in the maximally non-
local theory known as “Boxworld” [48].
2.2 Black boxes and non-signaling distributions
In this section we attempt to extract some of the insights of Bell's Theorem, and
begin to explore what can be said about Nature purely in terms of experimental out-
comes. Recall our discussion in the previous Section of the word “measurement”,
in particular that we are not too concerned with what is really going on between
34
the inputs and the outputs. In fact, it is useful at this stage to think of a system as
a “black box” into which the inputs go, and out of which come the outputs. When
we later introduce general probabilistic theories, it may be helpful to think of each
system as just such a black box; it may also be surprising, from this perspective, to
see just how much of the seemingly internal workings of quantum theory admits a
faithful operational analogue, i.e. how much can be described purely in terms of
input/output probabilities.
A basic set of physical notions, or operational primitives, is clearly required
in order to say anything about experimental outcomes: we assume that there exist
various types of system, that each systemmay be prepared in one of various states,
and that one of various measurements may be performed on a system. It goes
without saying that the possible states and measurements depend on the type of
system in play, and that the likelihood of a given outcome occurring is governed
by the state.
In adopting an operational attitude, we are interested not in the intrinsic qual-
ities of states of systems, such as their Hilbert-space dimension, but merely its
observable qualities, i.e. how the state informs the outcomes of measurements.
It is therefore useful to introduce the concept of an effect, which is no more than
the occurrence of a particular outcome. An effect must belong to the set of pos-
sible outcomes for at least one measurement, but in principle multiple measure-
ments may share a single effect. The state of a system is completely and uniquely
characterised by the probabilities it assigns to individual effects; any two states
which assign identical probabilities to the set of effects are considered to be the
same state. This assumption neatly encapsulates the operational stance of general
probabilistic theories: we should not suppose there to exist distinct states which
nevertheless have exactly the same observable qualities.
It is common at this point to make a number of finiteness assumptions; these
bring convenient simplifications, andmay generally be justified on physical grounds.
Firstly, a measurement comprises a finite set of effects, so that states assign dis-
crete (rather than continuous) outcome distributions - probability distributions over
35
the outcome-sets of measurements. This restriction makes sense in the context of
real-world quantum measurements: the result of an experiment, even after a large
number of repeats with extremely refined instruments, can only be recorded to a
certain number of significant figures. Although the underlying theory may pre-
dict a continuum of outcomes, for example in infinite dimensional Hilbert spaces,
the outcome set is in practice always restricted to a finite set. Even if we did have
access to infinitesimally accurate instruments, it would still be of interest to under-
stand measurement statistics resulting from finite dimensional quantum systems,
and this restriction would still be a reasonable one to make.
A second finiteness assumption is that for any system there exists a finite set
of fiducial measurementswhose outcome distributions suffice to uniquely identify
the state of that system. The effects belonging to fiducial measurements are known
as fiducial effects; a state is therefore completely characterised by the probabilities
it assigns to the finite set of fiducial effects. This means that we restrict ourselves
to considering only the statistics generated in finite dimensional quantum theory.
Although this is not fully general, many of the quantum phenomena we wish to
illuminate, such as entanglement, non-locality and non-contextuality, occur just as
prominently in finite dimensions. In this thesis we are interested in seeing what
can be demonstrated in this simpler setting, and leave the infinite-dimensional
considerations to others.
We now make rigorous the above descriptions of measurement and outcome.
An individual system, in which an experiment is modelled, allows for a certain
number of measurement choices, indexed by a variable x which takes values in
the finite set {1, . . . ,M}. For each measurement choice x, a certain number of
outcomes are permitted, indexed by a variable a which takes values in the finite
set {1, . . . , Kx} (note that the number of possible outcomes may depend on which
measurement is performed). For convenience we will make an exception to this
rule in the case thatM = K = 2, and will instead label the measurement choices
and outcomes using the set {0, 1}. An experiment involving this system is char-
acterised by a set of real numbers P (a|x), giving the probability that a is output,
36
conditioned on x having been input (see Figure 2.2).
Figure 2.2: “Black box” description of experimental statistics.
Now consider a multipartite system comprising N subsystems, each of which
can be regarded as an individual system in its own right, with its own set of mea-
surements. Suppose that the measurements at subsystem i are indexed by some
variable xi which takes values in the finite set {1, . . . ,M (i)}. Each measurement
choice allows for one of a finite number of outcomes, indexed by the variable ai
which takes values in the finite set {1, . . . , K(i)xi }. Imagine that a measurement is
performed individually on each subsystem, simultaneously and without any com-
munication between subsystems. This global measurement on the multipartite
system is described by a set of real numbers P (a1, . . . , aN |x1, . . . , xN), giving
the probability that outcomes a1, . . . , aN occur at systems 1, . . . , N respectively,
conditioned on measurement choices x1, . . . , xN having been performed on those
systems (see Figure 2.3).
Just as a specification of the outcome distributions P (a|x) uniquely charac-
terises the experiment on a single system, we assume that a specification of the
values P (a1, . . . , aN |x1, . . . , xN) is sufficient to characterise the multipartite ex-
periment. This assumption, known variously as local tomography [2, 34], local
distinguishability [33], the Local Observability Principle [41] or the Global State
37
Figure 2.3: Multipartite experiment composed of N black boxes.
Assumption [29], asserts that there are no additional degrees of freedom in a com-
posite system which are not captured by the differing measurement choices for the
individual subsystems.
The values P (a1, . . . , aN |x1, . . . , xN) correspond to a conditional probability
distribution, and in order for them to be physically meaningful they must obey the
normal laws of probability. Since each individual value is a probability, it must be
positive:
0 ≤ P (a1, . . . , aN |x1, . . . , xN) . (2.8)
For any fixed choice of measurements x1, . . . , xN on all subsystems, the probabil-
ities for all possible values of outcomes a1, . . . , aN must sum to 1, i.e. the outcome
distribution for that choice of local measurements is normalised:
∑
a1,...,aN
P (a1, . . . , aN |x1, . . . , xN) = 1. (2.9)
(This also implies that each value P (a1, . . . , aN |x1, . . . , xN) is at most 1).
We have stated that the conditional probability P (a1, . . . , aN |x1, . . . , xN) re-
sults from individual measurements made on separate subsystems, although we
have not defined what we mean by “separate”. This operational framework con-
cerns only experimental outcomes, and has no a priori notion of distance or simul-
taneity. Nevertheless, our goal is to discuss the outcomes of experiments made in
the real world, and so we make the assumption that information cannot be trans-
38
mitted from one set of subsystems to another through the process of measurement.
A justification for this is that the subsystems may represent physically separated
real-world structures; since the local measurements in principle may be carried
out simultaneously, there should be no influence from one set of subsystems to
another during the measurement procedure.
To ensure that information cannot be sent between the subsystems, we demand
that a no-signaling condition is satisfied by the outcome distribution P . This con-
dition asserts that the outcome statistics for any subsetΩ of theN subsystemsmust
not be affected by measurement choices made on those subsystems not belonging
to Ω, i.e. there is no way for one subsystem to signal information to any other
subsystems. In mathematical terms, we demand that for all Ω ⊂ [N ], the reduced
value,
∑
ai:i/∈Ω
P (a1, . . . , aN |x1, . . . , xN) , (2.10)
is well-defined and independent of the value of each xi for systems i /∈ Ω. If
the no-signaling condition is obeyed by P , then we say that P is a non-signaling
distribution. Non-signaling distributions have the following property: whatever
the choice of measurement made by systems outside of any subset Ω, the marginal
outcome distribution on the systems inside Ω is invariant. In fact, the no-signaling
condition is equivalent to the existence of a well-defined reduced outcome distri-
bution for any subset of N ; writing Ω = {i1, . . . , ir} ⊂ N , the reduced outcome
distribution is defined:
P (ai1 , . . . , air |xi1 , . . . , xir) =
∑
ai:i/∈Ω
P (a1, . . . , aN |x1, . . . , xN) . (2.11)
It should be stressed that the no-signaling condition is distinct from the impos-
sibility of superluminal propagation of information or matter, which holds in the
special theory of relativity. The no-signaling condition is a reasoned assumption
stated in terms of outcome statistics, and does not rely on any concept of space-
time. On the other hand, the impossibility of superluminal signaling is deduced
39
from the principles of special relativity, and is therefore true in our best model of
spacetime.
It is difficult to imagine reasonable theories of nature which allow for states
that violate the no-signaling condition. The possibility that the reduced state of
a subsystem might immediately change according to choices made in a distant
location violates any intuitive feeling one has about what the reduced state of a
subsystem even means. Conversely, it is not difficult to conceive of theories in
which superluminal propagation is in principle possible: non-relativistic quantum
mechanics provides one such example, in which the no-signaling condition holds
but there is no reason that the movement of a particle should be hampered by the
speed of light.
In order to simplify notation, we write a and x for the strings (a1, . . . , aN)
and (x1, . . . , xN). When discussing non-signaling distributions we use the no-
tation aΩ and xΩ for the “reduced strings” (ai1 , . . . , air) and (xi1 , . . . , xir), and
write P (aΩ|xΩ) for the reduced outcome distribution defined by (2.11). The no-
signaling condition can then be expressed in a compact form: for any Ω ⊂ [N ],
measurement-choice strings x, x′ satisfying xΩ = x′Ω, and fixed choice of outcomes
aΩ,
∑
ai:i/∈Ω
P (a|x) =
∑
a′i:i/∈Ω
P (a′|x′) = P (aΩ|xΩ) , (2.12)
where a′ denotes the string whose components are ai for i ∈ Ω and a′i for i /∈ Ω.
The following Proposition demonstrates that in order to checkwhether a known
outcome distribution is non-signaling, it suffices to verify that the non-signaling
condition holds for all Ω with |Ω| = 1, i.e. that it is not possible for any single
subsystem to signal to the rest via the choice of measurement on that subsystem.
Proposition 1. Suppose that the outcome distribution P (a|x) satisfies the condi-
tion that for all subsystems i and distinct measurement choices xi 6= x′i on subsys-
40
tem i,
Kxi
∑
ai=1
P (a1, . . . , ai, . . . , aN |x1, . . . , xi, . . . xN)
=
Kx′i
∑
a′i=1
P (a1, . . . , a′i, . . . , aN |x1, . . . , x′i, . . . xN) . (2.13)
Then P is a non-signaling outcome distribution.
Proof. Suppose that x and x′ are measurement-choice strings, satisfying xΩ = x′Ω
for a given subset Ω of the N subsystems. Fix a choice of outcomes aΩ for the
subsystems belonging toΩ. Wewish to show that summing over the set of possible
outcomes for all subsystems not inΩ leads to the same value, whether the outcome
distribution is taken with respect to x or with respect to x′, i.e.
∑
ai:i/∈Ω
P (a|x) =
∑
a′i:i/∈Ω
P (a′|x′) . (2.14)
As strings of numbers, x and x′ differ in at most s = N − |Ω| components (those
not belonging to Ω). Writing [N ] \ Ω = {i1, . . . , is}, it is possible to change s
components of x one at a time, terminating at x′:
x = x0 → xi1 → · · · → xis = x′. (2.15)
However, by varying the order of summation on the LHS of (2.14) and using
41
property (2.13), we have that,
∑
ai:i/∈Ω
P (a|x) =
∑
ai:i/∈Ω∪{i1}
∑
ai1
P (a|x)
=
∑
ai:i/∈Ω∪{i1}
∑
ai1
P
(
a|xi1
)
=
∑
ai:i/∈Ω∪{i2}
∑
ai2
P
(
a|xi2
)
... ...
=
∑
ai:i/∈Ω∪{is}
∑
ais
P
(
a|xis
)
=
∑
a′i:i/∈Ω
P (a′|x′) . (2.16)
2.3 Examples of probabilistic theories
Any theory of nature must provide some means for predicting experimental out-
comes. By virtue of this fact, any experiment described by that theory may con-
sequently be described by a “black box” scenario as outlined in the previous Sec-
tion, in which we ignore the internal workings of the box (i.e. the mathematical
operations which generate predictions in that theory). A state in the theory must
have some mechanism for specifying outcome probabilities for any measurement
allowed by the theory; thus it assigns probabilities to effects, or possible measure-
ment outcomes. Furthermore, we assume that any two distinct states must be dis-
tinguishable via their outcome statistics for at least one measurement; thus distinct
states assign distinct probabilities to at least one effect. As we have previously dis-
cussed, we will assume that there exists a finite set of fiducial measurements, each
consisting of a finite set of fiducial effects, such that distinct states assign distinct
probabilities to at least one fiducial effect.
42
Note that the positivity (2.8), normalisation (2.9) and non-signaling (2.10) con-
ditions are all clearly preserved under convex combinations. That is to say, if the
number of subsystems, measurements and outcomes is fixed, the outcome distri-
butions P1, . . . , Pr satisfy all three conditions, and {p1, . . . , pr} is any probability
distribution, then the outcome distribution defined by:
P (a|x) =
r
∑
i=1
piPi (a|x) , (2.17)
also satisfies all three conditions. Hence the set of non-signaling outcome distri-
butions forms a convex set, which we will explore in more detail in Section 2.4.
It is often convenient to specify a physical theory by explicitly delineating the
set of outcome distributions which are obtainable within that theory; generally
speaking, this involves a threefold specification: firstly, which types of individual
system are described by the theory, including their fiducial measurements; sec-
ondly, for each individual system a specification of which outcome distributions
are obtainable on the fiducial measurements; and finally a specification of which
joint distributions are permitted in composite systems. A general probabilistic
theory is just such a specification of systems and outcome distributions. We will
demand that the set of distributions in a general probabilistic theory is always con-
vex: the reason for this is that for any set of states -- with outcome distributions
P1, . . . , Pr -- that can be prepared in a system, it should be possible to prepare
the state whose outcome distribution is Pi with probability pi, so that (2.17) gives
the outcome distribution after such a preparation, and hence also corresponds to a
state in the theory. By analogy to the pure states of quantum theory, the extreme
states of a general physical theory are those that give rise to extremal points of the
convex set of outcome distributions: they are those states which cannot be written
as a convex sum of distinct states of the theory (see Section 2.5.1 for a rigorous
treatment of these notions).
It should finally be noted that there is a distinction between “black boxes” and
full general probabilistic theories. The black box scenario is simply a descrip-
43
tion of inputs, outputs, and the subsequent outcome distributions. This is useful,
for example, for defining the non-signaling principle and the CHSH value purely
in terms of such distributions. General probabilistic theories, on the other hand,
whilst often described in terms of outcome distributions, include a characterisa-
tion (usually based on physical principles) of which local and composite outcome
distributions are possible, as well as concepts like states, effects, fiducial measure-
ments and so on. As in quantum theory, general probabilistic theories often allow
for measurements of effects other than fiducial effects.
2.3.1 Classical theory
In classical theory, individual systems have a single fiducial measurement, and
any outcome distribution on that measurement constitutes a permitted state. The
fiducial measurement captures full information about the system. An example of
a classical system is a die: the fiducial measurement corresponds to observing the
upturned face and a fair die (once rolled) represents the uniformly distributed (or
completely mixed) state. A general state of the system is given by a probability
distribution (p1, . . . , p6). The extreme states are the deterministic states si, which
give outcome iwith certainty, e.g. the state s1 = (1, 0, . . . , 0): it is clear that these
cannot be written as a convex combination of distinct states, and conversely that
any state is a convex combination of the extreme states.
The joint states of composite classical systems are convex combinations of
product states. Product states are given by a product of outcome distributions at
each site:
P (a|1) = P (1) (a1|1) · · ·P (N) (aN |1) =
N
∏
i=1
P (i) (ai|1) , (2.18)
where xi = 1 for all i since we only have one fiducial measurement. Clearly, this
distribution can be achieved simply by preparing the state P (i) at each subsystem
44
i. A convex combination of states of the form (2.18) can be described by a ran-
dom variable λ = (λ1, . . . , λN) and an associated probability distribution PΛ(λ).
Each λi is a local variable taking values from some finite set Λi, and each value
of λi corresponds to some outcome distribution P (i) (ai|1, λi) that is allowed at
subsystem i. The joint states of multipartite classical systems are exactly those of
the form:
P (a|1) =
∑
λ
PΛ(λ)
N
∏
i=1
P (i) (ai|1,λi) . (2.19)
2.3.2 Quantum theory
In quantum theory, outcome distributions are generated bymeasurements on quan-
tum states. A quantum system, withM fiducial measurements and Kx outcomes
for measurement x, consists of a finite-dimensional Hilbert space H and for each
x a set of POVM elements {Ma|x} satisfyingMa|x ≥ 0 and
∑Kx
a=1Ma|x = I. The
outcome distributions P (a|x) allowed in the system are generated from density
operators ρ ∈ D (H) according to the Born trace rule:
P (a|x) = Tr
(
Ma|xρ
)
(2.20)
On any finite-dimensional Hilbert space there always exists a finite set of fidu-
cial POVM measurements, such that distinct density operators will generate dis-
tinct probabilities for at least one fiducial POVM element [49]. In general, at least
D2 independent POVM matrices are required to specify a single density opera-
tor in a D-dimensional Hilbert space - however, the choice of POVM elements
and how they are split up into distinct measurements is not determined by the di-
mension D. In order to define a quantum system in terms of general probabilistic
theories, therefore, one needs to specify both the dimension of the Hilbert space,
and the particular fiducial measurements being employed.
45
The simplest example of a quantum system is a qubit, with one possible set
of fiducial measurements being the three projective measurements onto the eigen-
bases of the Pauli operators σx, σy and σz. Listing these measurements as x = 1, 2
and 3 respectively, and listing their two possible outcomes as a = 1 or 2, the state
of the qubit is completely specified by the vector:
(
P (1|1) , P (2|1) P (1|2) , P (2|2) P (1|3) , P (2|3)
)
. (2.21)
A qubit that is in the “up” state of the x-direction is represented by the vector
(1, 0|12 ,
1
2 |
1
2 ,
1
2), whereas the vector (1, 0|1, 0|1, 0) is not achievable by any qubit
state.
Multipartite quantum systems consist of a collection of N quantum systems
with Hilbert spaces H1, . . . ,HN , and with the fiducial measurements at system i
given by the POVM operators {M (i)ai|xi}. The set of multipartite outcome distri-
butions on the composite system is generated by local POVM measurements on a
joint quantum state ρ ∈ D (H1 ⊗ · · · ⊗ HN):
P (a|x) = Tr
((
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN
)
ρ
)
. (2.22)
Any outcome distribution resulting from a multipartite quantum measurement
is non-signaling: this is due to the fact that
∑
ai M
(i)
ai|xi = I
(i), regardless of i and
the choice of measurement xi. Indeed, fixing i and xi, we see that:
Kxi
∑
ai=1
Tr
((
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN
)
ρ
)
= Tr



M (1)a1|x1 ⊗ · · · ⊗


Kxi
∑
ai=1
M (i)ai|xi

⊗ · · · ⊗M (N)aN |xN

 ρ


= Tr
((
M (1)a1|x1 ⊗ · · · ⊗ I
(i) ⊗ · · · ⊗M (N)aN |xN
)
ρ
)
, (2.23)
which is clearly independent of the vaule of xi. It follows from Proposition 1 that
46
P (a|x) is non-signaling.
The set of outcome distributions generated by a fixed composite quantum sys-
tem is subtly different from the set of non-signaling outcome distributions (with
a set number of measurement choices and outcomes) which can be achieved by
measurements of arbitrary quantum systems. In the former case, we have already
explicitly defined each subsystem (i.e. their Hilbert spaces and fiducial measure-
ments); on the other hand, we may say that an outcome distribution P (a|x) for
some pre-defined number of subsystems, measurement choices and outcomes is
quantum achievable if there exist some set of finite-dimensional Hilbert spaces
H1, . . . ,HN , POVM operators {M (i)ai|xi} and some state ρ ∈ D (H1 ⊗ · · · ⊗ HN)
such that (2.22) holds.
2.3.3 Boxworld
In an individual Boxworld system, any set of distributions over the outcomes of
the fiducial measurements is permitted: the state space is constrained only by
the positivity and normalisation conditions: any conditional probability distri-
bution P (a|x) corresponds to an allowed state. Thus the outcome distribution
(1, 0|1, 0|1, 0) which was not allowed in the qubit system, is a perfectly valid
Boxworld distribution. The set of Boxworld distributions for a single system is
a convex polytope, whose extremal vertices are the deterministic states, i.e. those
which, conditional on the measurement choice, produce one output with certainty.
AnyBoxworld systemwith only one fiducial measurement is equivalent to a classi-
cal system. The Boxworld system with two fiducial measurements, each of which
has two possible outcomes is referred to as a g-bit system, which stands for gener-
alised bit. This systemmay loosely be seen as a Boxworld analogue of the classical
bit and quantum qubit systems.
As with classical theory, the joint state space of composite Boxworld systems
should include at least those states which are convex combinations of products
of local states. A multipartite Boxworld state with outcome distribution P (a|x) is
47
local if there exists a random variableλ = (λ1, . . . , λN) taking values in a product
of finite sets Λ1 × · · · ×ΛN , an associated probability distribution PΛ(λ), and for
each value of λi ∈ Λi a local outcome distribution P (i) (ai|x1, λi) such that:
P (a|x) =
∑
λ
PΛ(λ)
N
∏
i=1
P (i)(ai|xi, λi). (2.24)
The set of local Boxworld outcome distributions also forms a convex polytope,
whose extremal vertices are given by products of deterministic states; such states
are known as pure product states. Any outcome distribution which cannot be writ-
ten as a convex combination of pure product states (2.24) is said to be non-local.
A multipartite density operator ρ is said to be a local quantum state if any local
POVMmeasurement on ρ results in a local outcome distribution being generated:
thus, separable states are local quantum states, whereas the Bell states are non-
local quantum states (since there exist local POVM measurements on them which
violate the CHSH inequality).
Not all valid non-signaling distributions in composite Boxworld systems ad-
mit a local description (2.24). For example, the outcome distribution generated
by CHSH-violating measurements on a Bell state is non-signaling, but non-local.
Boxworld is the general probabilistic theorywhose systems are as described above,
and whose multipartite states are all those outcome distributions which obey the
positivity (2.8), normalisation (2.9) and non-signaling (2.10) conditions. Since
these three conditions are represented by sets of linear inequalities, the set of out-
come distributions in a composite Boxworld system forms a convex polytope. Any
composite quantum systemwith has the same number of fiducial measurements on
each local subsystem is capable of generating a subset of the Boxworld outcome
distributions (as we will see, this turns out to be a strict subset).
The first known example of a non-signaling, non-quantum-achievable out-
come distribution is the celebrated PR-box state on a bipartite system comprising
two g-bit subsystems [48]. It is convenient to label the measurement choices and
outcome sets of a g-bit as binary numbers {0, 1} rather than {1, 2}, though this
48
change in labeling does not affect the outcome probabilities, and therefore has no
operational significance. In binary notation, the PR-box is defined by:
P (a1, a2|x1, x2) =
1
2
δ(a1 ⊕ a2 = x1 · x2), (2.25)
where “⊕” and “·” denote addition and multiplication modulo 2. This outcome
distribution is clearly seen to be non-signaling, as the outcome on a single system
is always uniformly random, regardless of the measurement choice on that or the
other system. Note also that a1 and a2 are perfectly correlated for every choice of
fiducial measurement except for x1 = x2 = 1, in which case they are perfectly
anti-correlated. Hence the CHSH value takes its algebraic maximum of 4 on the
PR-box state:
C1 = E00 + E01 + E10 − E11 = 1 + 1 + 1− (−1) = 4. (2.26)
where Exy was defined in Section 2.1 as P (a = b|x, y)− P (a 6= b|x, y)
The PR-box state is in fact one of the extremal vertices of the convex polytope
of states described by Boxworld for two g-bit systems [30]. This is the simplest
example of a composite system in which the sets of locally achievable, quantum-
achievable and non-signaling outcome distributions are distinct; we explore the
geometry of these sets in more detail in Section 2.4. The state of a system of two
g-bits may be depicted by means of a table or matrix, whose rows correspond to
outcomes of fiducial measurements on system 1, andwhose columns correspond to
outcomes of fiducial measurements on system 2. The entry in the row correspond-
ing to (a1, x1) and column corresponding to (a2, x2) is the value P (a1, a2|x1, x2).
The PR-box state is then represented by the following table:
49
x2 = 0 x2 = 1
a2 = 0 a2 = 1 a2 = 0 a2 = 1
x1 = 0
a1 = 0 1/2 0 1/2 0
a1 = 1 0 1/2 0 1/2
x1 = 1
a1 = 0 1/2 0 0 1/2
a1 = 1 0 1/2 1/2 0
(2.27)
Increasing the number of measurements on a subsystem will generate more
blocks of the table, whereas increasing the number of outcomes specific to a mea-
surement will increase the size of the blocks corresponding to that measurement.
An arbitrary bipartite Boxworld system need not have the same number of blocks
across as it has down, and those blocks need not be square. Certain properties
of the table must remain invariant however: all entries must be positive, and the
sum of the entries in an individual block must be equal to 1 (due to the positivity
and normalisation conditions). Each row of the table is subdivided into a set of
smaller “block-rows”; in the above example the third row is subdivided into two
block-rows, (1/2, 0) and (0, 1/2). Due to the non-signaling condition the sum of
entries across each of the block-rows is invariant across a row (and is equal to
1
2 for all rows in the above PR-box example). The same is true for columns and
block-columns.
Whilst 2-dimensional tables are sufficient to describe all bipartite Boxworld
distributions, an n-partite Boxworld system will generally be described by an n-
dimensional array. This representation thus becomes infeasible for obvious rea-
sons when the number of subsystems exceeds 3; however, since many of the inter-
esting Boxworld phenomena occur with 3 or fewer subsystems, this does not pose
too great a problem.
50
2.4 The non-signaling polytope
The full set of outcome distributions P (a|x) for a composite Boxworld system
can be naturally represented by vectors, whose components are indexed by all
possible strings (a|x) of inputs and outputs at each subsystem. The allowed set
of non-signaling vectors then forms a bounded polytope, due to the linear con-
straints imposed by the positivity, normalisation and non-signaling conditions,
which is often known as the Non-signaling Polytope. In this naive representa-
tion of the Non-signaling Polytope, the vectors are of length
∏N
i=1
(
∑M(i)
xi=1Kxi
)
,
but generally belong to a lower-dimensional vector subspace. Even for the bipar-
tite g-bit system in which the PR-box lives, these vectors have 16 components, but
lie in an 8-dimensional vector subspace [30], which is, for example, orthogonal
to the vector e1 + e2 − e3 − e4, since the non-signaling condition demands that
P (0, 0|0, 0) + P (0, 1|0, 0) = P (0, 0|0, 1) + P (0, 1|0, 1).
The extremal local states of the Non-signaling Polytope, i.e. the pure product
states, are represented by vectors whose only non-zero entries are equal to 1. In a
system of two g-bits, exactly four of the entries of a pure product state are equal
to 1, since each of the four possible global measurement choices (x1, x2) results
deterministically in an outcome (a1, a2). The number of extremal local states is
equal to the product of the number of deterministic states on each individual g-bit
system. There are four deterministic states of a g-bit system, corresponding to the
four possible maps from {0, 1} to itself; this gives 16 extremal local states:
P µνστL (a1, a2|x1, x2) = δ
(
a1 = [µx1]⊕ [ν]
)
· δ
(
a2 = [σx2]⊕ [τ ]
)
(2.28)
µ, ν, σ, τ ∈ {0, 1}.
Such states, being {0, 1}-valued, are clearly extremal in the set of non-signaling
states; however, they do not constitute the entire set of extremal non-signaling
states. The PR-box state described in Section 2.3.3 is an example of an extremal
non-local vertex of the Non-signaling Polytope [30]. By performing a simple lo-
cal operation - for example, flipping the bit x1 at subsystem 1 - the PR-box is con-
51
verted into a new outcome distribution - in this case defined by P (a1, a2|x1, x2) =
1
2δ(a1 ⊕ a2 = [x1 · x2]⊕ x2). This is also an extremal state, in which the outputs
are anti-correlated for all choices of measurement except x1 = 0, x2 = 1. Via
this type of local operation on inputs and outputs, 8 distinct extremal non-local
distributions can be obtained:
P µνσNL (a1, a2|x1, x2) =
1
2
δ
(
a1 ⊕ a2 = [x1 · x2]⊕ [µx1]⊕ [νx2]⊕ [σ]
)
(2.29)
µ, ν, σ ∈ {0, 1}.
The CHSH value C1 takes a maximal value of 2 in the set of local outcome
distributions, a maximal value of 2
√
2 in the set of quantum-achievable outcome
distributions, and a maximal value of 4 in the full set of non-signaling outcome
distributions. Note that for each value of (x1, x2) the bias,
Ex1x2 = P (0, 0|x1, x2) + P (1, 1|x1, x2)− P (0, 1|x1, x2)− P (1, 0|x1, x2) ,
(2.30)
is a linear functional on the set of vectors, hence the CHSH value is itself a linear
functional. So too is the similarly defined figure of merit:
C2 = E00 − E01 + E10 + E11, (2.31)
for which the maximal values taken by local, quantum and non-signaling outcome
distributions are the same as for C1 (but are not necessarily achieved on the same
outcome distributions). For example, C1 achieves its algebraic maximum of 4 and
minimum of -4 on the non-local states P 000NL and P 001NL respectively, whereas C1
achieves the same values on the non-local states P 010NL and P 011NL respectively. All
other extremal non-local states achieve 0 for both C1 and C2, whereas all extremal
local states achieve±2 for both C1 and C2. By plotting C1 and C2 on x- and y- axes,
one obtains a 2-dimensional projection, or slice, of the Non-signaling Polytope
(see figure 2.4).
The set of quantum-achievable states in this projection is described by the for-
52
mula C21 + C22 ≤ 8. A derivation of this inequality from Information Causality is
given in Section 4.2.2, although it can also be derived from the fact that the set of
biases Exy is quantum achievable if and only if:
|E00E01 − E10E11| ≤
∑
y=0,1
√
(1− E20y)(1− E21y). (2.32)
Figure 2.4: 2-dimensional projection of the non-signaling polytope for two g-bits.
Many of the beguiling aspects of quantum non-locality are captured by the
simple diagram illustrated in Figure 2.4. The CHSH and Tsirelson inequalities
demonstrate that for some figures of merit, the optimal quantum value lies strictly
between the optimal local value and the maximum algebraic value. Figure 2.4
reveals a more complex story: even in this simple two g-bit case, the quantum
achievability of correlations describes a convex set whose boundary traces a mys-
terious path between the local boundary and the non-signaling boundary; touching
both at various points, but for the most part lying somewhere in the middle.
What governs the shape traced out by the quantum boundary? What aspects
53
of nature can elucidate its form, not just in the simple 2-dimensional slice of two
g-bit systems illustrated above, but for any number of systems and measurement
choices? A definitive answer to this question is not yet known, though various at-
tempts have led to partial answers which shed light on the issue, notably the prin-
ciples known as Macroscopic Locality [50] and Information Causality [51], and
arguments based on the non-local computation of binary-valued functions [52].
The author's own contribution to this research is described in Chapter 4, in which
an investigation of the Information Causality principle yields two distinct perspec-
tives on why the quantum boundary takes the shape that it does.
2.5 Convex vector space representations
In Section 2.4 we described a natural embedding of the states of a general proba-
bilistic theory into a vector space. This method is just one possible (and usually
inefficient) vector space representation of a system, although it does serve the
useful purpose of illustrating that such a representation always exists, and can be
realised in a real, finite-dimensional vector space as long as the set of fiducial
measurements is finite. Note that a very elegant description of this form of finite-
dimensional quantum theory has long been established: the well-known Bloch-
sphere representation of a qubit system [53], in which a 2-dimensional density
operator corresponds to a vector with modulus less than or equal to one. The
Bloch-sphere representation differs from the embedding discussed in Section 2.4,
in which each component is equal to a fiducial outcome probability; for example,
the completely mixed state is represented by a three-dimensional zero vector in
the Bloch-sphere, and by a six-dimensional vector with every entry equal to 12 in
the representation given in Section 2.4. Nevertheless, these representations are
closely related, and describe the same basic set of quantum states.
In order to understand the structure of a given general probabilistic theory, and
to compare it with quantum theory, it is beneficial to study vector space represen-
tations of that theory, thus describing it in a language closely resembling that of
54
quantum theory, about which we already have a good deal of intuition. By ex-
amining theories from this perspective, we can highlight those features which are
central to a good description of real life experiments, and those features which
differ markedly from quantum theory. It is often the case, for example, that quan-
tum phenomena which appear special when contrasted with classical physics are
partly or entirely generic in the set of general probabilistic theories; teleportation,
entanglement-swapping and the impossibility of cloning are but a few such phe-
nomena [31, 54].
On the other hand, many properties are particular to a select group of theories.
One such property is the existence of a rich class of reversible transformations,
such that a pure state of a system may be mapped to any other pure state by a
reversible transformation [2, 55] (in quantum theory, by a unitary operator). An-
other such property is self-duality, in which the set of vectors representing states
and the set of vectors representing effects are essentially the same [56, 57] (in
quantum theory, these are the positive operators of the underlying Hilbert space).
These properties are not at all obvious from the description of systems in terms of
fiducial measurements, but become much more approachable using vector space
representations. By skipping the black box scenario and constructing probabilistic
models directly from their vector space representation, it is straightforward to find
theories which share some but not all of the properties of quantum theory, so that
conjectures concerning these properties can be more easily generated and tested
(the so-called polygon models, which link non-locality and the property of strong
self-duality, are a good example [36]). Exploring which features of general proba-
bilistic theories are possessed by quantum theory helps to build our intuition about
its capabilities, which is an invaluable step towards understanding its information
processing potential.
Various constructions of general probabilistic theories and their vector space
representations exist, each differing slightly from the others due to different op-
erational assumptions, conventions or mathematical convenience. The treatment
given here tries to be as general as possible, whilst retaining the ability to describe
55
classical, quantum and Boxworld theories on an equal footing so that their respec-
tive features can be highlighted and contrasted. In Section 2.5.1 we introduce the
mathematical prerequisites for this framework, most of which belong to the realm
of convex geometry (a similar and very readable introduction can be found in [37]).
In Sections 2.5.2 and 2.5.3 we describe how one or more systemsmay be described
via a real, finite-dimensional vector space. In Section 2.5.4 we then discuss how
the notion of transformations can be introduced in this setting and demonstrate that
unlike quantum theory, Boxworld does not permit entanglement to be reversibly
generated. We then provide some concrete examples of these representations in
Section 2.5.5.
It is worth mentioning that the convex vector space approach to general prob-
abilistic theories is but one node in a network of efforts to put quantum theory on
a more fundamental footing than is provided by the complex Hilbert space for-
malism. Other notable instances include the quantum logic approach pioneered
by Piron, Foulis and Randall [58, 59], and the categorical approach pioneered by
Abramsky and Coecke [60]. The former of these sets a precedent for analysing
Hilbert-space quantum mechanics as a probability calculus with axioms differing
from standard probability but of interest in its own right. The latter formulates
a “categorical semantics” for quantum theory, in which quantum theory is inter-
preted as a dagger compact category, whose morphisms correspond to physical
processes. Both these approaches are novel, and present a fascinating vantage
point, but it can be argued that they lack the conservatism and intuitive physical
appeal of the approach we now outline.
2.5.1 Convex geometry
Much of the discussion of general probabilistic theories relies on fundamental con-
cepts arising in the geometry of convex subsets of real vector spaces. Our interest
in convexity stems from a simple argument that any set of vectors corresponding to
preparable states of a system must be convex: if the vectors s1 and s2 correspond
to two distinct states in which the system may be prepared, then ps1 +(1− p)s2 is
56
the state corresponding to having prepared the system in state s1 with probability
p, and in state s2 with probability 1− p.
Definition. Let C be a subset of Rn.
• The convex hull of C is conv(C) = {
∑m
i=1 λici : λi ≥ 0,
∑
λi = 1, ci ∈
C}.
• The conic hull of C is cone(C) = {
∑m
i=1 λici : λi ≥ 0, ci ∈ C}.
• C is convex if C = conv(C), and C is a cone if C = cone(C)
For example, the positive quadrant of Rn is convex and a cone. All cones are
convex, however bounded convex sets, such as the unit ball, are convex without
being a cone. In the Bloch-sphere, pure quantum states are assigned vectors on
the surface of the sphere: these are the extreme points of the sphere. In order to
generalise this notion to all general probabilistic theories it will be important to
define extreme points of convex sets, and extreme rays of cones.
Definition. Let C ⊂ Rn be convex;
• x ∈ C is an extreme point if, whenever c1, c2 ∈ C and 0 < λ < 1 are such
that x = λc1 + (1− λ)c2, then c1 = c2 = x.
• Denote by ext(C) the set of extreme points of C.
Definition. Let C ⊂ Rn be a cone;
• For x ∈ Rn, the ray generated by x is defined Rx = {λx : λ ≥ 0}. For
x ∈ C, Rx is an extreme ray if, whenever c1, c2 ∈ cone(C) and λ1, λ2 > 0
are such that λ1c1 + λ2c2 = x, then c1 and c2 both belong to Rx. We say
that x generates an extreme ray of C.
• Denote by extray(C) the set of extreme rays of C.
57
The following well-known Theorems concerning extreme points are very use-
ful when discussing convex sets. Recall that in Rn, C is compact iff it is closed
and bounded.
Theorem 1 (originally due to Minkowski, see [61]). Let C ⊂ Rn be compact and
convex. Then,
C = conv(ext(C))).
in particular, ext(C) is non-empty, and its convex hull is the whole of C.
Theorem 2 (Carathéodory, [62]). Let C ⊂ Rn be compact. Then conv(C) is also
compact.
Theorem 3 (Carathéodory, [62]). Let P ⊂ Rn and let x ∈ conv(P ). Then there
exists a subset Px ⊆ P with |Px| ≤ n+ 1 such that x ∈ conv(Px).
Clearly the unit sphere has infinitely many extreme points, whereas a tetrahe-
dron or a square has only finitely many. Theories in which sets of states possess
only finitely many extreme points are of interest since they are easy to define, and
yet their non-locality properties are often non-trivial [36].
Definition. A subsetC ⊂ Rn is a polytope (resp. polytopic cone) if it is the convex
(resp. conic) hull of finitely many points.
Theorems 1 and 2 tell us that equivalently, a compact, convex subset C ⊂ Rn
is a polytope if ext(C) is finite. Note also that if P is a finite set of points, then
ext(conv(P )) ⊂ P . Hence one way of defining a polytope is to give a finite set of
points which makes up its extreme points - this is known as the V-representation
of a polytope. Another standard way to define a polytope is by giving a finite
set of half-spaces of Rn whose intersection is bounded - this is known as the H-
representation of a polytope [63].
Definition. A half-space of Rn is a set of the form H = {v ∈ Rn : 〈x, v〉 ≤ b}
for some x ∈ Rn and b ∈ R. If b = 0 thenH is a linear half-space.
58
Theorem 4 (see [63]). A compact subset C ⊂ Rn is convex if and only if it can
be written as an intersection of half-spaces. It is a polytope if and only if it can be
written as the intersection of a finite set of half-spaces.
Definition. Let C be any subset of Rn. The dual cone C∗ is the cone defined by
C∗ = {x ∈ V : 〈x, y〉 ≥ 0 ∀y ∈ C} (2.33)
For example, the dual cones of rays are half-spaces, and vice versa. The
concept of a dual cone becomes useful when talking about effects, and particu-
larly when constructing states and effects of composite systems. Recall that Rn
comes equiped with the standard, Euclidean inner product, and the correspond-
ing Euclidean norm: this allows us to define a topology on Rn via the metric
d(x, y) = ‖x− y‖2. In this context, C∗ is a closed cone for any C ⊂ V (even if C
itself is neither a cone nor closed), and hence (C∗)∗ is a closed cone. Moreover,
since every vector in C has non-negative inner product with every vector in C∗,
we also have that C ⊆ (C∗)∗ . If C itself is a closed cone then it turns out that
C = (C∗)∗; this will be useful later in establishing duality relations between sets
of states and effects.
Proposition 2. Let C be a closed cone in Rn. Then (C∗)∗ = C.
Proof. As argued above, clearly C ⊆ (C∗)∗. In order to prove that (C∗)∗ ⊆ C it
suffices to show that x /∈ C ⇒ x /∈ (C∗)∗; equivalently, for x /∈ C there exists
z ∈ C∗ such that 〈z, x〉 < 0. Given x /∈ C, the function
f : V → R
f(v) = ‖x− v‖,
is continuous and bounded below by 0. Moreover, picking an arbitrary c ∈ C, the
setD = C ∩ {d ∈ Rn : f(d) ≤ f(c)} is the intersection of two closed sets, and is
bounded. Being closed and bounded in a finite-dimensional space, D is therefore
compact. Hence f attains its minimum value inD at a point y: clearly f(y) is the
59
minimum value of f over C also. Let z = y − x: either y = 0 or it must be that y
is the closest non-zero point to z on the ray Ry, hence 〈z, y〉 = 0. Moreover, since,
0 < ‖y − x‖2 = 〈y − x, y〉 − 〈y − x, x〉 = −〈z, x〉, (2.34)
it must also be that 〈z, x〉 < 0. It remains to show that z is indeed a member of
C∗, i.e. for all v ∈ C, 〈z, v〉 > 0. Suppose for contradiction that y′ ∈ C satisfies
〈z, y′〉 < 0. Define the vector,
yε = (1− ε)y + εy′, (2.35)
noting that for all 0 ≤ ε ≤ 1, yε is a member of C. We will show that for suf-
ficiently small ε, ‖yε − x‖2 < ‖y − x‖2, i.e. f(yε) < f(y), contradicting the
definition of y. More precisely, we demand
ε < 2〈z, y − y
′〉
‖y − y′‖2
, (2.36)
noting that both numerator and denominator are necessarily positive. By the Law
of Cosines,
‖yε − x‖2 = ‖y − x‖2 + ‖y − yε‖2 − 2〈y − x, y − yε〉
= ‖y − x‖2 + ε2‖y − y′‖2 − 2ε〈z, y − y′〉
< ‖y − x‖2, (2.37)
as desired.
Definition. Let C be a cone in Rn. The generalised inequality associated with C
is the partial order ≤C on the vectors in Rn given by x ≤C y if ∃z ∈ C such that
x+ z = y.
For example, ifC is the cone defined by the positive quadrant ofRn, then x ≤c
y iff every component of x has a lesser or equal value than the same component of y.
The partial order induced by any cone C is transitive (in the sense that x ≤C y and
60
y ≤C z implies x ≤C z), reflexive (in the sense that x ≤C x) and antisymmetric
(in the sense that x ≤C y and y ≤C x implies x = y). Conversely, any such partial
ordering inRn induces a coneC = {v ∈ Rn : v ≥ 0}, however this is not relevant
to our discussion. Once a partial order is established, an analogue of an interval
on the real line may be defined, viz.
Definition. Let C ⊂ Rn be a cone. For any x, y ∈ Rn, the order interval [x, y]C
is the set {z ∈ Rn : x ≤C z ≤C y}.
For example, if C is the cone defined by the positive quadrant in Rn, then the
interval [0, 1] between the zero vector and the vector with 1 in every component is
the hypercube of vectors for which every component has a value between 0 and 1.
On the other hand, the interval [0, e1], where e1 is the first standard basis vector,
is simply the line segment between 0 and e1. Clearly, some order intervals have a
richer structure than others - the following definition characterises those vectors u
which generate interesting order intervals [0, u].
Definition. LetC ⊂ Rn be a cone. u ∈ C is an order unit ofC if, for any v ∈ Rn,
there exists a λ > 0 such that v ≤C λu.
In some sense an order unit provides a benchmark by which the “order mag-
nitude” of any vector can be evaluated, in much the same way as the unit element
1 does on the real line. If u is an order unit in C then for all other v ∈ C, there
exists a λ > 0 such that 1λv is contained in the interval [0, u], hence this interval
inherits something of the structure of the cone C. Not all cones possess an order
unit: for example, a ray contains an order unit only in 1-dimensional space.
Proposition 3. Let C ⊂ Rn be a closed cone. If u is an order unit in C∗, then
D = {c ∈ C : 〈u, c〉 = 1} is a compact, convex subset of C.
Proof. SinceD is an intersection of closed and convex sets, it is closed and convex
itself. It remains to show that D is bounded: suppose for contradiction that there
exists a sequence {di} ⊂ D such that ‖di‖ → ∞. For any vector v, there exists a
λ such that v ≤C∗ λu, i.e. 〈v, d〉 ≤ λ for all d ∈ D. The sequence of real numbers
61
〈d1,di〉
‖d1‖‖di‖ therefore tends to zero as i tends to infinity, hence, defining i1 = 1, for
any ε > 0 there exists a positive integer i2 such that
〈di1 ,dj〉
‖di1‖‖dj‖
< ε for all j > i2.
Continuing iteratively, for any  > 0 it is possible to find a set of n+1 vectors
{d′1, . . . , d′n+1} such that
〈d′i,d′j〉
‖d′i‖‖d′j‖
< ε for all 1 ≤ i, j ≤ n+ 1. The angle between
d′i and d′j is given by:
θij = cos−1
( 〈d′i, d′j〉
‖d′i‖‖d′j‖
)
(2.38)
Thus there exist sets of n+1 vectors whose pairwise angles can be taken to be arbi-
trarily close to pi2 , contradicting the fact that we are in an n-dimensional Euclidean
space.
2.5.2 Abstract state spaces
We established in Section 2.4 that the states of any system with finitely many
fiducial measurements may be represented by vectors in a real, finite-dimensional
vector space. Abstract state spaces provide a broad framework for describing sys-
tems in general probabilistic theories in terms of real, finite-dimensional vector
spaces, without first having to introduce canonical sets of fiducial measurements
and the outcome distributions which correspond to allowed states. Whilst any sys-
tem defined in terms of fiducial measurements admits at least one representation
in this manner, the appeal of the abstract state space is that there is no preferred
set of fiducial measurements. Rather, fiducial sets of measurements can be in-
ferred from the description of state and effect vectors within the real vector space.
Before defining an abstract state space, it is helpful to discuss how the mathemat-
ical machinery of Section 2.5.1 corresponds to operational assumptions about the
system.
To begin with, some convex subset S ⊂ V corresponds to the set of preparable
states of the system. By restricting attention to a vector subspace if necessary, the
set of states can be assumed to span the entire vector space. Wewill also assume for
convenience that the set of states is closed: a physical justification for this is that
whenever {si}i≥0 is a sequence of states such that si → s for some vector s, then
62
the physical properties of the states s can be implemented to arbitrary accuracy by
preparing the states si, hence we may as well assume that s is a state.
The state cone S+ = cone(S) is therefore a closed cone in V : this has a
convex subset the sub-normalised states S≤1 = {λs : 0 ≤ λ ≤ 1, s ∈ S} =
conv(S ∪ {0}). Sub-normalised states play an important role in the discussion of
measurement outcomes, and we wish them to be distinct from the set of allowed
states, hence we demand that for all s ∈ S and 0 ≤ λ < 1, λs /∈ S . This prohibits
the zero vector from being a member of S. This in turn implies that λS ∩ S = ∅
for all λ 6= 1, and that S+ 6= V . In fact the condition λs /∈ S does not lead to a
loss of generality: if it is not satisfied, it is possible to add an extra dimension to
the vector space in order to make it so, whilst keeping the property that S spans
the whole of V .
An effect takes the form of a function eˆ from the set of states to the interval
[0, 1], such that the value eˆ(s) is the probability of the outcome corresponding to e
occurring, given that the system is in state s. An effect function must respect the
convexity of the set of states, i.e. for any states si ∈ S and pi ∈ [0, 1] satisfying
∑
i pi = 1, eˆ (
∑
i pisi) =
∑
i pieˆ(si). Such a function is said to be convex-linear
on S. The physical motivation for convex-linearity is as follows: if a system is
prepared in state si with probability pi, then for any measurement involving e, the
probability of the outcome corresponding to eˆmust be the associated average over
the probabilities that would be obtained for each preparation. It turns out that this
condition is enough to extend eˆ to a linear function on the whole of V .
Proposition 4. Let S ⊂ V be a convex subset of a real vector space, such that
span(S) = V and λS ∩ S = ∅ for all λ 6= 1, and let W be a real vector space.
Any convex-linear function eˆ : S → W is uniquely extensible to a linear function
eˆ : V → W .
Proof. Any linear function which is an extension of eˆ must respect linear combi-
63
nations involving members of S:
eˆ
(
∑
i
λisi
)
=
∑
i
λieˆ(si). (2.39)
Since span(S) = V , this defines a linear function on the whole of V , assuming
that the function is in fact well-defined. To show that it is well-defined, we must
show that if a vector v is decomposed in distinct ways as a linear combination of
members of S , then the above definition of eˆ is independent of which combination
is used. Suppose then that v =
∑
i λisi =
∑
j λ′js′j , and firstly assume that the
λi and λ′j are all positive, i.e. v ∈ cone(S). Letting
∑
i λi = Λ, we have 1Λv =
∑
i
λi
Λ si ∈ S, since S is convex. By the same reasoning,
1
Λ′v ∈ S where Λ
′ =
∑
j λ′j , thus Λ = Λ′. Since
eˆ
(
1
Λ
v
)
= eˆ
(
∑
i
λi
Λ
si
)
= eˆ
(
∑
j
λ′j
Λ
s′j
)
, (2.40)
is uniquely defined, so is eˆ(v) = Λeˆ
( 1
Λv
)
, hence eˆ extends in a unique and well-
defined way to the whole of cone(S). Now suppose that some of the λi and λ′j are
negative; we may re-arrange the equality so that all coefficients are positive:
∑
λi>0
λisi −
∑
λ′j<0
λ′jsj =
∑
λ′j>0
λ′jsj −
∑
λi<0
λisi. (2.41)
Since the LHS and RHS are both elements of cone(S), we have from our previous
argument that,
∑
λi>0
λieˆ(si)−
∑
λ′j<0
λ′j eˆ(sj) =
∑
λ′j>0
λ′j eˆ(sj)−
∑
λi<0
λieˆ(si). (2.42)
Re-arranging again gives that
∑
i λieˆ(si) =
∑
j λ′j eˆ(s′j), which verifies that our
linearly extended function is indeed well-defined on the whole of V .
64
Therefore an effect function eˆ : S → [0, 1] is associated with a unique vector
e such that 〈e, s〉 = eˆ(s) for all states s, i.e. the set of effects generates an effect
cone E+ = S∗+ which is dual to the state cone. It follows from Proposition 2 that
also S+ = E∗+. From here on we will use the term “effect” both for a non-negative,
convex-linear functional eˆ on S and for the effect vector e which corresponds to
that functional. We make an assumption at this point that any such effect vec-
tor e corresponds to a physically realisable effect. That is to say, there exists a
measurement outcome whose probability of occurring is equal to 〈e, s〉 whenever
that measurement is physically performed on a system in the state represented by
s. The duality of the cones E+ and S+ is a direct result of this assumption. This
duality property holds in finite-dimensional quantum theory, as we will see, but
is difficult to justify on physical grounds; if the assumption is not made, then S+
and E+ must be defined separately, and many of the results concerning multipartite
systems will not follow, or would require modification.
Ameasurement in an abstract state space consists of a set of effects {e1, . . . , en}
such that
∑n
i=1〈ei, s〉 = 1 for all s ∈ S. This implies that the vector U =
∑
i ei
is the vector uniquely defined by linear extension of the function which maps all
vectors in S to 1. Just as we assume that all valid effect vectors are physically
realisable, we assume that any set of effects which sum to give U corresponds to a
possible measurement that can physically be performed (again, this is in principle
true in finite-dimensional quantum theory). The vector U corresponds to the unit
effect, which itself can be seen as a trivial measurement with only one outcome.
Note that all physically realisable effects must therefore obey e ≤E+ U , where
“≤E+” is the generalised inequality associated with E+.
Definition. An abstract state space is a 3-tuple (V,S+,U), where V is a real,
finite-dimensional vector space, S+ ⊂ V is a closed cone which spans V , and U
is an order unit in S∗+.
Note that the set of normalised states is not an explicit part of the definition,
rather it is defined as the set S = {s ∈ S+ : 〈U , s〉 = 1}, which by Proposition
3 is compact and convex. The effect cone is also not explicitly mentioned, but is
65
implicitly the dual cone to S+. By analogy with normalised states, e ∈ E+ is said
to be a proper effect if 〈e, s〉 ≤ 1 ∀s ∈ S , and we denote by E the set of proper
effects. Note that E can equivalently be defined as the order interval [0,U ]E+ . The
elements of E+ \ E are said to be improper effects.
Definition. Let (V,S+,U) be an abstract state space. Then,
• The pure states are those that belong to ext(S);
• The pure effects are those that belong to ext([0,U ]E+);
• The extreme ray effects are those that generate rays in extray(E+);
• A pure measurement is one for which the outcomes correspond to distinct
extreme ray effects.
The condition λS ∩ S = ∅ for λ 6= 1 ensures that the pure states are exactly
those states which generate the extreme rays of S+, hence there is no need to
distinguish between the extreme vectors of S and extreme rays of S+. As we will
see in Section 2.5.5 however, there often exist pure effects which are not extreme
ray effects. Pure measurements are those that provide maximal information about
the system, in the sense that the outcome statistics of any measurement may be
deduced from the outcome statistics of a related pure measurement. To see this,
note that any set of effects lies in the conic hull of the extreme ray effects, so
it is always possible to “split up” an outcome into a disjoint union of outcomes
corresponding to extreme ray effects. We can define this “splitting up” procedure
more rigorously, and equivalently define the pure measurements as those which
admit no non-trivial refinements.
Definition. Let M1 = {e1, . . . , en} and M2 = {f1, . . . , fm} be two measure-
ments on an abstract state space. M1 is a refinement of M2 if there exists a
surjective mapping φ : [n] → [m] such that,
fi =
∑
j:φ(j)=i
ej ∀i. (2.43)
66
Conversely,M2 is a coarse-graining ofM1. A trivial refinement is one for which,
φ(j) = i ⇐⇒ ei = λfj. (2.44)
An abstract state space may be defined in two equivalent ways: firstly, by
giving the vector space V , the state cone S+ and an order unit U of S∗+, from
which the sets E+ and E can be deduced. Secondly, by giving the vector space
V , the effect cone E+ and order unit U ∈ E+, from which S+ is the dual cone to
E+ and S is the compact, convex subset of S+ having unit inner product with U .
These complementary methods allow us to construct lower and upper bounds on
the set of states of composite systems.
2.5.3 Composite Systems
In order to construct a composite system in terms of its subsystems, wemust define
the vector space, state (or effect) cone and unit effect that characterise its abstract
state space. Recall that the principle of local tomography tells us that any joint
state is characterised by the outcome statistics of joint fiducial measurements, i.e.
the normalised set of conditional probabilities P (a1, . . . , aN |x1, . . . , xN). How-
ever, if the systems are not Boxworld or classical systems, we must be careful
that the reduced states of an individual system are in fact allowed on that sys-
tem, i.e.
∑
aj :j 6=i P (a|x) is an allowed state of system i, whatever the choice of
{xj : j 6= i}. As we will shortly see, it turns out that all possible joint states can
be neatly represented in the tensor product of the vector spaces representing each
individual system.
Definition. LetC(1), . . . , C(N) be cones in the real, finite-dimensional vector spaces
V (1), . . . , V (N) respectively. The tensor coneC(1)⊗˜ · · · ⊗˜C(N) is the cone inV (1)⊗
· · · ⊗ V (N) defined by cone({c(1) ⊗ · · · ⊗ c(N) : c(i) ∈ C(i)}).
Let {(V (i),S(i)+ ,U (i))}Ni=1 be a set of abstract state spaces representing systems
in some general probabilistic theory, and consider theN -fold tensor product space
67
VN =
⊗N
i=1 V (i). Suppose that on each of the systems, the state s(i) is prepared:
then it is natural to define the global state corresponding to this joint preparation
as the product state s(1) ⊗ · · · ⊗ s(N). Moreover, instead of being prepared in-
dependently, the systems may agree to prepare states according to some shared
classical correlation, i.e. there is a shared probability distribution {pj} such that
with probability pj , the state s(i)j is prepared on subsystem i. The global state after
this preparation is then a convex combination of product states:
∑
j
pjs(1) ⊗ · · · ⊗ s(N)j (2.45)
The cone generated by all convex combinations of product states is the tensor cone
Smin+ = S
(1)
+ ⊗˜ · · · ⊗˜S
(N)
+ . Whatever construction we employ for the set of states
of the composite system, it is physically reasonable to demand that the composite
state at least contains Smin+ . Define UN = U (1) ⊗ · · · ⊗ U (N), and note that this
has unit inner product with the convex hull of product states. Note that the set of
states S(i) on each system i span the vector space V (i), therefore the set of product
states spans VN , and UN is the unique vector with unit inner product on all product
states. We therefore take UN to be the N -fold unit effect.
Designating the set of allowed composite states to be Smin = {s ∈ Smin+ :
〈UN , s〉 = 1} defines a perfectly valid composite system. Measurements on the
composite system then correspond to sets of effects belonging to the effect cone
Emin+ =
(
Smin+
)∗, and the proper effects are those belonging to the order interval
[0,UN ]Emin+ .
Definition. Let {(V (i),S(i)+ ,U (i))}Ni=1 be a set of abstract state spaces. The mini-
mal tensor product or (min tensor product) of systems 1, . . . , N is represented by
the abstract state space
V min =
(
N
⊗
i=1
V (i),Smin+ ,UN
)
(2.46)
Notice that convex combinations of product states do not allow for any notion
68
of entanglement or non-locality, both of which are present in quantum theory and
Boxworld, so there must in general be larger consistent sets of states that can be
prepared on composite systems. Clearly, any composite state s should obey the
following properties:
〈e(1) ⊗ · · · ⊗ e(N), s〉 ≥ 0 ∀e(i) ∈ E (i)+ (2.47)
〈UN , s〉 = 1. (2.48)
In quantum and classical theory, composite states have well-defined reduced
states, which determine the local outcome statistics of local measurements, regard-
less of operations performed on the remaining subsystems. Suppose that a joint
measurement is made on a state s, and the outcomes on systems 2, . . . , N corre-
spond to the effects e(2), . . . , e(N). In principle, the choice of measurement at sys-
tem 1 may be made after these measurements occur; the reduced state s˜(1) ∈ S(1)≤1
relative to the fixed outcomes e(2), . . . , e(N) is such that for any e(1) ∈ E (1)+ ,
〈e(1), s˜(1)〉 = 〈e(1) ⊗ e(2) ⊗ · · · ⊗ e(N), s〉. (2.49)
This defines a unique vector s˜(1) ∈ V which, since the RHS is strictly between 0
and 1, must lie in S(1)≤1 . s˜(1) can be interpreted as the state of subsystem 1 post-
measurement, conditioned on the outcomes corresponding to effects e(2), . . . , e(N)
being recorded at subsystems 2, . . . , N .
In order to normalise the state s˜(1), it must be divided by the positive real num-
ber 〈U (1), s˜(1)〉: this value can be interpreted as the marginal probability of ob-
taining the outcomes e(2), . . . , e(N) for any global measurement which includes
the outcomes corresponding to these effects on each subsystem. Suppose that we
complete this global measurement using an additional set of effects on each sub-
system i ≥ 2 which, when summed together with e(i), give the unit effect. Then
the average over the resulting (normalised) reduced states, relative to each pos-
sible outcome on subsystems 2, . . . N , and weighted according to their marginal
69
probabilities, is the unique state s(1) ∈ S(1) such that for any e(1) ∈ E (1)+ ,
〈e(1), s(1)〉 = 〈e(1) ⊗ U (2) ⊗ · · · ⊗ U (N), s〉, (2.50)
known as the reduced state on subsystem 1. This reduced state provides a full
description of the outcome statistics for a local measurement at subsystem 1, given
that the state of the global system is s. As a consequence of our tensor product
construction, any allowed state satisfying (2.47) and (2.48) also leads to reduced
states which are genuine states on each individual subsystem. This set of states
may be explicitly constructed by defining the cones of effects and states in the
following way,
Emax+ = E
(1)
+ ⊗ˆ · · · ⊗ˆE
(N)
+
Smax+ =
(
Emax+
)∗ , (2.51)
thus defining Smax = {s ∈ Smax+ : 〈UN , s〉 = 1} as the largest possible set of
states for the composite system.
Definition. Let {(V (i),S(i)+ ,U (i))}Ni=1 be a set of abstract state spaces. The maxi-
mal tensor product or (max tensor product) of systems 1, . . . , N is represented by
the abstract state space
V max =
(
N
⊗
i=1
V (i),Smax+ ,UN
)
(2.52)
As well as describing an abstract state space for each type of individual sys-
tem, a general probabilistic theory must provide a description of how those sys-
tems may combine into composite systems. For any set {(V (i),S(i)+ ,U (i))}Ni=1
of permitted abstract state spaces, the theory must offer a composite state cone
Smin+ ⊆ (S+)N ⊆ Smax+ , or a composite effect cone Emax+ ⊆ (E+)N ⊆ Emin+ .
70
2.5.4 Transformations
Quantum theory is not just a static description of what kinds of measurements can
be performed, and what kind of outcome statistics we can expect from those mea-
surements. It also provides an explicit model of how one state may transform into
another. These transformations are described by unitary operators on the Hilbert
space of the system, or at least of a larger system containing it. Any transformation,
indeed any evolution of part of the universe as portrayed by quantum mechanics,
is described by a reversible transformation from the perspective of anyone who
regards that part of the universe as a closed system.
Transformations also fit naturally into the abstract state space framework de-
scribed in Sections 2.5.2 and 2.5.3. Suppose that (V,S+,U) is an abstract state
space, and consider a mapping T from S into itself (i.e. a transformation from
allowed states to allowed states). Recall from Section 2.5.2 that effects must be
convex-linear on S in order to respect probabilistic preparations of states. Much
the same reasoning can be applied here: the transformation acting on the state
ps1+(1−p)s2 is expected to have the operational properties of the corresponding
average of transformed states, i.e. T (ps1 + (1− p)s2) = pT (s1) + (1− p)T (s2).
It follows from Proposition 4 that T can be seen as a linear map from V to it-
self, which also maps S into itself. An allowed transformation is any such linear
map: just as we assume that any linear functional which maps states to the inter-
val [0, 1] corresponds to a physically realisable effect, we assume that any allowed
transformation may be realised by some physical operation on the system.
Suppose now that a system is comprised of N subsystems, with vector spaces
V (1), . . . , V (N) admitting state spaces S(1), . . . ,S(N), and that an allowed trans-
formation T (i) is locally performed on each subsystem i. We would expect that
there is an allowed global transformation T of the composite system which occurs
as a result of these local transformations being performed. Consider the effect of
this global transformation on a product state s(1) ⊗ · · · ⊗ s(N), and then perform-
ing a measurement locally on each subsystem. The state which gives the correct
outcome probabilities for any local measurement is T (1)(s(1))⊗ · · · ⊗ T (N)(s(N)).
71
T = T (1) ⊗ · · · ⊗ T (N) is the unique linear extension to
⊗
V (i) of the map which
acts in this manner on the set of product states. “Product transformations” of this
kind must therefore be allowed transformations of the composite system: observe
that this will always hold in min- and max- tensor product spaces, since product
transformationsmap the setsSmin andSmax into themselves. Note, if a local trans-
formation is performed on only a subset of theN subsystems, then we consider the
identity transformation I(i) to have been performed on the remaining subsystems.
In quantum theory, one may view transformations as acting on measurement
operators rather than states, by moving to the Heisenberg picture. An analogous
trick may be performed here. Since, in the operational perspective, a state is de-
fined by its outcome statistics, the action of a transformation is defined by how it
influences the inner product between states and effects, i.e. T is determined by the
set of values 〈e, T (s)〉. Recall that the adjoint T † of a linear map T is the linear
map for which satisfies 〈T †(x), y〉 = 〈x, T (y)〉 for all x, y ∈ V . Thus T is de-
termined equivalently by its adjoint transformation on effects: note that if T is an
allowed transformation then T † maps the effect cone E+ to itself.
Proposition 5. Let (V,S+,U) be an abstract state space, and T an allowed trans-
formation. Then T †(U) = U .
Proof. By definition of the adjoint, we have that for all states s ∈ S ,
〈T †(U), s〉 = 〈U , T (s)〉 = 1. (2.53)
Since S spans V , any two linear functionals which are equivalent on S are equiva-
lent on the whole of V . T †(U) has the same inner product as U does for all s ∈ S ,
thus T †(U) = U .
An allowed transformation is reversible if there exists an allowed transforma-
tion S such that TS(s) = s for all states s in S . This implies that S = T−1
when viewed as linear maps on the whole of V . If a transformation is reversible,
then so is its adjoint, which satisfies
(
T †
)−1 = (T−1)†. The property of being
72
reversible imposes strong restrictions on T . Clearly, for example, it must be a bi-
jective mapping from S to itself, otherwise T−1 will not be a well-defined allowed
transformation. As the following Proposition demonstrates, this means that the set
of pure states must be permuted by T .
Proposition 6. Let (V,S+,U) be an abstract state space, and T an allowed, re-
versible transformation. Then T maps pure states to pure states, T † maps pure
effects to pure effects and T † maps extreme ray effects to extreme ray effects.
Proof. Suppose that T (s) = ps1 + (1− p)s2, where 0 < p < 1 and s1, s2 ∈ S . T
has a linear inverse T−1, so we have that s = pT−1(s1) + (1− p)T−1(s2). Since
s is a pure state, this implies T−1(s1) = T−1(s2) = s, hence s1 = s2 = T (s).
Therefore T (s) is also a pure state. The proof that T † maps pure effects to pure
effects is identical, and the proof that T † maps extreme ray effects to extreme
effects is very similar (it simply involves switching the convex combination for a
conic combination).
2.5.5 Examples
So far we have introduced machinery for dealing with abstract state space repre-
sentations of general probabilistic theories, and derived results about their struc-
ture in this abstract setting; we now supplement this with concrete examples. We
begin by recapping and constructing the abstract state spaces for quantum theory,
classical theory and Boxworld, then describe polygon models introduced in [36],
whose state spaces take the shape of regular polygons.
Quantum Theory
In the standard presentation, quantum theory is not introduced by means of a
real vector space, but instead by means of a complex Hilbert spaceH, in which the
states are given by density operators andmeasurements by sets of POVMelements.
However, the set of density operators form a convex subset of the real vector space
herm (H) of Hermitian operators over H. Defining V = herm (H) to be our
73
vector space, and S+ = P (herm (H)) to be the cone of positive operators on
H (i.e. Hermitian operators P for which 〈x|P |x〉 > 0 for all non-zero x ∈ H),
we almost have a full abstract state space. Before defining U , we need to have
an inner product on V : this is provided by the Hilbert-Schmidt inner product on
linear operators, 〈A,B〉 = Tr
(
A†B
)
. With this inner product, the unit effect is
simply the identity I, and the cone of effects is identical to the cone of states. The
proper effects are positive operators E for which E ≤ I in the operator norm.
The convex set of normalised states S is the set {ρ ∈ P (H) : Tr (ρ) = 1}
(i.e. those that have unit inner product with I). Pure states are given by the set
of projectors onto 1-dimensional subspaces. Pure effects are given by projectors
of any rank, however extreme ray effects are again given by the set of projectors
onto 1-dimensional subspaces (this is in agreement with the fact that E+ = S+,
and our previous comment that the pure states generate the extreme rays of S+).
Observe that whenever H is of dimension larger than 2, there will exist infinitely
many pure effects which are not extreme ray effects. Reversible transformations
are maps ρ → UρU †, where U is a unitary operator. As expected, these map the
cone of positive operators bijectively to itself, and pure states onto pure states. The
adjoint mapping also takes pure effects to pure effects, and extreme ray effects to
extreme ray effects.
The set of states of composite quantum systems, the unit-trace positive oper-
ators in the tensor product of the Hilbert spaces, lies strictly between the sets of
states described by the min and max tensor products. The states of the min tensor
product are the separable states, i.e. those which can be written as a convex com-
bination of pure product states: this also permits effects that are not allowed in
standard quantum theory, for example entanglement witnesses, which give posi-
tive outcomes for any product state, but are negative for some entangled states. On
the other hand, composite quantum systems with the max tensor product include
states not allowed in the standard formulation, for example the partial transpose of
many entangled states generates an operator which is positive for tensor products
of POVM elements, but has global negative eigenvalues.
74
Classical Theory
Recall that classical systems are those for which there is a single fiducial mea-
surement, over which any outcome distribution corresponds to an allowed state.
For a fiducial measurement with K outcomes, this can be represented in RK by
setting the fiducial effects to be the standard basis vectors {e1, . . . , eK}. These
are the extreme ray effects, hence the effect cone is the positive quadrant. This
implies that the ith component of a vector representing a state is the probability of
obtaining outcome iwhen the fiducial measurement is performed on the system in
that state. Thus all state vectors have unit inner product with the unit effect, which
is the vector containing 1 in every component. The set of normalised states forms a
simplex, whose vertices are also the standard basis vectors. The pure state ei with
a 1 in the ith component (and 0 elsewhere) is the state which deterministically as-
signs outcome i when the measurement is performed. Like quantum theory, the
state and effect cones are identical.
Reversible transformations of a classical system must permute the fiducial ef-
fects, which can be interpreted as a “relabelling” of themeasurement outcome once
the measurement has been performed. Note that any such transformation maps the
standard basis to itself, and hence extends to a valid linear map on RK .
The min and max tensor products of composite classical systems coincide, and
thus there is only one reasonable construction of composite states and effects: the
only allowed states are classically correlated, and there is no entanglement. To
see this, it is enough to observe that the only composite product effect with which
the pure product state e(1)i1 ⊗ · · · ⊗ e
(N)
iN has non-zero product, is in fact the effect
given by the same vector. Hence any linear combination of product effects must
have only positive coefficients in order to have positive inner product with all pure
product states. Since the product effects span the tensor product space, the only
possible composite effects are given by Emax+ , which must therefore be equal to
Emin+ .
75
Boxworld
In a Boxworld system it is again simplest to construct the effect cone first,
then obtain the state space via duality. For a system with M measurements, for
which measurement x hasKx outcomes, recall that any set of distributions on the
outcomes for each measurement is an allowed state. This can be encapsulated in
a real vector space V of dimension d = 1 +
∑
x (Kx − 1). Choose a linearly
independent set of vectors {U} ∪ {Xa|x} for 1 ≤ x ≤ M and 1 ≤ a ≤ Kx − 1,
where the vectors of the formXa|x are fiducial effects, corresponding to outcome a
occurring when fiducial measurement x is performed. These vectors form a basis
for V . The remaining fiducial effect vectors are then restricted by the fact that
summing the effects of a fiducial measurement must give the unit effect:
XKx|x = U −
Kx−1
∑
a=1
Xa|x. (2.54)
For example, a g-bit system can be represented in a 3-dimensional real vector
space, where U , X0|0 and X0|1 take the place of the standard basis vectors, so that
e.g. X1|1 = (1, 0,−1). Any vector s which has a positive inner product with all
fiducial effect vectors and unit inner product with the unit effect U corresponds
to the state with outcome distribution P (a|x) = 〈Xa|x, s〉. These inner products
always have a value at most one since 〈U , s〉 = 1. Pure states are those which have
inner product only 0 or 1 with all fiducial effects: these deterministically assign
a single outcome to each choice of measurement (for example, the state which
always produces outcome 1 is a pure state).
Min tensor product Boxworld states are convex combinations of tensor prod-
ucts of these pure states. There exist effects in Emin+ \ Emax+ which are positive for
all such states: for example, the effect,
X1|0 ⊗X1|0 +X0|0 ⊗X1|1 +X1|1 ⊗X0|0 −X1|1 ⊗X1|1, (2.55)
gives inner product 0 or 1 for all pure product states, but is not a positive linear
76
combination of tensor products of fiducial effects. However, for a variant of the
PR-box state, in which the outcomes are exactly correlated (with probability 1/2
each for both being 0 or both being 1) whenever both inputs are equal to 0, but
anti-correlated (again with probability 1/2 for the two possibilities) otherwise, the
probability of obtaining the above effect is 3/2, which is clearly not permitted.
Min tensor product Boxworld outcome distributions clearly cannot violate the
CHSH inequality, and are not often the subject of much interest. When referring
to composite Boxworld systems, it is usually assumed that the max tensor product
is being used; this is the convention that we will follow in this thesis. A Box-
world state on a composite system -whoseN subsystems have abstract state spaces
(V (i),S(i)+ ,U (i)) - is defined by a non-signaling outcome distribution P (a|x), and
corresponds to a unique vector s ∈ Smax, such that 〈U , s〉 = 1 (i.e. the state is
normalized) and P (a1, . . . , aN |x1, . . . , xN) = 〈X(1)a1|x1⊗· · ·⊗X
(N)
aN |xN , s〉 [4]. Pure
product states are those of the form s(1) ⊗ · · · ⊗ s(N) where s(i) is a deterministic
state on subsystem i. For a system of two g-bits, all pure states are either pure
product states or variants of the PR-box; we do not yet have a full understanding
of the pure states of composite Boxworld systems, although computational enu-
merations have been carried out for some cases [64, 65].
In quantum theory, the Pauli measurements are not the only measurements that
can be made on a qubit system, despite constituting a fiducial set of measurements;
any positive semi-definite operatorP ≤ I corresponds to an outcome of some care-
fully constructed measurement. Similarly, in Boxworld the fiducial measurements
are not the only measurements that can be performed, and the fiducial effects are
not the only vectors which correspond to measurement outcomes. However, since
the fiducial measurements fully characterise a state s, the probability of obtaining
any non-fiducial effect must be a function of the values P (a1, . . . , aN |x1, . . . , xN)
associated with s. The set of allowed N -partite effects in Boxworld is the set of
vectors e ∈
⊗
V (i) such that 〈e, s〉 ∈ [0, 1] for all s ∈ S. This includes the N -
partite fiducial effects, as well as all linear combinations of the fiducial effects that
are positive and sub-normalised for all states. These non-fiducial effects can in
77
principle be measured by ignoring particular measurement outcomes, forgetting
which of several outcomes occurred, or performing measurements probabilisti-
cally.
The same “relabelling”' idea introduced for classical systems can be employed
in Boxworld to generate reversible transformations. For example, on a single sys-
tem, relabelling by switching the measurement choices x and x′ corresponds to the
permutation of fiducial effectsXa|x ↔ Xa|x′ for all a (we must haveKx′ = Kx for
this to work). Relabelling by switching the measurement outcomes a and a′ for the
same measurement corresponds to a single switch of the fiducial effect Xa|x with
Xa′|x. Since these transformations can be validly defined on the basis made up ofU
and a subset of the fiducial effects, they extend to linear maps on the whole vector
space. Chapter 5 is concerned with proving that all allowed, reversible Boxworld
transformations are composed of these local relabellings, as well as permutations
of subsystems, so long as no subsystem is classical. This is no easy task; however,
it is possible to prove relatively easily that in Boxworld, reversible transformations
map pure product states to pure product states.
Proposition 7. Let {(V (i),S(i)+ ,U (i))Ni=1} be subsystems of a composite Boxworld
system. Let T be a reversible transformation on the composite system. Then for
any pure product state s, T (s) is also a pure product state.
Proof. This proof closely follows that given in [4]. Recall that the adjoint T † must
permute the extreme rays of the effect cone: for a Boxworld system these are the
product fiducial effects on each system. Hence 〈e, T (s)〉 = 〈T †(e), s〉 ∈ {0, 1}
for any product fiducial effects e.
A joint local measurement involving measurement choices xi on subsystem i
comprises all product fiducial effects of the form Xa1|x1 ⊗ · · · ⊗ XaN |xN , where
each ai ranges from 1 to Kxi . Since the sum of these vectors is UN = U (1) ⊗
· · ·⊗U (N), T (s)must have inner product 1 with exactly one such vector, and zero
with the others. Thus for each string of measurement choices (x1, . . . , xN), T (s)
deterministically assigns some string of outcomes (a1, . . . , aN).
78
Suppose now that for two measurement strings x and x′ with xi = x′i for any
fixed i, T (s) assigns the outcome strings a and a′ for which ai 6= a′i. This implies
that T (s) has inner product 1 with both the effects,
U (1) ⊗ · · · ⊗X(i)ai|xi ⊗ · · · ⊗ U
(N)
U (1) ⊗ · · · ⊗X(i)a′i|xi ⊗ · · · ⊗ U
(N).
However, these distinct effects are both present in the same decomposition of UN :
UN =
∑
a
U (1) ⊗ · · · ⊗X(i)a|xi ⊗ · · · ⊗ U
(N),
thus T (s) has inner product at least 2 with UN , contradicting the assumption that
T takes normalised states to normalised states.
Polygon Models
Polygon models, first introduced in [36], are specifically constructed so that
the normalized state space of a single system takes the shape of a regular n-gon (or
n-sided polygon) in 3-dimensional real Euclidean space. In contrast to the above
theories, whose state spaces result from mathematical restrictions on the outcome
distributions, polygon systems are defined by their state space, and their outcome
distributions follow accordingly. Note that as n becomes large, the state space
begins to resemble a circle, which itself represents a “2-dimensional slice” of the
Bloch sphere of a qubit, in which only the σx and σz measurements are considered.
An analysis of polygon models for varying n potentially gives us insight into the
significance of the shape of the quantum local state space for phenomena such as
non-locality.
For fixed n, the normalized state space S of the n-gon system is the convex
hull of the extremal states ω1, . . . , ωn, defined by:
79
ωn =



rn cos
(2ipi
n
)
rn cos
(2ipi
n
)
1



, (2.56)
where rn =
√
sec
(pi
n
)
. Recall that the unit effect is the unique vector whose inner
product with all states is equal to 1. Clearly, this is the vector:
U =



0
0
1



. (2.57)
The extremal effect vectors must be defined separately in the cases where n is
odd or even. For even n, they are the vectors e1, . . . , en defined by:
ei =
1
2




rn cos
(
(2i−1)pi
n
)
rn cos
(
(2i−1)pi
n
)
1




. (2.58)
For odd n, there are 2n extremal effect vectors. The first n are defined by:
ei =
1
1 + r2n



rn cos
(2ipi
n
)
rn cos
(2ipi
n
)
1



(1 ≤ i ≤ n), (2.59)
and the remaining n are defined by en+i = U − ei for 1 ≤ i ≤ n. Note that in the
case of even n, U − ei is equal to e(i+n/2)modn, whereas in the case of odd n, this
process generates 2n distinct effect vectors. Compare this with the features of odd-
and even- sided polygons: the first n vectors correspond to normal planes which
are parallel to and touch the sides of the polygon. The second set of n vectors
correspond to normal planes which touch the polygon diametrically opposite to its
sides. The sides of even polygons are arranged in diametrically opposing pairs, so
that the first and second sets of effect vectors correspond to the same st of planes.
80
On the other hand, the sides of odd polygons are diametrically opposite to their
vertices, so that the second set of effect vectors corresponds to an entirely new set
of planes.
Figure 2.5 (taken from [36]) depicts the state spaces of the polygon models for
some values of n, with state and effect vectors projected onto the 2-dimensional
polygon. Note that setting n = 3 gives a classical “triangle” system whose single
fiducial measurement has 3 outcomes, and setting n = 4 gives a Boxworld system
with binary inputs and outputs. For n = 5 and beyond, however, this set of systems
do not overlap with classical, quantum or Boxworld systems.
Figure 2.5: Illustration of polygon state spaces and the first n extremal effects
(ΩQM denotes the quantum state space). Taken from [36].
Composite states of bipartite polygon systems may be described by a 3x3 ma-
trix S, such that eTSf ≥ 0 whenever e and f are local extremal effects or unit
vectors of the first and second systems respectively. The value eTSf gives the
81
probability of obtaining the local outcomes corresponding to e and f , whenever a
local measurement is performed which contains these effects on each system. It
can be shown that the states defined by the following matrices are entangled pure
states (except for the case n = 3):
even n : φ =



cos
(pi
n
)
sin
(pi
n
)
0
− sin
(pi
n
)
cos
(pi
n
)
0
0 0 1



(2.60)
odd n : φ =



1 0 0
0 1 0
0 0 1



(2.61)
By maximising over local binary measurements of the form {ei,U − ei}, it
is possible to violate Tsirelson's bound on the CHSH value for arbitrarily large,
even values of n. For odd values of n on the other hand, Tsirelson's bound is
always obeyed by these measurements. Moreover, as n increases, the optimal
values for both odd and even n appear to approach Tsirelson's bound from below
and above respectively. Intriguingly, Tsirelson's bound provides a kind of dividing
line between the optimal CHSH values obtained by odd and even polygons. Yet,
even when n is so large that the polygon approximates a circular “slice” of the
Bloch sphere, it is still possible to violate quantum predictions.
82
Chapter 3
Quasiprobability representations
We must by all means stick to our sensations, that is, simply to the
present impressions whether of the mind or of any criterion whatever,
and similarly to our actual feelings, in order that we may have the
means of determining that which needs confirmation and that which
is obscure.
“Letter to Herodotus”
Epicurus
83
The notion of negative probabilities has a history that is both surprisingly long,
and surprisingly closely intertwined with the history of quantum mechanics. Ideas
about the relation of negative probabilities to quantum theory arose not long after
its own development: in his Bakerian Lecture to the Royal Society in 1942 [66],
Dirac remarked that negative energies and negative probabilities arose naturally
in his calculations regarding relativistic extensions of quantum mechanics. How-
ever, when calculating final measurement probabilities, the negative values were
invariably “averaged out” to produce positive quantities and probabilities of ob-
served events, and were physically significant only in their contribution to these
observed values. He commented, “Negative energies and probabilities should not
be considered as nonsense, … [but] simply as things which do not appear in ex-
perimental results” [66].
Dirac conceived that those particles associated with negative quantities could
be seen as belonging to some hypothetical world differing markedly from our own,
but in which analogous transition laws applied, and in which the “averaged out” fi-
nal probabilities were in complete agreement with the probabilities we observe. In
the paper's introduction, in which he discussed Heisenberg's operational take on
quantum mechanics, Dirac appears to propose the following viewpoint: as long
as we are still ultimately talking about natural observations, it is no great sin if
negative quantities are encountered along the way. Even earlier, in 1932, Wigner
gave a striking precedent to this attitude in his paper introducing the well-known
Wigner phase-space distribution [67]. Whilst a classical joint probability distribu-
tion is generally incapable of describing simultaneously a wave function's position
and momentum, the Wigner distribution nearly accomplishes this by allowing the
values to be negative for some choices of position and momentum. Averaging
over all choices of position gives the correct formula for the momentum, and vice
versa. Wigner noted, “negative values … must not hinder the use of [the Wigner
distribution] in calculations, as an auxiliary function” [67].
In the context of quantum information theory, interest in quasiprobability dis-
tributions has seen a marked revival, including several applications and results in
84
quantum information and quantum foundations (a comprehensive review of de-
velopments in this topic can be found in [68]). These have tended to focus on
quasiprobabilistic representations of quantum states, with a view to progressing
our calculational abilities and intuition regarding them. In this chapter we take a
different perspective, and are concerned with simulating all non-signaling theo-
ries via quasiprobabilistic extensions of both quantum and classical theory. The
work herein is largely based on [1] and [43]; Subsection 3.2.2 and Section 3.4 are
exceptions to this, and constitute original, unpublished work. Section 3.3 is the
result of collaboration with Anthony Short.
In Section 3.1we introduce and discuss quasiprobability distributions in aman-
ner that is directly applicable to the discussion of outcome statistics of multipartite
experiments. In Section 3.2 we give a detailed discussion of a recent result by
Acín et al [1] which demonstrates that a specific quasiprobabilistic extension of
quantum theory allows the generation of arbitrary non-signaling distributions. In
Section 3.3 we show that the same can be achieved for local, classical probability
theory in two distinct ways. This results in four (two quantum-like, two classical-
like) distinct quasiprobabilistic models for non-signaling correlations; we discuss
several interesting features of these models as well as their relevance to the study
of quantum theory. In Section 3.4 we then discuss further quasiprobabilistic ex-
tensions of quantum theory, focusing on one that employs zero entanglement and
an arbitrarily small degree of classical correlation.
3.1 Quasiprobability distributions
Definition. A function P˜ : A1×· · ·×AN → R, for finite setsA1, . . . , AN , is said
to be a (multipartite) quasiprobability distribution if:
∑
a1,...aN
P˜ (a1, . . . , aN) = 1. (3.1)
Since we will generally be studying the multipartite case, the word “multi-
85
partite” preceding the phrase “quasiprobability distribution” will be dropped. For
convenience we will also use bold symbols to denote strings of values, for exam-
ple a denotes (a1, . . . , aN), and consequently P˜ (a) denotes P˜ (a1, . . . , aN). Like
ordinary probability distributions, quasiprobabilities may be conditioned on some
other set of variables:
Definition. A function P˜ : A1 × · · · × AN × X1 × · · ·XN → R is said to be a
conditional quasiprobability distribution if, for all fixed choices of x ∈ X1×· · ·×
XN :
∑
a1,...aN
P˜ (a|x) = 1, (3.2)
where the notation P˜ (a|x) is used in place of P˜ (a1, . . . , aN , x1, . . . , xN).
Conditional quasiprobability distributions form a direct analogue of outcome
distributions. Indeed, we can define a more general form of the non-signaling
condition which applies to quasiprobability distributions. Due to the difficulty in
finding a convincing physical interpretation for negative probabilities, it is difficult
to physically motivate this non-signaling condition in terms of communication
between distant parties. Nevertheless, this form of the non-signaling condition
forms an important mathematical step in the derivation of our results.
Definition. The conditional quasiprobability distribution P˜ (a|x) is non-signaling
if, for all i and choices of x1, . . . , xN , the expression
Kxi
∑
ai=1
P˜ (a1, . . . , aN |x1, . . . , xN) (3.3)
is independent of the particular value of xi.
The only assumption we have made about the outcome sets A1, · · · , AN so
far is that they are finite. Without loss of generality, we may as well assume that
Ai = {1, . . . , K(i)xi }, where K
(i)
xi is the number of possible local outcomes when
measurement xi is performed locally at subsystem i. For a string of outcome val-
ues a and a subset Ω = {i1, . . . , ik} ⊆ [N ] with i1 < · · · < ik, we will use the
86
notation aΩ, xΩ to denote the reduced strings (ai1 , . . . , aik), (xi1 , . . . , xik). Just as
with conventional outcome distributions, a non-signaling conditional quasiproba-
bility distribution haswell-definedmarginal distributions P˜ (aΩ|xΩ) for any choice
of Ω. The following technical definition will be helpful in characterizing condi-
tional quasiprobability distributions according to a restricted subset of its marginal
values.
Definition. The outcome ai on system i is maximal with respect to xi if it takes
on the greatest value permitted by measurement choice xi, i.e. ai = K(i)xi . For a
subset Ω ⊆ [N ], the string (aΩ|xΩ) is nowhere-maximal if, for all i ∈ Ω, ai is not
maximal with respect to xi.
We now state and prove a lemma concerning non-signaling quasiprobability
distributions, of which great use will be made throughout this chapter. The moral
of the lemma is as follows: if one wants to generate a non-signaling quasiproba-
bility distribution P , then essentially one needs only focus on the set of nowhere-
maximal, marginal values of P . As long as one's generated distribution is itself
non-signaling and agrees with P for these inputs, then it is guaranteed to agree
with P for all inputs. Ideas very similar to this have been utilised implicitly in the
literature of non-signaling correlations [1,69], but the author does not know of any
explicit statement of the result.
Lemma 1. A non-signaling, conditional quasiprobability distribution P˜ (a|x) is
uniquely characterized by the set of non-maximal marginal values
{
P˜ (aΩ|xΩ)
}
as (aΩ|xΩ) runs over the set of nowhere-maximal strings and all subsets ∅ 6= Ω ⊆
[N ].
Proof. Suppose that P˜ (a|x) and P˜ ′ (a|x) are non-signaling conditional quasiprob-
ability distributions, whosemarginals P˜ (aΩ|xΩ) and P˜ ′ (aΩ|xΩ) agree for all nowhere-
maximal choices of (aΩ|xΩ). We argue that P˜ (aΩ|xΩ) and P˜ ′ (aΩ|xΩ) actually
agree for all choices of (aΩ|xΩ), by induction on n = #{i ∈ Ω : ai maximal}.
The case n = 0 holds by assumption. Consider a string (aΩ|xΩ) with n > 0
maximal ai; let Ω = {i1, . . . , iM} and without loss of generality suppose that
87
ai1 = K
(i1)
xi1 (the proof is no different otherwise). Since P˜ (a1, . . . , aN |x1, . . . , xN)
is non-signaling, so are all of its marginal distributions, therefore applying the
non-signaling condition to system i1,
K(i1)xi1
∑
ai1=1
P˜ (aΩ|xΩ) = P˜ (ai2 , . . . , aiM |xi2 , . . . , xiM ) . (3.4)
Rewriting this, for our particular choice of (aΩ|xΩ), the marginal P˜ (aΩ|xΩ) can
be expressed in terms of quantities where fewer of the aij are maximal:
P˜
(
K(i1)i1 , . . . , aiM |xi1 , . . . , xiM
)
= P˜ (ai2 , . . . , aiM |xi2 , . . . , xiM )
−
∑
ai1not
maximal
P˜ (ai1 , . . . , aiM |xi1 , . . . , xiM ) . (3.5)
The number of maximal ai is now smaller than n for each term on the RHS, so by
the induction hypothesis we can switch P˜ with P˜ ′ for each term individually:
P˜
(
K(i1)i1 , . . . , aiM |xi1 , . . . , xiM
)
= P˜ ′ (ai2 , . . . , aiM |xi2 , . . . , xiM )
−
∑
ai1not
maximal
P˜ ′ (ai1 , . . . , aiM |xi1 , . . . , xiM ) . (3.6)
P˜ ′ is also non-signaling, hence equation (3.4) is still valid after replacing P˜ with
P˜ ′. Therefore the reverse of the procedure used to obtain equation (3.5) can be
applied to P˜ ′ on the RHS, giving
P˜
(
K(i1)i1 , . . . , aiM |xi1 , . . . , xiM
)
= P˜ ′
(
K(i1)i1 , . . . , aiM |xi1 , . . . , xiM
)
⇒ P˜ (aΩ|xΩ) = P˜ ′ (aΩ|xΩ) . (3.7)
Example. A simple example may help to illustrate the central idea of the proof,
88
which is that we can exploit the non-signaling condition in an iterative fashion in
order to express a particular value P (a|x) in terms of marginals for which the ai
are nowhere maximal. Suppose that P is the outcome distribution for a 2-party,
2-input and 2-output experiment, i.e. one possible state of a g-bit system. Recall
that for a g-bit system, we will prefer to use binary notation {0, 1} to label inputs
and outputs, rather than the set {1, 2}. The probability P (1, 1|0, 0) has a maximal
outcome value for both parties, but admits the following reduction using the non-
signaling and normalisation conditions:
P (1, 1|0, 0) = P (a2 = 1|x2 = 0)− P (0, 1|0, 0)
=
[
1− P (a2 = 0|x2 = 0)
]
−
[
P (a1 = 0|x1 = 0)− P (0, 0|0, 0)
]
= 1 + P (0, 0|0, 0)−
[
P (a1 = 0|x1 = 0) + P (a2 = 0|x2 = 0)
]
.
3.2 Quantum theory with quasiprobabilities
An outcome distribution P is quantum-achievable if there exist Hilbert spaces
(of any dimension) H1, . . . ,HN , positive operators M (i)ai|xi ∈ P (Hi) satisfying
∑
ai M
(i)
ai|xi = I
(i) , and a density matrix ρ ∈ D (H1 ⊗ · · · ⊗ HN) such that, for all
choices of ai and xi,
P (a1, . . . , aN |x1, . . . , xN) = Tr
(
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN ρ
)
. (3.8)
The set of quantum-achievable outcome distributionsQ inhabits a specific por-
tion of the full set of non-signaling distributions. If quantum theory - or some
(perhaps relativistic) extension of it - provides an accurate description of natu-
ral phenomena, then the correlations belonging to Q are exactly those which, in
principle, it is possible to generate between spatially separated parties in the real
world. It therefore seems a worthwhile endeavour for two reasons to explore what
defining characteristics single out Q as a subset of non-signaling distributions.
Firstly, from a practical vantage, it is beneficial to have a good understanding
89
of exactly which correlations it is possible to generate between distant parties, and
which correlations are unattainable. Any additional intuition gained from devel-
oping a broad treatment of correlations (rather than the formal paradigm of quan-
tum theory) is also a useful aid in developing algorithms that solve information-
theoretic problems with real-world applications [70, 71]. Secondly, on a more
foundational level, analysing the principles or rules that delimitQ from other non-
signaling distributions [50, 52, 72, 73] gives us a greater appreciation of what it is
that makes quantum theory so uniquely successful at describing Nature. Even if
quantum theory is superseded by a more complete or more accurate physical the-
ory, abstract notions concerning which correlations are allowable will persist.
One approach to this issue that has met with some success is to ask the ques-
tion “given a non-signaling distribution P(a|x), are there simple criteria by which
one can determine whether P is quantum-achievable or not?”. Even when the
number of parties is small, this turns out to be a more difficult task than might be
anticipated: in 2007 Navascués et al introduced a hierarchy of conditions which
are necessarily satisfied by bipartite quantum-achievable correlations [74], and
later demonstrated that any non-signaling distribution which lies outside of Q
violates one of these conditions (and therefore all subsequent conditions) [75].
Specifically, they introduce a sequence {Qn}n≥1 of sets of outcome distributions
which satisfies Qn+1 ⊆ Qn and ∩∞n=1Qn = Q (see Figure 3.1). Determining
whether a given outcome distribution belongs to the setQn can be formulated as a
semi-definite program; if the distribution fails this test then it is not quantum-
achievable. However, in general it cannot be confirmed that a distribution is
quantum-achievable after finitely many tests.
Another perspective is to instead ask “in what ways does the setQ stand out?”.
Acín et al [1] have shown that the standard Born trace rule can be extended to
generate any non-signaling correlation, by allowing the quantum state (usually
represented by a positive density operator) to be a non-positive operator. Explic-
itly, given any non-signaling distribution P (a|x) on N parties, there exist Hilbert
spaces Hi, POVM elements M (i)ai|xi ∈ L (Hi) and a Hermitian, unit-trace but not
90
Figure 3.1: Hierarchy of sets which approximates the set of quantum-achievable
outcome distributions. Taken from [75].
necessarily positive operator ρ˜ such that:
P (a|x) = Tr
(
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN ρ˜
)
. (3.9)
This result provides a universal, quantum-like manner in which all non-signaling
correlations can be written. In other words, the set of non-signaling correlations is
exactly that which can be generated using the formula (3.9). Moreover, the set of
quantum-achievable correlations can be neatly recovered by demanding a single
additional constraint: that the operator ρ˜ be positive. The set of classical or local
correlations can also be neatly recovered, by demanding that ρ˜ be positive and
separable.
As well as being of standalone interest, this formulation has recently been used
to prove the following result: any physical theory for which the local structure is
identical to that of qubits, and which admits at least one continuous, reversible
interaction, must have the global structure specified by quantum theory [2]. The
ability to represent the correlations of a broad class of theories in a similar way to
quantum or classical correlations may provide a powerful tool in analysing such
theories, and identifying which properties of quantum theory are special within
this class of theories.
In Section 3.2.1 we reproduce the proof of the result of Acín et al in detail,
generalising the published version to allow for outcome sets that vary according
91
to the subsystem i and measurement choice xi. We then make the original and
interesting remark that with some simplemodifications, the operator ρ˜ can bemade
so that it and the set of POVM operatorsM (1)a1|x1⊗· · ·⊗M
(N)
aN |xN pairwise commute.
This will lead us naturally into Section 3.3, in which we study quasiprobabilistic
extensions of the local classical formalism, rather than the quantum formalism.
3.2.1 Non-positive density matrices
We now give a detailed exposition of the procedure described by Acín et al [1]
for representing arbitrary non-signaling outcome distributions in terms of local
POVM operators acting on a Hermitian, unit-trace, but not necessarily positive
density operator ρ˜. In fact, the published proof assumes that all measurements
have the same number of outcomes and that all subsystems have the same number
of measurements, and gives the explicit form of the constructed operators only in
the caseN = 2 (but sketches themultipartite generalisation). The proof given here
provides a concise and rigorous construction in the multipartite case and allows for
differing numbers of measurement outcomes.
Lemma 2. Let H be a complex, d-dimensional Hilbert space, and let V be the
real, d2-dimensional vector space of Hermitian matrices acting onH. Then there
exists a basis {Eα} of V satisfying the following properties:
(i) Eα is positive semidefinite for all α;
(ii) E1 = I;
(iii) I−
∑
α>1Eα is a positive semi-definite operator.
Proof. Let {|ei〉} be a basis forH, and define the d · (d− 1)/2 vectors
|fjk〉 = |ej〉+ |ek〉 k < l (3.10)
and the d · (d− 1)/2 vectors
|gmn〉 = |em〉+ i|en〉 m < n. (3.11)
92
The set of d2 rank-1 projectors {|ei〉〈ei|, |fjk〉〈fjk|, |gmn〉〈gmn|} forms a lin-
early independent subset of the real vector space V . To see this, consider the
decompositions of these projectors into matrices of the form |ej〉〈ek|. Note that
|fjk〉〈fjk| is the only projector to produce an |ej〉〈ek| cross-term with a real coef-
ficient, whilst |gmn〉〈gmn| is the only projector to produce an |em〉〈en| cross-term
with a complex coefficient. Since V is of dimension d2, this set of projectors forms
a basis for V .
Suppose that {Eα} denotes a relabelling of the above basis for V such that
E1 = |ed〉〈ed|. Note that the identity can be decomposed as I =
∑d
i=1 |ei〉〈ei|;
therefore E1 = |ed〉〈ed| lies in the span of the set {I, Eα}α>1. Setting E1 equal
to I instead of |ed〉〈ed| forms a new set which therefore also spans V . Hence we
may assume without loss of generality that E1 = I. Since this new set also has d2
elements, it again forms a basis for V , thus conditions (i) and (ii) are satisfied.
To additionally satisfy condition (iii), note that the matrix E =
∑
α>1Eα is
also positive semi-definite, and so by compactness the real number
ω = max
|||ψ〉||=1
〈ψ|E|ψ〉 (3.12)
is both positive and finite. After scaling every Eα by 1ω for α > 1, we have that
I−
∑
α>1Eα is also a positive operator.
Theorem 5. An outcome distribution P is non-signaling if and only if there exist
Hilbert spaces (of any dimension)H1, . . . ,HN , operatorsM (i)ai|xi ∈ P (Hi) satisfy-
ing
∑
ai M
(i)
ai|xi = I
(i) , and aHermitian, unit-trace operator ρ˜ ∈ L (H1 ⊗ · · · ⊗ HN)
such that
P (a1, . . . , aN |x1, . . . , xN) = Tr
(
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN ρ˜
)
. (3.13)
Proof. The “if” direction of the proof is little more than an algebraic substitution
of the condition
∑
ai M
(i)
ai|xi = I
(i). Indeed, for an arbitrary choice of x1, summing
93
over a1 and using linearity of the trace operation gives:
K(1)x1
∑
a1=1
P (a1, . . . , aN |x1, . . . , xN) =
K(1)x1
∑
a1=1
Tr
(
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN ρ˜
)
= Tr




K(1)x1
∑
a1=1
M (1)a1|x1

⊗ · · · ⊗M (N)aN |xN ρ˜


= Tr
(
I(1) ⊗ · · · ⊗M (N)aN |xN ρ˜
)
(3.14)
Equation (3.14) clearly has no dependence on the value of x1, hence any outcome
distribution P that can be represented in this way is non-signaling.
To prove the converse, we construct local measurement operatorsM (i)ai|xi and a
Hermitian operator ρ˜ of unit tracewhich generates a non-signaling quasi-distribution
P˜ ′, then show that P˜ ′ agrees with P for all nowhere-maximal marginal strings
(aΩ|xΩ). Using Lemma 1, this is sufficient to conclude that P˜ ′ is identical to the
outcome distribution P , and therefore that P is generated according to (3.13).
Recall that system i has M (i) measurement choices, with measurement x al-
lowing for K(i)x possible outcomes. LetHi ∼= Cdi , where di is large enough that
d2i ≥ 1 +
M(i)
∑
x=1
(
K(i)x − 1
)
. (3.15)
Let {Eα} be the basis of positive semi-definite matrices defined in Lemma 2
for the vector space V of Hermitian matrices acting onHi. Note that V has a real
inner product 〈A,B〉 = Tr (AB). Let {E˜α} be the dual basis to {Eα}, defined
by 〈Eα, E˜β〉 = Tr
(
EαE˜β
)
= δαβ . For 1 ≤ xi ≤ M (i) and 1 ≤ ai ≤ K(i)x − 1,
define {M (i)ai|xi} (the set of non-maximal POVM elements) to be some subset of
{Eα}α>1; note that this is possible as long as di is large enough to satisfy (3.15).
For each matrixM (i)ai|xi , let M˜
(i)
ai|xi denote the corresponding dual matrix, which is
94
Hermitian but not necessarily positive semi-definite. To complete the POVM set
for each measurement xi, define M (i)K(i)xi |xi
= I(i) −
∑K(i)xi −1
ai=1 M
(i)
ai|xi which, due to
condition (iii) of Lemma 2, is also a positive semi-definite operator. Note that the
POVM elements have been defined purely in terms of the number of measurement
choices and outcomes, and do not depend on the specific outcome distribution P .
Our non-positive density operator ρ˜ ∈ H1 ⊗ · · · ⊗ HN , on the other hand,
does depend on P . In giving a concise formula for ρ˜, a slight abuse of notation
is needed: for a subset Ω ⊂ [N ] of the N systems, ˆ
⊗
i∈ΩM˜
(i)
ai|xi will denote an
N -fold tensor product, in which the matrix I˜(i) is inserted for components i not
belonging to Ω. For example,
ˆ⊗
i∈{2,3}
M˜ (i)ai|xi = I˜
(1) ⊗ M˜ (2)a2|x2 ⊗ M˜
(3)
a3|x3 . (3.16)
Note that since P is assumed to be non-signaling, the marginal distributions
P (aΩ|xΩ) are well-defined for all ∅ 6= Ω ⊆ [N ]. For the empty set Ω = ∅, we will
also define P (a∅|x∅) = 1. The operator ρ˜ is then given by the formula
ρ˜ =
∑
Ω⊆[N ]
∑
nowhere-maximal
strings (aΩ|xΩ)
P (aΩ|xΩ)
⊗
i∈Ω
M˜ (i)ai|xi . (3.17)
Since each M˜ (i)ai|xi is not guaranteed to be positive semi-definite, neither is the
operator ρ˜. However, by construction they are all Hermitian, hence so is ρ˜. Sup-
pose now that P˜ ′ is the (non-signaling) quasiprobability distribution generated by
the POVM elementsM (i)ai|xi acting on the state ρ˜, according to the Born trace rule
(3.13). Given a subsetΩ ⊆ [N ] and nowhere-maximal string (aΩ|xΩ), themarginal
value P˜ ′ (aΩ|xΩ) is obtained by taking the trace of the product of ρ˜with the matrix
⊗
i∈ΩM
(i)
ai|xi .
Using the linearity of the trace, the fact that Tr (A⊗B) = Tr (A) ·Tr (B), and
the duality of the constructed basis sets, it is clear that these marginal values are
exactly P (aΩ|xΩ). Therefore by Lemma 1, P˜ ′ is in fact the outcome distribution
P , as desired. By post-multiplying ρ˜ by the N -partite identity matrix
⊗N
i=1 I(i), a
95
similar procedure verifies that ρ˜ is of unit trace:
Tr (ρ˜) = Tr
(
ρ˜
(
N
⊗
i=1
I(i)
))
= 1. (3.18)
Example The construction outlined in the previous proof will now be applied to
the canonical PR-box state. Recall that the PR-box is a special state of the system
comprised of two g-bit subsystems, i.e. each subsystem has two fiducial measure-
ment choices and two possible outcomes for each fiducial measurement. Hence it
is sufficient to set d1 = d2 = 2 in order to satisfy (3.15). By following the proce-
dure in Lemma 2, before performing the rescaling, we obtain the following basis
of positive semi-definite matrices for the vector space of 2x2 complex Hermitian
matrices:
E1 =
(
1 0
0 1
)
, E2 =
(
1 0
0 0
)
,
E3 =
(
1 1
1 1
)
, E4 =
(
1 i
−i 1
)
. (3.19)
In order to satisfy property (iii) of Lemma 2, it is straightforward to check that
the following matrix is positive semi-definite,
E1 −
1
ω
(E2 + E3 + E4) , (3.20)
as long as ω ≥ 52 . For convenience, we will set ω = 4, generating the follow-
ing set of POVM operators (recall that for the PR-box, measurement choices and
96
outcomes belong to the set {0, 1}):
M0|0 =
(
1
4 0
0 0
)
, M1|0 =
(
3
4 0
0 1
)
,
M0|1 =
(
1
4
1
4
1
4
1
4
)
, M1|1 =
(
3
4 −
1
4
−14
3
4
)
. (3.21)
Since the POVM elements are taken to be identical for both systems, the su-
perscript (i) is unnecessary. and has been omitted. The relevant dual elements to
the POVM operators are then,
I˜ =
(
0 −1+i2
−1−i
2 1
)
, M˜0|0 =
(
4 0
0 −4
)
, M˜0|1 =
(
0 2
2 0
)
. (3.22)
Recall that the PR-box outcome statistics are described by the formulaP (a1, a2|x1, x2) =
1
2δ(a1⊕a2 = x1·x2 (mod 2)). In this case, the marginal distributions are uniformly
distributed for both systems, and the formula for ρ˜ reduces to,
ρ˜ =
1
∑
x1,x2=0
1
2
δ
(
x1 · x2 = 0
)
M˜ (1)0|x1 ⊗ M˜
(2)
0|x2
+ 1
2
(
M˜ (1)0|0 ⊗ I˜
(2) + M˜ (1)0|1 ⊗ I˜
(2)
)
+ 1
2
(
I˜(1) ⊗ M˜ (2)0|0 + I˜
(1) ⊗ M˜ (2)0|1
)
+ I˜(1) ⊗ I˜(2). (3.23)
Substituting the matrices obtained in (3.22) gives,
ρ˜ =






8 3 + i 3 + i −1 + i2
3− i −6 −12
−5−i
2
3− i −12 10
−5−i
2
−1− i2
−5+i
2
−5+i
2 −11






(3.24)
97
Note that ρ˜ is Hermitian, of unit trace, and non-positive (for example 〈e4|ρ˜|e4〉 =
−11). It can be checked that the POVM elements (3.21) do indeed generate the
correct probabilities when acting on the matrix ρ˜ even in the case where the string
(a|x) is not nowhere-maximal, for example,
Tr
(
M0|1 ⊗M1|1 ρ˜
)
= 1
2
(3.25)
Tr
(
M1|1 ⊗M1|1 ρ˜
)
= 0. (3.26)
This concludes the example of generating PR-box statistics using a non-positive
density operator.
3.2.2 Commuting operators
It turns out that with a simple modification of the proof of Theorem 5, the same
result can be achieved but with the additional constraint that all operators (both the
POVM elements and ρ˜) commute. The cost of such a modification is an increase
in the dimension of the Hilbert space at each system - in fact we require that di =
1+
∑M(i)
x=1
(
K(i)x − 1
)
, so that dimension of the Hilbert space is roughly the square
of that in the proof of Theorem 5.
The construction of the POVM elements is then particularly simple. Let
{|e0〉} ∪ {|eai|xi〉 |1 ≤ xi ≤M (i), 1 ≤ a ≤ K(i)xi − 1}, (3.27)
be a basis for the Hilbert space Cdi , and for non-maximal pairs (ai, xi), define
M (i)ai|xi to be the projector |eai|xi〉〈eai|xi|. Writing the identity as I
(i) = |e0〉〈e0| +
∑
|eai|xi〉〈eai|xi|, it is clear for each xi that the matrix given by,
M (i)
K(i)xi |xi
= I(i) −
∑
ai<K
(i)
xi
M (i)ai|xi , (3.28)
is itself a diagonal matrix all entries either 1 or 0. The dual basis of matrices (in the
98
same sense as in the proof of Theorem 5) is given by the set of diagonal matrices:
M˜ (i)ai|xi = |eai|xi〉〈eai|xi| − |e0〉〈e0| (non-maximal (ai|xi) (3.29)
I˜(i) = |e0〉〈e0|. (3.30)
Again, ρ˜ is given by the formula:
ρ˜ =
∑
Ω⊆[N ]
∑
nowhere-maximal
strings (aΩ|xΩ)
P (aΩ|xΩ)
⊗
i∈Ω
M˜ (i)ai|xi , (3.31)
which, for exactly the same reasons as before, is of unit trace and reproduces the
probability distribution P for all nowhere-maximal strings (aΩ|xΩ), hence exactly
reproduces P . Notice that, as a linear combination of tensor products of diagonal
matrices, ρ˜ is also diagonal in the standard basis representation. Since diagonal
matrices commute, all the operators thus far defined are pairwise commuting.
Example Let us again turn to the case of the system comprised of two g-bit sub-
systems. In this case we require d1 = d2 = 3, and for all systems the POVM
elements take the form:
M0|0 =



1 0 0
0 0 0
0 0 0



, M1|0 =



0 0 0
0 1 0
0 0 1



,
M0|1 =



0 0 0
0 1 0
0 0 0



, M1|1 =



1 0 0
0 0 0
0 0 1



, (3.32)
and the dual set of matrices takes the form:
I˜ =



0 0 0
0 0 0
0 0 1



, M˜0|0 =



1 0 0
0 0 0
0 0 −1



, M˜0|1 =



0 0 0
0 1 0
0 0 −1



(3.33)
99
This construction allows for slightly easier calculation of the non-positive den-
sity operator; in fact it is not too hard to find the operator for an arbitrary state
P (a1, a2|x1x2) of two g-bit systems. Then the operator ρ˜ which generates this
probability distribution when acted on by the POVM elements (3.32) is the 9x9
diagonal matrix whose entries in order are:
{P (0, 0|0, 0) ,
P (0, 0|0, 1) ,
P (0, 1|0, 1)− P (0, 0|0, 0)
P (0, 0|1, 0) ,
P (0, 0|1, 1) ,
P (0, 1|1, 0)− P (0, 0|1, 1)
P (1, 0|1, 0)− P (0, 0|0, 0) ,
P (1, 0|0, 1)− P (0, 0|1, 1) ,
1−
∑
x1
P (a1 = 0|x1)−
∑
x2
P (a2 = 0|x2) +
∑
x1,x2
P (0, 0|x1, x2)} (3.34)
3.3 Classical theory with quasiprobabilities
As discussed in Chapter 2, an outcome distribution is said to be locally achievable
or classical if it can be written in such a way that, up to some shared statisti-
cal randomness, measurement outcomes on an individual system are determined
solely by the value of some (physical) variable which is unique to that system.
Specifically, P (a|x) is local if there exists for each system i some set of values
Λi which determine the output at that system, with a joint probability distribution
PΛ(λ) = PΛ(λ1, . . . , λN) for the local variable λi ∈ Λi, plus a set of conditional
probability distributions P (i)(ai|xi, λi) on system i, such that for each choice of xi
100
and λi,
∑
ai P
(i)(ai|xi, λi) = 1, and,
P (a|x) =
∑
λ1,...,λN
(
P (1)(a1|x1, λ1) · · ·P (N)(aN |xN , λN)
)
PΛ(λ). (3.35)
There are two distinct positivity conditions which one could relax in (3.35):
either the joint distribution PΛ(λ), or the local distributions P (i)(ai|xi, λi) may be
permitted to take on negative values. In this section, we demonstrate that each of
these modifications expands the set of distributions of the form given in (3.35) so
that it encapsulates all non-signaling distributions.
The simulation of quantum states and measurements with local quasiproba-
bility distributions has an established history, with the Wigner phase-space distri-
bution being a notable early example [67]. Let ψ(x) be the position-space wave
function of a quantum system with one dimension, in some pure state ψ. The
Wigner phase-space distribution is the following quasiprobability distribution:
P˜ (x, p) = 1
pi~
∫ ∞
−∞
ψ∗(x+ y)ψ(x− y)e2ipy/~dy. (3.36)
For some choices of position x and momentum p, P˜ (x, p) may well be negative.
However, the marginal distribution for x when averaging over p gives the exact
probability distribution for x, and vice versa:
∫ ∞
−∞
P˜ (x, p)dp = |ψ(x)|2, (3.37)
∫ ∞
−∞
P˜ (x, p)dx = |ψ(p)|2. (3.38)
Wigner was able to generalise his function to mixed states of several quan-
tum particles, and used it to calculate quantum correction terms to the thermody-
namic equilibrium of many-particle systems. More recently, and more relevant to
quantum information theory, Franco & Penna [76] analysed a finite-dimensional
version of the Wigner function, known as the discrete Wigner function, for the
101
quantum state of two qubit systems. They discovered that this function was in-
timately tied to the entanglement properties of the state: if at least one of values
taken on by the Wigner function falls below a critical negative threshold, then the
state is entangled.
Local quasiprobabilistic representations of quantum states have provided a
useful calculative tool in what may otherwise have been intractable problems.
However, the notion of general non-signaling correlations has not been around
for as long as quantum theory, and not as much attention has been devoted to lo-
cal quasiprobabilistic representations of arbitrary non-signaling outcome distribu-
tions. In Section 3.3.1 we demonstrate how to generate any non-signaling outcome
distribution by relaxing the positivity of PΛ; in Section 3.3.2 we do the same for
P (i)(ai|xi, λi). Then in Section 3.3.3, we show how these results lead to quantum
corollaries which extend the results of Section 3.1.
3.3.1 Non-positive mixtures
The local function P (i)(ai|xi, λi) is conditioned on the parameter λi, which so far
we have treated as simply some random variable taking values from some finite
set Λi. λi can be interpreted as providing a complete specification of all physi-
cal properties pertaining to the system, even if such a full specification is beyond
the full scope of whatever scientific theory is currently available to describe that
system. If λ = (λ1, . . . λN) takes on a single value with probability 1, the mul-
tipartite system behaves as if in a product state, i.e. P (a|x) is the product of the
local distributions P (i)(ai|xi, λi).
However, it is conceivable that λ is not deterministically a single value, but
takes on different values with different probabilities; in this case P becomes a
convex combination of the product states corresponding to each value of λ. The
joint probability distribution PΛ then describes a kind of statistical mixing over the
joint physical states of the multipartite system.
In this section we show that arbitrary non-signaling distributions can be gen-
erated if we relax the positivity requirement on PΛ: this is analogous not to the
102
local measurements generating outcomes in an unphysical manner, but rather to
an unphysical mixing of otherwise physically meaningful objects. Interestingly,
in our construction the product distributions for fixed choices of λi are completely
deterministic: for each measurement choice xi and λi there is a specific outcome
ai which occurs with probability 1. Obviously, not much information about the
distribution P is contained in the local distributions; instead it is PΛ which carries
this information.
Theorem6. AnN-partite conditional probability distributionP (a|x) is non-signaling
if and only if it can be represented in the form
P (a|x) =
∑
λ1,...,λN
(
P (1)(a1|x1, λ1) · · ·P (N)(aN |xN , λN)
)
P˜Λ(λ), (3.39)
where P˜Λ(λ) is a quasiprobability distribution, and P (i)(ai|xi, λi) is a conditional
probability distribution for each i.
Proof. The “if” direction of the proof is not difficult: each P (i)(ai|xi, λi) is a con-
ditional probability distribution for each choice of λi, and hence normalized. Fix-
ing the string (a|x) and some system i, then summing over all possible values of
ai gives:
Kxi
∑
ai=1
P (a|x) =
Kxi
∑
ai=1
∑
λ
(
P (1)(a1|x1, λ1) · · ·P (N)(aN |xN , λN)
)
P˜Λ(λ)
=
∑
λ
((
∑
ai
P (i)(ai|xi, λi)
)
∏
j 6=i
P (j)(aj|xj, λj)
)
P˜Λ(λ). (3.40)
For each λi, by normalization
(
∑
ai P
(i)(ai|xi, λi)
)
= 1, so that the RHS has no
dependence on xi. Hence the distribution P (a|x) is non-signaling, and has the
following well-defined marginals:
P (aΩ|xΩ) =
∑
λ
(
∏
i∈Ω
P (i)(ai|xi, λi)
)
P˜Λ(λ) (3.41)
103
To prove the converse, we must show that given any non-signaling distribution
P (a1, . . . , aN |x1, . . . , xN), there exists a set Λi of local variables at each system
i, a joint distribution P˜Λ on the strings (λ1, . . . , λN), and local outcome functions
P (i)(ai|xi, λi) such that (3.39) holds.
With this in mind, let Λi be the set of all possible ordered measurement choice/
outcome pairs [a′i, x′i], along with one additional value for each system, which
we will refer to as ξi. The size of the set Λi is then
∑
xi Kxi + 1. For a string
(λ1, . . . , λN) ∈ Λ1×· · ·×ΛN , let Ω = {i ∈ [N ] : λi 6= ξi} (i.e. the set of indices
i for which λi 6= ξi) and define
P˜Λ(λ) =
[
∏
i/∈Ω
(1−M (i))
]
P (aΩ|xΩ) (3.42)
so that, for example,
P˜Λ([a′1, x′1], [a′2, x′2], . . . , [a′N , x′N ]) = P (a′1, . . . , a′N |x′1, . . . , x′N)
P˜Λ(ξ1, [a′2, x′2], . . . , [a′N , x′N ]) = (1−M (1))P (a′2, . . . , a′N |x′2, . . . , x′N)
... = ...
P˜Λ(ξ1, ξ2, . . . , ξN) = (1−M (1)) · · · (1−M (N)). (3.43)
To show that P˜Λ is indeed a quasiprobability distribution, it must be checked
that the sum of PΛ(λ) over all possible strings λ is equal to 1. By separating out
this sum according to the subset Ω of components λ which do not equal ξi, we
obtain:
∑
λ1,...,λN
P˜Λ(λ) =
∑
Ω⊆[N ]
∑
aΩ,xΩ
[
∏
i/∈Ω
(1−M (i))
]
P (aΩ|xΩ) . (3.44)
Note that since P (a|x) is a normalized outcome distribution, the sum over aΩ of
104
P (aΩ|aΩ) is equal to 1. Summing also over xΩ gives:
∑
λ1,...,λN
P˜Λ(λ) =
∑
Ω⊆[N ]
[
∏
i/∈Ω
(1−M (i))
][
∏
j∈Ω
M (j)
]
. (3.45)
The RHS is now the expansion of the product of terms of (M (i) + (1−M (i))) as
i ranges from to N , each term equaling 1;
∑
λ1,...,λN
P˜Λ(λ) =
∏
i∈[N ]
(
M (i) + (1−M (i))
)
= 1. (3.46)
The P (i)(ai|xi, λi) are defined to be the following conditional probability dis-
tributions, which treat the maximal outcome values as a special case:
P (i)(ai|xi, λi) =



δλi,[ai,xi] if ai < Kxi
1−
∑
a′i<Kxi
δλi,[a′i,xi] if ai = Kxi
(3.47)
Suppose that P˜ ′ (a|x) is the conditional quasiprobability distribution generated
by the local functions P (i)(ai|xi, λi) according to the quasiprobabilistic mixture
P˜Λ, i.e.
P˜ ′ (a|x) =
∑
λ
(
P (1)(a1|x1, λ1) · · ·P (N)(aN |xN , λN)
)
P˜Λ(λ). (3.48)
By exactly the same argument as in the “if” direction at the beginning of this proof,
P˜ ′ is non-signaling. Let Ω be any subset of [N ] and let (aΩ|xΩ) be any nowhere-
maximal string. The marginal distribution P˜ ′ (aΩ|xΩ) is then given by:
P˜ ′ (aΩ|xΩ) =
∑
λ
(
∏
i∈Ω
P (i)(ai|xi, λi)
)
P˜Λ(λ). (3.49)
Note that for non-maximal ai, P (i)(ai|xi, λi) = δλi,[ai,xi], thus the only strings λ
contributing to the sum are those which, for all i ∈ Ω, have λi = [ai, xi]. For any
105
such λ, let Ω′ ⊆ [N ] be the subset of components not equal to ξi, and note that
Ω ⊆ Ω′. By collecting summands for which Ω′ is the same, we have,
P˜ ′ (aΩ|xΩ) =
∑
Ω′:
Ω⊆Ω′⊆[N ]







[
∏
i/∈Ω′
(
1−M (i)
)
]
∑
aj ,xj :
j∈Ω′\Ω
P (aΩ′|xΩ′)







. (3.50)
For each Ω′, summing P (aΩ′|xΩ′) over aj for j ∈ Ω′ \Ω gives the marginal value
P (aΩ|xΩ). Summing also over xj for j ∈ Ω′ \ Ω gives a factor ofM (i), so that,
P˜ ′ (aΩ|xΩ) =
∑
Ω′:
Ω⊆Ω′⊆[N ]



[
∏
i/∈Ω′
(
1−M (i)
)
]


∏
j∈Ω′\Ω
M (j)

P (aΩ|xΩ)



. (3.51)
By the same reasoning as used for showing that P˜Λ is normalized, the sum over
Ω′ can be rewritten as a product of terms
(
M (i) + (1−M (i))
)
for i ∈ Ω′ \ Ω,
which (after factoring out P (aΩ|xΩ)) is therefore equal to 1. Hence P˜ ′ (aΩ|xΩ) =
P (aΩ|xΩ).
P˜ ′ (a|x) is a non-signaling conditional quasiprobability distributionwhich agrees
with P (a|x) for all non-maximal marginal values. It follows from Lemma 1 that
P˜ is in fact the outcome distribution P .
We mentioned before proving the Theorem that the value of λi can be seen
as a codification of the local physical state, or configuration, which completely
determines the local outcome probabilities via the distribution P (i)(ai|xi, λi). The
size of the set of strings of local variables Λ1×· · ·×ΛN can loosely be seen as an
indicator of how efficient that particular representation is; note that in the above
proof, the size of this set is is
∏N
k=1
(
1 +
∑
xi Kxi
)
.
It is possible to reduce this number by noting that certain values of λi give rise
to exactly the same local functions. In particular, when λk is equal to either ξi or
[Kxi , xi] it will always be the case that δ[ai,xi],λi = 0 for ai < Kxi . This implies that
each local conditional probability distribution deterministically assigns the great-
106
est value outcome Kxi for all measurements xi: P (i)(ai|xi, λi) = δai,Kxi . Com-
bining this set of local states into a single state ηi, say, for which P (i)(ai|xi, ηi) =
δai,Kxi , leaves the generated probability distribution intact, as long as the statisti-
cal weighting of any given product of local distributions
∏N
i=1 P (i)(ai|xi, λi) is the
same.
For clarity, this compression of the local variable space is not used in the main
presentation of the theorem. However, it brings the size of Λ1 × · · · × ΛN down
to
∏N
i=1(1 +
∑
xi(Kxi − 1)). This has a striking comparison with Lemma 1, in
which we deduced that a conditional quasiprobability distribution is characterized
by its set of non-maximal, marginal values. The number of such values is:
∑
∅6=Ω⊆[N ]
(
∏
i∈Ω
[
∑
xi
(Kxi − 1)
])
=
N
∏
i=1
(
1 +
∑
xi
(Kxi − 1)
)
− 1. (3.52)
The extra −1 on the RHS of (3.52) occurs because when Ω = ∅, we demand
that P (∅|∅) = 1 for normalization. This roughly corresponds to the case λ =
(η1, . . . , ηN), whose weighting can be interpreted as fixed by normalization of PΛ.
3.3.2 Non-positive effects
In this section we show that a dual result to Theorem 6 holds in which the lo-
cal functions become conditional quasiprobability distributions, whilst the joint
distribution PΛ remains a genuine probability distribution. Recall that PΛ can be
regarded as a statistical mixture over the different possible products of local states,
where each such product is represented by a string λ specifying the local states on
each system. In this section we explore what can be generated when those local
states are allowed to produce certain outcomeswith negative probabilities. Despite
taking a statistical mixture of completely local objects, one is able to generate the
entire set of non-signaling outcome distributions.
Theorem7. AnN-partite conditional probability distributionP (a|x) is non-signaling
107
if and only if it can be represented in the form
P (a|x) =
∑
λ1,...,λN
(
P˜ (1)(a1|x1, λ1) · · · P˜ (N)(aN |xN , λN)
)
PΛ(λ) (3.53)
wherePΛ(λ) is a probability distribution, and P˜ (i)(ai|xi, λi) is a conditional quasiprob-
ability distribution for each i.
Proof. As in Theorem 6, the “if” direction is not difficult; since the P˜ (i)(ai|xi, λi)
are conditional quasiprobability distributions, they satisfy
∑
ai P˜
(i)(ai|xi, λi) = 1
(independent of the choice of xi). It follows from a simple algebraic substitution
that the distribution given by (3.53) is non-signaling.
Conversely, to prove that all non-signaling distributions P (a|x) can be writ-
ten in the form of (3.53), we again have to make a choice of local conditional
quasiprobability distributions and strings λ with a joint probability distribution
PΛ. Let Λi be the set of ordered pairs [a′i, x′i] consisting of the allowed measure-
ment choices and outputs for system i (note that we do not require the extra variable
ξi this time). The local variables are then given by:
PΛ([a′1, x′1], . . . , [a′N , x′N ]) =
P (a′1, . . . , a′N |x′1, . . . , x′N)
M (1)M (2) · · ·M (N)
(3.54)
Clearly, each value of PΛ(λ) is positive and lies in [0, 1]. Moreover, summing
P (a′|x′) over a′ gives 1 by normalization; the number of values of x′ to sum over
is then the product of the number of measurement choices at each site. Thus PΛ is
a normalized probability distribution.
The local distributions are defined as follows :
P˜ (i)(ai|xi, λi) =
{
M (i)δλi,[ai,xi] if ai < Kxi
1−
∑
a′i<Kxi
M (i)δλi,[a′i,xi] if ai = Kxi
(3.55)
For each choice of xi and λi, P˜ (i)(Kxi|xi, λi) is specifically constructed so
that the sum over ai of P˜ (i)(ai|xi, λi) is equal to 1. Therefore P˜ (i)(ai|xi, λi) is a
conditional quasiprobability distribution.
108
It remains to show that the given values for PΛ(λ) and P˜ (i)(ai|xi, λi) satisfy
(3.53). Consider the quasiprobability distribution P˜ ′ (a|x) given by
P˜ ′ (a|x) =
∑
λ1,...,λN
(
P˜ (i)(ai|xi, λi) · · · P˜ (N)(aN |xN , λN)
)
PΛ(λ). (3.56)
Again, a simple algebraic substitution demonstrates that P˜ ′ (a|x) is non-signaling.
For any Ω ⊆ [N ], and nowhere-maximal string (aΩ|xΩ),
P˜ ′ (aΩ|xΩ) =
∑
λ
(
∏
i∈Ω
P˜ (i)(ai|xi, λi)
)
PΛ(λ)
=
∑
(a′|x′)
(
∏
i∈Ω
M (i)δa′i,aiδx′i,xi
)
P (a′|x′)
∏
j∈[N ]M (j)
=
(
∏
i∈Ω
M (i)
)
(
∏
k∈[N ]\ΩM (k)
)
P (aΩ|xΩ)
∏
j∈[N ]M (j)
= P (aΩ|xΩ) . (3.57)
For example, when Ω = {1, 2},
P ′ (a1, a2|x1, x2) =
∑
λ1,...,λN
M (1)δλ1,[a1,x1]M (2)δλ2,[a2,x2]PΛ(λ1, . . . , λN).
= M (1)M (2)P (a1, a2|x1, x2)
M (1)M (2)
= P (a1, a2|x1, x2) . (3.58)
SinceP and P˜ ′ are quasiprobability distributionswhose non-maximalmarginals
agree, it follows again from Lemma 1 that they are identical and that (3.53) holds.
Again, it is interesting to look at the size of the set of strings of local variables;
in this case,
∏N
i=1(
∑
xi Kxi). A very similar reduction can bemade as for Theorem
6, since whenever λi = [Kxi , xi] for any xi, the local outcome function again takes
109
the form P (i)(ai|xi, λi) = δai,Kxi . Hence it is possible to collect theseM
(i) local
variables at each system i into a single local variable ηi, again adjusting the joint
probability distribution PΛ in order to generate the same outcome statistics and
preserve normalization. Just as before, this brings the total size of the set of strings
of local variables to
∏N
i=1
(
∑
xi(Kxi − 1) + 1
)
.
3.3.3 Quantum corollaries
In this section we show how quasiprobabilistic quantum representations may be
explicitly constructed from quasiprobabilistic local representations, thus recover-
ing the results of Section 3.1 as a corollary of Theorem 6, and obtaining a dual
quantum representation as a corollary of Theorem 7. This involves a careful con-
struction of density matrix ρ and measurements operatorsM (i)ai|xi which, upon ap-
plication of the standard Born trace rule, reduce to the local quasidistributions
constructed in Section 3.3.
Corollary 1. An N-partite conditional probability distribution P (a|x) is non-
signaling if and only if it can be represented in the form
P (a|x) = Tr
((
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN
)
ρ˜
)
, (3.59)
where M˜ (i)ai|xi are genuine POVMelements (positive operators satisfying
∑
ai M
(i)
ai|xi =
I(i)) for each i, and ρ˜ is a Hermitian operator satisfying Tr (ρ) = 1. Furthermore
this representation can be chosen such that the operatorsM (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN
and ρ˜ all commute.
Proof. To prove the “if” direction, sum the right-hand side of (3.59) over ai using
the fact that
∑
ai M
(i)
ai|xi = I
(i). To prove the converse, we assign a Hilbert space
with orthonormal basis {|λi〉|λi ∈ Λi} to each system i, where Λi is the set of all
ordered pairs [ai, xi] and the extra element ξi (as in the proof of Theorem 2). We
110
then take
ρ˜ =
∑
λ1,...,λN
P˜Λ(λ) |λ1〉〈λ1| ⊗ · · · ⊗ |λN〉〈λN |, (3.60)
M (i)ai|xi =
∑
λi
P (i)(ai|xi, λi) |λi〉〈λi|, (3.61)
where P˜Λ(λ) and P (i)(ai|xi, λi) are given by (3.42) and (3.47) respectively. Sub-
stituting these into (3.59) gives:
Tr
((
M (1)a1|x1 ⊗ · · · ⊗M
(N)
aN |xN
)
ρ˜
)
=
∑
λ
[
N
∏
i=1
P (i)(ai|xi, λi)
]
P˜Λ(λ). (3.62)
Thus we recover the local quasiprobability distribution constructed in the proof of
Theorem 6, and proven there to agree exactly with P (a|x).
A similar corollary to Theorem 7 holds, which provides a kind of dual result
to Theorem 5.
Corollary 2. An N-partite conditional probability distribution P (a|x) is non-
signaling if and only if it can be represented in the form
P (a|x) = Tr
((
M˜ (1)a1|x1 ⊗ · · · ⊗ M˜
(N)
aN |xN
)
ρ
)
, (3.63)
where M˜ (i)ai|xi ∈ L (Hi) are Hermitian operators satisfying
∑
ai M˜
(i)
ai|xi = I
(i) for
each i, and ρ is a genuine density operator. Furthermore, this representation can
be chosen such that the operators M˜ (1)a1|x1 ⊗ · · · ⊗ M˜
(N)
aN |xN and ρ all commute.
Proof. Again, the `if” direction follows directly from substitution of the fact that
∑
ai M˜
(i)
ai|xi = I
(i). To prove the converse, we use the results of Theorem 7. To
each system i, we assign a Hilbert space with orthonormal basis {|λi〉|λi ∈ Λi},
111
where Λi is the set of all ordered pairs [ai, xi]. We then take
ρ =
∑
λ1,...,λN
PΛ(λ) |λ1〉〈λ1| ⊗ · · · ⊗ |λN〉〈λN |, (3.64)
M˜ (i)ai|xi =
∑
λi
P˜ (i)(ai|xi, λi) |λi〉〈λi|, (3.65)
where PΛ(λ1, . . . , λN) and P˜ (i)(ai|xi, λi) are given by (3.54) and (3.55) respec-
tively. Substituting this choice of operators into (3.63) leads to exactly the same
equation as (3.53): therefore the result follows directly from Theorem 7.
Note that, from the discussions directly following the proofs of these Theorems
6 and 7, the dimension of the density operator in both Corollaries can be taken to
be d =
∏N
i=1
(
1 +
∑
xi(Kxi − 1)
)
.
3.4 Further quasiprobabilistic quantum models
In Section 3.3 it was shown that there are two distinct modifications of classical
probability theory which allow one to generate arbitrary non-signaling distribu-
tions. These are dual to one another: one way is to allow the local distributions
to take on negative values, the other way is to allow the joint probability distri-
bution to take on negative values. Each of these classical representations admits
a quantum-like corollary, in which the POVM elements and the density opera-
tors are all diagonal with respect to some basis, but one or the other are no longer
positive operators.
In this Section we discuss further quantum-like representations in which the
observables are represented by non-positive operators. In contrast to previous
sections, the density matrix does not depend on the specific probability distribu-
tion P (a1, . . . , aN |x1, . . . , xN) that is being generated. In fact, the density matrix
that we use will be not only separable, but fixed by the number of measurements
and measurement outcomes on each subsystem. This provides a result which is
112
stronger than Corollary 2, in which the density matrix varies according to the out-
come distribution being generated. We then demonstrate the interesting result that
it is possible to reduce the correlation present in our fixed density matrix so that
it is arbitrarily close to the set of product states, whilst still being able to generate
arbitrary non-signaling distributions by acting on it with commuting, non-positive
observables. The cost of this manoeuvre is that the negativity of the measurement
operators becomes much larger.
3.4.1 Non-positive observables acting on ρN,d
From here on we restrict ourselves to the scenario in which each local measure-
ment has the same number of possible outcomes, given by the integerK, and each
party has the same number of measurement choices, an integer M . This is in-
tended to simplify the presentation, and (as will be seen) allows for an intriguing
generalization in the following section.
Once M , K, and the number of parties is fixed, the density matrix we will
use is also fixed, and only the measurement operators vary in order to generate
different distributions. In an N -partite quantum system where each local Hilbert
space has dimension d and a chosen basis {|i〉}di=1, define the separable state:
ρN,d = 1
d
(
|1〉⊗N〈1|⊗N + . . .+ |d〉⊗N〈d|⊗N
)
. (3.66)
Note that this matrix depends only on the two parameters N and d: we will use
different values of N and and d and hence different density matrices at various
stages in the proof, before giving the final density matrix (which is also defined as
ρN,d for specified N and d).
The following Lemma hints at the usefulness of this particular separable state
for our purposes. If, for the moment, we drop not only the positivity of mea-
surement operators, but also the requirement that summing over a measurement
choice xi gives the identity, then it is possible to generate any set of real (or, in-
deed, complex) numbers via the Born trace rule, not only those that are normalized
113
or non-signaling.
Lemma 3. Let Q (a1, . . . , aN |x1, . . . , xN) = Q (a|x) be any set of real numbers,
not necessarily positive, normalized or non-signaling. Then there exists an integer
d and operators M˜ (i)ai|xi on a Hilbert space of dimension d, such that all theN -fold
tensor products of these operators commute with ρN,d, and
Q (a|x) = Tr
((
M˜ (1)a1|x1 ⊗ · · · ⊗ M˜
(N)
aN |xN
)
ρN,d
)
(3.67)
Proof. Let d = (M · K)N−1 and let {|ea|x〉 : 1 ≤ a ≤ K, 1 ≤ x ≤ M} be an
orthonormal basis of CM ·K . In the (N -1)-fold tensor product space CM ·K ⊗ . . .⊗
CM ·K , for 1 ≤ i ≤ N − 1 define the matrices:
M˜ (i)ai|xi = I
(1) ⊗ · · · ⊗ I(i−1) ⊗ |eai|xi〉〈eai|xi| ⊗ I(i+1) ⊗ · · · ⊗ I(N−1), (3.68)
and for i = N define
M˜ (N)aN |xN =
∑
(a′|x′):
a′N=aN ,
x′N=xN
(
d ·Q (a′|x′)
N−1
⊗
i=1
|ea′i|x′i〉〈ea′i|x′i|
)
, (3.69)
Note that all the above operators are defined on a d-dimensional local vector space
Cd. The vectors |ea1,...,aN−1|x1,...,xN−1〉 = |ea1|x1〉 ⊗ · · · ⊗ |eaN−1|xN−1〉 form an
orthonormal basis ofCd, with respect to which all the above operators are diagonal.
This is the basis which we will use to define the density matrix ρN,d as in (3.66);
that is to say,
ρN,d = 1
d
∑
a1,...,aN−1x1,...,xN−1
|ea1,...,aN−1|x1,...,xN−1〉⊗N〈ea1,...,aN−1|x1,...,xN−1 |⊗N (3.70)
Taking the trace of tensor products of the above operators acting on the state ρN,d
114
yields the following simplification:
Tr
((
M˜ (1)a1|x1 ⊗ · · · ⊗ M˜
(N)
aN |xN
)
ρN,d
)
=1
d
∑
a′1,...,a′N−1,
x′1,...,x′N−1
N
∏
i=1
〈ea′1,...,a′N−1|x′1,...,x′N−1 |M˜
(i)
ai|xi|ea′1,...,a′N−1|x′1,...,x′N−1〉. (3.71)
For i < N , by construction each term in the product is equal to zero, except in the
case where a′i = ai, x′i = xi, in which case the term is equal to 1. Thus,
Tr
((
M˜ (1)a1|x1 ⊗ · · · ⊗ M˜
(N)
aN |xN
)
ρN,d
)
= 1
d
〈ea1,...,aN−1|x1,...,xN−1 |M
(N)
aN |xN |ea1,...,aN−1|x1,...,xN−1〉
= Q (a1, . . . , aN |x1, . . . , xN) . (3.72)
This result is used repeatedly in the argument of the main proof of this section.
More precisely, it is applied once for each subset Ω ⊆ [N ] in order to generate
the non-maximal marginals of the distribution P on that subset. Note that these
non-maximal marginal values P (aΩ|xΩ) are simply a set of real numbers between
0 and 1; without the maximal outcome values ai, they do not in general obey the
normalisation or non-signaling conditions.
Theorem 8. Let P (a1, . . . , aN |x1, . . . , xN)) be any outcome distribution on N
parties, such that each party has M measurement choices, and K possible out-
comes for each measurement. The distribution is non-signaling if and only if there
exist (not necessarily positive) local operators M˜ (i)ai|xi on a Hilbert space of dimen-
sion d such that for each i and choice of xi,
∑K
ai=1 M˜
(i)
ai|xi = I
(i) (the identity on
system i) and
P (a1, . . . , aN |x1, . . . , xN) = Tr
((
M˜ (1)a1|x1 ⊗ · · · ⊗ M˜
(N)
aN |xN
)
ρN,d
)
. (3.73)
115
Proof. As with many of the proofs in this chapter, the “if” direction is little more
than an algebraic substitution of the condition
∑K
ai=1 M˜
(i)
ai|xi = I
(i). The converse
direction proceeds by applying the result of Lemma 3 to each subset Ω ⊆ [N ]
in a recursive manner, to obtain a set of matrices {M˜ (i)ai|xi (Ω)}ai<K . The final
measurement matrices will be the direct summands M˜ (i)ai|xi =
⊕
Ω⊆[N ] M˜
(i)
ai|xi (Ω),
and by construction at each stage we will ensure that these final matrices will
generate the correct marginal outcome statistics on Ω, for all nowhere-maximal
marginal strings (aΩ|xΩ).
Let d =
∑N
i=1(M · (K − 1))i−1
(N
i
)
(the reason for this will be clear later).
Applying Lemma 3 with Ω = [N ], there exist operators {M˜ (i)ai|xi ([N ])}ai<K of
dimension d([N ]) = (M · (K − 1))N−1 such that for all nowhere-maximal strings
(aΩ|xΩ),
Tr
((
M˜ (1)a1|x1 ([N ])⊗ · · · ⊗ M˜
(N)
aN |xN ([N ])
)
ρN,d([N ])
)
= d
d([N ])
· P (a1, . . . , aN |x1, . . . , xN) . (3.74)
Let Ω = {i1, . . . , ik} ⊂ [N ]. By Lemma 3 it is possible to find operators
M˜ (i)ai|xi (Ω) such that
Tr
((
ˆ⊗
i∈Ω
M˜ (i)ai|xi (Ω)
)
ρk,d(Ω)
)
= Q (a1, . . . , ak|x1, . . . , xk) (3.75)
for any set of numbersQ (a1, . . . , ak|x1, . . . , xk), as long as d(Ω) equals the num-
ber of possible strings (a1, . . . , ak|x1, . . . , xk).
Assuming that the operators M˜ (i)ai|xi (Ω
′) have been fixed for all Ω ⊂ Ω′ ⊆ [N ],
apply Lemma 3 to obtain operators M˜ (i)ai|xi (Ω) ∈ L
(
Cd(Ω)
)
for all |Ω| = k parties,
where d(Ω) = (M · (K − 1))|Ω|−1 , such that for all nowhere-maximal strings
116
(aΩ|xΩ),
d(Ω) · Tr
((
M˜ (i1)ai1 |xi1 (Ω)⊗ · · · ⊗ M˜
(ik)
aik |xik
(Ω)
)
ρk,d(Ω)
)
= d · P (aΩ|xΩ)
−
∑
Ω⊂Ω′⊆[N ]
d(Ω′) · Tr
((
M˜ (i1)ai1 |xi1 (Ω
′)⊗ · · · ⊗ M˜ (ik)aik |xik (Ω
′)
)
ρk,d(Ω′)
)
. (3.76)
For i /∈ Ω, define M˜ (i)ai|xiΩ to be the zero matrix, and define the overall mea-
surement operators to be the direct sums of all the above operators for that choice
of i, ai and xi:
M˜ (i)ai|xi =
⊕
Ω⊆[N ]
M˜ (i)ai|xi (Ω) ∈ L


⊕
Ω⊆[N ]
Cd(Ω)

 . (3.77)
The dimension of these operators is
∑
Ω⊆[N ] d(Ω), agreeing with the initial defini-
tion of d. Let ρN,d be defined according to the orthonormal basis of
⊕
Ω⊆[N ] Cd(Ω)
which is the union of the orthonormal bases obtained in each application of Lemma
3 on the vector space Cd(Ω). For any Ω ⊆ [N ], let ˆ
⊗
i∈ΩM˜
(i)
ai|xi (Ω) again denote
the N -fold tensor product whose ith components for i /∈ Ω are equal to the iden-
tity I(i). By expanding ρN,d in terms of the chosen basis vectors, we see that for
all nowhere-maximal strings (aΩ|xΩ),
Tr
((
ˆ⊗
i∈Ω
M˜ (i)ai|xi
)
ρN,d
)
= 1
d
d
∑
j=1
N
∏
i=1
〈j|M˜ (i)ai|xi|j〉. (3.78)
Splitting up the sum on the RHS according to d =
∑
Ω⊆[N ] d(Ω), substituting
〈j|I(i)|j〉 = 1, and noting that M˜ (i)ai|xi (Ω) is the zero matrix for i /∈ Ω:
Tr
((
ˆ⊗
i∈Ω
M˜ (i)ai|xi
)
ρN,d
)
= 1
d
∑
Ω′:
Ω⊆Ω′⊆[N ]
d(Ω′)
∑
j=1
|Ω′|
∏
i=1
〈j|M˜ (i)ai|xi (Ω
′) |j〉. (3.79)
117
Note that the summand corresponding to each subset Ω′ is in fact a scalar multiple
of the trace of the product of ˆ
⊗
i∈Ω′M˜
(i)
ai|xi (Ω
′) with the density operator ρN,d(Ω′).
Hence, separating out the term corresponding to Ω in the sum:
Tr
((
ˆ⊗
i∈Ω
M˜ (i)ai|xi
)
ρN,d
)
= d(Ω)
d
Tr
((
ˆ⊗
i∈Ω
M˜ (i)ai|xi (Ω)
)
ρN,d(Ω)
)
+
∑
Ω′:
Ω⊂Ω′⊆[N ]
d(Ω′)
d
Tr
((
ˆ⊗
i∈Ω
M˜ (i)ai|xi (Ω
′)
)
ρN,d(Ω′)
)
.
(3.80)
Substituting equation (3.76) for the first term on the RHS gives
Tr
((
ˆ⊗
i∈Ω
M˜ (i)ai|xi
)
ρN,d
)
= P (aΩ|xΩ) . (3.81)
So far, we have only defined the measurement operators for non-maximal pairs
(ai, xi) on each system i. However, as long as the remaining operators are defined
so that
∑
ai M˜
(i)
ai|xi = I
(i) for each i and choice of xi, then the quasiprobability
distribution generated by these measurement operators acting on the state ρN,d will
be non-signaling and normalized. By (3.81), the non-maximal marginals of this
quasiprobability distribution agree with those of P ; from Lemma 1 it then follows
that this distribution is exactly P (a|x). To complete the proof, we need simply
define M˜ (i)K|xi = I
(i) −
∑R−1
j=1 M˜
(i)
j|xi for each i and xi.
ExampleLetP (a1, a2|x1, x2) be any state of a g-bit system. In this case we require
d = 4, d({1, 2}) = 2 and d({1}) = d({2}) = 1. Starting with Ω = {1, 2},
118
following the technique in the proof of Lemma 3 we obtain
M (1)0|0 ({1, 2}) =
(
1 0
0 0
)
, (3.82)
M (1)0|1 ({1, 2}) =
(
0 0
0 1
)
, (3.83)
M (2)0|0 ({1, 2}) =
(
4P (0, 0|0, 0) 0
0 4P (0, 0|0, 1)
)
, (3.84)
M (2)0|1 ({1, 2}) =
(
4P (0, 0|1, 0) 0
0 4P (0, 0|1, 1)
)
. (3.85)
Similarly, following the technique with Ω = {1} and Ω = {2} generates the
following 1-dimensional matrices:
M (1)0|0 ({1}) = (4P (a1 = 0|x1 = 0)− 1), (3.86)
M (1)0|1 ({1}) = (4P (a1 = 0|x1 = 1)− 1), (3.87)
M (2)0|0 ({2}) = (4[P (a2 = 0|x2 = 0)− P (0, 0|0, 0)− P (0, 0|1, 0)]), (3.88)
M (20|1({2}) = (4[P (a2 = 0|x2 = 0)− P (0, 0|0, 1)− P (0, 0|1, 1)]). (3.89)
The measurement operators for A are therefore:
M (1)0|0 = Diag(1, 0, 4P (a1 = 0|x1 = 0)− 1, 0) (3.90)
M (1)0|1 = Diag(0, 1, 4P (a1 = 0|x1 = 1)− 1, 0) (3.91)
M (1)1|0 = I−M
(1)
1|1 , M
(1)
1|1 = I−M
(2)
1|2 , (3.92)
and the operators for B are:
119
M (2)0|0 = Diag(4P (0, 0|0, 0) ,
4P (0, 0|1, 0) ,
0,
4[P (a2 = 0|x2 = 0)− P (0, 0|0, 0)− P (0, 0|0, 0)]) (3.93)
M (2)0|1 = Diag(4P (0, 0|0, 1) ,
4P (0, 0|1, 1) ,
0,
4[P (a2 = 0|x2 = 1)− P (0, 0|0, 1)− P (0, 0|1, 1)]) (3.94)
M (2)1|0 = I−M
(2)
0|0 , M
(2)
1|1 = I−M
(2)
0|1 . (3.95)
The density operator ρN,d is represented by a 16x16 matrix with a 1 in the
(i, i)th position for i ∈ {1, 5, 9, 13} and 0s elsewhere.
3.4.2 Approximate product states
In Section 3.2.1, the choice of POVM elements depended only on the number of
parties, measurements and measurement outcomes; once these were fixed, it was
by modifying the non-positive density operator that any given non-signaling out-
come distribution could be generated. The procedure in Section 3.4.1 is in some
sense dual to this: the density operator ρN,d depends only on N (the number of
parties) and the local dimension d, which itself depends on the number of mea-
surement choices and outcomes. ρN,d is specially chosen to simplify the proof
of Lemma 3: specifically, multiplying ρN,d by a tensor-product of local matrices
and taking the trace leads to the simplification (3.71), enabling the construction of
local matrices which generate an arbitrary set of real numbers Q (a|x).
120
As we demonstrate in this section, it is possible to choose other initial density
matrices as long as some form of Lemma 3 still applies. Perhaps surprisingly,
it is possible to use any of a class of density operators which are of one greater
dimension than ρN,d, but which come arbitrarily close to the set of product states.
For ε ∈ R, define:
ρN,dε = (1− ε) |0〉⊗N〈0|⊗N + ε
d
∑
i=1
|i〉⊗N〈i|⊗N . (3.96)
This is a valid density matrix for 0 <  < 1/
√
d, and as ε approaches zero,
it approaches the product state |0〉⊗N〈0|⊗N . ρN,dε is defined on the vector space
(
C⊕ Cd
)⊗N , where d takes the same value as in the proof of Theorem 8. Recall
the construction of operators M˜ (i)ai|xi (Ω) for each Ω ⊆ [N ]. For non-maximal pairs
(ai, xi), consider the new operators:
N˜ (i)ai|xi(Ω) =
{
M˜ (i)ai|xi (Ω) if i 6= max (Ω)
1
d·εM˜
(N)
aN |xN (Ω) if i = max (Ω),
(3.97)
and define the non-maximal overall measurement operators as:
N˜ (i)ai|xi = 0⊕
⊕
Ω⊆[N ]
N˜ (i)ai|xi(Ω). (3.98)
For general subsets Ω and nowhere-maximal strings (aΩ|xΩ),
Tr
(
ˆ⊗
i∈Ω
N˜ (i)ai|xiρ
N,d
ε
)
= ε
d
∑
j=1
N
∏
i=1
〈j|N˜ (i)ai|xi(Ω)|j〉
= 1
d
∑
Ω⊆Ω′⊆[N ]
d(Ω′)
∑
j=1
|Ω′|
∏
i=1
〈j|M˜ (i)ai|xi (Ω
′) |j〉 (3.99)
where the same arguments as when obtaining (3.79) are used to deduce the second
line, plus the fact that for each subset Ω, for exactly one value of i does N˜ (i)ai|xi(Ω)
121
have a factor of 1d·ε . Equation (3.99) agrees precisely with equation (3.79), thus
recovering the non-maximal marginals P (aΩ|xΩ). Defining the remaining mea-
surement operators N˜ (i)Kxi|xi = I
(i) −
∑Kxi−1
ai=1 N˜
(i)
ai|xi is again sufficient to generate
the full outcome distribution P (a|x).
3.5 Discussion
Classical theory and quantum theory each admit two distinct quasiprobabilistic
extensions, all of which allow for the generation of arbitrary non-signaling corre-
lations. Moreover, the quantum case can be seen as following immediately from
the classical version. This suggests two things: firstly, it does not appear to be a
special property of quantum theory, or quantum-like formalisms, that allows for
non-signaling distributions to be generated in this manner. Secondly, it seems clear
that any general probabilistic theory which, like quantum theory and Boxworld,
“contains” classical probability theory, can also be modified to generate arbitrary
non-signaling distributions in this manner (note that this does not detract from the
importance of results such as that of Acín et al [1], which is still immensely useful
in providing a quantum-like framework for describing non-quantum correlations).
Here we use “contain” in the sense that local classical systems can be simulated by
some subset of states and measurements on local systems in that theory, and any
convex mixtures of product states are allowed in composite systems (in the lan-
guage of Chapter 2, the composite state space contains the min-tensor product of
the local state spaces). It would be interesting to pursue this line of investigation,
which concerns the relationship between classical theory and other non-signaling
probabilistic theories.
Recall that the joint probability distribution PΛ(λ) of a locally achievable out-
come distribution (3.35) is defined over a set of strings of the formλ = (λ1, · · · , λN),
where each λi corresponds to some local variable at system i. This local variable
is often taken to codify the local, physical state of affairs at system i. Before
Theorem 6 it was remarked that PΛ can be interpreted as a statistical mixture of
122
product states, each corresponding to a possible value of λ. This can be taken a
step further: in “epistemic” views of mechanics, such a non-deterministic proba-
bility distribution over states may well correspond to an observer's ignorance of
an underlying (perhaps deterministic) model, whose values codify the real state
of affairs. For various reasons (for example, instrumental imprecision) the ob-
server cannot access this real state of affairs, and must instead regard the system
as being in a convex mixture of these states. From this perspective, Theorem 7 is
particularly surprising: PΛ is a genuine probability distribution representing one's
ignorance about completely local states of affairs which, albeit non-physical due
to their negative values, cannot generate any non-locality of their own accord. De-
spite this, the mixture of those states generates the strongest non-locality which is
algebraically possible. This is particularly striking in the case of Section 3.4.2, in
which even the classical correlation shared between the parties becomes negligi-
ble. When quasiprobabilities are introduced, it is no longer sensible to think of
the joint probability distribution on local variables as an ignorance or mixture of
allowed local states, but rather as a non-local object in itself.
In the case of Theorem 6, it is the joint probability distribution PΛ that is al-
lowed to take on negative values, rather than the local outcome functions. Instead
of defining a convex combination of product states, PΛ now defines an “affine”
combination of product states. In our proof of Theorem 6, we showed that these
product states could in fact be taken to be products of pure states: states which
are completely deterministic on the local system. It is a reasonably well-known
fact that in Boxworld, the states of a joint system can be written as affine com-
binations of pure product states [29, 77]; similar results have also been obtained
using a measure-theory approach [78, 79]. Theorem 6 is therefore an alternative
and constructive proof of this result. On the other hand, to the best of the author's
knowledge, no result along the lines of Theorems 7 or 8 exists in the published
literature; these are important and intriguing results both because of their potential
novel applications in quantum information and foundations, and because of the
surprising and counterintuitive aspects highlighted in the previous paragraph.
123
It is interesting to examine the extent to which the local quasiprobability distri-
butions constructed in this chapter actually violate the positivity condition obeyed
by conventional probabilities. Since in some sense we are replacing the “weird-
ness” of non-locality with the “weirdness” of negative probabilities, one might
assume that more strongly non-local distributions lead tomore negative quasiprob-
abilities. This does not seem to be the case in our constructions; for example, in
Theorem 6, with an odd number of systems the largest negative value taken on
by PΛ will in fact be
∏N
i=1(1 −M (i)). This value depends purely on the number
of measurement choices at each system; even locally-achievable distributions will
result in highly negative quasiprobability distributions being constructed, so long
as the number of possible measurements is high.
It would nevertheless be interesting to explore whether other constructions ex-
ist which do align negativity with non-locality in some way. From the results of
Navascués et al [74, 75], in which a convergent hierarchy of conditions are given
for characterizing the set of quantum correlations, it seems unlikely that one could
determine that the initial outcome distribution P (a|x) is quantum-achievable sim-
ply by looking at the negativity of the local quasiprobability distributions simu-
lating it. However, a converse result may well be possible, in which quantum-
achievable distributions can always be represented using negative values no less
than a given bound (even if some non-quantum outcome distributions also do not
violate this bound).
Although in this chapter we have tended to refrain from discussing the phys-
ical meaning of negative probabilities, some considerations along these lines do
suggest themselves at this point. Given the accuracy with which local quasiproba-
bilistic representations may simulate any quantum - or indeed non-signaling - out-
come distribution, it seems natural to ask if we should be looking for some special
physical meaning associated to negative probabilities. Have we veered far off the
mark by investing so much research into non-locality, when perhaps negativity is
somehow more real? On the other hand, what if these quasiprobabilistic models
are no more than a purely mathematical exercise in equation-solving, simply made
124
much easier by the removal of linear positivity constraints? A clue to the answer
to these questions may lie in the variety of approaches described in this chapter.
Rather than there being a single, canonical method for deriving local or quan-
tum quasiprobabilistic representations, there appears to be a multitude, of which
we have likely only scratched the surface! The sheer redundancy of quasiproba-
bilistic representations lends credibility to the opinion that although they might be
useful for the purpose of calculation (a laWigner), it might be futile to search for
anything special they have to tell us about the world.
125
126
Chapter 4
Information Causality
All reality is a game. Physics at its most fundamental, the very fabric
of our universe, results directly from the interaction of certain fairly
simple rules, and chance; the same description may be applied to
the best, most elegant and both intellectually and aesthetically sat-
isfying games. By … resulting from events which, at the sub-atomic
level, cannot be fully predicted, the future … retains the possibility of
change, the hope of coming to prevail…. In this, the future is a game;
time is one of its rules.… The very first-rank games acknowledge the
element of chance, even if they rightly restrict raw luck.
“Player of Games”
Iain M. Banks
127
4.1 Non-local games
Non-local games provide a convenient and intuitive setting in which to explore
which correlations are more useful than others for being exploited in information-
based tasks [7,80-83]. In a non-local game, two parties (whom, following conven-
tion, we will refer to as Alice and Bob in this chapter) each receive some separate
input, and, without communicating, return an output. They win the game if, ac-
cording to some predicate, their outputs are “correct” with respect to their inputs.
This predicate generally refers to whether the joint output is correct with respect
to the joint input, rather than Alice or Bob individually being correct or not. Their
goal is to attempt to maximise their average rate of success, given the probabilities
with which their inputs are distributed.
Related to the notion of a non-local game is that of a distributed computation.
The setup is very similar: Alice and Bob are each given an input, andmust produce
outputs which are jointly correct with respect to the joint input. The difference is
that Alice and Bob are allowed to communicate after they know their inputs, but
must keep their communication to a minimum. Further restrictions may be in-
troduced, for example the communication might only be one-way from Alice to
Bob. Note that there are two distinct ways to optimize a distributed computation:
firstly, the probability of success must be maximised for a fixed amount of com-
munication (for example, one bit); and secondly, the amount of communication
must be minimised whilst achieving at least some fixed probability of success (for
example, winning every time).
To put this on a rigorous footing, suppose that Alice's inputs belong to some
finite setX1, and Bob's inputs belong to a finite setX2. A probability distribution
pi(x1, x2) determines the likelihood of Alice and Bob receiving the pair of inputs
x1 ∈ X1 and x2 ∈ X2. Alice and Bob then output a pair of values a1 and a2
from the finite sets A1 and A2 respectively, and their outputs are judged according
to some predicate Va1,a2,x1,x2 taking values in {0, 1} (one could conceive of an
extension in which V instead assigns a weighting in [0, 1], however the case where
answers are either “right” or “wrong” is interesting enough for our purposes). The
128
Figure 4.1: (i) Non-local game; (ii) Distributed computation.
functions pi and V then define a nonlocal game G = G(pi, V ).
Whatever strategy Alice and Bob employ, the probability of their outputs are
given by a conditional probability distribution P (a1, a2|x1, x2). If no communica-
tion is allowed, thenP will be a non-signaling outcome distribution. Their average
success probability is then given by the formula:
PG =
∑
x1,x2
pi(x1, x2)
∑
a1,a2
P (a1, a2|x1, x2)Va1,a2,x1,x2 . (4.1)
If Alice and Bob are restricted to classical strategies, i.e. the set of locally
achievable outcome distributions, thenP is a convex combination of product states:
P (a1, a2|x1, x2) =
∑
λ
P (1) (a1|x1, λ1)P (2) (a2|x2, λ2)PΛ(λ) (4.2)
129
Substituting this into (4.1) gives:
PG =
∑
x1,x2
pi(x1, x2)
∑
a1,a2
(
∑
λ
P (1) (a1|x1, λ1)P (2) (a2|x2, λ2)PΛ(λ)
)
Va1,a2,x1,x2
=
∑
λ
PΛ(λ)
(
∑
x1,x2
pi(x1, x2)
∑
a1,a2
P (1) (a1|x1, λ1)P (2) (a2|x2, λ2)Va1,a2,x1,x2
)
.
(4.3)
Hence the probability of success at a classically achievable strategywill be an aver-
age of the probabilities of success for some set of product strategies. The optimal
classical probability of success is therefore attained via some product strategy.
This argument can be taken further: for any convex set of non-signaling distri-
butions, the optimal success probability will always be attained via a distribution
which cannot be written as a convex combination of other distributions (i.e. one
which is an extreme point of the set).
Any local strategy P (1) (a1|x1) employed at Alice's end is in fact a convex
combination of deterministic strategies a˜1 : X1 → A1 which assign a definite
output for each input. To see this, let each such strategy a˜1 be weighted by the
following product:
qa˜1 =
∏
x′1
P (1) (a˜1(x′1)|x′1) . (4.4)
Let Q be the strategy which is the convex combination of deterministic strategies
aˆ according to the above weighting. Then for any pair (a1, x1), we have:
Q (a1|x1) =
∑
a˜1
δ(a˜1(x1), a1)qa˜1
=
∑
a˜1
δ(a˜1(x1), a1)
∏
x′1
P (1) (a˜1(x′1)|x′1)
= P (1) (a1|x1)
∑
a˜1:a˜1(x1)=a1
∏
x′1 6=x1
P (1) (a˜1(x′1)|x′1)
= P (1) (a1|x1) (4.5)
130
The same is true for any local strategy that Bob employs, hence any product strat-
egy (and any classical strategy) is a convex combination of deterministic product
strategies. The optimal probability of success for classical strategies is therefore
the maximum probability of success over the (finite) set of deterministic product
strategies:
PG(C) = max
a˜1,a˜2
∑
x1,x2
pi(x1, x2)Va˜1(x1),a˜2(x2),x1,x2 . (4.6)
If Alice and Bob have access to a shared, entangled quantum state, then P
takes the form of a quantum-achievable outcome distribution. Alice and Bob per-
form quantummeasurements which depend on their inputs x1 ∈ X1, x2 ∈ X2, and
the outcome of the measurement will determine their individual outputs. Specif-
ically, suppose that they share one part each of a density matrix ρ belonging to a
bipartite quantum system H1 ⊗H2. Each of Alice's possible inputs x1 ∈ X1 will
correspond to a set of positive operators {M (1)a1|x1}which sum to the identity inH1:
∑
a1 M
(1)
a1|x1 = I
(1). Similarly, each each of Bob's inputs x2 ∈ X2 corresponds to
a set of positive operators {M (2)a2|x2} satisfying
∑
a2 M
(2)
a2|x2 = I
(2). The outcome
distribution can then be written:
P (a1, a2|x1, x2) = Tr
(
M (1)a1|x1 ⊗M
(2)
a2|x2ρ
)
(4.7)
By purifying the state ρ in larger Hilbert spacesH1,H2, the outcome distribu-
tion can also be realised by quantum measurements on a pure state |ψ〉, so that,
P (a1, a2|x1, x2) = 〈ψ|M (1)a1|x1 ⊗M
(2)
a2|x2 |ψ〉, (4.8)
and the probability of winning the game is given by,
PG =
∑
x1,x2
pi(x1, x2)
∑
a1,a2
〈ψ|M (1)a1|x1 ⊗M
(2)
a2|x2 |ψ〉Va1,a2,x1,x2 . (4.9)
The optimal quantum strategy PG(Q) is therefore the supremum of (4.9) over
all Hilbert spacesH1,H2, and density matrices and POVMs operators defined on
131
those spaces. Unlike classical strategies, there is no simple reduction to a maxi-
mization problem over a finite set of strategies; indeed, it is not generally known
whether the optimal quantum strategy can always be achieved, or if instead there
exist games where, as the dimensions ofH1 andH2 increase, the optimal value is
approached but is never quite reached [80].
In this section we review some of the literature on non-local games, and discuss
in detail a particular game known as the Information Causality game. In the fol-
lowing sections we then study the Information Causality game from two intriguing
perspectives. Firstly, in Section 4.2, we analyse the players' probability of success
in the Information Causality game and thus derive a bound on a class of non-local
games known as Random Access Codes, proving a conjecture given in [3]. The
results of this section were obtained in collaboration with Anthony Short and pub-
lished in [44]. Secondly, in Section 4.3, we reformulate the standard description
of Information Causality in terms of entropies, arriving at an elegant derivation of
Tsirelson's bound from entropic considerations. The results of Section 4.3, also
published in [44] are substantially the author's own work.
4.1.1 The CHSH game revisited
There are several non-local games for which the optimal quantum strategy is known
to be achieved in low dimensions [47,80,83]. A good example of this is the CHSH
game introduced in Section 2.1, in which quantum strategies clearly outperform
their classical counterparts. Recall that in the CHSH game the input and output sets
are all of size 2, and that for convenience we will take these sets to be {0, 1}. The
input distribution pi is uniform over the 4 possible input pairs, and the predicate V
takes the following form:
Va1,a2,x1,x2 = δ(a1 ⊕ a2 = x1 · x2), (4.10)
132
where “⊕” and “·” denote addition and multiplication modulo 2. Note that a1 ⊕
a2 = 0 if and only if the values of a1 and a2 are correlated, whereas it equals 1 if
and only if they are anti-correlated. The expression x1 ·x2 is equal to 1 if and only
if both x1 and x2 are equal to 1, i.e. for just one of the possible values of the pair
of inputs. Hence the predicate can be summarised in words as: “Alice and Bob's
outputs must be correlated for all but one choice of the pair of inputs, otherwise
they must be anti-correlated”. Since pi(x1, x2) is always equal to 14 , the probability
of success for any strategy with outcome distribution P can be written:
PG =
1
4
∑
x1,x2,a1,a2
P (a1, a2|x1, x2) δ(a1 ⊕ a2 = x1 · x2). (4.11)
As we have argued, the optimal classical success probability for the CHSH
game is found by considering the deterministic product strategies. It is easy to
check that for every choice of local deterministic strategies aˆ1 and aˆ2, the equation
aˆ1(x1)⊕ aˆ2(x2) = x1 · x2 is satisfied either for exactly 3 out of the four possible
input choices (e.g. aˆ1(x1) = 0 = aˆ2(x2) fails only when x1 = x2 = 1), or for
exactly 1 out of the four inputs (e.g. aˆi(xi) = xi succeeds only when x1 = x2 = 0).
Therefore PG(C) = 34 .
The optimal quantum probability of success for the CHSH game is 2+
√
2
4 , com-
monly known as Tsirelson's bound since his proof in [47]. Observe that the prob-
ability of success PG at the CHSH game is distinct from the CHSH value C intro-
duced in Section 2.1. However, if one assumes that Alice and Bob are given their
inputs uniformly at random, then it is easily checked that they are related in the
following manner:
PG =
C + 4
8
. (4.12)
Therefore from Tsirelson's bound it can be deduced that the optimal CHSH value
achievable in quantum theory is 2
√
2, as stated in Section 2.1.
As well as providing a concise indicator of how quantum theory deviates from
any local, classical theory, the CHSH game provides a stepping stone for exploring
non-signaling correlations that cannot be achieved in quantum theory. The PR-
133
box [48] is an example of a non-signaling outcome distribution which performs
perfectly at the CHSH game, and thus outperforms quantum theory. Recall that in
order to succeed, Alice and Bob's outputs must anti-correlate if and only if both
their inputs are equal to 1. The PR-box is defined in such a way that this clearly
occurs:
P (a1, a2|x1, x2) =
1
2
δ(a1 ⊕ a2 = x1 · x2). (4.13)
Note that whatever the input on Alice's side, her outcome will always be uniformly
random; the same is true for Bob, and hence this distribution is non-signaling.
The PR-box's violation of Tsirelson's bound demonstrates that the non-signaling
condition is not by itself sufficient to single out the set of quantum achievable out-
come distributions. This raises several natural questions. Why does Tsirelson's
bound take on this slightly peculiar value, and what is it about nature that forbids
us from going above it? A related question is: what physical principles, when
imposed in addition to the non-signaling condition, allow us to recover Tsirelson's
bound purely through information-theoretic arguments, rather than through the
paradigm of Hilbert spaces and density operators? In the following two sections,
we review some attempts to answer these questions. In Section 4.1.2 we discuss
the well-known result of W. van Dam, that if nature were to allow the existence
of PR-boxes, then all communication complexity tasks would become trivial [84].
In Section 4.1.3 we introduce the Information Causality game, which describes an
information-theoretic principle from which one can rule out not just the PR-box,
but any non-signaling distribution whose probability of success at the CHSH game
is greater than Tsirelson's bound.
4.1.2 Inner product games
In the Non-Local Inner Product game [84], Alice is given an n-bit string x and
Bob is given an n-bit string y. Then, after performing local operations but without
any communication, Alice must output a bit a and Bob must attempt output a bit
b such that a ⊕ b = x · y = ⊕ni=1(xi · yi). Note that if n = 1 and x and y are
134
chosen uniformly at random, then the Non-Local Inner Product game reduces to
the CHSH game.
Closely related to the Non-Local Inner Product game is the Distributed Inner
Product Computation [84]. In this task, Alice and Bob are again given bit strings x
and y; however, instead of both producing individual outputs, Alice is permitted to
transmit any number of bits to Bob, after which Bob attempts to output the value
x · y. As well as maximising the probability of success, the aim is to minimise
the number of bits which are transmitted from Alice to Bob. Note that a strategy
for the Non-Local Inner Product Game can easily be converted into a strategy for
the Distributed Inner Product Computation with one bit of communication and an
identical probability of success: instead of outputting her bit a, Alice transmits it
to Bob, who then calculates and outputs the sum (modulo 2) of a and his output b
of the Non-Local Inner Product game. This conversion of strategies will turn out
to be very important in Section 4.2.
Recall that if Alice and Bob share a single g-bit in the PR-box state, then they
can succeed with probability 1 at the CHSH game. Following [84], it can easily
be seen that with n shared PR-box states, they can also succeed with probability 1
at the Non-Local Inner Product game (and therefore perform all Distributed Inner
Product Computations perfectly with the transmission of a single bit). For each
value of i with 1 ≤ i ≤ n, suppose that Alice and Bob input the measurement
choices corresponding to xi and yi respectively into their half of the same shared
PR-box. The output of that PR-box will be two bits - ai for Alice and bi for Bob -
satisfying ai⊕bi = xi ·yi. After doing this for all n components of x and y, Alice's
135
overall output bit is a = ⊕ni=1ai, and Bob's overall output bit is b = ⊕ni=1bi. Then,
a⊕ b =
(
n
⊕
i=1
ai
)
⊕
(
n
⊕
j=1
bj
)
=
n
⊕
i=1
(ai ⊕ bi)
=
n
⊕
i=1
(xi · yi)
= x · y. (4.14)
As van Dam points out [84], not much work is required to extend this idea,
so that Alice and Bob can compute more sophisticated functions of their input bit
strings. It is a well-known fact that any Boolean function f(x, y) may be written
as a multi-variable polynomial in the components of x and y, and by collecting
terms in y it is possible to decompose f as:
f(x, y) =
2n
⊕
i=1
(Pi(x) ·Qi(y)) . (4.15)
where each Pi is a multi-variable polynomial and each Qi is a monomial, i.e. of
the form yi1 · . . . · yim for some subset {i1, . . . , im} ⊂ [n] (note that there are 2n
such subsets). For example, the function f(x, y) = y1 AND (x1 OR y2) can be
written as:
f(x, y) = y1 · (x1 ⊕ y2 ⊕ x1 · y2)
= [x1 · y1]⊕ [(x1 ⊕ 1) · y1 · y2] (4.16)
The distributed computation of a general function of two n-bit strings thus
reduces to the distributed computation of the inner product of 2n binary variables.
By using the bit Pi(x) as the input for Alice's half of each PR-box, and Qi(y) for
Bob's half, it is thus possible to compute any Boolean function in this way, with at
136
most 2n uses of the PR-box and one bit of communication from Alice to Bob.
If Nature allowed the existence of PR-box correlations, then the above protocol
would permit the distributed computation of arbitrary functions using trivial com-
munication complexity. This does not seem to immediately conflict with physical
intuition - after all, it's not as if the speed of computation is independent of the
spatial separation between Alice and Bob. However, as van Dam points out, it
renders the very notion of communication complexity defunct, since all functions
could be evaluated in this trivial fashion. This consequence of strongly nonlocal
correlations is “squarely against the world-view and experience of probably all re-
searchers in the field of complexity theory” [84]; the existence of non-trivial com-
munication complexity can be regarded as an intuitive physical principle which
rules out the existence of PR-box correlations.
Ruling out just the existence of PR-boxes, however, does not even come close
to ruling out the existence of all non-signaling correlations which are not quantum-
achievable. It would be nice if, for example, a similar protocol could be designed
which rules out all those correlations that violate the CHSH inequality beyond
Tsirelson's bound. Though this has been done for correlations whose maximal
CHSH violation lies in a defined interval above Tsireloson's bound [72], it is an
open question whether Tsirelson's bound provides a critical threshold value for
the non-triviality of communication complexity. Since the publication of van
Dam's protocol, various other natural principles have been shown to exactly re-
cover Tsirelson's bound [50-52], or at least come close to it [72]; in the next section
we describe the Information Causality principle, which is one of the better known
examples of this.
4.1.3 The information causality game
Like van Dam's protocol, Information Causality is formulated in terms of a game.
The Information Causality game [51] again involves two parties; a bit string a of
length n is chosen uniformly at random and given to Alice, whilst Bob is given
a number k, 1 ≤ k ≤ n. Alice may then transmit an m-bit message x to Bob,
137
after which Bob makes a guess at ak, the kth bit of Alice's original message. The
parties may decide on a shared strategy but play separately, and no communication
other than the message x is permitted. Note that the Information Causality game
can be formulated as an inner product game: if Bob's number k is represented as
a bit string y with a 1 in the kth place and zeros elsewhere, then Bob's is required
to make a guess at the output of the inner product function f(a, y) = a · y.
However, there is a crucial difference between the Information Causality game
and other non-local games; rather than attempting to maximise the average prob-
ability that Bob's guess is correct, Alice and Bob must try to maximise a slightly
unusual figure of merit. For any given strategy that Alice and Bob agree upon, sup-
pose that β(k) is the classical random variable encoding Bob's guess at the value
of ak (i.e. assuming he was given k). Then their degree of success is measured by
the function:
I =
N
∑
k=1
Ic(ak : β(k)), (4.17)
where Ic(A : B) denotes the classical mutual information of the joint probability
distribution A,B. Classical intuition suggests that since a was chosen at random
and independently of Bob's system, the total information Bob has about amust be
no greater than whatever can be extracted from m classical bits; accordingly the
Information Causality principle demands that for any strategy that Alice and Bob
employ,
I =
N
∑
k=1
Ic(ak : β(k)) ≤ m. (4.18)
The formula (4.18) can be seen as a generalisation of the non-signaling con-
dition as applied to Bob's system in the game, with the non-signaling condition
being the special casem = 0. To see this, let P (∅, β(k)|a, k) be the outcome dis-
tribution associated with the game (where the outcome at Alice's system plays no
role and so is denoted by ∅), and recall that the non-signaling condition is equiv-
alent to the existence of a well-defined marginal distribution P (β(k)|k) which is
independent of a; since β(k) is in fact Bob's ouput given the number k, P (β(k)|k)
138
is simply the distribution of the variable β(k): this distribution being independent
of a is in turn equivalent to (4.18) holding withm = 0.
Two further points are worth mentioning, concerning (4.18). Firstly, the in-
equality can be very simply saturated with a classical strategy in which Alice sim-
ply sends to Bob the firstm bits of a, so that Ic(ak : β(k)) = 1 if 1 ≤ k ≤ m, and
0 otherwise. Secondly, in (4.18) we have avoided defining measures of informa-
tion for states of systems which may not be classical systems. Although Alice and
Bob's strategy may involve bipartite states from general physical theories, Bob's
guess must ultimately take the form of a classical bit, and the figure of merit I is
a sum of mutual informations between purely classical random variables.
However, classical theory is not the only general physical theory which pos-
sesses a measure of mutual information; namely, in quantum theory the quantum
mutual information Iq(A : B) between systems A and B is defined as Hq(A) +
Hq(B) − Hq(A,B), where Hq is the von Neumann entropy. Iq is positive, sym-
metric, and generally behaves in accordance with our intuition about information
shared between two states. Equation (4.18) holds in any general physical theory
in which, for any two systems X and Y , there exists some measure of mutual
information I˜(X : Y ) which satisfies four basic properties [51]:
(i) Consistency: Whenever X and Y are both classical systems, I˜ reduces to the
classical mutual information, I˜(X : Y ) = Ic(X : Y )
(ii) Data Processing: For any allowed transformation τ : Y → Y ′, I˜(X : Y ) ≥
I˜(X : Y ′)
(iii) Chain Rule: For all tripartite systems X, Y, Z, I˜(X, Y : Z) − I˜(X : Z) =
I˜(Y : X,Z)− I˜(X : Y )
(iv) Symmetry: For all bipartite systems X, Y , I˜(X,Y ) = I˜(Y,X)
It is well known that all four properties hold both for the classical mutual in-
formation Ic and the quantum mutual information Iq in their respective general
139
physical theories [53]; hence, if Alice and Bob are restricted to making measure-
ments on shared classical or quantum bipartite states, they will never be able to
violate the Information Causality bound (4.18). Note however, that whilst prop-
erties (i), (ii) and (iv) seem very reasonable, property (iii) appears to be without
immediate physical significance. This will become important later in Section 4.3,
where it is shown that the Information Causality principle can be framed so that
requirement (iii) essentially disappears.
Given the existence of a mutual information which satisfies properties (i)-
(iv), we will demonstrate that (4.18) holds using the techniques presented in [51].
Firstly, observe that any two systems X and Y can be individually mapped to
independent classical systems (for example the map which simply traces out the
system, represented by τ : X → ∅). By Symmetry and Data Processing, one ob-
tains I˜(X : Y ) ≥ I˜(∅ : ∅). It then follows from Consistency that I˜(X : Y ) ≥ 0,
hence I˜ is always non-negative.
Let X be the classical system comprising the m bits which Alice transmits to
Bob; the equation m = I˜(X : X) follows immediately from Consistency. Con-
sider the tripartite system X,A, B, where A is the classical system containing Al-
ice's input a and B is Bob's system before Alice transmits her message x. The
state of system X is described by a set of probabilities px for each bitstring x, and
corresponds to a particular reduced state ψx of system a, B, such that the joint state
ofX,A, B is the convex combination of the product states (x, ψx) according to the
distribution px. The transformation τ : x → ψx therefore maps systemX to system
A, B, so that by Data Processing,
m = I˜(X : X) ≥ I˜(X : A, B). (4.19)
Applying the Chain Rule to the RHS of (4.19) withX = B, Y = X andZ = A
gives:
I˜(X : A, B) = I˜(X, B : A) + I˜(X : B)− I˜(A : B) (4.20)
Since a is chosen uniformly at random, it is independent of system B. There-
140
fore the state of A, B can be constructed from A, ∅ by simply mapping ∅ to the
reduced state at system B. By Data Processing and Consistency, I˜(A, B) ≤
Ic(A, ∅) = 0. Since I˜ is non-negative, I˜(A : B) = 0. Also, I˜(X : B) ≥ 0,
so that:
I˜(X : A, B) ≥ I˜(X, B : A) = I˜(A : X, B) (4.21)
where the right-hand equality follows from Symmetry. Writing the system A as
A1,A{2,...,n}, and applying the Chain rule with X = A1, Y = A{2,...,n} and Z =
X, B to the RHS of (4.21) gives:
I˜(A1,A{2,...,n} : X, B) =I˜(A{2,...,n} : A1,X, B)
+ I˜(A1 : X, B)− I˜(A1 : A{2,...,n}). (4.22)
Since A1 and A{2,...,n} are independent classical systems, the rightmost term is
equal to 0. Applying the trace-out map to system A1 in the tripartite system
A1,X, B then gives:
I˜(A1,A{2,...,n} : X, B) ≥ I˜(A{2,...,n} : X, B) + I˜(A1 : X, B). (4.23)
This inequality can be iterated n − 1 times, writing A{2,...,n} as A2,A{3,...,n} etc...
, resulting in the inequality:
I˜(A : X, B) ≥
n
∑
k=1
I˜(Ak : X, B). (4.24)
Bob's output β(k) is derived ultimately fromX andB, hence there must exist some
map from the state of X, B to β(k). Hence by Data Processing and Consistency,
n
∑
k=1
I˜(Ak : X, B) ≥
n
∑
k=1
Ic(Ak : β(k)) = I. (4.25)
Putting together the equations (4.19), (4.21), (4.24) and (4.25) gives the desired
resultm ≥ I .
141
So far, we have demonstrated that the Information Causality principle holds
in any general physical theory which, like classical or quantum theory, admits a
sensible measure of mutual information. We now present the derivation of the fact
that, for general physical theories allowing violations of Tsirelson's bound, Infor-
mation Causality does not hold (again, this is based on the derivation presented
in [51]). We first describe the protocol in the case where Alice and Bob have ac-
cess to perfect PR-boxes, and then move onto the case of noisy PR-boxes, which
give correct outputs only with a specified probability.
Recall that n is the number of bits given to Alice, and suppose that n = 2r
for some integer r. We will demonstrate that if Alice and Bob have access to an
initial supply of many shared PR-boxes, it is possible to perfectly evaluate the
function f(a, k) = ak using only r chosen outputs of the PR-boxes and one bit
of communication. This is easy to demonstrate in the case r = 1: Alice inputs
a1 ⊕ a2 and obtains output c, whilst Bob inputs (k − 1) and obtains output b. By
definition of the PR-box, b⊕ c = (k− 1) · (a1⊕ a2), hence a1⊕ b⊕ c = ak. Alice
needs only send Bob the value x = a1 ⊕ c, which he can then sum with his output
b to obtain his desired bit ak.
For r > 1, write a = a(0)a(1) as a concatenation of two bit-strings of length
n/2, and let (k1, . . . , kr) be a bit-string representation of (k− 1), so that the value
of the rth bit kr indicates whether ak belongs to either a(0) or a(1). Assume that
the protocol has been constructed for the case n = 2r−1, and suppose that Alice
and Bob perform the protocol for both of Alice's strings a(0) and a(1) - with Bob
using the input represented by (k1, . . . , kr−1) in both cases - but that Alice does not
transmit any messages to Bob. Suppose that Alice would normally communicate
the single bit x(0) to Bob had they been performing the protocol only for the string
a(0), and similarly the bit x(1) had they been performing the protocol only for a(1).
Note that Bob now wishes to know the value of exactly one of x(0) or x(1), depend-
ing on whether kr is 0 or 1. This reduces the task to the r = 1 case, hence by the
previous paragraph, using one extra PR-box Bob may obtain the value of bit x(kr),
and complete the protocol for the (n/2)-bit string a(kr). Figure 4.2 depicts this
142
protocol in the case where r = 2 and k = 3; note that whilst the general protocol
requires three PR-box states, Bob only makes use of two of the outputs.
Figure 4.2: How to violate the Information Causality protocol using perfect PR-
box states (r = 2, k = 3). Adapted from [51].
Now suppose that Alice and Bob use a noisy PR-box, which on inputs x, y
outputs bits a and b, whose individual values are uniformly random, such that
with probability p, a⊕ b = x · y and with probability 1− p, a⊕ b = (x · y)⊕ 1 (in
fact, any non-signaling outcome distribution can be brought to this form by means
of local operations, without affecting its overall CHSH value [85]). Tsirelson's
bound is violated by such a box if and only if p > 2+
√
2
4 [47]. In the protocol
with perfect PR-boxes, Alice and Bob compute their final answer by a modulo 2
summation of the outputs of r PR-boxes. Therefore in the noisy case, the correct
answer is obtained not just when all the PR-boxes function correctly, but if and
only if an even number of them output the wrong answer. Thus the probability of
143
Bob having the correct answer at the end of the protocol is given by:
b r2 c
∑
j=0
(
r
2j
)
pr−2j(1− p)2j = 1
2
[1 + (2p− 1)r] = 1 + E
r
2
, (4.26)
where E = p − (1 − p) is the bias of the PR-box. The figure of merit I can be
manipulated into a form into which we can subsitute (4.26). Expanding it in terms
of the classical entropy,
I =
n
∑
k=1
Ic(ak : β(k))
=
n
∑
k=1
H(ak)−Hc(ak|β(k)).
(4.27)
Since a is uniformly distributed, Hc(ak) = 1 for all k. If one has access to β(k),
then ak has the same distribution as ak ⊕ β(k), hence Hc(ak|β(k)) = Hc(ak ⊕
β(k)|β(k)), so:
I ≥
n
∑
k=1
1−Hc(ak ⊕ β(k))
=
n
∑
k=1
1− h(Prob(ak = β(k))), (4.28)
(4.29)
where h(p) is the binary entropy of the distribution {p, 1 − p}. Now substituting
144
the success probability (4.26) and using the inequality 1− h(1+y2 ) ≥
y2
2 ln 2 :
I ≥
n
∑
k=1
1− h
(
1 + Er
2
)
(4.30)
≥ 1
2 ln 2
n
∑
k=1
E2r (4.31)
= (2E
2)r
2 ln 2 (4.32)
In order for this last value not to become arbitrarily large with r, and exceed 1
(the value of m in this protocol), we require that E2 < 12 , i.e. p <
2+
√
2
4 . Hence,
Tsirelson's bound gives the precise threshold beyond which the violation of Infor-
mation Causality is possible.
4.2 Probability of success in the information causal-
ity game
As mentioned at the beginning of the previous Section, the Information Causality
game differs from other non-local games in that one attempts tomaximise a strange
function I on the inputs and outputs, rather than the probability of success. On
the other hand, Tsirelson's bound is commonly formulated as an upper limit on
the probability of success at the CHSH game. This contrast raises some natural
questions about Information Causality, and how one uses it to derive Tsirelson's
bound. If quantum theory cannot improve on the classical bound on I , does this
also imply that quantum theory also cannot improve on the optimal probability of
success? If not, how are probability of success and I (defined as in (4.18)) related?
In fact, it can easily be seen that the optimal quantum probability of success
in the Information Causality game is sometimes greater than the optimal classical
probability of success. For example, in a simple version of the game in which
n = 2 and m = 1, the optimal classical probability of success is 34 (e.g. when
145
Alice sends Bob x = a1 and he guesses β(k) = x regardless of k, they always win
when k = 1 and win half the time when k = 2). However, by exploiting quan-
tum measurements which saturate Tsirelson's bound, Alice and Bob can achieve a
success probability of 2+
√
2
4 . To do this, Alice and Bob first generate bits a
′ and b′
satisfying a′ ⊕ b′ = (a1 ⊕ a2) · (k − 1) with probability 2+
√
2
4 . Then Alice sends
Bob x = a′ ⊕ a1 and Bob outputs β(k) = b′ ⊕ x.
It is also possible to obtain very different values of I for strategies with the
same probabilities of success. As above, Alice can send Bob her first bit to obtain
I = 1 and probability of success 34 ; alternatively, Alice and Bob can uniformly
“mix” this strategy with one where Alice sends Bob her second bit and he out-
puts that, so that the overall probability of success is the same but they win with
probability 34 if either k = 1 or k = 2. For this strategy,
I = Ic(a1 : β(1)) + Ic(a2 : β(2))
= 2 (Hc(β(1))−Hc(β(1)|a1))
= 2
(
1− h
(
3
4
))
≈ 0.38. (4.33)
The figure I seems to take heavily into considerationwhether the likelihood of suc-
cess is “shifted” into a few of Alice's components, or is evenly distributed across
them. Consider mixing the first-bit strategy with a small amount of noise: since
I is a continuous function, the value of I will be greater than 0.38. However,
the probability of success will now be slightly weighted toward a half, i.e. strictly
smaller. Thus, I and the probability of success are not even related monotonically.
4.2.1 Random Access Codes
With probability of success as the figure of merit, the Information Causality game
becomes very similar to the task of Random Access Coding [71]. In an (n,m, p)
Random Access Code (RAC), Alice has a random bit string a of length n, which
146
she codes into a bit string of lengthm < n which is transmitted to Bob. Bob must
then be able to recreate any one of Alice's bits ak with probability at least p. Much
of the research into Random Access Coding so far concerns the case wherem = 1
and where the bit string a is uniformly chosen, and attempts to find the optimal
success probability p as a function of n.
When Alice and Bob are restricted to classical strategies, the optimal success
probability in the case m = 1 is known; it is attained by using a “majority-vote”
strategy, in which Alice simply sends Bob the bit that most frequently occurs in
her string [71]. This gives an analytical success probability which yields a simple
approximation for large n using Stirling's formula:
PCsuccess =
1
2
(
1 + 1
2n−1
(
n− 1
bn−12 c
))
≈ 1
2
(
1 +
√
2
pin
)
. (4.34)
If Alice and Bob are given access to measurements on quantum states, there are
two further classes of RAC that can be considered. In an Entanglement-Assisted
RAC (EARAC) [3], Alice and Bob may have an initial supply of entangled states
upon which to perform measurements. In a Quantum RAC (QRAC), Alice and
Bob do not share any prior entanglement, however instead of transmittingm clas-
sical bits to Bob, Alice transmitsm qubits to Bob. Both scenarios have been inves-
tigated in the m = 1 case; for both EARACs and QRACs, the following formula
has been shown to be an upper bound on the probability of success [3, 71].
PQsuccess ≤
1
2
(
1 +
√
1
n
)
. (4.35)
Pawloski et al construct an EARAC protocol in [3] which attains the bound (4.35)
exactly when n is of the form 2j3k, for integers j and k. We show in Section
4.2.2 that this bound is achievable for all n, using strategies converted from a
related inner product game. For QRACs it is still not known for which n (4.35) is
achievable, however the optimal success probability is known to be lower-bounded
147
by the following formula [71]:
PQsuccess ≥
1
2
(
1 +
√
1
6pin
)
. (4.36)
4.2.2 Optimal success probability of EARACs
Since the Information Causality game is not itself judged in terms of probability of
success, it is interesting to explore inmore depth how it is used to derive a bound on
the probability of success at winning the CHSH game. With this in mind, a crucial
step in the argument appears to be the relation of entropy to PR-box probabilities,
in the approximated bound
I ≥
N
∑
k=1
1−H(Pk) ≥
1
2 ln 2
N
∑
k=1
E2k , (4.37)
where Ek is the success bias of the game, conditioned on Bob being given the
number k:
Ek = Prob(ak = β(k))− (1− Prob(ak = β(k)))
= 2Prob(ak = β(k))− 1. (4.38)
(Note that Ek is not the same as the bias E of a noisy PR-box state, as defined
in Section 4.1.3.) The protocol designed in Section 4.1.3 can also be seen as a
derivation of a quadratic bound for quantum strategies in the Information Causality
game:
N
∑
k=1
E2k ≤ 2 ln 2. (4.39)
It turns out to be very useful to look at a more direct way of deriving a similar
bound via the rules of quantum theory, rather than using the properties of mutual
information. Recall the Inner Product game defined in Section 4.1.2, in particular
the fact that any strategy can be converted to a strategy for the Distributed Inner
148
Product Computation with m = 1. Note also that the setup for the Information
Causality game is in fact an example of a Distributed Inner Product Computation.
Bob attempts to determine the value of a ·y, with the following additional promise
on the distribution of Bob's string y: with uniform probability, exactly one of the
bits (i.e. the kth bit) is 1, whilst the rest are non-zero.
In this Section we derive a quantum upper bound on the Inner Product game
which is very similar in form to the quadratic bias bound (4.39), and show that there
exist strategies that achieve this bound. By the above reasoning, such strategies
can be converted to strategies for Information Causality: we then show that one of
these converted strategies attains the optimal probability of success for EARACs,
given by (4.35), and hence gives the optimal probability of success for Information
Causality withm = 1. The following Lemma, proved in [86], is crucial in deriving
our results about the Inner Product Game.
Lemma 4. For any sub-normalized vectors u1, . . . , us and v1, . . . , vt in a real
Euclidean space of dimension min (s, t), there exists a density matrix ρ on a finite
dimensional Hilbert space H = H1 ⊗ H2, with ±1-valued Hermitian operators
A1, . . . , As on H1 and B1, . . . , Bt on H2 such that Tr((Ai ⊗Bj)ρ) = 〈ui, vj〉 for
all 1 ≤ i ≤ s, 1 ≤ j ≤ t.
Theorem 9. Suppose that Alice and Bob play the Non-local Inner Product Game,
where Alice's input x is chosen uniformly from bit strings of length n, and Bob's
input y is chosen according to any random distribution from bit strings of length
n. Let Ey be the bias of the game, conditioned on the value of y. Then the set of
biases {Ey} is achievable if, and only if,
∑
yE2y ≤ 1.
Proof. We first prove the “only if” direction. Since Alice and Bob do not com-
municate in the game, the outcome distribution P (a, b|x, y) of their outputs given
their inputs must result from a local POVMmeasurement on a shared density ma-
trix ρ. That is to say, for each value of x Bob receives, there is a pair of positive
operators {M (1)0|x ,M
(1)
1|x } which sum to the identity I(1), and for each value of y a
149
similar pair {M (2)0|y ,M
(2)
1|y } for Alice, such that:
P (a, b|x, y) = Tr
(
M (1)a|x ⊗M
(2)
b|y ρ
)
(4.40)
The POVM measurement (4.40) can be realised as a standard (projective)
quantum measurement on a density matrix ρ˜ belonging to a Hilbert space of larger
dimension; moreover, the state ρ˜ can be purified into a state |ψ〉 of even larger
dimension. Thus we can assume that Alice and Bob begin with the entangled pure
state |ψ〉, and their outputs are obtained by measurement of Hermitian operators
aˆx and bˆy respectively with eigenvalues in {0, 1}.
For this set of measurements, let Pxy denote the probability that a⊕ b = x · y
given that Alice and Bob are given the particular strings x and y. Suppose that a
+1 value is associated to the game if a ⊕ b = x · y (i.e. they succeed), and a −1
value is associated to it otherwise. Then the bias Exy = Pxy − (1 − Pxy) can be
seen as the expected value of the game, conditioned on the inputs x and y. In terms
of the state |ψ〉 and operators aˆx and bˆy, the bias has a simple form as the expected
value of the operator (−1)aˆx+bˆy+x·y.
Exy = 〈ψ|(−1)aˆx+bˆy+x·y|ψ〉. (4.41)
The bias of the game conditioned only on Bob's input y is given by the average
over Alice's inputs. Since Alice's bit string is chosen uniformly at random, this
is Ey = 12n
∑
xExy. To derive the quadratic bound on Bob's biases, define the
normalized states:
|A〉 = 1√
2n
∑
x
(−1)aˆx|ψ〉 ⊗ |x〉 (4.42)
|By〉 =
1√
2n
∑
x
(−1)bˆy+x·y|ψ〉 ⊗ |x〉 (4.43)
150
so that {|By〉} forms an orthonormal set satisfying 〈By|By′〉 = δyy′ . It follows that
〈A|
(
∑
y
|By〉〈By|
)
|A〉 =
∑
y
〈A|By〉2
=
∑
y
(
1
2n
∑
x
∑
x′
〈x|x′〉〈ψ|(−1)aˆx(−1)bˆy+x′·y|ψ〉
)2
=
∑
y
(
1
2n
∑
x
〈ψ|(−1)aˆx+bˆy+x·y|ψ〉
)2
=
∑
y
E2y . (4.44)
Since
∑
y |By〉〈By| is a projector, and |A〉 is normalised,
∑
y
E2y = 〈A|
(
∑
y
|By〉〈By|
)
|A〉 ≤ 1. (4.45)
We now prove the “if” direction: suppose that the real numbers By satisfy
∑
yB2y ≤ 1, and let the vectors ey be the standard basis vectors of some real
vector space. Define the sub-normalised vectors ux =
∑
y(−1)x·yByey and vy =
ey. By Lemma 4 (after purifying the state ρ), there exists a state |ψ〉 and 0, 1-
valued operators aˆx and bˆy (where e.g. (−1)aˆx = Ax) such that 〈ψ|(−1)aˆx+bˆy|ψ〉 =
〈ux, vy〉 = (−1)x·yBy, and hence
Ey =
1
2n
∑
x
Exy
= 1
2n
∑
x
(−1)x·y〈ψ|(−1)aˆx+bˆy |ψ〉
= 1
2n
∑
x
By
= By
151
Corollary 3. There exists a quantum strategy for the Information Causality game
withm = 1 which succeeds with probability 12
(
1 + 1√n
)
.
Proof. Suppose that Alice and Bob play the Non-local Inner Product game where
Alice's input string is chosen uniformly at random, but Bob's bit string is chosen
uniformly from the set of n-bit strings which have exactly one non-zero bit. By the
“if” direction of Theorem 9, it is possible to find a quantum strategy such thatEy =
1√
n for all such y (since this choice of Ey satisfies
∑
yE2y = 1). Consequently, for
each string y with one non-zero bit their probability of success conditioned on y is
Py =
1 + Ey
2
= 1
2
(
1 + 1√
n
)
. (4.46)
The total probability of success is given by averaging Py over the distribution on
Bob's string y. However, since Py is the same for all y, the probability of success is
simply 12
(
1 + 1√n
)
. Recall that any strategy for this version of the Non-local Inner
Product game can be transferred to a strategy for m = 1 Information Causality
with the same probability of success.
The probability of success in Corollary 3 is equal to the upper bound (4.35) on
probability of success for EARACs, hence this is the optimal probability of success
and (4.35) can be saturated for all integersn; moreover, any optimal strategy for the
modified inner product game converts to an optimal strategy inm = 1 Information
Causality. For larger values ofm, the optimal success probability for EARACs is
not generally known; however, it's worth noting that in deriving Tsirelson's bound
from Information Causality, the protocol given in Section 4.1.3 only involves one
bit being transmitted from Alice to Bob, who then computes the direct product
with his own output. If in fact the Information Causality game specified that at
most one bit is transmitted to Bob, and the corresponding condition I ≤ 1 was
used instead of I ≤ m, then the same results follow. Although it is interesting to
consider the case of generalm, the crux of the argument lies in them = 1 scenario.
152
The quadratic bias bound
∑
yE2y ≤ 1 given in Theorem 9 for the Non-local
Inner Product game appears to capture a great deal about the set of quantum cor-
relations. In the case where Bob's input y is chosen uniformly from all bit strings
of length n, the probability of success conditioned on y obeys the bound:
Py =
1
2
(
1 + 1
2n
∑
y
Ey
)
≤ 1
2

1 +
√
1
2n
∑
y
E2y


≤ 1
2
(
1 + 1√
2n
)
. (4.47)
Again, since this value is constant, the total probability of success also obeys this
bound. In the case n = 1, the Non-local Inner Product game is the CHSH game,
and the formula (4.47) reduces to Tsirelson's bound. In fact, in this case the in-
equality E20 + E21 ≤ 1 describes the exact quantum boundary in a particular 2-D
slice of the non-signaling polytope. However, it is not known whether either the
quadratic bias bound or the Information Causality principle is sufficient to recover
the complete boundary on the set of quantum correlations.
Just as with the Information Causality principle, note that it is possible to sat-
urate the quadratic bias bound using only classical correlations. If Alice and Bob
pick some fixed value y = y′, and Alice outputs x · y′ whilst Bob outputs 0, then
they win every time Bob's input is y′, hence Ey′ = 1; clearly this implies Ey = 0
for y 6= y′. Unfortunately, although the quadratic bias bound is appealing for its
relatively simple proof and obvious connection to success probabilities, it lacks
the direct physical intuition of the Information Causality principle.
153
4.3 The role of entropy
In Section 4.1 we derived the Information Causality principle from the existence of
a measure of mutual information I˜ which satisfies four conditions: Consistency
with classical mutual information, a Data Processing inequality, the Chain Rule
and Symmetry. This in itself suggests an interesting perspective: any physical
theory which admits a “reasonable” measure of mutual information (where “rea-
sonable” here means “satisfying at least the above four conditions”) must obey
the Information Causality principle, and therefore must obey any bounds on non-
signaling correlations which follow from the Information Causality principle (in-
cluding Tsirelson's bound). In Boxworld, where strongly non-local states such as
the PR-box are permitted, clearly no such measure of mutual information can be
attributed to pairs of systems.
The usual classical and quantum mutual informations, denoted Ic and Iq re-
spectively, are known to obey all four of the above conditions [53]. However,
neither of these represent the most basic informational primitive of their respec-
tive theory; rather, they are often both defined as the same function of the classical
or quantum entropy:
I(X : Y ) = H(X) +H(Y )−H(X, Y ). (4.48)
This suggests that it is worth exploring the principle of Information Causality for
measures of mutual information which are defined in a similar way, i.e. I˜(X :
Y ) = H˜(X) + H˜(Y ) − H˜(X,Y ), where H˜ is some measure of entropy in a
general probabilistic theory. Measures of entropy in the framework of general
probabilistic theories have already received attention in the literature, including
their relation to mutual information and Information Causality [38-40]. Unsur-
prisingly, given the above comments, the measures of entropy considered have
some intuitive properties, but tend to disobey one or more natural principles.
Suppose then that a mutual information I˜ between two systems is defined in
terms of an entropy function H˜ on individual systems as in (4.48). What properties
154
must the entropy satisfy in order that I˜ satisfies the four assumptions required in
the derivation of Information Causality? Since (4.48) is symmetric in X and Y ,
I˜ will automatically satisfy Symmetry. If H˜ reduces to the classical entropy Hc
whenX is classical, then I˜(X : Y )will also satisfy Consistency. The Chain Rule,
which we earlier commented was without immediate physical significance, also
follows straight from (4.48):
I˜(X, Y : Z)− I˜(X : Z) = I˜(Y : X,Z)− I˜(X : Y )
⇐⇒
[
H˜(X, Y ) + H˜(Z)− H˜(X,Y, Z)
]
−
[
H˜(X) + H˜(Z)− H˜(X,Z)
]
=
[
H˜(Y ) + H˜(X,Z)− H˜(Y,X,Z)
]
−
[
H˜(X) + H˜(Y )− H˜(X,Y )
]
⇐⇒ H˜(X) + H˜(X, Y ) + H˜(X,Z)− H˜(X,Y, Z)
= H˜(X) + H˜(X, Y ) + H˜(X,Z)− H˜(X,Y, Z) (4.49)
Thus the awkward-looking Chain Rule simply disappears, when everything is
formulated in terms of entropies rather than mutual information. We are left with
the Data Processing inequality; for any two systems X and Y , and an allowed
transformation τ on system Y alone:
I˜(X : Y ) ≥ I˜(X : τ(Y ))
⇐⇒ H˜(X) + H˜(Y )− H˜(X,Y ) ≥ H˜(X) + H˜(τ(Y ))− H˜(X, τ(Y ))
⇐⇒ H˜(X, τ(Y ))− H˜(τ(Y )) ≥ H˜(X,Y )− H˜(Y )
⇐⇒ H˜(X|τ(Y )) ≥ H˜(X|Y ) (4.50)
where we identify the generalised conditional entropy,
H˜(X|Y ) = H˜(X, Y )− H˜(Y ), (4.51)
155
and interpret it as the uncertainty about the state of systemX , given knowledge of
the state of system Y . The inequality (4.50) has an intuitive physical interpretation
in this sense: one cannot learn more about systemX by performing a transforma-
tion on system Y alone, hence one's uncertainty about X , given Y , should not
decrease just because of a local transformation performed at Y .
The constraint (4.50) can also be written in the following way:
H˜(X, τ(Y ))− H˜(X,Y ) ≥ H˜(τ(Y ))− H˜(Y )
⇐⇒ ∆τH˜(X,Y ) ≥ ∆τH˜(Y ), (4.52)
where τ is a local operation at Y , and ∆τH˜ represents the change in entropy due
to τ . In this form, the inequality is a statement about changes in entropy rather
than about the conditional entropy, thus has the advantage that one does not need
to introduce the idea of conditional entropy. The inequality (4.52) also has an
intuitive physical interpretation: local transformations may generate uncertainty
on a bipartite level which is greater than the uncertainty generated only locally
(for example, the transformation may destroy correlations between the systems).
We have thus demonstrated that the Information Causality principle holds in
any theory that permits an entropy H˜ satisfying two very reasonable properties:
(i) Consistency If system X is classical, H˜(X) reduces to the classical entropy
of the probability distribution on X .
(ii) Ancillary Evolution For any local transformation τ on system Y ,
H˜(X|τ(Y )) ≥ H˜(X|Y ). (4.53)
For the Information Causality principle to follow from these principles, it is
in fact also necessary for the theory to contain certain local transformations. In
Section 2.5.4 we assumed that any transformation which maps the set of allowed
states to itself in a convex-linear fashion is part of the theory. However, there is
no overriding reason why the theory should not contain a strict subset of all such
156
transformations. Indeed, one might conceive of a physical theory whose states
are the full set of Boxworld states, yet for which no transformations exist, either
from a system to itself or from one system to another. In such a theory, Ancillary
Evolution holds for everymeasure of entropy. Therefore, even though Information
Causality is violated by this theory, both Consistency and Ancillary Evolution hold
for the measure of entropy which assigns the classical entropy to states of classical
systems, and 0 to states of all other systems.
It is interesting to determine the minimal set of transformations which a theory
must allow, alongside admitting an entropy that obeys Consistency and Ancil-
lary Evolution, in order for the Information Causality principle to hold. This is
achieved by examining where the Data Processing inequality is used in the proof
of the Information Causality principle in terms of the mutual information. There
are three types of transformation to which the Data Processing inequality is ap-
plied: ``tracing out'' of a system; ``constructing'' a system, i.e. a transformation
from a classical system (possibly an ``empty'' system denoted by ∅) to any other
type of system; and ``measuring'' a system, i.e. a transformation from any type of
system to a classical system. So long as all permissible transformations belonging
to these classes are contained in the theory, then Information Causality will indeed
follow from Consistency and Ancillary Evolution.
Nevertheless, this reduction in the number and complexity of properties moti-
vates an attempt to reformulate - and derive - the Information Causality principle
purely in terms of entropies; this is the topic of Section 4.3.1. In Section 4.3.2 we
then give a brief overview of some attempts at introducing measures of entropy
into general physical theories, and discuss how they relate to our own perspective
on the Information Causality principle.
4.3.1 An entropic information causality
Before deriving an entropic version of the Information Causality principle, we
deduce a number of standard inequalities which must hold for any measure of
entropy satisfying the Consistency and Ancillary Evolution properties from the
157
previous section:
Subadditivity For systems X and Y , if τ : Y → ∅ is the “tracing out” map, then
by Ancillary Evolution:
H˜(X|∅) ≥ H˜(X|Y ). (4.54)
The entropy ofX given a traced-out system must be the same as the entropy
of X , hence:
H˜(X) ≥ H˜(X, Y )− H˜(Y )
⇒ H˜(X, Y ) ≤ H˜(X) + H˜(Y ). (4.55)
IfX and Y are independent systems, then there is a local map corresponding
to the preparation of the state of Y independent ofX , ω : ∅ → Y . Ancillary
Evolution then implies thatH(X|Y ) ≥ H(X|∅); combining this with (4.55)
gives H˜(X,Y ) = H˜(X) + H˜(Y ).
Strong Subadditivity For three systems X1, X2 and Y , if τ : (X2, Y ) → Y
is the marginalization map, then it follows from Ancillary Evolution that
H˜(X1|X2, Y ) ≤ H˜(X1|Y ). Hence,
H˜(X1, X2|Y ) = H˜(X1, X2, Y )− H˜(Y )
= H˜(X1, X2, Y )− H˜(X2, Y ) + H˜(X2, Y )− H˜(Y )
= H˜(X1|X2, Y ) + H˜(X2|Y )
≤ H˜(X1|Y ) + H˜(X2|Y ). (4.56)
The process may be iterated for a larger number of systems:
H˜(X1, . . . , Xn|Y ) ≤ H˜(X1|Y ) + . . .+ H˜(Xn|Y ). (4.57)
Classical Positivity One's uncertainty about the state of a classical system X is
158
never negative, even when one conditions on another system Y (which may
or may not be independent of X , or classical itself):
System X is classical⇒ H˜(X|Y ) ≥ 0. (4.58)
To prove this, suppose that ψ is the state of system X,. Note that the state
ofX is described by a probability distribution on a finite set E of outcomes
for a single fiducial measurement. For each outcome e ∈ E, there is an
associated probability pe, a deterministic state xe on systemX which outputs
e with certainty, and a corresponding reduced state ψe of Y . The state ψ
is therefore the convex combination of the product states xe · ψe (i.e. the
product state whereX is in state xe and Y is in state ψe) weighted according
to pe.
Suppose that the systemX ′ is identical toX , and that the joint systemX,X ′
is initially in a convex combination of the product states xe · x′e, weighted
according to pe. On system X ′, let τ : X ′ → Y be the transformation
described by performing the fiducial measurement E on system X ′, and
preparing system Y in state ψe if outcome e is obtained. The state of system
X,Y is now the convex combination of the product states xe · ψe weighted
according to pe, i.e. after the transformation τ , the state of system X, Y is
ψ.
By the Consistency postulate, before the transformation τ is performed, the
conditional entropy H˜(X|X ′) reduces to the classical conditional entropy:
H˜(X|X ′) = Hc(X|X ′) = Hc(X,X ′)−Hc(X ′). (4.59)
Since X ′ is perfectly correlated with X , Hc(X,X ′) = Hc(X ′) therefore
H˜(X|X ′) = 0. Then by the Ancillary Evolution postulate:
H˜(X|Y ) = H˜(X|τ(X ′)) ≥ H˜(X|X ′) = 0. (4.60)
159
The Subadditivity, Strong Subadditivity and Classical Positivity properties
must be obeyed by any measure of entropy which satisfies the Consistency and
Ancillary Evolution postulates. Using these three natural properties, it is quite
easy to derive an Entropic Information Causality principle. Recall that in the In-
formation Causality game, Alice is given a bit-string a of length n and Bob is
given a number 1 ≤ k ≤ n. After Alice transmits an m-bit message x to Bob,
Bob outputs β(k), his best guess at the value of ak, the kth bit of Alice's string.
Unlike in Section 4.1.3, we will no longer assume that Alice's input a is uniformly
chosen.
Suppose that Alice and Bob share a state of some system A,B, in a general
probabilistic theory that admits some measure of entropy H˜ satisfying the Con-
sistency and Ancillary Evolution postulates. Whatever their strategy, Bob's final
guess β(k) is ultimately deduced from his system B and the classical system X
which holds the message x; therefore there exists a transformation from X, B to
β(k), and so by Consistency and Ancillary Evolution:
∑
k
Hc(ak|β(k)) =
∑
k
H˜(ak|β(k)) ≥
∑
k
H˜(ak|x, B). (4.61)
Writing a as a multipartite state a1, . . . , an and using the iterated Strong Subaddi-
tivity property (4.57),
∑
k
H˜(ak|x, B) ≥ H˜(a|x, B)
= H˜(a, x, B)− H˜(x, B). (4.62)
Then by Subadditivity on the bipartite system x, B,
H˜(a, x, B)− H˜(x, B) ≥ H˜(a, x, B)− H˜(B)− H˜(x). (4.63)
160
Introducing a term H˜(a)− H˜(a) and using the fact that H˜(a, B) = H˜(a)+ H˜(B)
since a and B are independent,
H˜(a, x, B)− H˜(B)− H˜(x) = H˜(a, x, B)− H˜(B)− H˜(a) + H˜(a)− H˜(x)
= H˜(x, a, B)− H˜(a, B) + H˜(a)− H˜(x)
= H˜(x|a, B) + H˜(a)− H˜(x). (4.64)
Since a and x are classical, and using the positivity of H˜ on classical systems,
H˜(x|a, B) + H˜(a)− H˜(x) ≥ Hc(a)−Hc(x). (4.65)
Putting this all together gives an entropic formulation of the Information Causality
principle:
∑
k
Hc(ak|β(k)) ≥ Hc(a)−Hc(x). (4.66)
The standard Information Causality principle
∑N
k=1 Ic(ak : β(k)) ≤ m can be
deduced from (4.66) as follows: since x is anm-bit string
m ≥ Hc(x)
≥ Hc(a)−
∑
k
Hc(ak|β(k)). (4.67)
If the string a is uniformly random, then Hc(a) =
∑
kHc(ak):
m ≥
∑
k
[Hc(ak)−Hc(ak|β(k))]
=
∑
k
Ic(ak : β(k)). (4.68)
This entropic formulation of Information Causality has several advantages:
firstly, it deals in entropy, which is a more basic informational quantity thanmutual
information, and has a more immediate interpretation as the uncertainty inherent
in the state of a system. Secondly, it leads to an even more pleasing statement re-
161
garding measures of information in general physical theories. If a general physical
theory admits some measure of entropy satisfying Consistency and Ancillary Evo-
lution (which are both very reasonable demands), then the Information Causality
principle (and subsequent bounds on non-locality) must hold in that theory. To
put it more strongly, Tsirelson's bound cannot be violated in any physical theory
which admits an extension of classical entropy that behaves reasonably under dy-
namics. Though this fact already follows from the proofs contained in the original
Information Causality paper [51], it is striking to see it presented in this way.
Both the entropic Information Causality principle (4.66) and the quadratic bias
bound (4.39) can be seen as fundamental restrictions on information processing
which, when applied to the full set of non-signaling correlations, recovers many of
the bounds on quantum non-locality. However, whereas the quadratic bias bound
suffers from a lack of a simple, physical interpretation, it is relatively easy to mo-
tivate equation (4.66) from physical considerations. After his guess β(k), the re-
maining uncertainty that Bob has about Alice's inputs should be no less than his
original uncertainty Hc(a), minus the information in the message x.
4.3.2 Entropy in general physical theories
We now describe some measures of entropy which have previously been intro-
duced for general probabilistic theories such as Boxworld. These measures are
invariably seen as a generalisation of the Shannon entropy for classical systems,
and the Consistency property will hold. Hence, in strongly non-local theories such
as Boxworld, it is expected that Ancillary Evolution will not hold, and that this
measure of entropy will behave poorly under some dynamical transformation.
It is most convenient to introduce these entropies using the convex set rep-
resentation of general probabilistic theories, described in Chapter 2. Recall that
states and effects are assigned vectors s and e respectively, such that the probabil-
ity of obtaining the outcome corresponding to e, for a system in state s, is given
by the inner product 〈e, s〉. A measurement M is described by a set of effects
{e1, . . . er} which sum to the unit effect:
∑
i ei = u.
162
Measurement entropy
For any state s, a measurement M defines a classical probability distribution
M(s) = {〈ei, s〉}; it is tempting to define a measure of entropy which is the
minimum of the classical Shannon entropy of this distribution over all measure-
ments M. However, this poses a problem: an entropy defined in this way could
always be decreased by simply “merging” effects together and considering the cor-
responding outcomes to be the same - at the extreme, the unit measurement {U}
consisting of just the unit effect, will always give zero entropy.
In order to counter this problem, recall the notion of a refinement of a mea-
surement given in Section 2.5.2, and note that “merging” effects as described
above leads to a coarse-graining of the measurement. Therefore we wish to con-
sider Shannon entropies of outcome distribution only over pure measurements, i.e.
those which are made up of (non-parallel) extreme-ray effects.
We will recap the notion of refinement for convenience: measurementM2 =
{f1, . . . , fs} is a refinement ofM1 if there exists a function φ : M2 → M1 such
that each ei is the sum of those fj which map to it:
ei =
∑
j:φ(fj)=ei
fj. (4.69)
Thus, every measurement is a refinement of the unit measurement, which is com-
pletely non-informative; in general, the refined measurementM1 will be strictly
more informative than M2. However, simply splitting up e1 into two effects
f1 = f2 = 12e1 will result in a refinement which is not more informative (this
could be achieved in practice by a post-processing of results: if outcome 1 is ob-
tained, the actual output is uniformly assigned one of two distinct, new values).
To avoid this issue, the refinement is said to be trivial if φ(fj) = ei ⇒ fj = λjei
for real numbers λj . A measurement is fine-grained if it admits no non-trivial
fine-grainings. LetM∗ denote the set of all fine-grained measurements.
The measurement entropy Hme of a state is the infimum of the Shannon en-
163
tropies of the outcome distributions for all fine-grained measurements [38, 40]:
Hme(s) = inf
M∈M∗
Hc(M(s)). (4.70)
In a classical system, a fine-grained measurement consists only of vectors
which are scalar multiples of the effects belonging to the single fiducial measure-
ment. It is easily checked that such a measurement is a trivial refinement of the
fiducial measurement itself, and that the outcome entropy is therefore minimized
for the fiducial measurement. Thus the measurement entropy reduces to the Shan-
non entropy for classical systems. In a quantum system, a fine-grained measure-
ment consists of measurement operators which are scalar multiples of rank-one
projectors; it turns out also that the measurement entropy reduces to the von Neu-
mann entropy [38, 40], however this is not immediately obvious. In Boxworld, a
fine-grained measurement is one in which all effects are tensor products of fiducial
effect vectors, i.e. ei = X(1)a1|x1 ⊗ · · · ⊗X
(n)
an|xn .
All three of the above theories share the property that whenever e(1) is an
extreme-ray effect on system 1 and e(2) is an extreme-ray effect on system 2, then
e(1) ⊗ e(2) is an extreme-ray effect of the joint system (in Boxworld, as with all
max-tensor product theories, these are the only joint extreme-ray effects). Sup-
pose that M(1) = {e(1)i } and M(2) = {e
(2)
j } are fine-grained measurements
on systems 1 and 2 respectively. For any joint state s, the outcome distribution
M(1) ⊗M(2)(S) = {〈e(1)i ⊗ e
(2)
j , s〉} has as its marginal on system 1 the outcome
distribution M(1)(s) (where s(1) is the reduced state on system 1), and similarly
for system 2. Therefore, by subadditivity of the Shannon entropy,
Hc(M(1) ⊗M(2)(s)) ≤ Hc(M(1)(s(1))) +Hc(M(2)(s(2))). (4.71)
By considering tensor products of fine-grained measurements which approach the
measurement entropy to arbitrary precision on the RHS, it is clear that the mea-
surement entropy is indeed subadditive in theories for which the aforementioned
property holds.
164
On the other hand, the measurement entropy fails to obey strong subadditivity
for some tripartite Boxworld systems [38, 40]. This is demonstrated in [38] using
a similar procedure as employed in Section 4.1.3 in order for Bob to perfectly
guess one of a pair of random bits given to Alice. Suppose that the random bits
given to Alice are represented by two classical systems A1 and A2, whilst Bob is
in possession of a g-bit system B. In order to obtain a random bit string (a1, a2),
Alice simply measures her systems using the single fiducial measurements x1 and
x2. Depending on whether Bob wishes to obtain the value of the first bit or the
second, his measurement choice is given by x3 = 1 or x3 = 2 respectively. For
Bob's output a3, suppose that the tripartite system is in the state represented by:
P (a1, a2, a3|x1, x2, x3) =
1
4
δ(a3 = ax3). (4.72)
The only system with more than one fiducial measurement choice is B, hence to
show that this is non-signaling, it suffices to observe that the reduced systemA1A2
is in a completely uniform state, no matter the value of x3. The reduced state on
B is the completely mixed state of a g-bit, hence Hme(B) = 1. Since Bob can
choose his measurement in order to correlate exactly with either a1 or a2, then
Hme(A1B) = Hme(A2B) = 1. Any measurement on the total system A1A2B
generates 4 possible outcomes with uniform probability, henceHme(A1A2B) = 2.
Therefore,
Hme(A1A2B) +Hme(B) = 3 > 2 = Hme(A1B) +Hme(A2B). (4.73)
The measurement entropy does not obey Classical Positivity in Boxworld, ei-
ther. Consider a bipartite system composed of a classical bit system A, and a sys-
tem B with two measurement choices, each of which admits three outputs, which
is in the state represented by the following matrix:
165
x2 = 1 x2 = 2
a2 = 1 a2 = 2 a2 = 3 a2 = 1 a2 = 2 a2 = 3
a1 = 1 1/2 0 0 0 1/4 1/4
a1 = 2 0 1/4 1/4 1/2 0 0
Now consider a bipartite measurement which takes the following form: first
measure system A to obtain a1, then measure system B using the measurement
choice x1 = a1. This guarantees the output a2 = 1 on the second system, and a1
takes each of its possible valueswith probability 12 . HenceH
me(AB) ≤ Hc({12 ,
1
2}) =
1. However, each of the fine-grained measurements on the reduced state on sys-
tem 2 results in an outcome distribution {12 ,
1
4 ,
1
4}, which has entropy 1.5. Thus
Hme(A|B) = Hme(A|B) − Hme(B) ≤ −0.5, in violation of the Classical Posi-
tivity principle.
Mixing Entropy
A second measure of entropy for general probabilistic theories which reduces to
the Shannon and von Neumann entropies on classical and quantum systems re-
spectively, is the mixing entropy [38, 40]. Every decomposition of a state s into
pure states:
s =
r
∑
i=1
pisi (4.74)
corresponds to a probability distribution p = {pi}. Suppose that P(s) is the set of
probability distributions that result from decompositions of s in this manner. The
mixing entropy Hmi of s is the infimum of the classical entropy over this set:
Hmi(s) = inf
p∈P(s)
Hc(p). (4.75)
166
As shown in [38], the mixing entropy is not subadditive (let alone strongly
subadditive) in Boxworld. An explicit counterexample is the following state s of
a system comprising two g-bits:
x2 = 1 x2 = 2
a2 = 1 a2 = 2 a2 = 1 a2 = 2
x1 = 1
a1 = 1 1/4 3/8 3/8 1/4
a1 = 2 3/8 0 1/4 1/8
x1 = 2
a1 = 1 1/4 3/8 1/4 3/8
a1 = 2 3/8 0 3/8 0
The reduced state on either system is the state which outputs 1 with probability
5
8 and 2 with probability
3
8 , regardless of the measurement choice. This confirms
that the state is non-signaling, and demonstrates that
Hmi(A) = Hmi(B) = 0.95... . (4.76)
Recall that there are 16 local deterministic and 8 non-local PR-box variants
making up the extreme states of the bipartite system AB (the PR-box variants are
obtained from the PR-box by local relabellings of the inputs and outputs). Out
of these 24 extreme states, only 4 of the local deterministic states and 1 PR-box
variant have positive entries only where s also has positive entries. For each of
these extreme states si, it can then be checked that for pisi to have all its positive
entries less than or equal to those of s, we must have pi ≤ 14 for every i. This
implies that any decomposition of s consists of at least 4 extreme states, none of
whose coefficients is greater than 14 . Hence
Hmi(AB) ≥ 2 > Hmi(A) +Hmi(B). (4.77)
167
However, Classical Positivity is satisfied by the mixing entropy [40]. If system
A is classical, then for any system B, the extreme states of system AB are of
the form ωA ⊗ ωB, where ωA are ωB are reduced states on systems A and B
respectively. Then any decomposition of a state s of system AB takes the form
s =
∑
i,j
pijωAi ⊗ ωBj =
∑
j
(
∑
i
pijωAi
)
⊗ ωBj , (4.78)
where ωAi correspond to the outcomes of the single fiducial measurement on sys-
tem A. Writing qj =
∑
i pij , one decomposition of the reduced state on system B
is
sB =
∑
j
qjωBj . (4.79)
Any decomposition of s generates an entropy Hc(p) ≥ Hc(q) ≥ Hmi(B), there-
fore Hmi(AB) ≥ Hmi(B) and so Hmi(A|B) ≥ 0.
Relation to Information Causality
Both the measurement and mixing entropies are extensions of the von Neu-
mann entropy for quantum systems, and therefore obey all three of Subadditivity,
Strong Subadditivity and Classical Positivity when confined to only classical or
quantum theory. In Boxworld however, they both violate two of these natural
properties; since they both obey Consistency, they both must certainly violate An-
cillary Evolution. The entropic formulation of Information Causality gives two
interesting extra perspectives on this: firstly, because Boxworld allows violations
of Tsirelson's bound, there simply can be no measure of entropy that will be “per-
fect” in the sense of obeying both Ancillary Evolution and Consistency. Even if
there exists some measure of entropy in Boxworld which satisfies Subadditivity,
Strong Subadditivity and Classical Positivity, it will inevitably violate some other
natural consequence of Ancillary Evolution and Consistency. Secondly, despite
already possessing many natural properties, both the measurement and mixing en-
tropies will always disobey Ancillary Evolution in any theory allowing a violation
168
of Tsirelson's bound.
On one hand, this is slightly disappointing: without natural extensions of en-
tropies which obey natural laws, we may find it more difficult to explore informa-
tion processing in more general physical paradigms. Ordinarily, counter-intuitive
phenomena provide us with something to learn from rather than to be scared of,
however in the case of Boxworld it seems that even the most fundamental tenets
of information processing run into intractable obstacles. On the other hand, this
provides exciting evidence that entropy plays a deep role in limiting the amount
of non-locality which is achievable in nature. The existence of a reasonable mea-
sure of entropy may even be a more fundamental physical requirement than the
standard Hilbert-space quantum postulates.
4.4 Discussion
We have explored two different perspectives on the Information Causality game
presented in [51]. The first perspective is to consider the probability of success
as the figure of merit: we see that quantum theory gives an advantage which is
not reflected in the original figure of merit I , for which quantum and classical
theory perform equally well. Investigating how these probabilities are involved
in deriving Tsirelson's bound from Information Causality leads us to a quadratic
quantum bound on the biases achieved, given the different inputs for Bob in the
non-local inner product game:
∑
y
E2y ≤ 1. (4.80)
This inequality holds for any probability distribution over Bob's inputs, and hence
applies to the version of Information Causality without any communication from
Alice to Bob. This is another example of a bound which quantum and classi-
cal correlations can both saturate, but stronger non-local correlations can violate,
and from which one can recover the same sections of quantum boundary within
169
two-dimensional slices of the non-signaling polytope. Furthermore, the fact that
quantum correlations allow one to achieve any set of biases satisfying this rule
suggests that it captures a significant amount of information about the set of quan-
tum correlations. The question remains whether quadratic inequalities such as
this are sufficient to completely recover the quantum boundary, and whether other
quadratic inequalities can tell us useful things about quantum capabilities in other
non-local tasks.
The second perspective is to view the Information Causality inequality not
just as a constraint on possible physical theories but also as a consequence of the
existence of a 'good' measure of entropy. In this direction, we have shown that In-
formation Causality can be derived given any extension of the entropy from clas-
sical to more general systems which satisfies H˜(X|τ(Y )) ≥ H˜(X|Y ) under local
transformations on system Y . Conversely, any theory which violates Information
Causality (such as Boxworld) cannot have anymeasure of entropy which obeys the
above evolution law and agrees with the Shannon entropy for classical systems. In
particular, any general probabilistic theory which violates Tsirelson's bound can-
not possess a measure of entropy which obeys these conditions; this strange link
between non-locality and entropy is definitely worth exploring further.
Given the results of Section 4.3, as well as those of [38-40], it seems that the
existence of a “good” entropy for quantum theory, which shares so many of the
properties of the classical entropy, is very special within the class of general prob-
abilistic theories. Are there other theories for which one can define a “reasonable”
measure of entropy satisfying Consistency and Ancillary Evolution, or is this a
defining feature of quantum theory? Of course, we could consider a general prob-
abilistic theory in which states and measurements are represented by the same
Hilbert-space objects as quantum theory, but which allows only a strict subset of
quantum states. Trivially, the restriction of the von Neumann entropy would sat-
isfy the same properties as for full quantum theory.
Amore interesting question is whether a reasonable entropy is definable in the-
ories which cannot be simulated by quantum theory, for example those which have
170
non-local correlations that are unattainable via measurements of quantum states.
Since Information Causality may be deduced from simple entropic properties, the
existence of a reasonable entropy potentially places stronger bounds on quantum
theory than Information Causality does alone. It would be interesting to look for
other games in which possessing a reasonable measure of entropy precludes one
from performing better than is possible classically.
171
Chapter 5
Reversible Boxworld dynamics
To do mathematics is to engage in an act of discovery and conjecture,
intuition and inspiration; to be in a state of confusion— not because
it makes no sense to you, but because you gave it sense and you still
don't understand what your creation is up to; to have a breakthrough
idea; to be frustrated as an artist; to be awed and overwhelmed by
an almost painful beauty; to be alive, damn it.
“A Mathematician's Lament”
Paul Lockhart
172
In this chapter we explore the class of reversible dynamics in the general prob-
abilistic theory known as Boxworld. Recall from Section 2.5.4 that the allowed
transformations of a general probabilistic theory are determined once its state space
has been defined, and that reversible transformations are those for which there is
an allowed inverse transformation. Thus this chapter is concerned with the set of
reversible transformations that are permitted by the state space structure of Box-
world. As this state space is constrained only by the conditions of non-signaling
and local tomography for composite systems (as well as the probabilistic laws of
positivity and normalization), Boxworld represents something of a canonical ex-
ample of a general probabilistic theory, and is a natural benchmark against which
to analyse various features of quantum theory. One such feature is the principle
of reversible transitivity, which states that any pure state of either an individual
or composite system may be reversibly transformed to any other. This holds triv-
ially in quantum theory; indeed, any set of n orthonormal states in n-dimensional
Hilbert space may be linearly mapped to any other such set, and this mapping
extends naturally to a unitary transformation on the Hilbert space.
The importance of reversible transitivity as a defining feature of quantum the-
ory is indicated by its ubiquitous use (or the use of stronger versions of it) in various
so-called “derivations” of quantum theory from reasonable axioms [2, 28, 33-35,
87]. Roughly speaking, in these derivations the authors consider reasonable (or
physically motivated) constraints on the systems of general probabilistic theories,
much the same as the types of systems we described in Chapter 2. It is then shown
that some small set of these constraints is sufficient to single out quantum theory.
By “single out” we mean that in any theory which conforms to these constraints,
the systems represent finite-dimensional quantum systems, and individual systems
combine to form composite systems in exactly the manner proscribed by the tensor
product of their respective Hilbert spaces.
Reversible transitivity is one such constraint that applies to systems of general
probabilistic theories, and tends to play a crucial role in the process of singling
out quantum theory. For example, if reversible transitivity is assumed, then every
173
system has a well-defined maximally mixed state analogous to the quantum max-
imally mixed state. To show this, it is often first argued on physical grounds that
the set of reversible transformations on any system forms a compact, topological
group T , which therefore admits a Haar measure. This allows one to define on
any system a unique state,
ω =
∫
T
T (s) dµ(T ), (5.1)
for an arbitrary choice of pure state s (note that ω is independent from s, again due
to reversible transitivity). The maximally mixed state ω is equivalent in various
ways to the well-known quantum maximally mixed state 1nI in a Hilbert space of
dimension n: ω is invariant under reversible transformations, ω gives a non-zero
outcome probability for every non-zero effect, and no state s is perfectly distin-
guishable from ω in a single experiment.
We have stated that reversible transitivity is a useful constraint, but have not
yet shown why it might be a reasonable one. To motivate reversible transitivity on
physical grounds, one might appeal to the notion of conservation of information
[28, 33], i.e. that information is ultimately neither lost nor gained in the course of
any physical process. If a systemA is in a pure state sA, then it is always possible to
transform A into some other pure state tA: if necessary, one can perform a “throw
away and replace” operation, in which every state inA is irreversiblymapped to tA.
However, conservation of information stipulates that -- in order for information not
to be lost -- the system A must be coupled with the environment E in such a way
that the global transformation on AE is reversible. Before the transformation, the
global systemAE is in some state sAE whose marginal on systemA is sA; after the
transformation, the global systemAE is in state tAE , whose marginal on systemA
is tA. Therefore the state of system A may be transformed from an arbitrary pure
state sA to any other pure state tA in a manner which is ultimately reversible.
It must be conceded that this does not quite grant us a reason to expect re-
versible transitivity in the form we have defined it, since it does not guarantee that
174
a reversible transformation exists on system A alone which maps state s to state
t. In fact, the author is not aware of a conclusive, natural reason to expect that the
principle of reversible transitivity is obeyed by the universe; despite this, the prin-
ciple of reversible transitivity holds a far greater immediate appeal than the stan-
dard Hilbert-space quantum postulates. Moreover, the conservation of information
does at least suggest that it is interesting to study which reversible transformations
are permitted in general probabilistic theories, since it implies that all transforma-
tions must be reversible as long as the system is taken to be “large enough”. A
demand for reversibility on some level would also open up links to the field of
thermodynamics, in which closed systems undergo reversible processes (the links
between quantum information theory, quantum foundations and thermodynamics
is a recent area of investigation, for example see [88-90]).
As demonstrated by Proposition 7 in Section 2.5.5, the property of reversible
transitivity does not hold in Boxworld. Proposition 7 shows that reversible trans-
formations map pure product states to pure product states: therefore, a pure prod-
uct state cannot be reversibly mapped to a pure entangled state (such as the PR-
box state). However, this result does not give us a full characterisation of re-
versible Boxworld transformations: entanglement cannot be reversibly generated,
but do there exist interesting reversible dynamics that begin and end with entan-
gled states? Recently, it has been shown that, for a restricted class of composite
Boxworld systems, the set of reversible transformations does not allow for any
interesting dynamics [4]. Specifically, for composite systems in which all subsys-
tems have the same number of local fiducial measurements, and the same number
of outcomes for each measurement, the only allowed reversible transformations
are compositions of relabellings of measurement choices and outcomes, and per-
mutations of subsystems. These transformations can be described easily via the
outcome distribution representing the state. For example, the transformation of
a system comprising two g-bit subsystems, which permutes the two subsystems
175
then switches the outputs of subsystem 2, is described by:
P (a1, a2|x1, x2) → P (a2, a1 ⊕ 1|x2, x1) . (5.2)
These transformations are trivially convex-linear, i.e. they respect probabilistic
mixtures of states, and they map allowed states to allowed states in a reversible
manner. That they are the only possible transformations in the Boxworld systems
considered is surprising, and suggests a powerful link between a theory's state
space and its set of reversible transformations. What is it about these systems
that reduces the reversible transformations to trivial relabellings? Is it related to
the polytopic nature of the state space, its symmetries, or simply the fact that so
many states are allowed? By developing techniques to extend this result to the
case of non-identical subsystems, we can hope to shed some light on the answers
to these questions, and to learn something about the significance of the reversible
transitivity which quantum theory enjoys.
We begin in Section 5.1 by recapping themethod of proof used in [4] to demon-
strate that all reversible dynamics are trivial in the case that all subsystems have the
same number of measurements, each with the same number of outcomes. In Sec-
tion 5.2 we introduce and develop a “tabular” representation of Boxworld states
and effects, which has previously been used in [5] to explore the structure of Box-
world measurements. By considering the action of reversible transformations in
this tabular regime, it is not difficult to recover the bipartite version of the general-
isation of [4] to non-identical subsystems. By considering these tabular represen-
tations, one builds up an intuition with which the full generalisation to multipartite
systems becomesmuchmore tractable. In Section 5.3we use this intuition to set up
some important results about decompositions of Boxworld effects, then in Section
5.4 we employ these results to obtain Theorem 13, the main result of the chap-
ter. In Section 5.5 we explore the limits of extending this result to other general
probabilistic theories: namely, we answer in the negative the question of whether
a lack of interesting reversible dynamics is due solely to the polytopic structure of
the state space.
176
Much of the work in this chapter was undertaken collaboratively with Anthony
Short and published in [43], although the recognition of the importance of multi-
form effects, and the final proof of the result in Section 5.4, are due to the author.
The probabilistic model constructed in Section 5.5 is a result of the author's origi-
nal, unpublished work.
5.1 Identical subsystems: review
Recall from Chapter 2 that an individual Boxworld system is characterised by a
finite number of fiducial measurements, each of which has a fixed, finite number
of possible outcomes. Boxworld systems combine under the max-tensor product,
so that the joint states of composite Boxworld systems are all those non-signaling
states which are normalised and produce positive probabilities for all local fiducial
effects. In the case where all subsystems are identical, and all fiducial measure-
ments have the same number of outcomes, we can assign constants M and K to
be the number of fiducial measurements per subsystem and the number of out-
comes per measurement respectively. For each subsystem we will assign a set of
M ·K fiducial effect vectorsX(i)a|x = Xa|x of the abstract state space for each sub-
system, which in our construction will not depend on i. This set of vectors will
correspond to a valid representation of Boxworld as long as the set U ∪ {Xa|x}
(where 1 ≤ x ≤ M and 1 ≤ a ≤ K − 1) is linearly independent, and for each x,
∑
aXa|x = U .
In Boxworld, the composite effect cone is generated by tensor products of the
extreme rays of the local effect cones. These extreme rays are the local fiducial
effect vectors, hence the composite fiducial effects e are the tensor products of
local fiducial effects:
e = Xa1|x1 ⊗ · · · ⊗XaN |xN . (5.3)
Recall from Proposition 6 of Section 2.5.4 that reversible adjoint transformations
must map extreme rays of the effect cone to other extreme rays. Each reversible
transformation is therefore a permutation of the set of composite fiducial effects.
177
Since each fiducial effect may be described as a “string” of local fiducial effects,
it is convenient to consider a one-to-one correspondence between composite fidu-
cial effects and strings of fiducial effects, such that the above composite effect
corresponds to the fiducial effect string,
e˜ = (Xa1|x1 , . . . , Xan|xN ). (5.4)
The ith component of e˜ is therefore e˜i = Xai|xi . This notation allows for a conve-
nient notion of distance between fiducial effect strings.
Definition. For fiducial effect strings e˜ and f˜ , the Hamming distance between e˜
and f˜ is defined to be the number of components in which they differ, i.e.
dH(e˜, f˜) = #{i ∈ [N ] : e˜i 6= f˜i}. (5.5)
With these comments in hand, the proof proceeds via the following steps:
• The local fiducial effect vectors are explicitly constructed in a vector space
V so that any reversible adjoint transformation T † corresponds to an orthog-
onal matrix on the composite tensor product space.
• The inner product between composite fiducial effects is then the product
of the component-wise inner products between local fiducial effects, and is
preserved by T †.
• For any pair of composite fiducial effects who differ in exactly one compo-
nent, their images under T † similarly differ in exactly one component.
• The permutation induced by T † on the fiducial effect strings thus preserves
Hamming distance 1. A combinatorial theorem implies that any such per-
mutation must be composed of permutations of components, and local per-
mutations.
178
• T † therefore acts on the composite fiducial effects in an identical manner,
thus it is trivial, in the sense of being composed of permutations of sub-
systems and local permutations. If T † is trivial for this particular choice of
the fiducial effect vectors, then T † is trivial as a mapping of the composite
outcome distributions P (a1, . . . , aN |x1, . . . , xN).
We now proceed through these steps in more detail, closely following the ar-
gument given in [4].
5.1.1 Fiducial effect vectors
We now construct explicit real vectors for the fiducial effects Xai|xi of a Box-
world system that hasM fiducial measurement choices andK outcomes for each
fiducial measurement. We first construct, for each fiducial measurement choice, a
(K−1)-simplex whoseK vertices loosely represent theK outcomes of that mea-
surement, then form a direct sum of the spaces containing these simplices in order
to represent the full outcome set. We add to this direct sum a one-dimensional
space to represent the unit effect, and ensure that each fiducial effect overlaps
symmetrically with this extra dimension so that summing over the effects of any
one measurement gives the unit effect.
A (K − 1)-simplex is formed ofK vertices in RK−1, whose positions may be
defined as a set of vectors {v1, . . . , vK} equidistant from the origin, with constant
pairwise inner products. To construct such a set of vectors, consider the standard
basis vectors ei ofRK , and define c = 1K
∑K
1 ei to be their barycentre. Then using
the fact that 〈ei, c〉 = 〈c, c〉 = 1K for all i it is straightforward to check that the K
179
vectors defined by vi = (ei − c) satisfy the following equalities,
〈vi, vj〉 =
{
K−1
K (i = j)
− 1K (i 6= j)
(5.6)
〈vi, c〉 = 0 (5.7)
K
∑
i=1
vi = 0. (5.8)
The hyperplane orthogonal to c contains the vi and is isometrically isomorphic to
RK−1, hence our desired set of vectors exists also in a space of one less dimension,
and moreover {v1, . . . , vK−1} is a basis for RK−1. Note that for all 0 ≤ j, k ≤
K − 1,
〈vj|
(
K
∑
i=1
|vi〉〈vi|
)
|vk〉 =
{
K−1
K (j = k)
− 1K (j 6= k)
= 〈vj, vk〉. (5.9)
Since the vi span RK−1, it follows that
K
∑
i=1
|vi〉〈vi| = I. (5.10)
Let {ei} be the standard basis of the vector space RM . We associate each
measurement x with the basis vector ex, so that the simplex corresponding to that
measurement is embedded in a separate vector space ex ⊗ RK−1. We wish to
construct the local fiducial effect vectors in the space
(
RM ⊗ RK−1
)
⊕ R, and
associate the rightmost R with the unit effect by setting U = (0 ⊗ 0) ⊕ 1. The
local fiducial effect vectors are:
Xa|x =
(
√
M
K
ex ⊗ va
)
+
(
1
K
U
)
. (5.11)
180
From (5.8) it can be seen that for each x, we have
∑
aXa|x = U as required.
Moreover, since {ei} and {v1, . . . , vK−1} are linearly independent sets, so is the
set U ∪ {Xa|x} for 1 ≤ x ≤ M and 1 ≤ a ≤ K − 1. Therefore this choice of
fiducial effect vectors is a valid representation of a single Boxworld system. From
(5.10) it also holds that,
∑
a,x
|Xa|x〉〈Xa|x| =
M
K
∑
x
(
|ex〉〈ex| ⊗
(
∑
a
|va〉〈va|
))
+ 1
K2
∑
a,x
|U〉〈U|
= M
K
(
∑
a,x
|ex〉〈ex| ⊗ |va〉〈va|+ |U〉〈U|
)
= M
K
I. (5.12)
Similarly, for the composite fiducial effect vectors,
∑
a,x
|Xa1|x1〉〈Xa1|x1 | ⊗ · · · ⊗ |XaN |xN 〉〈XaN |xN | =
(
M
K
)N
I (5.13)
This immediately implies that any reversible adjoint transformation T † is orthog-
onal, since it acts to permute the composite fiducial effects,
T †T =
(
K
M
)N
T †
(
∑
a,x
|Xa1|x1〉〈Xa1|x1 | ⊗ · · · ⊗ |XaN |xN 〉〈XaN |xN |
)
T
=
(
K
M
)N
(
∑
a,x
|Xa1|x1〉〈Xa1|x1 | ⊗ · · · ⊗ |XaN |xN 〉〈XaN |xN |
)
= I. (5.14)
Therefore T † preserves the inner product between any pair of composite fidu-
cial effects, which is determined by the component-wise inner products between
181
local fiducial effects:
〈Xa|x, Xa′|x′〉 =
1
K2





1 (x 6= x′)
1−M (x = x′, a 6= a′)
1 +M · (K − 1) (x = x′, a = a′)
(5.15)
5.1.2 Induced fiducial effect string permutations
Recall that any reversible adjoint transformation T † induces a permutation on the
fiducial effect strings, i.e. strings whose entries are the local fiducial effects. When
M andK are constant across subsystems, and in particular whenM ≥ 2 (ensuring
that the subsystems are non-classical), then we can use the vector representation
of Section 5.1.1 to show that these permutations are highly restricted in form. The
assumption that M ≥ 2 guarantees that the inner product between any two lo-
cal fiducial effect vectors is non-zero, and is strictly negative exactly when they
correspond to different outcomes of the same measurement.
Lemma 5. On a composite Boxworld system withM ≥ 2 local measurements and
K local outcomes, suppose that T is a reversible transformation, with an induced
permutation T˜ † on fiducial effect strings. Then T˜ † preserves a Hamming distance
of 1 between pairs of fiducial effect strings, i.e., for any composite fiducial effects
e and f with corresponding fiducial effect strings e˜, f˜ ,
dH(e˜, f˜) = 1 =⇒ dH(T˜ †(e˜), T˜ †(f˜)) = 1 (5.16)
Proof. Suppose e˜ =
(
Xa1|x1 , . . . , XaN |xN
)
and f˜ =
(
Xa′1|x′1 , . . . , Xa′N |x′N
)
are two
fiducial effect strings that differ only in component j. The inner product between
the composite fiducial effects e and f is the product of the component-wise inner
products between local fiducial effects:
〈e, f〉 =
N
∏
i=1
〈Xai|xi , Xa′i|x′i〉. (5.17)
182
From (5.15), the greatest value that K2n〈e, f〉 can take is (1 + M(K − 1))N ,
which occurs when ai = a′i and xi = x′i for all i. Since the only negative term
in (5.15) is 1 − M (when x = x′, a 6= a′), the least value that can be taken is
(1 − M)(1 + M(K − 1))N−1. As M ≥ 2, this occurs if and only if e and f
differ in one component, on which they represent different outcomes of the same
measurement. In this case, since T is orthogonal,
K2n〈T †(e), T †(f)〉 = (1−M)(1 +M(K − 1))N−1, (5.18)
therefore T˜ †(e˜) and T˜ †(f˜) must also differ in exactly one component.
It remains to show that if e˜ and f˜ differ in one component, on which they
represent outcomes of different measurements, i.e. xj 6= x′j , then again T˜ †(e˜) and
T˜ †(f˜) differ in exactly one component. In this case
K2n〈e, f〉 = K2n〈T †(e), T †(f)〉 = (1 +M · (K − 1))N−1. (5.19)
This alone does not suffice to show that T †(e) differs in only one component from
T †(f), since for example it may be that (1 −M)2 = (1 +M · (K − 1)). How-
ever, consider any local pure state s(j) on subsystem j which gives outcome aj for
measurement xj and outcome a′j for measurement x′j (outcomes for the remaining
measurement choices may be assigned arbitrarily). For the remaining subsystems
with i 6= j, let s(i) be any pure state which gives outcome ai for measurement
xi. Then the pure product state s = s(1) ⊗ · · · ⊗ s(N) is constructed such that
〈s, e〉 = 〈s, f〉 = 1, hence e+ f E+ UN .
Recall from Section 2.5.4 that a reversible adjoint transformation T † maps
Emax+ linearly and bijectively to itself such that T †(UN) = UN ; hence T †(e) +
T †(f) E+ UN . It follows that for at least one component k, the local fiducial ef-
fects T˜ †(e˜)k and T˜ †(f˜)k correspond to outcomes of different measurements. This
fact, combined with (5.19), implies that T˜ †(e˜) differs from T˜ †(f˜) in exactly one
component.
We now apply a general combinatorial theorem concerning permutations of
183
strings. We will consider strings belonging to some Cartesian product of finite
alphabets A1 × · · · × AN ; in this Section each alphabet will comprise theM ·K
distinct local fiducial effect vectors and will therefore be identical, however we
state and prove the theorem without the assumption of identical alphabets, as this
will become useful when we apply the theorem in the case that not all subsystems
are identical. In order to prove the theorem, it is convenient to make rigorous our
notion of local operations and permutations of components, at least in regards to
fiducial effect strings.
A permutation Q of A1 × · · · × AN is a local operation if there is some per-
mutation Qi of the set Ai such that
Q : (a1, . . . , ai, . . . , aN) 7→ (a1, . . . , Qi(ai), . . . , aN). (5.20)
Q is a component permutation if there is some permutation σ of the set [N ] such
that
Q : (a1, . . . , aN) 7→ (aσ(1), . . . , aσ(N)). (5.21)
Note that if i, j ∈ [N ] such that σ : i 7→ j, then Q is a valid permutation only if
Ai = Aj .
We may as well assume for the time being that an alphabet of size n is simply
the set {0, . . . , n− 1}. Denote by 0 the string (0, . . . , 0), and define the Hamming
weight of a string a to be the number of non-zero components, i.e. WH(a) =
dH(a, 0).
Theorem 10. Let A1, . . . ,AN be finite alphabets, and Q be a permutation of the
set A1 × · · · × AN . If Q preserves a Hamming distance of 1 between pairs of
strings, then it is a composition of operations which permute components, and
operations which act independently on each component (local operations).
Proof. Let Ai = {0, . . . , ni − 1}. We may assume (by pre-composing Q with
local operations if necessary) that Q maps the string 0 to itself. By assumption, Q
acts as a permutation on the set of strings with exactly one non-zero component.
Let Li denote the set of strings which are non-zero only in the ith component: the
184
components of each Li are at Hamming distance 1 from each other, so Q maps
each Li to some Lj , and moreover maps Li to Lj only if ni = nj .
Suppose that we pre-composeQ with the component permutation which maps
component j to component i just when Q maps Li to Lj , to obtain a new map
Q1 which is maps each set Li to itself. Suppose further that we pre-compose Q1
with a local operation for every component i, which acts as the inverse of Q1 on
the set Li, to obtain a new map Q2 which therefore fixes each string of Hamming
weight 1. Note that if Q2 can be written as a combination of local operations and
component permutations then so can Q, hence we may assume without loss of
generality that Q fixes strings of Hamming weight 1.
Given that Q fixes strings of Hamming weight 1, we show by induction that
it fixes all strings. Suppose that it fixes strings of weight less than W , and let a
be a string of weightW . However, a is uniquely specified by all those strings of
Hamming distance 1 from it, that have weightW −1. Since those strings are fixed
by Q, so too must a be. Hence, Q fixes strings of weightW and by induction, all
strings.
5.1.3 Characterisation of reversible transformations
It follows from Sections 5.1.1 and 5.1.2 that reversible adjoint transformations
permute the composite fiducial effect vectors in a way which is a composition of
local subsystem operations and permutations of subsystems. Thus the transforma-
tion itself acts in a similar manner on states: it is a composition of local subsystem
operations and permutations of subsystems. Since the systems are identical, we
will see that any permutation of subsystems is a valid reversible transformation.
However, local operations are further restricted by the following Lemma which,
importantly, applies in the general Boxworld scenario, without assuming that all
measurements have the same number of outcomes.
Lemma 6. The only allowed reversible transformations of a single boxworld sys-
tem are relabellings of measurement choices and measurement outcomes.
185
Proof. Let T be a reversible transformation, so that the adjoint T † permutes the
local fiducial effect vectors Xa|x. Suppose that the system hasM measurements,
and that measurement x has Kx outcomes. If x 6= x′, then for any choice of
1 ≤ a ≤ Kx and 1 ≤ a′ ≤ Kx′ there exists a pure state s which gives outcome
a when x is measured and outcome a′ when x′ is measured. This state obeys
〈Xa|x, s〉 = 〈Xa′|x′ , s〉 = 1, hence Xa|x + Xa′|x′ E+ UN . However, for any
1 ≤ a, a′ ≤ Kx we have that Xa|x +Xa′|x ≤E+ UN , since
∑
aXa|x = U , and any
conic combination of the fiducial effect vectors is a member of E+. Now,
Xa|x +Xa′|x ≤E+ UN ⇐⇒ T †(Xa|x) + T †(Xa′|x) ≤E+ UN , (5.22)
hence two fiducial effect vectors correspond to outcomes of the same measure-
ment x only if their images under T † also correspond to outcomes of the same
measurement x′. Therefore the permutation is a composition of permutations of
the form:
Xa|x 7→ Xa′|x (5.23)
corresponding to relabelling of outcomes, and permutations of the form:
Xa|x 7→ Xa|x′ (5.24)
corresponding to relabelling of measurement choices. Note that one measurement
choice may be mapped onto another only if they have the same number of out-
comes.
Combining the results of the previous Sections leads to the main result of [4].
Theorem 11. In a composite Boxworld system, withM ≥ 2 local measurements
for each subsystem and K local outcomes for each measurement, the only re-
versible transformations are permutations of subsystems, and local relabellings
of measurement choices and outcomes. Furthermore, all such transformations
are allowed.
186
Proof. The fact that these are the only possible reversible transformations follows
from Sections 5.1.1 and 5.1.2, and Lemma 6. It remains to show that any such
transformation is allowed. This follows easily from viewing these transformations
acting on the outcome distributions representing states. For example, permuting
subsystems 1 and 2 is given by the mapping:
P (a1, a2, . . . , aN |x1, x2, . . . , xN) 7→ P (a2, a1, . . . , aN |x2, x1, . . . , xN) , (5.25)
and relabelling the outcomes of measurement x1 on subsystem 1 via the permuta-
tion σ of the set [K] is given by the mapping:
P (a1, . . . , aN |x1, . . . , xN) 7→ P (σ(a1), . . . , aN |x1, . . . , xN) . (5.26)
All of these mappings map allowed states to allowed states in a reversible and
convex-linear fashion, hence correspond to allowed reversible transformations.
We conclude this Section with some remarks about attempting a direct general-
ization of this result to composite systemswith differing numbers of measurements
on subsystems, and differing numbers of outcomes per measurement. A natural
first step would be to relax the demand that M and K are constant across sub-
systems, but retain the demand that each local fiducial measurement has the same
number of outcomes for a given subsystem. The local fiducial effect vectors for
subsystem i can still be constructed as in Section 5.1.1, given that there areM (i)
fiducial measurement choices, each of which hasK(i) possible outcomes. The in-
ner product between any pair of vectors is still given by (5.15) -- but withM and
K replaced by M (i) and K(i) -- and composite fiducial effects are still given by
tensor products of the local fiducial effects. The proof that reversible transforma-
tions are orthogonal carries through, and so it seems worth examining whether a
consideration of inner products will again demonstrate that a Hamming distance
of 1 is preserved. If so, due to the fact that Theorem 10 applies to arbitrarily-sized
alphabets, it follows that all reversible transformations are compositions of local
187
operations and subsystem permutations.
Unfortunately however, the proof of Lemma 5 does not carry through so nicely
ifM and K are allowed to vary across subsystems. A simple counterexample is
provided by a composite system comprising 3 subsystems where the number of
measurement outcomes K(1) = K(2) = K(3) = 2 is constant but M (1) = 2,
M (2) = 2 and M (3) = 10. In this case, for any two effects whose components
agree except for the third subsystem, where they correspond to outcomes of dif-
ferent measurements, the inner product will be:
(1 +M (1) · (K(1) − 1))(1 +M (2) · (K(2) − 1))
(K(1) ·K(2) ·K(3))2
= 3× 3
12
= 3
4
. (5.27)
On the other hand, a pair of effects with different outcomes of the same mea-
surement on components 1 and 3, and outcomes of different measurements on
component 2, will generate the same inner product:
(1−M (1))(1−M (3))
(K(1) ·K(2) ·K(3))2
= (−1)× (−9)
12
= 3
4
. (5.28)
Any reversible transformation whichmaps a pair of effects of the first kind to a pair
of the second kind would necessarily be non-trivial, since it would not preserve
the Hamming distance. However, as we prove later in this chapter, it is impossible
to extend this transformation in a consistent, linear manner to a full permutation
of the composite fiducial effect vectors; this system does not in fact allow for non-
trivial reversible transformations. It may be observed that there is a lot of play in
the variablesM (i) andK(i) in attempting to generate these “pathological” systems,
and that as the number of subsystems increases there will certainly be infinitely
many pathological systems (for example, combining any number of copies of the
above system will result in a system which will have many more overlaps in the
inner product). It does not appear to be a promising task, therefore, to simply go
through and check each case where Lemma 5 might go wrong.
A natural second attempt to extend the proof is to embed an arbitrary Box-
188
world system into one which is of the identical-subsystem form considered in this
Section, in the hope that the smaller system will inherit its reversible transforma-
tions from the larger. Suppose, for example, that it was possible to “add in” a new
measurement with just two outcomes on any one subsystem, in such a way that the
reversible transformations of the old system are induced by those of the new. By
repeatedly performing this procedure, one can arrive at a system for which subsys-
tems have the same number of measurements. Suppose also that it was possible to
also “add in” outcomes to measurements in a similar way, so that one may arrive
at a system for whichM andK are both constant across subsystems. Having char-
acterised the reversible transformations of this larger system, we would perhaps be
able to infer from this procedure that the reversible transformations of the smaller
system are trivial, since they are inherited from the larger system.
However, this embedding procedure does not proceed as straightforwardly as
hoped. Consider for example a system with any number of subsystems, measure-
ments and outcomes, and suppose we added in an extra outcome to the first fidu-
cial measurement of the first subsystem (which previously hadK outcomes, say).
This generates a set of new composite fiducial effects, whose first component is
X(1)K+1|1. For any reversible transformation on the larger system including this set
of effects, we might attempt to induce a transformation on the smaller system by
simply restricting the permutation of fiducial effect strings to those which do not
have X(1)K+1|1 in the first component. However, two problems are encountered:
firstly, the transformation may map fiducial effects not in the above set to those
which are; secondly, the induced permutation is unlikely to respect the linearities
of the smaller set of fiducial effects, such as
K(1)1
∑
a=1
X(1)a|1 ⊗ · · · ⊗X
(N)
aN |xN =
K(1)2
∑
a=1
X(1)a|2 ⊗ · · · ⊗X
(N)
aN |xN (5.29)
Given these difficulties, it seems that a fresh approach is needed to demonstrate
that reversible transformations preserve Hamming distance 1 between pairs of ef-
fects. Note that proving this in the general case is sufficient to generalise the fact
189
that all reversible transformations are trivial, since the remaining steps of the proof
do not rely on the number of measurements and outcomes being constant across
subsystems. In the rest of this Chapter we develop a proof of this property of re-
versible transformations, which is not based on considering specific orthogonal
representations of Boxworld systems as before, but on the basic convex structure
of the state and effect spaces.
It should be noted that our demands concerning classical systems must be
adapted somewhat: in the case where M is constant, simply demanding M ≥ 2
ensures that no subsystem is classical. However, ifM is allowed to vary then we
also need to rule out the possibility that some subsystems are classical and others
are not. In the remainder of this Chapter we will demand thatM (i) ≥ 2 for all i,
i.e. that all subsystems are non-classical. If this were not the case, then non-trivial
reversible transformations do in general exist. As a simple example, consider a
bipartite system with M (1) = 1, K(1)1 = 2 so that subsystem 1 is classical, and
M (2) = 2, K(2)1 = K
(2)
2 = 2 so that subsystem 2 is a g-bit system. The reversible
“C-NOT” operation on this system is defined by:
X(1)1|1 ⊗X
(2)
a|x → X
(1)
1|1 ⊗X
(2)
a|x (5.30)
X(1)2|1 ⊗X
(2)
a|x → X
(1)
2|1 ⊗X
(2)
a⊕1|x. (5.31)
To check that this transformation is indeed linear, one need only verify that the
unit effect U (1) ⊗U (2) is mapped to itself, and that effects of the form U (1) ⊗X(2)a|x
and X(1)a|x ⊗ U (2) have a well-defined image, independent of how the local unit
effect is decomposed on each subsystem. This turns out to be true because there is
only a single decomposition of the unit effect of the classical subsystem into local
fiducial effects. Since the operation is a linear permutation of composite fiducial
effects, it is an allowed transformation. A similar example can be constructed for
any system with at least one classical subsystem, in which a local operation is
performed, conditioned on the outcome obtained on the classical subsystem.
190
5.2 Effect tables
Recall from Section 2.3.3, in which we first introduced Boxworld, that under the
assumption of local tomography, a composite Boxworld state can be represented
as a multi-dimensional tabular array. This idea was first introduced and discussed
in [5] in the context of Boxworld measurements; we will develop our own inves-
tigation of this idea, as it provides a neat diagrammatic portrayal of many of the
results we will prove before achieving the main result. Our development of these
tabular representations is original work that has not been published. The number
of dimensions or axes is the number of subsystems, the rows in a given axis corre-
spond to outcomes of measurements on that subsystem, and each entry gives the
probability of obtaining the outcomes corresponding to the row containing that en-
try for each respective subsystem. For example, the PR-box state of two g-bits, in
which outcomes are randomly correlated as long as the local fiducial measurement
choices are not both equal to 1, and randomly anti-correlated otherwise, has the
following tabular representation:
x2 = 1 x2 = 2
a2 = 1 a2 = 2 a2 = 1 a2 = 2
x1 = 1
a1 = 1 1/2 0 1/2 0
a1 = 2 0 1/2 0 1/2
x1 = 2
a1 = 1 1/2 0 0 1/2
a1 = 2 0 1/2 1/2 0
(5.32)
We will refer to the subset of entries corresponding to the outcomes of only a
single measurement choice from each subsystem as a block of the tabular array.
Diagrammatically, a block is a set of entries enclosed by lines that separate local
measurement choices. For example, by specifying the second measurement on
both subsystems in (5.32), we get the bottom-right block:
0 1/2
1/2 0
(5.33)
191
From hereon in we will omit for convenience the first rows and columns which
label the local measurements and outcomes, since these can be deduced from the
number of blocks and rows per block, and include only the outcome probabilities,
viz.
1/2 0 1/2 0
0 1/2 0 1/2
1/2 0 0 1/2
0 1/2 1/2 0
(5.34)
A pure product state is represented by a table whose only non-zero entries
are 1. For each measurement choice on each axis, a particular row is selected (the
outcome determined by themeasurement choice) -- the 1s in the table occur exactly
where these selected rows overlap. For example, the following table represents a
state of two subsystems with three fiducial measurements each, where in the first
subsystem the state assigns outcome 1 for the first measurement and 2 for the
second and third measurements, and in the second subsystem the state assigns the
outcome equal in value to the measurement choice.
1 0 0 0 1 0 0 1 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 1 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 0 1 0
(5.35)
Composite effects can also be represented using tables, which we will refer
to as effect tables, so that the probability of obtaining that effect is given by the
entry-wise dot product between its table and the table representing the state of
192
the system. The composite fiducial effects, which are tensor products of local
fiducial effects, have only a single 1 in the entry corresponding to the measurement
outcome specified on each subsystem. General composite effects, which are conic
combinations of the composite fiducial effects, have effect tables that are non-
negative in every entry (a table can correspond to a valid effect even if some entries
are negative, but in this case there always exists another table with all positive
entries, which corresponds to the same effect).
The tabular representation of effects is over-complete, in the sense that some
effects may correspond to more than one possible table. This over-completeness
is codified by a set of linearities which are imposed on the set of effect tables, due
to the set of non-signaling conditions on the system. For example, in a two g-bit
system, the condition that the reduced outcome probability of obtaining outcome
0 for measurement 0 on subsystem 2 is independent of the measurement choice on
subsystem 1 is given by the equation:
P (0, 0|0, 0) + P (0, 1|0, 0) = P (0, 0|0, 1) + P (0, 1|0, 1) . (5.36)
This equation is reflected by demanding the following equivalence between effect
tables:
1 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
≡
0 0 1 1
0 0 0 0
0 0 0 0
0 0 0 0
(5.37)
Both of these tables are representations of the effect U (1) ⊗ X(2)0|0 ; demanding
that they are equivalent is no more than demanding consistency with the equation,
X(1)0|0 ⊗X
(2)
0|0 +X
(1)
1|0 ⊗X
(2)
0|0 = X
(1)
0|01 ⊗X
(2)
0|0 +X
(1)
1|1 ⊗X
(2)
0|0 , (5.38)
193
where each side corresponds to a distinct decomposition of the local unit effectU (1)
into local fiducial effects. In this manner, the linearities inherent in the local effect
cone of the abstract state space automatically engender the single-system non-
signaling conditions which are respected by the set of composite states Smax. By
Proposition 1, we know that the full set of non-signaling conditions is generated by
those non-signaling conditions involving single subsystems. Hence, the linearities
of the local effect cone give rise to the full set of non-signaling conditions on the
composite state space.
The effect tables which interest us the most are the binary effect tables, those
consisting of 0s and 1s, which represent effects that are sums of composite fidu-
cial effects (with all coefficients equal to 1). A binary effect table is said to be
multiform if it is equivalent to some other binary effect table, as in (5.37). We
will use the term sub-row to refer to binary effect tables which are similar to the
effect tables in (5.37) in that their non-zero entries lie only along one row of their
constituent blocks. As long as every subsystem is non-classical, then every axis
splits up into at least 2 blocks, so that sub-rows (and any tables covering a sub-
row) are always multiform. Moreover, each equivalence between sub-rows is due
to a decomposition of the local unit effect on the corresponding subsystem, as in
(5.38). We will say that a binary effect table covers another binary effect table if
the non-zero entries of the latter are a subset of the non-zero entries of the former:
clearly, any effect table which covers a sub-row is again multiform, for example:
1 1 0 0
1 0 0 0
0 0 0 0
0 0 0 0
≡
0 1 0 0
0 0 0 0
1 0 0 0
1 0 0 0
(5.39)
In order to check whether two effect tables are equivalent, it suffices to show
that they have the same dot product with all pure product states. This is because
the pure product states span the vector space of the composite abstract state space
[29], hence any two effects which the same dot product with all of them must
194
be identical. In cases where both tables cover sub-rows, it may be possible to
determine an equivalence between binary effect tables by applying various sub-
row equivalences. This is achieved by “sliding” a sub-row across to another sub-
row contained in the same row. For example, in (5.40) below, the two horizontal
sub-rows of the top left block can be slid across to the right, and then the two
vertical sub-rows slid downwards to form the effect table on the RHS.
1 1 0 0
1 1 0 0
0 0 0 0
0 0 0 0
≡
0 0 0 0
0 0 0 0
0 0 1 1
0 0 1 1
(5.40)
However this sliding process will not always be a useful indicator of equiva-
lence, as demonstrated in the following example which does not cover any sub-
rows:
1 0 0 0
0 0 0 1
0 0 1 0
0 1 0 0
≡
0 0 1 0
0 1 0 0
1 0 0 0
0 0 0 1
(5.41)
The equivalence (5.41) can be verified by dot product with pure product states.
However, a similar consideration demonstrates that this is not in fact an equiva-
lence between valid effects. The binary table on the LHS of (5.41) has dot product
2 with the pure product state s that deterministically outputs 0 for every fiducial
measurement choice on both subsystems, hence this effect table represents an im-
proper effect. This is illustrated in (5.42), where arrows are used to highlight which
rows and columns correspond to the outcomes determined by s; the entries of the
effect table which are picked out by s are those lying in the overlap of these rows
and columns, and their sum is the dot product of the effect with s, which is 2 in
this case. The results described in this chapter will rely on the multiform tables
which do correspond to proper effects and give genuine probabilities for all states,
and thus examples like (5.41) will not generally present a major obstacle.
195
↓ ↓
→ 1 0 0 0
0 0 0 1
→ 0 0 1 0
0 1 0 0
; 1 + 0 + 0 + 1 = 2 (5.42)
Some further observations will help to familiarise the reader with effect tables
and their properties. As mentioned above, sub-rows are always multiform. In
the bipartite g-bit examples given, all sub-rows have the same total number of 1s,
however this will not be true if any two local fiducial measurements have differing
numbers of outcomes. If in fact there are two fiducial measurements with differing
numbers of outcomes K1 and K2 on the same subsystem, then there will exist a
sub-row containingK1 1swhich is equivalent to a sub-row containingK2 1s, hence
equivalent binary effect tables need not have the same number of non-zero entries.
Calculating dot products between binary effect tables and pure product states
is an invaluable tool; as we have mentioned, it determines whether two distinct
tables correspond to the same effect or not. In bipartite systems, it also allows
for a useful characterisation of exactly when a binary effect table corresponds to
a proper effect. In (5.42) we see that the presence of non-zero entries in diagonal
blocks allows for the construction of a pure product state, which selects the rows
and columns necessary to “hit” those two entries, and, if necessary, completes the
lattice by selecting arbitrary rows and columns from remaining measurements on
each subsystem. Note that the blocks do not have to be on the same diagonal line,
merely unaligned (i.e. not in the same block-row or block-column). Below is
an example of this kind of improper binary effect table, and one of the six pure
product states for which it gives dot product greater than 1.
196
↓ ↓ ↓
→ 1 1 0 0 0 0
0 0 0 1 0 0
0 0 0 0 0 0
→ 0 0 0 0 1 0
(5.43)
A binary effect table also corresponds to an improper effect if it has distinct
non-zero entries in the same row, but not within the same block of that row. If this
is the case, then a similar pure product state construction can be used to hit both
non-zero entries, again with an arbitrary completion of the lattice for that state if
necessary. Below is an example of this kind of improper binary effect table, and
one of the eight pure product states for which it gives dot product greater than 1.
↓ ↓ ↓
→ 1 1 0 0 1 0
0 0 0 1 0 0
0 0 0 0 0 0
→ 0 0 0 0 0 0
(5.44)
These two constraints are the only obstacles to a binary, bipartite effect table
being a proper effect. Indeed, as long as all non-zero entries are contained in the
same row of blocks (or block-row), but never in distinct blocks of the same row,
then the effect is proper. To see this, suppose that an effect table corresponding to
an effectE ∈ E+ satisfies these conditions, and suppose without loss of generality
that it has non-zero entries only in the block row corresponding to the first fiducial
measurement on subsystem 1 (i.e. the leftmost column of blocks). In each of
these blocks, fill in extra 1s if necessary to get an effect F ≥E+ E such that every
non-zero entry lies in a vertical sub-row. Then slide all sub-rows up to the topmost
block (the subrowswill not clash since no two non-zero entries lie in distinct blocks
of the same row) to obtain an effect which is either the unit effect, or a strict subset
of it. Now E ≤E+ F ≤E+ U , hence E must be a proper effect. This process is
197
illustrated by the following operations on a binary, bipartite effect table satisfying
these conditions:
1 0 0 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 1 0 0 0
→
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 1 0 0 0
→
1 1 0 0 0
1 1 0 0 0
1 1 0 0 0
0 0 0 0 0
0 0 0 0 0
Thus, a binary, bipartite effect table is proper if and only if all its non-zero
entries lie in a single block-row, without any two entries lying in distinct blocks of
a single row. It is tempting to try to extend these conditions in order to characterise
proper binary effect tables of multipartite systems.
Unfortunately however, it is not the case in multipartite Boxworld systems that
an improper effect can be “detected” using pure product states in the same way;
there exist improper effectsE for which 〈E, s〉 ∈ {0, 1} for all pure product states
s. In fact, this occurs even in tripartite systems, as demonstrated by the tripartite
effect E depicted in Figure 5.1. In this representation of E, each of the smallest
cubes represents an entry of the effect table; shaded cubes represent entries with a
1 in them, whereas unshaded cubes represent zero entries. Assigning subsystems
1, 2 and 3 to the x-, y- and z- axes respectively, it can be seen that the algebraic
form of E is,
E =X(1)1|1 ⊗X
(2)
1|1 ⊗X
(3)
1|1 +X
(1)
2|1 ⊗X
(2)
1|2 ⊗X
(3)
2|2
+X(1)1|2 ⊗X
(2)
2|1 ⊗X
(3)
1|2 +X
(1)
2|2 ⊗X
(2)
2|2 ⊗X
(3)
2|1 , (5.45)
where the ordering of the summands agrees with the alphabetical ordering of the
cubes' labels in Figure 5.1.
Similarly to bipartite systems, specifying a tripartite pure product state corre-
sponds to picking out a (three-dimensional) grid of entries in the effect table. This
grid is obtained by choosing one particular outcome for each local fiducial mea-
198
Figure 5.1: Improper tripartite binary effect table which is not detected by pure
product states.
surement on each subsystem, and selecting the entries that lie in the intersection of
the subsequent rows. For example, the pure product state which gives outcome 1
for every local fiducial measurement will result in a grid that contains the shaded
cube A in Figure 5.1, but none of the other shaded cubes. By inspection it can be
verified that any pure product state will pick out at most one shaded cube, hence
〈E, s〉 ∈ {0, 1} for all pure product states s: in short, we would not be able to de-
duce that E is improper by looking at its inner products with pure product states.
We must resort to other methods to prove that E is actually improper. We first
argue that if any further cube were to be shaded in Figure 5.1, there would be some
pure product state that picks out two of the shaded cubes. To see this, note that
whenever two cubes are completely unaligned (i.e. correspond to different mea-
surement choices in all three subsystems), or are in the same plane but diagonally
opposing blocks (i.e. correspond to the same local effect on one subsystem, but
different measurement choices on the remaining subsystems), then it is possible
to construct a pure product state which hits both of them.
In fact, a similar observation shows that if any entry corresponding to an un-
shaded cube is assigned a positive, non-zero entry, then there will be a pure product
state whose inner product with the resulting effect is larger than 1. Moreover, for
any shaded cube, setting the corresponding entry of the effect table to be larger
199
than 1 will generate in an improper effect (one need only consider a pure product
state which hits the corresponding fiducial effect). It follows that for any fiducial
effect F and positive number ε, the effect E + εF is an improper effect. There-
fore, either E is actually the unit effect, or else it does not lie in the order interval
[0,U ]E+ , as no conic combination of fiducial effects can be added to it to obtain U .
It remains to show simply that E is not the unit effect. This follows from the
existence of a pure product state s for which 〈E, s〉 = 0. Let s(1) and s(2) both be
equal to the g-bit state which deterministically assigns outcome 2 to both fiducial
measurements, and let s(3) be the g-bit state which assigns outcome 1 for the first
fiducial measurement and outcome 2 for the second fiducial measurement. It can
then be straightforwardly verified that 〈E, s(1)⊗s(2)⊗s(3)〉 = 0, either by directly
using the decomposition (5.45) or by observing that the grid of entries specified
by s(1) ⊗ s(2) ⊗ s(3) does not hit any shaded cubes in Figure 5.1. Hence E is not
the unit effect and is therefore improper, despite the fact that 〈E, s〉 ∈ {0, 1} for
all pure product states s.
5.2.1 Reversible transformations on effect tables
Since reversible adjoint transformations permute the set of composite fiducial ef-
fects, they induce a permutation on the entries of binary effect tables, such that
entry a is mapped to entry b if and only if the fiducial effect which is represented
by a single 1 in entry a is mapped to the fiducial effect which is represented by a
single 1 in entry b. Effect tables provide a compact visualisation of the reversible
transformations which correspond to relabellings of measurements and outcomes,
and permutations of subsystems. These are given by block permutations (along an
axis), row permutations and transpose-like operations on tables respectively. We
give some examples to illustrate this procedure.
A relabelling of the measurement inputs on subsystem 1 will permute the
blocks in line with the first axis, but leave the ordering of the rows within each
block invariant. For effects in a bipartite system of two g-bits, switching the mea-
surement inputs on subsystem 1 is pictured below.
200
1 0 0 0
0 0 0 1
0 0 1 0
0 0 0 0
→
0 0 1 0
0 1 0 0
1 0 0 0
0 0 0 0
(5.46)
A relabelling of the outcomes of a measurement in a subsystem are given by
permuting the rows along the axis corresponding to the subsystem, but doing so
only within the block that corresponds to the measurement in question. For exam-
ple, switching the measurement outcomes for the first measurement on the second
subsystem of the table on the RHS of (5.46) is given below.
0 0 1 0
0 1 0 0
1 0 0 0
0 0 0 0
→
0 1 0 0
0 0 1 0
1 0 0 0
0 0 0 0
(5.47)
Permuting the subsystems corresponds to permuting the axes corresponding
to those subsystems (equivalently, permuting the coordinates of each entry). In
the case of bipartite systems, this is the same as taking the transpose of the table
(imagining it were a matrix). Note that the table need not be symmetric, and no
two blocks need be the same size, so in general one must be careful to transpose
the lines delineating the blocks as well as the entries themselves. If we apply a
permutation of the two subsystems to the table on the RHS of (5.47), we obtain
the mapping below.
0 1 0 0
0 0 1 0
1 0 0 0
0 0 0 0
→
0 0 1 0
1 0 0 0
0 1 0 0
0 0 0 0
(5.48)
201
5.2.2 Diagrammatic proof for general bipartite systems
We now give a proof that the only reversible transformations allowed in a bipar-
tite Boxworld system are composed of local operations and permutations of the
subsystems, so long as none of the subsystems are classical. This proof applies
even to the case where the subsystems are non-identical, and is therefore already a
novel result, not implied by the proof outlined in Section 5.1. Bipartite Boxworld
effects are represented by 2-dimensional effect tables, for which local operations
correspond to permutations of rows along a single axis, and a permutation of the
subsystems corresponds to transposing the table (note that this transpose is only
valid if the two subsystems are identical). Assuming that every measurement is
non-trivial and hasmore than one outcome, each block in the effect table has height
and width both at least 2 cells. As long as none of the subsystems are classical,
then along both the horizontal and vertical axes there are at least 2 blocks.
Recall from Section 5.1.3 that we need only demonstrate that reversible trans-
formations preserve a Hamming distance of 1 between pairs of fiducial effect
strings. We will obtain this result for binary, bipartite Boxworld systems in a
stepwise manner, supplementing many of the steps with an effect table illustration
which provides a visualisation of the argument. In the bipartite setting a satisfac-
tory proof of each step can be given by reference to effect table diagrams, since
they take a much simpler form. In the general multipartite setting the same cannot
be said, however the stepwise process used will be very similar, hence this proof
of the bipartite case can be seen as setting up the necessary ideas and intuition that
will then be applied to the multipartite case.
Central to the following argument are the notions of sub-rows and multiform
effect tables. Recall that a sub-row is a binary effect table where the entries along a
single block of a single row are exactly the non-zero entries. A binary effect table
is multiform if there exists a distinct binary effect table which gives the same inner
product for all states. Binary effect tableE (strictly) covers binary effect table F if
the non-zero components of F are a strict subset of the non-zero components ofE.
202
Step 1. The only proper binary effect tables equivalent to a sub-row are other sub-
rows in the same row.
Suppose that E is a sub-row, and that F is an effect table which is not a sub-
row in the same row. Without loss of generality, suppose that E is a sub-row for
some column of the table, i.e. E represents an effect of the form X(1)a|x ⊗ U (2).
To demonstrate that E and F cannot represent equivalent effects, it suffices to
construct a pure product state which has differing inner products with the effects
represented by E and F . Recall that a pure, bipartite product state is uniquely
specified by selecting one column from each block along the horizontal axis, and
one row from each block along the vertical axis.
If F has a non-zero entry which is in a distinct row, then it is possible to select
outcomes for each measurement of subsystem 1 so that the corresponding set of
columns contains non-zero entries ofF but not ofE. To complete the specification
of the pure product state, select a row from each block along the second axis,
such that the previous non-zero entry of F is contained in at least one of them.
The resulting state has inner product 1 with F and 0 with E. In the below table,
the relevant non-zero entry of F is underlined; by selecting the second and third
columns, and e.g. the second and third rows, we get a pure product state which
distinguishes E from F .
↓ ↓
1 0 0 0
→ 1 0 1 0
→ 0 0 0 0
0 0 0 0
(5.49)
If instead F has non-zero entries only in the same column that E does, then
either F strictly covers a sub-row, in which case it is improper, or F has non-zero
entries only for a strict subset of each block of that column. In the latter case,
it is possible to select a set of rows of the table, one for each block along the
203
vertical axis, such that the set of rows contains non-zero entries of E but not of
F . Selecting a set of columns which includes the one in which both E and F have
non-zero entries will generate a pure product state which has inner product 1 with
E and 0 with F . This process is again demonstrated in the below diagram, where
the underlined entries are the non-zero entries of F .
↓ ↓
→ 0 1 0 0
0 1 0 0
→ 0 0 0 0
0 1 0 0
0 1 0 0
→ 0 0 0 0
(5.50)
This concludes the proof of Step 1.
Step 2. A proper, binary effect table which does not cover any sub-rows is not
multiform.
LetE be a binary, bipartite effect table which does not cover any sub-rows, and
suppose for contradiction that it is equivalent to a distinct, binary, bipartite effect
table F . Then there must exist some component (i, j) of the table for which E is
zero and F is non-zero (here i represents the column and j the row, unlike conven-
tional matrix notation). We aim to construct a pure product state which “hits” this
(i, j), but none of the entries ofE. Recall that the effectE can be proper only if all
its non-zero entries lie in some block-row - without loss of generality suppose this
is the first column of blocks, i.e. every non-zero entry of E is associated with the
first measurement choice on subsystem 1. If the non-zero entry of F does not lie
in this column of blocks, then by moving leftwards from this entry we must arrive
at a cell (i′, j) in the first column of blocks for whichE is zero (otherwiseE would
cover a horizontal sub-row). Select a set of columns which includes i and i′, and a
204
set of rows which includes j and a row from each remaining block such that E is
zero on the i′th component of that row (this is possible since otherwise E would
cover a vertical sub-row). The resulting pure product state has inner product 0
with E and 1 with F . In the below diagram, the underlined entry is the non-zero
entry of F , so that i = 3 and j = 5; thus we obtain i′ = 2 and also select row 3.
↓ ↓
0 1 0 0
0 1 0 0
→ 0 0 0 0
0 0 0 0
→ 1 0 1 0
(5.51)
Suppose instead that entry (i, j) belongs to the first column of blocks. In this
case it is similarly possible to select a set of rows which includes j and a row from
each remaining block such that E is zero on the ith component of that row. Any
set of columns which includes i will now result in a pure product state which has
inner product 0 with E and 1 with F .
Step 3. Reversible transformations map sub-rows to sub-rows.
Let E be a sub-row; E is multiform, being equivalent to all other sub-rows
contained in the same row as E (there is at least one other if all systems are non-
classical). Let F be another sub-row contained in the same row: thus E and F are
binary effect tables representing the same proper effect, but whose non-zero com-
ponents are disjoint. Let T be a permutation of the effect table components, which
represents an allowed reversible transformation on the Boxworld system. The im-
age T (E) of E under this transformation is a proper binary effect table which is
equivalent to T (F ), hence is multiform. It follows from Step 2 that T (E) covers
a sub-row. The diagram below is an example of sub-rows E and F (underlined)
205
transforming under an arbitrary permutation of components - in this diagram the
permutation does not in fact correspond to an allowed transformation, since it is
ill-defined on effects which have multiple effect table representations.
1 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
→
0 1 0 0
0 0 1 0
0 0 0 1
0 1 0 0
(5.52)
Suppose that T (E) is not itself a sub-row, i.e. that it strictly covers a sub-row.
LetG denote this sub-row, and letG′ be another sub-row in the same row asG. The
image T−1(G) of G under the inverse permutation T−1 is a binary effect which
is strictly covered by E, and which is equivalent to the effect T−1(G′). Hence
T−1(G) is multiform. However, since T−1(G) is a strict subset of a sub-row, this
contradicts Step 2. Therefore T (E) is itself a sub-row, and T maps sub-rows to
sub-rows.
Step 4. Reversible transformations preserve Hamming distance 1 between pairs
of fiducial effect strings.
Suppose that e and f are composite fiducial effects, which have Hamming
distance 1 between their respective fiducial effect strings. e and f are uniquely
represented by binary effect tables E and F which have only a single non-zero
entry, such that these two non-zero entries lie in some row or column of the table.
E and F may be extended uniquely to sub-rows E ′ and F ′ which lie in this row
and cover E and F respectively.
If the E ′ and F ′ are in fact the same sub-row, then by Step 3 their image under
an allowed reversible transformation T is some sub-row T (E ′). Thus the images
T (E) and T (F ) of E and F under T also belong to the same sub-row, therefore
their corresponding fiducial effect strings also have a Hamming distance of 1. If
E ′ and F ′ are not the same sub-row, they are nevertheless equivalent sub-rows,
206
and by Step 3 must be transformed to equivalent sub-rows under T . However,
according to Step 1, sub-rows can only be equivalent if they belong to the same
row, hence again T (E) and T (F ) correspond to fiducial effect strings which have
a Hamming distance of 1.
Step 5. The only reversible transformations allowed in bipartite boxworld sys-
tems, where none of the subsystems are classical, are compositions of relabellings
of measurement choices and outcomes, and permutations of subsystems (assum-
ing they are identical).
FromStep 4 and Theorem 10 (which applies to arbitrary alphabets in each com-
ponent), we have that any reversible transformation must be a composition of local
operations and permutation of the subsystems. By Lemma 6, which also applies in
the case where different measurements may have different numbers of outcomes,
we have that the only local operations allowed are relabellings of measurement
choices and outcomes. Note that permuting the states of the two subsystems cor-
responds to a valid operation only if the measurement choices on each subsystem
can be matched up, so that the paired measurements have the same number of out-
comes. It is not possible to permute the state of the subsystems if, for example,
they have a different number of fiducial measurements or if the first subsystem
has a measurement withM outcomes, but none of the measurements on the sec-
ond subsystem haveM outcomes. However, if the first subsystem has two fiducial
measurements with 2 and 3 outcomes, and the second subsystem has two fiducial
measurements measurements with 3 and 2 outcomes, then the states of these sub-
systems may be permuted as long as the first measurement of the first subsystem
is matched with the second measurement of the second subsystem.
207
5.3 Decompositions of effects
In Section 5.2.2 we demonstrated, by means of diagrammatic representations, that
the allowed reversible transformations of a bipartite Boxworld system are triv-
ial, so long as none of the subsystems are classical. In generalising this result to
general multipartite systems, it will be helpful to shift our primary focus to the
algebraic, rather than diagrammatic, representations of effects, but to keep these
diagrammatic notions at the back of one's mind as an aid to understanding the
method of proof. The proof proceeds in a stepwise fashion largely similar to the
bipartite case, though some care has to be taken to make rigorous some facts which
are less evidently true in tabular arrays of larger dimension.
We begin by recapping the necessary features of Boxworld. AnN -partite sys-
tem is made up of N subsystems, for which M (i) is the number of local fiducial
measurement choices on subsystem i, andK(i)xi is the number of outcomes for mea-
surement xi on subsystem i. We will always assume that these values are at least
equal to 2, so that all subsystems are non-classical and all measurements are non-
trivial. To each subsystem i belongs a set of effects E (i), whose extreme rays are
the fiducial effects X(i)ai|xi which correspond to outcome ai being obtained when
measurement xi is performed. The composite fiducial effects of an N -partite sys-
tem are those of the form e = X(1)a1|x1 ⊗ · · · ⊗ X
(N)
aN |xN : a general N -partite effect
is a conic combination of the composite fiducial effect vectors, and a general N -
partite state is a vector which has non-negative inner product with all the composite
fiducial effect vectors.
The pure product states of anN -partite system are those of the form s = s(1)⊗
· · · ⊗ s(N), where s(i) is a pure state of subsystem i. Pure states of an individual
subsystem assign specific outcomes for each fiducial measurement choice; we will
use the notation s(i)xi to index this outcome, with 1 ≤ s
(i)
xi ≤ K
(i)
xi . If the state s and
effect e satisfy 〈e, s〉 = 1, we will say that s hits e. Note that if s = s(1)⊗· · ·⊗s(N)
is a pure product state and e = X(1)a1|x1⊗· · ·⊗X
(N)
aN |xN is a composite fiducial effect,
then s hits e if, and only if, s(i)xi = ai for all i.
Recall that reversible transformations permute the set of composite fiducial
208
effects, hence induce permutations on the components of the effect table repre-
senting the system. In particular, binary effect tables, whose non-zero entries are
all equal to 1, are mapped to other binary effect tables. Binary effect tables rep-
resent effects which are sums of composite fiducial effects, i.e. which are of the
formE =
∑
α eα, where each eα is a composite fiducial effect. This type of effect
merits special attention in our treatment.
Definition. Suppose that for a set of composite fiducial effects {eα} and an effect
E we have that E =
∑
α eα. Then {eα} is a decomposition of E.
We may also use the terminology: E admits the decomposition {eα}. We will
tend to use lowercase letters for composite fiducial effects, and uppercase for sums
of extreme ray effects.
Definition. An effect E is multiform if it can be written E =
∑
α eα =
∑
β fβ
where {eα} and {fβ} are distinct sets of composite fiducial effects.
In other words, the distinct binary effect tables which have 1s in the entries
corresponding to effects eα and fβ respectively, are equivalent, both representing
the effect E.
Definition. A sub-unit effect is an effect of the form X(1)a1|x1 ⊗ · · · ⊗ U
(i) ⊗ · · · ⊗
X(N)aN |xN , where exactly one component of the tensor product is the unit effect, and
the remainder are local fiducial effects. A sub-unit effect is an i-sub-unit effect if
its ith component is the unit effect.
Sub-unit effects are represented by the binary effect tables previously referred
to as sub-rows; for bipartite systems, 1-sub-unit effects are represented by hori-
zontal sub-rows and 2-sub-unit effects are represented by vertical sub-rows. Just
as sub-rows are multiform, so are sub-unit effects, with the number of decompo-
sitions of an i-sub-unit effect being at leastM (i), the number of fiducial measure-
ment choices on subsystem i.
Fiducial and sub-unit effects can uniquely be written as a tensor products of
vectors, such that each component i of the tensor product is equal either to some
209
X(i)ai|xi or to U
(i). Therefore we may refer to their ith component in a well-defined
sense; for example, if
E = X(1)a1|x1 ⊗ U
(2) ⊗X(3)a3|x3 , (5.53)
then E(1) = X(1)a1|x1 and E
(2) = U (2). For a subset Ω ⊆ [N ] we will write EΩ =
⊗
i∈ΩE(i), so that in the above exampleE{2,3} = U (2)⊗X
(3)
a3|x3 . Finally, in analogy
to one binary effect table covering another if the non-zero components of the latter
are a subset of the non-zero components of the former, we have the following
definition.
Definition. A set of fiducial effects {eα}α∈A (strictly) covers the effect E if there
is some (strict) subset B ⊆ A such that
∑
α∈B eα = E.
The following two lemmas (and their corollaries) establish useful facts about
effects and the ways in which they may be decomposed into sums of composite
fiducial effects. We are primarily concerned with effects that are sums of compos-
ite fiducial effects with coefficient 1, and hence which may in principle be repre-
sented by binary effect tables (although these may well exist in many-dimensional
spaces). In terms of binary effect tables, Lemma 7 is the algebraic version of Step
1 of Section 5.2.2, demonstrating that the only binary effect tables equivalent to a
sub-row are other sub-rows belonging to the same row.
Being rather technical in nature, the proof of Lemma 7 merits some discus-
sion (the following proofs are no less technical, but the discussion here will serve
equally well to illuminate them also). In essence, the linear relations (or lack of)
between the fiducial effect vectors are exploited in order to derive a categorisa-
tion of the ways in which sub-unit effects may be decomposed as sums of fiducial
effects. However, instead of working directly with the notion of linear indepen-
dence, we employ the trick of considering the inner products between fiducial
effects and pure product states. This strategy is hinted at in [4] and, after some
thought, is unsurprising: we know that the pure states on a single subsystem are
the deterministic states, and we know also that distinct effects must have distinct
210
inner products for at least one pure product state. Indeed, one useful perspective to
bear in mind is the following: demanding that every valid conditional distribution
over the measurements corresponds to an allowed state is equivalent to demanding
the precise linear dependencies that exist between the fiducial effect vectors.
Consider a pair of distinct fiducial effects in an individual non-classical Box-
world system. If this pair of effects belongs to the same measurement, for example
the pair X1|1 and X2|1, then the state which deterministically outputs 1 for every
fiducial measurement will hit the first effect, but not the second. If the pair belongs
to distinct measurements, for example the pairX1|1 andX1|2, then any state which
outputs 1 for the first measurement, 2 for the second measurement, and anything
for the remaining measurements will again hit the first effect but not the second.
It will be noticed that the ability to distinguish distinct effects with pure states is
not specific to the examples chosen; this is central to proving the first part of the
Lemma.
Suppose again that a pair of fiducial effects belongs to different measurements,
for exampleX1|1 andX1|2. Then it is just as easy to find a pure product state which
hits both effects: the state s which always outputs 1 will again suffice. Therefore
〈X1|1 +X1|2, s〉 = 2, implying that X1|1 +X1|2 is an improper effect. Thus, X1|1
and X1|2 cannot belong to the same decomposition of the unit effect. Again, this
is not specific to the example chosen: it follows almost immediately that the only
fiducial effect decompositions of the unit effect are given by the set of outcomes
belonging to a single measurement.
Lemma 7. Let E =
∑
α eα be an i-sub-unit effect. Then each composite fiducial
effect eα satisfies e(j)α = E(j) for all components j 6= i. Moreover, the set of ith
components {e(i)α } forms a local fiducial measurement on subsystem i.
Proof. We will prove the lemma by contradiction. Suppose first that e(j)α′ 6= E(j)
for some α′ and some j 6= i. Let E(j) = X(j)aj |xj and e
(j)
α′ = X
(j)
a′j |x′j
. Either xj 6= x′j ,
or xj = x′j but aj 6= a′j , so we can construct a pure product state s(j) on system
j such that s(j)xj 6= aj and s
(j)
x′j
= a′j , so that s(j) hits e
(j)
α′ but not E(j). Then there
211
exists a pure product state s whose jth component is s(j), so that 〈E, s〉 = 0, but
for which 〈eα′ , s〉 = 1, contradicting the fact that 〈eα′ , s〉 ≤ 〈E, s〉.
{e(i)α } is a set of fiducial effects satisfying
∑
α e
(i)
α = U (i), hence any pure state
s(i) on system i must hit exactly one of the e(i)α . If any two of the e(i)α are effects
corresponding to different measurements, then there is a pure state s(i) which hits
both of them. Hence the effects all belong to the same fiducial measurement x; if
{e(i)α } is not the full set of outcomes of measurement x, then there is a pure state
s(i) which hits none of them. It follows that {e(i)α } forms a fiducial measurement
on subsystem i.
Corollary 4. Let E =
∑r
α=1 eα be a sub-unit effect. Then {eα} does not strictly
cover a multiform effect.
Proof. From Lemma 1 it follows that e(j)α = E(j) for all components j 6= i, and
that the set {e(i)α } corresponds to the full set of outcomes for a local fiducial mea-
surement on subsystem i. In other words, the decompositions ofE are in a one-to-
one correspondence with the local fiducial effect decompositions of U (i), each of
which is obtained by fixing a measurement choice xi, and summing all the fiducial
effect vectors which correspond to an outcome for that measurement:
U (i) =
∑
xi
X(i)ai|xi . (5.54)
In particular, there are only a finite number of such local decompositions of
U (i), and the sets of local fiducial effects making up these decompositions are
pairwise disjoint. Likewise, there are only a finite number of decompositions of
E, and the sets of composite fiducial effects making up these decompositions are
pairwise disjoint.
Suppose that {eα} covers a multiform effect, i.e. (after relabelling the eα
if necessary) for some integer s < r and some integer t, there exist distinct
sets {eα}sα=1, {fβ}tβ=1 of fiducial effects such that
∑s
α=1 eα =
∑t
β=1 fβ . Then
{f1, . . . , ft, es+1, . . . er} is a decomposition ofE distinct from {e1, . . . , er}. How-
212
ever, both these sets contain er, contradicting the fact that the decompositions of
E are pairwise disjoint. Therefore {eα} does not cover a multiform effect.
Just as Lemma 7 was the algebraic version of Step 1 of 5.2.2, Lemma 8 is the
algebraic version of Step 2; however, since it is difficult to categorise which effects
are proper when the number of subsystems is large, a slightly different approach
must be used. Specifically, let r be the least number of non-zero entries over all
possible sub-rows of the system. Then Lemma 8 states that any binary effect table
which has at most r non-zero entries, and which does not cover any sub-rows, is
not multiform. It turns out that, along with Lemma 7, this is enough to prove the
desired result in the case that all subsystems have at least one measurement with
r outcomes - for the general case, however, we will require a more sophisticated
approach.
For convenience we will assume from here on that the subsystems are arranged
in order of increasing numbers of measurement outcomes, i.e. K(i)j ≤ K
(i)
j+1 and
K(i)1 ≤ K
(i+1)
1 ; this amounts to a relabelling of subsystems and measurement
choices. K(1)1 is therefore the smallest number of outcomes possible for any fidu-
cial measurement.
Lemma 8. For r ≤ K(1)1 suppose that {eα}rα=1 does not cover any sub-unit effects.
Then for any fiducial effect f /∈ {eα}, there is a pure product state which hits f
but none of the eα.
Proof. Let f = X(1)a1|x1 ⊗ · · · ⊗X
(N)
aN |xN . We proceed by induction on the number
of subsystems N . When N = 1 set sx1 = a1 to ensure that s hits X
(1)
a1|x1 . The
conditions imply that no partial sum of {eα} equals the unit effect, hence for each
other choice of measurement x′ 6= x1, it must be possible to choose sx′ such that
Xsx′ |x′ /∈ {eα}. By construction s hits X
(1)
a1|x1 but none of the eα.
WhenN > 1, note that for any fiducial effect g on system 1, the set {e{2,...,N}α :
e(1)α = g} is a decomposition of some effect on the remaining N − 1 subsystems.
This decomposition has size most K(1)1 ≤ K
(2)
1 , and also cannot cover any sub-
unit effects, hence, by the induction hypothesis applied to the case N − 1, there
213
exists a pure product state s(2) ⊗ · · · ⊗ s(N) which hits f {2,...N} but none of the set
{e{2,...,N}α : e(1)α = g}.
Again, it is necessary to set s(1)x1 = a1. Consider the set {e
(1)
α }, and the outcomes
for fiducial measurement choices other than x1 on system 1. One of two cases must
occur:
(a) The set {e(1)α } fills none of the other measurements, i.e. for every x′ 6= x1,
there is an ax′ such that X(1)ax′ |x′ /∈ {e
(1)
α }. For each such x′ set s(1)x′ = ax′ so
that s can hit eα only if e(1)α = f (1). However, using the inductive hypothesis,
there exists a pure product state s(2)⊗· · ·⊗s(N) which hits f {2,...,N} but none
of the set {e{2,...,N}α : e(1)α = f (1)}.
(b) There exists a measurement x′ 6= x1 on system 1 with K(1)x′ = r which is
filled by the set {e(1)α }, i.e. (after reordering) e(1)α = X(1)α|x′ for 1 ≤ α ≤ r.
{eα} covers no sub-unit effects, so there must be some α′ and some system
i 6= 1 such that e(i)α′ 6= f (i). Set s
(1)
x′ = α′ so that s does not hit any eα with
α 6= α′; the remaining components of s(1) may be chosen arbitrarily. By
the inductive hypothesis, there exists a pure product state s(2) ⊗ · · · ⊗ s(N)
which hits f {2,...,N} but not the single effect e{2,...,N}α′ .
In both cases, by construction s = s(1) ⊗ · · · ⊗ s(N) hits f but none of the
eα.
Corollary 5. The only multiform effects which have a decomposition with exactly
K(1)1 elements are sub-unit effects.
Proof. Suppose E =
∑r
α=1 eα =
∑s
β=1 fβ are distinct decompositions, with
r = K(1)1 , and suppose without loss of generality that f1 /∈ {eα}rα=1. Every pure
product state which hits f1 must also hit one of the eα, so it follows from Lemma 8
that {eα} covers a sub-unit effect. By Lemma 7, every decomposition of a sub-unit
effect has at leastK(1)1 elements, hence E is itself a sub-unit effect.
214
5.3.1 Identical subsystems revisited
We now demonstrate that Lemmas 7 and 8 are sufficient to prove that all re-
versible Boxworld transformations are trivial, in the case that all subsystems are
non-classical, and that there is a fixed positive integer r such that the least number
of outcomes for any local fiducial measurement on each subsystem i is r. This
is a weaker condition than demanding that all subsystems are identical, which is
weaker still than demanding in addition that all fiducial measurements on all sub-
systems have the same number of outcomes. This last condition is exactly that
assumed in [4], hence the result we recover in this section, whilst not fully gen-
eral, is already stronger than has previously been demonstrated.
Assuming that the least number of fiducial measurement outcomes is fixed
across subsystems, consider an i-sub-unit effect E for arbitrary i. Note that E is a
multiform effect with at least one decomposition with exactly r = K(1)1 elements
(corresponding to the measurement on subsystem i which has r outcomes). It
follows that T †(E) is also a multiform effect with at least one decomposition with
exactlyK(1)1 elements, hence by Corollary 5, T †(E) is a sub-unit effect. Thus, T †
maps sub-unit effects to sub-unit effects.
Now let e = X(1)a1|x1⊗· · ·⊗X
(N)
aN |xN and e
′ = X(1)a′1|x′1⊗· · ·⊗X
(N)
a′N |x
′
N
be composite
fiducial effects whose fiducial effect strings differ only in component i. Observe
that both effects belong to a decomposition of the i-sub-unit effect
E = X(1)a1|x1 ⊗ · · · ⊗ U
(i) ⊗ · · · ⊗X(N)aN |xN . (5.55)
Suppose that E is mapped to a j-sub-unit effect, for some j (not necessarily equal
to i). It follows that T †(e) and T †(e′) are distinct composite fiducial effects which
belong to some decomposition of T †(E). Lemma 7 then implies that T †(e) and
T †(e′) differ only in component j. Therefore T † preserves a Hamming distance of
1 between fiducial effect strings, and we may apply Theorem 10 and Lemma 6 to
deduce that T † must be some relabelling of subsystems, measurement choices and
measurement outcomes.
215
Theorem 12. The allowed reversible transformations of a Boxworld system with
non-classical subsystems, such that the least possible number of measurement out-
comes is constant across subsystems, are relabellings of subsystems, and local
relabellings of measurements and measurement outcomes.
Proof. We have already argued that these are the only possible reversible trans-
formations. It remains to show that any such transformation is allowed, as long
as the number of outcomes of each local fiducial measurement is respected. In
particular, a relabelling of measurements on a single subsystem is valid only if
whenever measurement x is mapped to measurement x′, then both x and x′ have
the same number of outcomes. A relabelling of subsystems is similarly valid
only if whenever subsystem i is mapped to subsystem j, then the measurement
choices in i are individually mapped to measurement choices in j which have
the same number of outcomes. Modulo these considerations, it is clear that any
such relabelling is a well-defined mapping on outcome distributions of the form
P (a1, . . . , aN |x1, . . . , xN). Thus valid relabellings are reversible and convex-
linear mappings of allowed states to allowed states.
5.4 Main result
We now relax the condition that K(i)1 = K
(1)
1 for all subsystems i, and demand
merely that each subsystem is non-classical (has more than one fiducial measure-
ment), and each local fiducial measurement is non-trivial (has more than one out-
come). This makes the task more complicated; not all sub-unit effects have de-
compositions withK(1)1 elements, and so Corollary 5 cannot universally be applied
in the same way.
However, Corollary 5 can still be applied for each subsystem i that does obey
K(i)1 = K
(1)
1 , so that T † permutes the set of sub-unit effects within this set of
subsystems. Now consider a system j for which K(j)1 is the next possible greater
value than K(1)1 . A j-sub-unit effect E must be transformed to something which
is multiform with some decomposition {eα} involving K(j)1 elements. We will
216
demonstrate that if (a)
∑
α eα is not a sub-unit effect, and (b) T † permutes sub-unit
effects on subsystems with K(i)1 = K
(1)
1 , then for any other decomposition {fβ}
of T †(E) there exists a pure product state s which hits f1 but none of the effects
{eα}. This sets up an iterative process, at each stage assuming that T † permutes the
sub-unit effects on subsystems with smaller numbers of outcomes. The iteration
terminates when K(j)1 takes on its maximal value, and T † thus permutes the full
set of sub-unit effects in the system.
The following definition is convenient for discussing the set of sub-units which
belong to one of several subsystems:
Definition. In a multipartite Boxworld system comprising N subsystems, S{i} is
the set of sub-unit effects at system i ∈ [N ]. If Ω ⊆ [N ] is a subset of the N
subsystems, then SΩ = ∪i∈ΩS{i}.
In order to prove that sub-unit effects are mapped to sub-unit effects, the fol-
lowing lemmas will prove useful.
Lemma 9. Let Ω ⊆ [N ] and suppose that T † is an allowed reversible transforma-
tion which permutes the set SΩ. Then the images of two composite fiducial effects
will be identical for all components outside Ω if and only if the original effects
were. i.e.:
eΩ¯1 = eΩ¯2 ⇐⇒
(
T †(e1)
)Ω¯ =
(
T †(e2)
)Ω¯
where Ω¯ = [N ] \ Ω.
Proof. Suppose firstly that the composite fiducial effects e1 and e2 differ only
in one component i ∈ Ω. Observe that e1 and e2 belong to (possibly different)
decompositions of a unique sub-unit effect E ∈ S{i}. By assumption T †(E) is a
j-sub-unit effect for some j ∈ Ω; T †(e1) and T †(e2) belong to decompositions of
T †(E), hence by Lemma 7 can only differ in component j.
Suppose now that e1, e2 satisfy e(k)1 = e
(k)
2 for all k /∈ Ω, but that they differ in
any number of components belonging to Ω. Then it is possible to move from e1
217
to e2 by changing one component at a time (each component belonging to Ω). At
each step, T † maps the corresponding pair of effects to a pair which differ only in
components belonging to Ω. Hence T †(e1)(k) = T †(e2)(k) for all k /∈ Ω.
To prove the converse direction, note that if T † is an allowed reversible trans-
formation which permutes the set SΩ, then so is (T †)−1.
Lemma 10. Suppose that E =
∑r
α=1 eα is a sub-unit effect, and that {T †(eα)}
covers a sub-unit effect F . Then T †(E) = F .
Proof. Without loss of generality let
∑s
α=1 T †(eα) = F for s ≤ r, and let
∑
β fβ
be a distinct decomposition ofF . ThenE covers themultiform effect (T †)−1(F ) =
∑s
α=1 eα =
∑
β=1(T †)−1(fβ). It follows from Corollary 4 that s = r and that
T †(E) = F .
Lemma 11. Suppose that {eα} does not cover any sub-unit effects, but that there
exists some subsystem i for which
∑
α e
(i)
α = U (i). Then for any fiducial effect
f /∈ {eα}, there exists a pure product state which hits f but none of the eα.
Proof. Let f (i) = X(i)a|x and Ωi = [N ] \ {i}. Note that {e
(i)
α } is the complete set
of outcomes for some fiducial measurement x′ on subsystem i: without loss of
generality, e(i)α = X(i)α|x′ .
If x′ = x, then f (i) = e(i)a . Set s(i)x = a and choose the remaining components
of s(i) arbitrarily, so that s(i) hits e(i)a but none of the other e(i)α . Note that fΩi and
eΩia must be distinct fiducial effects, so by Lemma 8 there exists a pure product
state sΩi which hits fΩi but not eΩia .
If x′ 6= x, then since
∑
eα is not a sub-unit effect, there existsα′ and i′ 6= i such
that e(i
′)
α′ 6= f (i
′). Set s(i)x = a and s(i)x′ = α′, and choose the remaining components
of s(i) arbitrarily. Again, by Lemma 8 there is a pure product state sΩi which hits
fΩi but not the single fiducial effect eΩiα′ .
In both cases, combining s(i) with sΩi gives a pure product state s which hits
f but none of the eα.
Lemma 12. Reversible Boxworld transformations map sub-unit effects to sub-unit
effects, so long as none of the subsystems are classical.
218
Proof. We begin by considering the action of T † on a 1-sub-unit effect E. T †(E)
is a multiform effect with a decomposition containing K(1)1 elements, hence by
Corollary 5 it is a j-sub-unit effect for some subsystem j with K(j)1 = K
(1)
1 . By
the same reasoning T † permutes the set SΩ, where Ω = {j : K(j)1 = K
(1)
1 }.
We now show iteratively for each positive integer r > K(1)1 that T † permutes
the set SΩr , where Ωr = {i : K
(i)
1 = r}. Let i ∈ Ωr, let
∑r
α=1 eα =
∑s
β=1 e′β
be distinct decompositions of an i-sub-unit effect E and assume that T † permutes
the set SΩ, where Ω = {j : K(j)1 < r}. Note that T †(E) is also multiform, since
T †(E) =
∑r
α=1 T †(eα) =
∑
β T †(e′β). Write fα = T †(eα) and g = T †(e′1), noting
that g /∈ {fα}.
Assuming that {fα} does not cover a sub-unit effect, our aim is to construct a
pure product state s that hits g but none of the fα, giving a contradiction. Hence
{T †(eα)}must cover a sub-unit effect. It then follows from Lemma 10 that T †(E)
is itself an i′-sub-unit effect for some i′ ∈ Ωr. By continuing the iteration, we
complete the proof of the lemma.
To obtain the desired contradiction, assume that {fα} does not cover a sub-unit
effect. Let Ω¯ = [N ] \ Ω and consider the set {f Ω¯α }rα=1 (recall that for a fiducial
effect f , f Ω¯ is the tensor product of all those components of f belonging to Ω¯).
Since e′(i)1 is distinct from e
(i)
α for all α, and i ∈ Ω¯, we have that e′Ω¯1 /∈ {eΩ¯α}. It
follows from Lemma 9 that gΩ¯ /∈ {f Ω¯α }.
Suppose that there exists a subsystem i′ ∈ Ω¯ such that {f (i
′)
α }rα=1 covers the
local unit effect U (i′), i.e.
∑
α f
(i′)
α ≥E(i′)+ U
(i′). Recall that the fiducial effect de-
compositions of U (i′) are obtained by fixing a fiducial measurement on subsystem
i′ and taking the set of local fiducial effect vectors which correspond to an out-
come of that measurement. Since all fiducial measurements on subsystem i′ have
at least r outcomes, it follows that
∑
α f
(i)
α = U (i), and by Lemma 11 there exists
a state which hits g but none of the fα.
Suppose instead that there is no subsystem i′ ∈ Ω¯ for which {f (i
′)
α }rα=1 covers
U (i′). Then the set {f Ω¯α } -- considered as a collection of fiducial effects over the
maximal tensor product of all subsystems belonging to Ω¯ -- does not cover any
219
sub-unit effects. By Lemma 8 applied to the subsystems belonging to Ω¯, there
exists a pure product state sΩ¯ which hits gΩ¯ but none of the f Ω¯α . Combining sΩ¯
with any pure state sΩ which hits gΩ gives a pure product state s which hits g but
none of the fα.
Having proved that reversible transformations of Boxworld systems map sub-
unit effects to sub-unit effects, it is straightforward once again to show that they
must be trivial.
Theorem 13. The only reversible transformations of non-classical systems al-
lowed in Boxworld are relabellings of subsystems, and local relabellings of mea-
surement choices and measurement outcomes.
Proof. To complete the proof, we again need only check that all relabellings of this
form are allowed transformations, as long as subsystems are only permuted only
if they are identical, and measurement choices are permuted only if they have the
same number of outcomes. Again, this is obvious from considering the action of
such transformations on distributions in the form P (a1, . . . , aN |x1, . . . , xN).
5.5 Polytopic models with non-trivial reversible dy-
namics
In this Section we introduce and discuss a new probabilistic model which does
not belong to Boxworld, but shares some similarities with it. At several points
in the proof of Theorem 13 above, we have relied on two essential features of
Boxworld which do not hold in quantum theory; that the number of extreme-ray
vectors of the local state and effect cones is finite (that is to say, the model is
polytopic), and that local systems combine under the maximal tensor product. The
combination of these features implies that the extreme-ray vectors of the composite
effect cone are exactly the tensor products of the local extreme-ray effect vectors.
Consequently, the adjoint of any reversible transformation of the composite space
must map composite fiducial effects to composite fiducial effects.
220
The obvious question is then: are these features sufficient to derive the con-
clusion of Theorem 13? Does there exist any probabilistic model whose local
state-cones are polytopic cones, and whose systems combine under the maximal
tensor product, yet which has reversible transformations on its composite systems
which are not made up of local operations and relabelling of subsystems? In this
Section we demonstrate that such models do exist, by way of an explicit example.
Before introducing this example, it is worth discussing polytopic models in
more detail. The local systems of a polytopic model will still be characterised by
a finite set of fiducial measurements, however it will not generally be the case
that any outcome distribution on those measurements is permitted. Instead, some
further set of restrictions are imposed on the space of possible outcome distribu-
tions, so that they form a strict subset of the full set of outcome distributions that
would be present in a Boxworld system which has the same number of fiducial
measurements and outcomes. To see this in a concrete example, recall the “poly-
gon models” of Chapter 2, in which the local state space takes the form of a regular
n-gon, for some integer n. In the case n = 5, the set of pairs of fiducial effects
{ei, en+i}5i=1 form a set of 5 measurements that is fiducial for the state space. Thus
it is possible to characterise a state by assigning outcome probabilities for each of
these measurements, rather than specifying its position on the pentagon: in this
way the set of outcome distributions is naturally a subset of the set of outcome
distributions of a Boxworld systems with 5 measurements and 2 outcomes. This
subset is strict, since unlike a Boxworld system it is not possible for a state in a
pentagon system to assign a definite outcome to every fiducial measurement (such
a vector would have to have inner product 1 with 5 fiducial effect vectors, as well
as the unit effect, which it can easily be checked is impossible).
Recall that an individual Boxworld systemmay be defined by specifying a unit
effectU and a set of fiducial effect vectors {Xa|x}Kx−1a=1 , all of which are linearly in-
dependent. The remaining fiducial effect vectors are defined via the normalization
conditions XKx|x = U −
∑Kx−1
a=1 Xa|x. Thus a g-bit (binary input/output) system
is represented by a 3-dimensional state space, wherein the set of normalised states
221
lie in the subspace {v : 〈U , v〉 = 1}. Suppose now that we “squash” the g-bit state
space, by introducing one extra restriction on the outcome statistics: namely, that
the probability of obtaining outcome a = 1 is invariant of whether measurement
choice x = 1 or x = 2 was performed. The vector of probabilities representing
the outcome distribution must then take the form:
(p, 1− p | p, 1− p). (5.56)
In fact, this is a representation of a classical bit, since only one fiducial measure-
ment (x = 1 for example) is sufficient to characterise any such state, and any out-
come distribution on measurement 1 is possible. In order to make this “squashing”
procedure more interesting we can apply it to a g-trit, i.e. a Boxworld system with
two fiducial measurements, each of which has three outcomes. In this case, the
vector of outcome probabilities takes the following form:
(p, q, 1− p− q | p, q′, 1− p− q′). (5.57)
The effect cone which is dual to this set of states may be constructed in the
following way: in a real vector space of dimension 4, choose a set of linearly
independent vectors {U , X1, X2|1, X2|2}. Unlike Boxworld, the vector X1 corre-
sponds to the obtaining the first outcome for both measurements x = 1 and x = 2.
The remaining fiducial effect vectors X3|2 and X3|3 are defined according to the
normalisation relations:
U = X1 +X2|1 +X3|1 = X1 +X2|2 +X3|2. (5.58)
In this Section we will refer to such a system as a squashed g-trit.
Proposition 8. Suppose that a composite system is the max-tensor product of two
subsystems, the first subsystem being a squashed g-trit and the second subsystem
being a standard g-bit. Then there is an allowed adjoint transformation which
switches the composite fiducial effectsX(1)1 ⊗X
(2)
1|1 andX
(1)
1 ⊗X
(2)
2|1 , whilst leaving
222
all other composite fiducial effects invariant. Moreover, this transformation is
reversible and non-trivial (not a composition of local operations and permutations
of subsytems).
Proof. Recall that a g-bit system has fiducial effect vectors {X1|1, X2|1, X1|2, X2|2},
such thatX1|1 +X2|1 = X1|2 +X2|2 = U . The given transformation is essentially
a kind of C-NOT operation on the systems involved: by mapping X(1)1 ⊗X
(2)
1|1 ↔
X(1)1 ⊗ X
(2)
2|1 we are imposing that the outcomes of measurement x = 1 on sub-
system 2 are switched, conditional on outcome 1 occurring on subsystem 1. In
order to prove the proposition, we must verify that the given transformation obeys
three properties: that it is allowed (i.e. maps allowed states to allowed states in a
convex-linear fashion), that it is reversible, and that it is non-trivial.
1. Allowed.
Suppose that vector spaces V1 and V2 represent the squashed g-trit and g-bit
subsystems respectively. It suffices to show that there exists a linear operator T † on
V1⊗V2 which permutes the set of composite fiducial effect vectors in the required
manner. Take bases B(1) = {U , X1, X2|1, X2|2} and B(2) = {U , X1|1, X1|2} for
V1 and V2 respectively. By expanding tensor products of these basis elements in
terms of the composite fiducial effect vectors, and applying the given permutation
of fiducial effect vectors (the “C-NOT” operation), we may derive the action that
T † must take on the tensor product basis elements. For example, we may begin by
demanding that,
T †
(
X(1)1 ⊗X
(2)
1|1
)
= X(1)1 ⊗X
(2)
2|1
= X(1)1 ⊗ U (2) −X
(1)
1 ⊗X
(2)
1|1 . (5.59)
223
As required, T † leaves the composite unit effect invariant:
T †
(
U (1) ⊗ U (2)
)
= T †
([
X(1)1 +X
(1)
2|1 +X
(1)
3|1
]
⊗
[
X(2)1|1 +X
(2)
2|1
])
= T †
(
X(1)1 ⊗
[
X(2)1|1 +X
(2)
2|1
]
+
[
X(1)2|1 +X
(1)
3|1
]
⊗
[
X(2)1|1 +X
(2)
2|1
])
= X(1)1 ⊗
[
X(2)2|1 +X
(2)
1|1
]
+
[
X(1)2|1 +X
(1)
3|1
]
⊗
[
X(2)1|1 +X
(2)
2|1
]
= U (1) ⊗ U (2). (5.60)
Note that alternatively decomposing U (1) asX(1)1 +X
(1)
2|2 +X
(1)
3|2 , or decomposing
U (2) as X(2)1|2 + X
(2)
2|2 (or both), will not affect this calculation, which reassures us
that the action of T † on the composite unit effect is well-defined. We find that the
vector U (1) ⊗X(2)1|1 is transformed in the following manner:
T †
(
U (1) ⊗X(2)1|1
)
= T †
(
X(1)1 ⊗X
(2)
1|1 +X
(1)
2|1 ⊗X
(2)
1|1 +X
(1)
3|1 ⊗X
(2)
1|1
)
= X(1)1 ⊗X
(2)
2|1 +X
(1)
2|1 ⊗X
(2)
1|1 +X
(1)
3|1 ⊗X
(2)
1|1
= X(1)1 ⊗
[
U (2) −X(2)1|1
]
+
[
U (1) −X(1)1
]
⊗X(2)1|1
= X(1)1 ⊗ U (2) + U (1) ⊗X
(2)
1|1 − 2X
(1)
1 ⊗X
(2)
1|1 . (5.61)
Again, this is well-defined in the sense that we get the same result if the alternative
fiducial effect decomposition of U (1) is used. It turns out that X(1)1 ⊗ X
(2)
1|1 and
U (1)⊗X(2)1|1 are the only two members of the tensor product basis not kept invariant
by T †, as can be checked by similar calculations. For example,
T †
(
X(1)1 ⊗ U (2)
)
= T †
(
X(1)1 ⊗
[
X(2)1|1 +X
(2)
2|1
])
= X(1)1 ⊗
[
X(2)2|1 +X
(2)
1|1
]
= X(1)1 ⊗ U (2). (5.62)
At this point we have defined a linear map T †, defined on V1⊗V2 by its action
on the tensor product basis resulting from the bases B(1) and B(2). Specifically, T †
224
maps:
X(1)1 ⊗X
(2)
1|1 → X
(1)
1 ⊗ U (2) −X
(1)
1 ⊗X
(2)
1|1
U (1) ⊗X(2)1|1 → X
(1)
1 ⊗ U (2) + U (1) ⊗X
(2)
1|1 − 2X
(1)
1 ⊗X
(2)
1|1 , (5.63)
and leaves all other basis vectors invariant. We must now check that T † correctly
permutes the composite fiducial effects in the manner originally specified, i.e. that
it switchesX(1)1 ⊗X
(2)
1|1 andX
(1)
1 ⊗X
(2)
2|1 , whilst leaving all other composite fiducial
effects invariant. This will in turn demonstrate that the specified permutation is
allowed, since it necessarily corresponds to a linear map that maps the joint effect
cone Emax+ into itself.
Note that many of the composite fiducial effects are also members of the tensor
product basis, but not those with any component belonging to the set
{X(1)3|1 , X
(1)
3|2 , X
(2)
2|1 , X
(2)
2|2}. (5.64)
However, we may expand these non-basis composite fiducial effects in the tensor
product basis, and hence check that T † acts correctly on them. The effectsX(1)1 ⊗
X(2)1|1 and X
(1)
1 ⊗X
(2)
2|1 are indeed switched:
T †
(
X(1)1 ⊗X
(2)
1|1
)
= X(1)1 ⊗ U (2) −X
(1)
1 ⊗X
(2)
1|1 = X
(1)
1 ⊗X
(2)
2|1 (5.65)
T †
(
X(1)1 ⊗X
(2)
2|1
)
= T †
(
X(1)1 ⊗
[
U (2) −X(2)1|1
])
= X(1)1 ⊗ U (2) −
(
X(1)1 ⊗ U (2) −X
(1)
1 ⊗X
(2)
1|1
)
= X(1)1 ⊗X
(2)
1|1 (5.66)
Moreover, the remaining composite fiducial effect vectors are invariant under T †.
225
For example,
T †
(
X(1)3|1 ⊗X
(2)
2|1
)
= T †
([
U (1) −X(1)1 −X
(1)
2|1
]
⊗
[
U (2) −X(2)1|1
])
=
[
U (1) −X(1)1 −X
(1)
2|1
]
⊗ U (2) +X(1)2|1 ⊗X
(2)
1|1
+ T †
(
X(1)1 ⊗X
(2)
1|1 − U
(1) ⊗X(2)1|1
)
=
[
U (1) −X(1)1 −X
(1)
2|1
]
⊗ U (2) +X(1)2|1 ⊗X
(2)
1|1
+X(1)1 ⊗X
(2)
1|1 − U
(1) ⊗X(2)1|1
= X(1)3|1 ⊗
[
X(2)1|1 +X
(2)
2|1
]
−X(1)3|1 ⊗X
(2)
2|1
= X(1)3|1 ⊗X
(2)
1|1 . (5.67)
It is worth stressing the order of steps taken in this part of the proof. We were
given a permutation of the joint fiducial effects, but with no guarantee that it cor-
responds to an allowed transformation. We then derived the form of the linear
operation which - if the given permutation of fiducial effect vectors corresponds
to an allowed transformation - must be the unique linear operator that performs
that permutation. It was then necessary to verify that the linear operator does in-
deed permute the fiducial effect vectors in the correct manner.
2. Reversible.
Given that T † as defined above is allowed, and permutes composite fiducial
effects as in the statement of the Proposition, it is trivial to see that is is reversible.
Composing T † with itself gives an operator that leaves all composite fiducial ef-
fect vectors invariant. Since this set of vectors spans the space V1 ⊗ V2, it must be
that T † is its own inverse.
3. Non-trivial.
We now demonstrate that T † is not a composition of local operations and swap-
ping of subsystems (although, since the subsystems are non-identical, the latter is
226
impossible anyway). Recall our use of the Hamming distance between pairs of
fiducial effect strings in Section 5.1, which is defined as the number of compo-
nents in which those strings differ. Clearly, local operations and permutations of
subsystems must preserve the Hamming distance between pairs of effects. How-
ever, consider the composite fiducial effects e = X(1)1 ⊗X
(2)
1|1 and f = X
(1)
2|1⊗X
(2)
1|1 ,
which are mapped to the composite fiducial effects T †(e) = X(1)1 ⊗ X
(2)
2|1 and
T †(f) = f . The Hamming distance between e and f is 1, whereas the Hamming
distance between T †(e) and T †(f) is 2. Hence T † is non-trivial.
Further insight into why T † is an allowed transformation can be gained by
considering the tabular representation of the max-tensor product of a squashed g-
trit and a g-bit system. Suppose that, instead of a squashed g-trit, a similar C-NOT
transformation is applied to a max-tensor product of a g-trit and a g-bit system. A
slight change of notation is needed, so that the composite fiducial effects X(1)1|1 ⊗
X(2)1|1 andX
(1)
1|1 ⊗X
(2)
2|1 are switched; optionally, the effectsX
(1)
1|2 ⊗X
(2)
1|1 andX
(1)
1|2 ⊗
X(2)2|1 may also be switched. Determining the linear form of such a transformation
becomes problematic when considering the tensor product basis vectorU (1)⊗X(2)1|1 ,
due to the fact that a multiform effect is transformed into a non-multiform effect
(recall that a bipartite binary effect table in Boxworld is multiform if and only if it
contains a sub-row):
1 0 0 0
1 0 0 0
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
→
0 1 0 0
1 0 0 0
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
. (5.68)
However, the effect table on the RHS of (5.68) ismultiform if it represents the
max-tensor product of a squashed g-trit and g-bit system. It is equivalent to the
227
following effect table:
0 1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
1 0 0 0
1 0 0 0
, (5.69)
which can be seen from the fact that they are identical on pure product states (for
which the first and fourth lines of the corresponding state table are identical).
It is not difficult to give an example in which T † generates classical correla-
tions, giving a second proof that T † is non-trivial. The product state
(1/2, 1/2, 0 | 1/2, 1/2, 0)⊗ (1, 0|1, 0), (5.70)
is transformed by T † in the following manner:
1/2 0 1/2 0
1/2 0 1/2 0
0 0 0 0
1/2 0 1/2 0
1/2 0 1/2 0
0 0 0 0
→
0 1/2 1/2 0
1/2 0 1/2 0
0 0 0 0
0 1/2 1/2 0
1/2 0 1/2 0
0 0 0 0
. (5.71)
The state on the RHS is not a product state, but is rather the following convex
combination of pure product states:
1
2
(1, 0, 0|1, 0, 0)⊗ (0, 1|0, 1) + 1
2
(0, 1, 0|0, 1, 0)⊗ (1, 0|1, 0). (5.72)
On the other hand, it can be shown that T † does not generate any entanglement
in this model. Recall that any reversible transformation must map pure states to
pure states: for entanglement to be generated reversibly, at least one pure product
state must be transformed to a pure entangled state, otherwise the set of convex
228
combinations of pure product states will be mapped into itself. We already know
that the pure states of a g-bit system are the 4 states that assign deterministic out-
comes for both fiducial measurement choices; a similar argument applies to the
squashed g-trit, whose pure states are the 5 states that assign deterministic out-
comes and give equal probabilities for the first outcome of both fiducial measure-
ments. Therefore the state table representations of the pure product states can be
enumerated, and it can be verified that T † permutes the set of pure product states.
Indeed, we can follow this line of reasoning to conclude that no reversible
transformation of the bipartite system we have described is capable of generating
entanglement. Note that the state tables representing pure product states are made
up of only 1s and 0s, and conversely that any state table consisting only of 1s
and 0s represents a pure product state. In a similar vein to Proposition 7, we may
therefore deduce that s is a pure product state if and only if 〈e, s〉 ∈ {0, 1} for
all composite fiducial effects e. The adjoint of any reversible transformation T
permutes the set of composite fiducial effects, hence for any pure product state s
we have that 〈e, T (s)〉 = 〈T †(e), s〉 ∈ {0, 1} for all composite fiducial effects e.
Thus every reversible transformation T maps pure product states to pure product
states, and does not generate entanglement.
5.6 Discussion
We have refined and extended the result of [4], and demonstrated that - as long
as no subsystem is classical - reversible dynamics in general composite Boxworld
systems always take the form of permutations of subsystems and local relabellings
of measurement choices and outcomes on individual subsystems. This character-
isation, and in particular the method of proof we develop, may be useful as tools
for exploring which theories are reversible transitive. Note that we were already
able to show in Proposition 7 of Section 2.5.5 that Boxworld is not a reversibly
transitive theory, by proving that reversible transformations permute the set of
pure product states. It is therefore worth highlighting in what ways it is useful to
229
provide an explicit characterisation of reversible Boxworld transformations.
One very useful outcome of this work is in developing a broader understanding
of reversible dynamics; in particular, developing the study of reversible transfor-
mations in general probabilistic theories, and a framework for exploring the ap-
plication of these techniques to theories other than Boxworld. In Section 5.5 we
gave a result which limits the extent to which Theorem 13 may be generalised --
such a result would have been significantly harder to come by without the insights
and perspective gained from our construction of the proof of Theorem 13 and the
tabular formalism developed along the way. We have consequently shown that
the analogous result to Theorem 13 does not generally hold in theories for which
the number of local fiducial effects is finite, and for which composite systems
combine under the max-tensor product. However, the local squashed g-trit system
is not reversibly transitive, as the state which assigns outcome 1 to both fiducial
measurements cannot be reversibly transformed to the state which assigns 2 to
both fiducial measurements (it can be checked that the adjoint transformation will
not be well-defined on the fiducial effect X(1)1 , using the notation of Section 5.5).
A natural next step in this direction would be to investigate the set of allowed
reversible transformations of the polygonal models given in Section 2.5.5. Such
models also admit a finite number of fiducial effects, and in Section 5.5 we saw
that an n-gon system may be viewed as a subset of the outcome distributions of a
Boxworld system with n fiducial measurements, each of which has two outcomes.
The case of even n presents an even greater similarity to Boxworld in that the
extreme-ray effects are identical with the fiducial effects, thus improving the ap-
plicability of the techniques we have developed. Unfortunately, this equivalence
between fiducial and extreme-ray effects is not true for odd n, where only half of
the fiducial effects are extreme-ray effects.
At the beginning of the present chapter we discussed the centrality of reversible
transitivity in derivations of quantum theory; an interesting open conjecture is that
this feature alone is sufficient as a “physical axiom” fromwhich to derive quantum
theory. In order to prove such a conjecture it would be necessary to demonstrate
230
that any general probabilistic theory which is not in a well-defined sense “embed-
ded in” quantum theory, is not reversibly transitive either. We may say that a the-
ory is “embedded in” quantum theory if the systems of that theory can be mapped
to quantum systems, and the effects and states mapped to positive operators and
density operators respectively on those systems, in such a manner that composite
systems are mapped to the corresponding compositions of quantum systems, and
that the predictions (both local and composite) of the theory may equivalently be
calculated by taking the trace of the product of the relevant quantum operators.
For example, Boxworld is not contained within quantum theory, since it allows
for a strictly greater violation of the CHSH inequality than quantum theory does.
From this perspective, the squashed g-trit example given in Section 5.5 may be
seen as providing a counter-example to the strongest of three possible versions of
this conjecture. In order of strongest to weakest, these versions of the conjecture
may be explicitly stated as:
Strong Any theory with non-trivial reversible joint dynamics is embedded in quan-
tum theory.
Medium Any theory which is reversibly transitive on local systems and has non-
trivial, reversible, joint dynamics is embedded in quantum theory.
Weak Any theory which is reversibly transitive on local and composite systems is
embedded in quantum theory.
Note that the bipartite system consisting of a max-tensor product of a squashed
g-trit and a g-bit is not embedded in quantum theory, since it allows for maximal
CHSH violations in much the same way that a max-tensor product of two g-bits
does, via fiducial measurements on the following allowed state:
231
0 0 0 0
1/2 0 1/2 0
0 1/2 0 1/2
0 0 0 0
1/2 0 0 1/2
0 1/2 1/2 0
(5.73)
Thus the squashed g-trit provides a counter-example to the strong version of the
conjecture, but not to the medium or weak versions. As already mentioned, it
would be interesting to look at whether polygon models (which are locally re-
versibly transitive) give any insight into whether the medium conjecture holds in
general. One result by Masanes et al [2] is in a similar vein to the Medium conjec-
ture, and states that any general probabilistic theory whose local systems are qubit
systems, and which admits at least one non-trivial, continuous, reversible interac-
tion between systems, must also be identical to quantum theory on its composite
systems. This suggests that structure of the local state space, in conjunction with
reversible transitivity, may have a lot to tell us about the nature of quantum theory.
If it is true that all reversibly transitive theories are embedded in quantum the-
ory, then quantum theory is the maximal reversibly transitive theory that could
possibly describe Nature; this would go some way towards settling the matter of
understanding why quantum theory is the most accurate description of Nature. On
the other hand, if a “foil” theory is discovered, which is reversibly transitive but
which makes predictions that quantum theory cannot, then this is of great interest
in itself. Perhaps such a counterexample will eventually supersede quantum the-
ory as a theory of Nature; at the very least it will give us some indication of what
further axioms are necessary (rather than simply sufficient) to distinguish quan-
tum theory from the full set of general probabilistic theories. Whichever outcome
turns out to be true, the question of reversible transitivity is undoubtedly worthy
of further investigation.
232
233
Chapter 6
Conclusion and Outlook
The miracle of the appropriateness of the language of mathematics
for the formulation of the laws of physics is a wonderful gift which
we neither understand nor deserve. We should be grateful for it and
hope that it will remain valid in future research and that it will extend,
for better or for worse, to our pleasure, even though perhaps also to
our bafflement, to wide branches of learning.
"The Unreasonable Effectiveness of Mathematics"
Eugene Wigner
234
In this thesis we have derived various results which attempt to provide a more
intuitive and comprehensive understanding of some of the stranger features of
quantum theory. In Chapter 3 we investigated the remarkable fact that if one re-
laxes either one of two assumptions in quantum theory's Hilbert space formalism -
that states be represented by positive operators, or that observables be represented
by positive operators - then the predictions of quantum theory extend to the full
class of non-signaling distributions, and hence encompass the full class of gen-
eral probabilistic theories. We have also shown that the same feat is possible by
introducing quasiprobabilies into local, classical outcome distributions, even in
the case where almost no correlation exists between the involved systems. These
results are useful in that they can be applied to the study of general probabilistic
theories (for example, in deriving quantum theory from physical principles [2])
and that they provide a neat categorisation of quantum-achievable correlations as
those which are generated so long as we do enforce the positivity of all operators.
Chapter 4 explores the uniqueness of quantum theory from the perspective of
information theory. We prove the conjectured tightness of a bound on how well
entanglement allows us to perform random access codes, and introduce a quadratic
bias bound
∑
yE2y ≤ 1 which seems to capture a great deal of information about
the set of quantum-achievable correlations. We also argue that the existence of a
sensible measure of entropy precludes many general probabilistic theories whose
non-locality is stronger than that of quantum theory, and have discussed how this
relates to recent research on entropies in general probabilistic theories. Both the
quadratic bias bound and the suggested link between entropy and non-locality
throw up many intriguing open questions which may in future lead to fruitful lines
of research.
Chapter 5 explores the role that reversible transitivity plays as a characteristic
and fundamental feature of quantum theory. It is demonstrated that the triviality
of reversible Boxworld transformations, previously only known in the case that all
subsystems are identical [4], extends to the most general case, so long as none of
the subsystems are classical. A key insight in this proof was to consider a special
235
class of effects known as sub-unit effects, and to show that the set of sub-unit
effects is mapped to itself by the adjoint of any reversible transformation. As a
visual aid to motivate the central idea of the proof, and to provide a simpler proof
for the bipartite case, we introduced and further developed a tabular formalism of
Boxworld states and effects first introduced in [5]. The techniques developed in
this chapter may provide useful tools in exploring the reversible dynamics of other
general probabilistic theories. We give one example of a model which is more
non-local than quantum theory, but does allow for non-trivial reversible dynamics,
although this model is not reversibly transitive. There exists a fascinating interplay
between how non-local a theory is and the richness of its reversible dynamics.
Further research might illuminate how the dynamical properties of the universe
inform the strength of correlations allowed between distant parties.
The reader may make the justified observation that the content of each chap-
ter is somewhat disparate, in that there is little interplay between their respective
results, and in that they do not appear to constitute a linear progression towards
some unified goal. Nevertheless, in the interests of tying together several years
spent thinking about the same physical theory, it is worth analysing what similar-
ities can be found amongst these chapters. There is one quite conspicuous feature
that they share in common: each chapter constitutes an investigation of quantum
theory “from the outside”; that is to say, as a special member of the class of general
probabilistic theories. We may go one step further and say that the main results of
each chapter are an attempt to mark out quantum theory from all other theories; to
discern what makes quantum theory special in the first place.
Chapters 4 and 5 are each concerned with a fundamental property of quantum
theory, the imposition of which renders one or more alternative theories as unnat-
ural, or at least inconvenient. One moral that can be drawn equally well from both
chapters is thus: if the world were not as described by quantum theory, then one
or another fundamental physical aspect of the universe would be violated. These
results strive towards a characterisation of quantum theory as the only reasonable
theory that could possibly describe nature. Of course, we should always keep at
236
the back of our minds the possibility that quantum theory is not the most accurate
description of nature, or indeed that the attempt to model nature in a way which is
somehow agreeable to our common sense is futile, since nature might simply be
too weird for us to grasp in a way that we find intuitive. Despite this reservation,
we can take the optimistic view from these chapters (and other recent results in
quantum foundations) that there is still more ground to be gained in obtaining an
intuitive and reasonable explanation of quantum phenomena.
Chapter 3 stands apart from Chapters 4 and 5 in that it makes no real judge-
ment about non-quantum theories. Rather, the results in this chapter explore the
relation between the outcome distributions achievable in classical theory, quan-
tum theory and Boxworld, by means of extending the normal rules of probability.
The overarching moral of the chapter takes on a more negative tone concerning
the uniqueness of quantum theory: it is not likely a special feature of quantum
theory that all non-signaling distributions admit quasiprobabilistic quantum repre-
sentations, and negative probabilities should probably be regarded as a useful cal-
culational tool more than as a profound insight into the workings of the universe.
However, it is very possible that future work on quasiprobabilistic representations
will, for example, draw a stronger connection between non-locality and negative
probabilities, in such a way as to provide a reasonable principle by which strongly
non-local outcome distributions are ruled out. Even if this does not turn out to
be the case, it is still useful to have a unified, quantum-like framework for non-
signaling correlations, and to have a variety of local quasiprobabilistic models for
these correlations.
Quantum non-locality has played almost as prominent a role in our discus-
sions as have general probabilistic theories. Each chapter draws out interactions
between non-locality and some other object of physical or mathematical interest,
be it negative probability, information-based games, entropy, or reversible dynam-
ics. An interesting take on the local quasiprobabilistic distributions of Chapter 3
is the idea that one is replacing the weirdness of non-locality with the weirdness
of negative probabilities. Assigning positive probabilities to all possible outcomes
237
is clearly an assumption of Bell's Theorem; we have shown that Bell's Theorem
does not hold if one just slightly relaxes to the extent that negative probabilities
are assigned to outcomes which never actually occur. Despite its treatment by
various notable physicists including Dirac [66] and Feynman [91], the idea of
negative probabilities has never gained much traction. Is it simply a lesser de-
gree of discomfort that seems to incline us towards accepting non-locality over
negative probabilities? Perhaps our comfort with locality is the result of cultural
and historical familiarity; if Newtonian physics had been written in a different
conceptual language, would the language of quantum theory - being heavily in-
fluenced by Newtonian mechanics - have developed differently too, perhaps even
having a general probabilistic theory slant from the beginning? These questions
are difficult to answer, but as long as the presiding theories of nature admit distinct
interpretations and distinct mathematical representations, it is in our own nature
to differentiate these interpretations and representations according to our sense of
reason, usefulness and beauty.
238
Bibliography
[1] A. Acín, R. Augusiak, D. Cavalcanti, C. Hadley, J. K. Korbicz, M. Lewen-
stein, Ll. Masanes, andM. Piani. Unified framework for correlations in terms
of local quantum observables. Phys. Rev. Lett., 104:140404, 2010.
[2] G. de la Torre, L. Masanes, A. J. Short, and M. P. Müller. Deriving quantum
theory from its local structure and reversibility. Phys. Rev. Lett., 109:090403,
2012.
[3] M. Pawlowski and M. Zukowski. Entanglement-assisted random access
codes. Phys. Rev. A, 81:042326, 2010.
[4] D. Gross, M. P. Mueller, R. Colbeck, and O. C. O. Dahlsten. All re-
versible dynamics in maximally non-local theories are trivial. Phys. Rev.
Lett., 104:080402, 2010.
[5] A.J. Short and J. Barrett. Strong nonlocality: a trade-off between states and
measurements. New J. Phys., 12:033034, 2010.
[6] W.K.Wootters andW.H. Zurek. A single quantum cannot be cloned. Nature,
299:802, 1982.
[7] S. Kochen and E. P. Specker. The problem of hidden variables in quantum
mechanics. Indiana Univ. Math. J., 17:59, 1967.
[8] J. S. Bell. On the Einstein-Podolsky-Rosen paradox. Physics, 1:195, 1964.
239
[9] A. Einstein, B. Podolsky, and N. Rosen. Can quantum-mechanical descrip-
tion of physical reality be considered complete? Phys. Rev., 47:777, 1935.
[10] D. Bohm and Y. Aharonov. Discussion of experimental proof for the paradox
of Einstein, Rosen, and Podolsky. Phys. Rev., 108:1070, 1957.
[11] A. Aspect, P. Grangier, and G. Roger. Experimental realization of Einstein-
Podolsky-Rosen-Bohm gedankenexperiment: A new violation of Bell's in-
equalities. Phys. Rev. Lett., 49:91, 1982.
[12] G.Weihs, T. Jennewein, C. Simon, H.Weinfurter, and A. Zeilinger. Violation
of Bell's inequality under strict Einstein locality conditions. Phys. Rev. Lett.,
81:5039, 1998.
[13] M. A. Rowe, D. Kielpinski, V. Meyer, C. A. Sackett, W. M. Itano, C. Mon-
roe, and D. J. Wineland. Experimental violation of a Bell's inequality with
efficient detection. Nature, 409:791, 2001.
[14] D. N. Matsukevich, P. Maunz, D. L. Moehring, S. Olmschenk, and C. Mon-
roe. Bell inequality violation with two remote atomic qubits. Phys. Rev.
Lett., 100:150404, 2008.
[15] M. Giustina, A. Mech, S. Ramelow, B. Wittman, J. Kofler, J. Beyer, A. Lita,
B. Calkins, T. Gerrits, S. W. Nam, R. Ursin, and A. Zeilinger. Bell viola-
tion using entangled photons without the fair-sampling assumtption. Nature,
497:227, 2013.
[16] S. Groblacher, T. Paterek, R. Kaltenbaek, C. Brukner, M. Zukowski, M. As-
pelmeyer, and A. Zeilinger. An experimental test of non-local realism. Na-
ture, 446:871, 2007.
[17] N. Gisin and H. Zbinden. Bell inequality and the locality loophole: Active
versus passive switches. Phys. Lett. A, 264(2–3):103, 1999.
240
[18] D. Salart, A. Baas, J. A. W. van Houwelingen, N. Gisin, and H. Zbinden.
Spacelike separation in a Bell test assuming gravitationally induced col-
lapses. Phys. Rev. Lett., 100:220404, 2008.
[19] T. Scheidl, R. Ursin, J. Kofler, S. Ramelow, X.-S. Ma, T. Herbst,
L. Ratschbacher, A. Fedrizzi, N. K. Langford, T. Jennewein, and
A. Zeilinger. Violation of local realism with freedom of choice. Proc. Natl.
Acad. Sci. U.S.A., 107(46):19708, 2010.
[20] G. Blaylock. The E. P. R. paradox, Bell's inequality, and the question of
locality. Am. J. Phys., 78:111, 2009.
[21] T. Maudlin. What Bell proved: A reply to Blaylock. Am. J. Phys., 78:121,
2010.
[22] P. H. Eberhard. Bell's theorem without hidden variables. Nuovo Cimento B,
38:75, 1977.
[23] T. Norsen. Against ‘realism’. Found. Phys., 37(3):311--340, 2007.
[24] T. Maudlin. Space-time in the quantum world. In Bohmian mechanics and
quantum theory: an appraisal. Kluwer Academic Publishers, 1996.
[25] P. M. Pearle. Hidden-variable example based upon data rejection. Phys. Rev.
D, 2:1418, 1970.
[26] A. Garg and N. D. Mermin. Detector inefficiencies in the Einstein-Podolsky-
Rosen experiment. Phys. Rev. D, 35:3831, 1987.
[27] A. K. Ekert. Quantum cryptography based on Bell's theorem. Phys. Rev.
Lett., 67:661, 1991.
[28] L. Hardy. Quantum theory from five reasonable axioms, 2001. arXiv:quant-
ph/0101012.
241
[29] J. Barrett. Information processing in generalized probabilistic theories. Phys.
Rev. A, 75:032304, 2007.
[30] J. Barrett, N. Linden, S. Massar, S. Pironio, S. Popescu, and D. Roberts.
Nonlocal correlations as an information-theoretic resource. Phys. Rev. A,
71:022101, 2005.
[31] H. Barnum, J. Barrett, M. Leifer, and A. Wilce. Cloning and broadcasting in
generic probabilistic theories, 2006. arXiv:quant-ph/0611295.
[32] G. Chiribella, G. M. D'Ariano, and P. Perinotti. Probabilistic theories with
purification. Phys. Rev. A, 81:062348, 2010.
[33] G. Chiribella, G. M. D'Ariano, and P. Perinotti. Informational derivation of
quantum theory. Phys. Rev. A, 84:012311, 2011.
[34] L. Masanes and M. P. Müller. A derivation of quantum theory from physical
requirements. New J. Phys., 13:063001, 2011.
[35] H. Barnum, M. P. Mueller, and C. Ududec. Higher-order interfer-
ence and single-system postulates characterizing quantum theory, 2014.
arXiv:1403.4147 [quant-ph].
[36] P. Janotta, C. Gogolin, J. Barrett, and N. Brunner. Limits on nonlocal corre-
lations from the structure of the local state space. New J. Phys., 13:063024,
2011.
[37] C. Pfister. One simple postulate implies that every polytopic state space
is classical, 2012. (Master's Thesis, Institute for Theoretical Physics, ETH
Zurich) arXiv:1203.5622 [quant-ph].
[38] A. J Short and S.Wehner. Entropy in general physical theories. New J. Phys.,
12:033023, 2010.
[39] G. Kimura, K. Nuida, and H. Imai. Distinguishability measures and entropies
for general probabilistic theories. Rep. M. Phys., 66:175, 2010.
242
[40] H. Barnum, J. Barrett, L. O. Clark, M. Leifer, R. Spekkens, N. Stepanik,
A. Wilce, and R. Wilke. Entropy and information causality in general prob-
abilistic theories. New J. Phys., 12:033024, 2010.
[41] G. M. D'Ariano. No-signaling, dynamical independence, and the local ob-
servability principle. J. Phys. A, 40:8137, 2007.
[42] A. J. Short. No purification for two copies of a noisy entangled state, 2008.
arXiv:0809.2622 [quant-ph].
[43] S. W. Al-Safi and A. J. Short. Simulating all nonsignaling correlations via
classical or quantum theory with negative probabilities. Phys. Rev. Lett.,
111:170403, 2013.
[44] S. W. Al-Safi and A. J. Short. Information causality from an entropic and a
probabilistic perspective. Phys. Rev. A, 84:042323, 2011.
[45] S. W. Al-Safi and A. J. Short. Reversible dynamics in strongly non-local
boxworld systems. J. Phys. A: Math. Theor., 47:325303, 2014.
[46] J. Clauser, M. Horne, A. Shimony, and R. Holt. Proposed experiment to test
local hidden-variable theories. Phys. Rev. Lett., 23:880, 1969.
[47] B. Tsirelson. Quantum generalizations of Bell's inequality. Lett. Math. Phys.,
4:93, 1980.
[48] S. Popescu and D. Rohrlich. Quantum nonlocality as an axiom. Found.
Phys., 24:379, 1994.
[49] C. A. Fuchs. Quantum mechanics as quantum information (and only a little
more), 2002. arXiv:quant-ph/0205039.
[50] T. H. Yang, M. Navascués, L. Sheridan, and V. Scarani. Quantum bell in-
equalities from macroscopic locality. Phys. Rev. A, 83:022105, 2011.
243
[51] M. Pawlowski, T. Paterek, D. Kaszlikowski, V. Scarani, A. Winter, and
M. Zukowski. Information causality as a physical principle. Nature,
461:1101, 2009.
[52] N. Linden, S. Popescu, A. J. Short, and A. Winter. Quantum nonlocality and
beyond: Limits from nonlocal computation. Phys. Rev. Lett., 99:180502,
2007.
[53] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Infor-
mation. Cambridge University Press, Cambridge, 2000.
[54] H. Barnum, J. Barrett, M. Leifer, and A. Wilce. Teleportation in general
probabilistic theories. In Mathematical Foundations of Information Flow
(Proceedings of the Clifford Lectures 2008), page 25. American Mathemati-
cal Society, 2012.
[55] M. P. Müller and C. Ududec. The structure of reversible computation de-
termines the self-duality of quantum theory. Phys. Rev. Lett., 108:130401,
2012.
[56] H. Barnum, C. P. Gaebler, and A. Wilce. Ensemble steering, weak self-
duality, and the structure of probabilistic theories, 2009. arXiv:0912.5532
[quant-ph].
[57] A. Wilce. Symmetry, self-duality and the jordan structure of quantum me-
chanics, 2011. arXiv:1110.6607 [quant-ph].
[58] C. Piron. Axiomatique quantique. Helv. Phys. Acta, 37:439, 1964.
[59] D.J. Foulis and C.H. Randall. Empirical logic and quantum mechanics. Syn-
these, 29(1-4):81, 1974.
[60] B. Coecke S. Abramsky. A categorical semantics of quantum protocols. In
Proceedings of the 19th IEEE conference on Logic in Computer Science,
page 415. IEEE Computer Science Press, 2004.
244
[61] R. Webster. Convexity. Oxford University Press, Oxford, 1994.
[62] C. Carathéodory. Über den variabilitätsbereich der fourier’schen konstan-
ten von positiven harmonischen funktionen. Rend. Circ. Mat. Palermo,
32(1):193, 1911.
[63] B. Grünbaum. Convex Polytopes. Interscience Publishers, 1967.
[64] N. S. Jones and L. Masanes. Interconversion of nonlocal correlations. Phys.
Rev. A, 72:052312, 2005.
[65] S. Pironio, J.-D. Bancal, andV. Scarani. Extremal correlations of the tripartite
no-signaling polytope. J. Phys. A: Math Theor., 44:065303, 2011.
[66] P. A. M. Dirac. Bakerian lecture. the physical interpretation of quantum me-
chanics. Proc. R. Soc. A, 180(980):1, 1942.
[67] E.Wigner. On the quantum correction for thermodynamic equilibrium. Phys.
Rev., 40:749, 1932.
[68] C. Ferrie. Quasi-probability representations of quantum theory with appli-
cations to quantum information science. Rep. Prog. Phys., 74(11):116001,
2011.
[69] T. J. Barnea, J.-D. Bancal, Y.-C. Liang, and N. Gisin. Tripartite quantum state
violating the hidden-influence constraints. Phys. Rev. A, 88:022123, 2013.
[70] H. Buhrman,M. Christandl, F. Unger, S.Wehner, andA.Winter. Implications
of superstrong nonlocality for cryptography. Proc. Roy. Soc. A, 462:1919,
2006.
[71] A. Ambainis, D. Leung, L. Mancinska, and M. Ozols. Quantum random
access codes with shared randomness, 2008. (Master's thesis, University of
Waterloo) arXiv:0810.2937 [quant-ph].
245
[72] G. Brassard, H. Buhrman, N. Linden, A. A. Méthot, A. Tapp, and F. Unger
̇Limit on nonlocality in any world in which communication complexity is not
trivial. Phys. Rev. Lett., 96:250401, 2006.
[73] J. Allcock, N. Brunner, M. Pawlowski, and V. Scarani. Recovering part of the
boundary between quantum and nonquantum correlations from information
causality. Phys. Rev. A, 80:040103, 2009.
[74] M. Navascués, S. Pironio, and A. Acín. Bounding the set of quantum corre-
lations. Phys. Rev. Lett., 98:010401, 2007.
[75] M. Navascues, S. Pironio, and A. Acín. A convergent hierarchy of semidef-
inite programs characterizing the set of quantum correlations. New J. Phys.,
10:073013, 2008.
[76] R. Franco and V. Penna. Discrete wigner distribution for two qubits: a char-
acterization of entanglement properties. J. Phys. A, 39(20):5907, 2006.
[77] J. Degorre, M. Kaplan, S. Laplante, and J. Roland. The communication com-
plexity of non-signaling distributions. Quantum Info. Comput., 11:649, 2011.
[78] E. R. Loubenets. Local quasi hidden variable modelling and violations
of bell-type inequalities by a multipartite quantum state. J. Math. Phys.,
53:022201, 2012.
[79] E. R. Loubenets. Nonsignaling as the consistency condition for local quasi
classical probability modelling of a general multipartite correlation scenario.
J. Phys. A: Math. Theor., 45:185306, 2012.
[80] R. Cleve, P. Høyer, B. Toner, and J. Watrous. Consequences and limits
of nonlocal strategies. In 19th Annual IEEE Conference on Computational
Complexity, page 236. IEEE, 2004.
246
[81] P. K. Aravind. A simple demonstration of bell's theorem involving
two observers and no probabilities or inequalities, 2002. arXiv:quant-
ph/0206070v2.
[82] J. Briet, H. Buhrman, T. Lee, and T. Vidick. Multiplayer xor games and
quantum communication complexity with clique-wise entanglement, 2009.
arXiv:0911.4007 [quant-ph].
[83] M. L. Almeida, J.-D. Bancal, N. Brunner, A. Acín, N. Gisin, and S. Pironio.
Guess your neighbor's input: A multipartite nonlocal game with no quantum
advantage. Phys. Rev. Lett., 104:230404, 2010.
[84] W. van Dam. Implausible consequences of superstrong nonlocality. Natural
Computing, 12:9, 2013.
[85] L. Masanes, A. Acin, and N. Gisin. General properties of nonsignaling the-
ories. Phys. Rev. A, 73:012112, 2006.
[86] B. Tsirelson. Quantum analogues of Bell inequalities: The case of two spa-
tially separated domains. J. Sov. Math., 36:557, 1987.
[87] G. Chiribella, G. M. D'Ariano, and P. Perinotti. Quantum theory, namely the
pure and reversible theory of information. Entropy, 14:1877, 2012.
[88] P. Skrzypczyk, A. J. Short, and S. Popescu. Work extraction and thermody-
namics for individual quantum systems. Nat. Commun., 5:4185, 2014.
[89] N. Brunner, M. Kaplan, A. Leverrier, and P. Skrzypczyk. Dimension
of physical systems, information processing, and thermodynamics, 2014.
arXiv:1401.4488 [quant-ph].
[90] M. Huber, M. Perarnau-Llobet, K. V. Hovhannisyan, P. Skrzypczyk,
C. Klöckl, N. Brunner, and A. Acín. Thermodynamic cost of creating corre-
lations, 2014. arXiv:1404.2169 [quant-ph].
247
[91] R. P. Feynman. Negative probability. In F. David Peat and Basil Healy,
editors, Essays in Honour of David Bohm. Routledge & Kegan Paul Ltd,
London, 1987.
248