Problems of optimal choice on
posets and generalizations of
acyclic colourings
Bryn Garrod
Department of Pure Mathematics and Mathematical Statistics
Trinity College
University of Cambridge
This dissertation is submitted for the degree of
Doctor of Philosophy
3rd May 2011

For Raphae¨le

Acknowledgements
I am very grateful to my supervisor, Be´la Bolloba´s, for everything that he has done for
me over the last three and a half years, from offering me professional assistance, support
and advice to organizing research visits to the Re´nyi Institute in Budapest and to the
University of Memphis. He has kept me going in the right direction when I have lost
my way or my motivation, asked crucial questions when I have been stuck, and of course
taught me and introduced me to a wealth of mathematical knowledge and understanding.
At the same time, he has become an integral part of my social life, whether sharing coffee
and biscuits in the Pavilion E common room, or inviting me to parties or a Super Bowl
in his home. I have enjoyed the many non-mathematical conversations that we have had,
even if my lack of cultural knowledge is a constant disappointment!
I should like to thank my collaborators Grzegorz Kubicki, Micha l Morayne and Rob
Morris. I am particularly grateful to Micha l and to Jan Tkocz for their warm hospitality
when I visited them in Wroc law and to Rob for the help and advice that he has given me
with my work far beyond his role as my departmental mentor. Thank you also to Graham
Brightwell, whose discussions with Be´la and Rob led to an earlier proof of Theorem 3.2,
for his helpful comments on the paper that I wrote with Rob, which became Chapter 3.
Thank you to Victor Falgas-Ravry for a useful conversation about Lemma 2.19. I hugely
appreciate the constructive comments and corrections provided by my examiners, Andrew
Thomason and Mark Walters.
I am grateful to the Engineering and Physical Sciences Research Council for funding
me for the duration of my Ph.D. and to the Department of Pure Mathematics and Mathe-
matical Statistics at the University of Cambridge for providing me with a very pleasant
and productive working environment. Thank you to the organizers of the following confe-
rences, all of whom supported my attendance financially, and also to the department,
Trinity College and the London Mathematical Society for contributing to these costs:
v
vi ACKNOWLEDGEMENTS
the Bristol Summer School on Probabilistic Techniques in Computer Science, Building
Bridges, the Fete of Combinatorics and Computer Science, the LMS-EPSRC Short Course
on Probabilistic Combinatorics, the 22nd British Combinatorial Conference, the 14th In-
ternational Conference on Random Structures and Algorithms, a conference in honour of
the 70th birthday of Endre Szemere´di, and the International Congress of Mathematicians
2010.
I have been very lucky with the people whom I have had to guide me through my
mathematical studies. I am extremely grateful to my secondary school maths teacher,
Peter Jack, for the encouragement and enthusiasm that he provided for more than seven
years, and to Tim Gowers and Imre Leader for inspiring me in supervisions and lectures
for the next seven.
Thank you to the rest of Team Bolloba´s for the coffee times, mini-seminars and confe-
rence social life, particularly my good friends Tom Coker, Micha l Przykucki and Misha
Tyomkyn. Thank you to all my other friends for their emotional support, particularly
Jen Gold, Alex Holyoake, Phil Horler, Ce´line Miani, Paul Smith, David Taylor, Charles
Vial and Ioanna Vlahou, with whom I have spent most of my spare time over the last
three and a half years.
I am hugely indebted to my parents, Martin and Dilys, and my sister and brother,
Tania and Ross, for the happy, supportive family background in which I have grown up,
and the encouragement that they have always given me.
My greatest thanks go to my wife, Raphae¨le, whom I met at the start of my Ph.D.
and who is as much a part of it as the combinatorics. Thank you to her for marrying me
and for looking after me so well. I am thankful for everything that she has done for me,
which is more than I can write here.
Declaration of joint work
This dissertation is the result of my own work and includes nothing that is the outcome
of work done in collaboration except where specifically indicated in the text. The first five
sections of Chapter 2 are based on joint work with Grzegorz Kubicki1 of the University
of Louisville, USA, and Micha l Morayne2 of Politechnika Wroc lawska, Poland, and about
50% of this is attributable to me. The work that forms the basis of Chapter 3 was joint
with Robert Morris,3 then of the University of Cambridge, UK, but now of IMPA, Brazil,
and my contribution was about 75%.
1gkubicki@louisville.edu
2michal.morayne@pwr.wroc.pl
3rob@impa.br
vii

Abstract
This dissertation is in two parts, each of three chapters. In Part 1, I shall prove some
results concerning variants of the ‘secretary problem’. In Part 2, I shall bound several
generalizations of the acyclic chromatic number of a graph as functions of its maximum
degree.
I shall begin Chapter 1 by describing the classical secretary problem, in which the aim
is to select the best candidate for the post of a secretary, and its solution. I shall then
summarize some of its many generalizations that have been studied up to now, provide
some basic theory, and briefly outline the results that I shall prove.
In Chapter 2, I shall suppose that the candidates come as m pairs of equally qualified
identical twins. I shall describe an optimal strategy, a formula for its probability of success
and the asymptotic behaviour of this strategy and its probability of success as m→∞. I
shall also find an optimal strategy and its probability of success for the analagous version
with c-tuplets.
I shall move away from known posets in Chapter 3, assuming instead that the candi-
dates come from a poset about which the only information known is its size and number
of maximal elements. I shall show that, given this information, there is an algorithm that
is successful with probability at least 1
e
. For posets with k ≥ 2 maximal elements, I shall
prove that if their width is also k then this can be improved to k−1
√
1
k
, and show that no
better bound of this type is possible.
In Chapter 4, I shall describe the history of acyclic colourings, in which a graph must
be properly coloured with no two-coloured cycle, and state some results known about
them and their variants. In particular, I shall highlight a result of Alon, McDiarmid
and Reed, which bounds the acyclic chromatic number of a graph by a function of its
maximum degree. My results in the next two chapters are of this form.
ix
x ABSTRACT
I shall consider two natural generalizations in Chapter 5. In the first, only cycles of
length at least l must receive at least three colours. In the second, every cycle must receive
at least c colours, except those of length less than c, which must be multicoloured.
My results in Chapter 6 generalize the concept of a cycle; it is now subgraphs with
minimum degree r that must receive at least three colours, rather than subgraphs with
minimum degree two (which contain cycles). I shall also consider a natural version of this
problem for hypergraphs.
Contents
Acknowledgements v
Declaration of joint work vii
Abstract ix
Introduction 1
Part 1: Problems of optimal choice on posets 1
Part 2: Generalizations of acyclic colourings 2
Part 1. Problems of optimal choice on posets 5
Chapter 1. The classical secretary problem 7
1.1. The problem 7
1.2. Outline solution 7
1.3. Variants 9
1.4. Formal model and notation 18
1.5. Useful theorems 21
Chapter 2. How to choose the best twins 27
2.1. Introduction 27
2.2. Notation and basic definitions 29
2.3. The optimal stopping time 30
2.4. The probability of success 36
2.5. Asymptotics 40
2.6. How to choose the best c-tuplets 44
2.7. Open problems 52
Chapter 3. The secretary problem on an unknown poset 55
3.1. Introduction 55
xi
xii CONTENTS
3.2. Lower bounds 56
3.3. Upper bound 66
3.4. Open problems 80
Part 2. Generalizations of acyclic colourings 83
Chapter 4. Acyclic colourings of graphs 85
4.1. Colouring problems 85
4.2. From Nash-Williams to acyclic colourings 87
4.3. Variants 88
4.4. Generalized acyclic colourings 92
4.5. Definitions and a useful tool 93
Chapter 5. Long acyclic colourings and acyclic colourings with many colours 97
5.1. Introduction 97
5.2. The length-l-acyclic chromatic number 97
5.3. The c-colour acyclic chromatic number 103
5.4. Open problems 113
Chapter 6. Small minimum degree colourings of graphs and hypergraphs 115
6.1. Introduction 115
6.2. The degree-r chromatic number of a graph 115
6.3. The degree-r chromatic number of a hypergraph 121
6.4. Open problems 126
Conclusion 129
Part 1: Problems of optimal choice on posets 129
Part 2: Generalizations of acyclic colourings 130
Bibliography 133
Introduction
This dissertation is split into two unrelated parts. In Part 1, I shall consider several
problems of optimal choice on posets, which are generalizations of a problem popularly
known as the ‘secretary problem’. In Part 2, I shall consider generalizations of acyclic
colourings of graphs, a concept whose origins can be traced back to Nash-Williams’s
theorem concerning the decomposition of the edge set of a graph into forests.
Part 1: Problems of optimal choice on posets
The classical secretary problem is as follows. There are n candidates to be interviewed
for a position as a secretary. They are interviewed one by one and, after each interview,
the interviewer must decide whether or not to accept that candidate. If the candidate is
accepted then the process stops, and if the candidate is rejected then the interviewer moves
on to the next candidate. The interviewer may only accept the most recently interviewed
candidate. At each stage, the interviewer knows the complete ranking of the candidates
interviewed so far, all of whom are comparable, but has no other measure of their ability.
The interviewer is only interested in finding the very best candidate; selecting any other
for the job is considered a failure. The aim is to find a strategy that maximizes the
probability that the interviewer chooses the best candidate, under the assumption that
the candidates are seen in a uniformly random ordering. In Chapter 1, I shall describe
the solution to this problem, that there is a strategy that is successful with probability
at least 1
e
and that this is asymptotically best possible. I shall also provide more of the
historical background of this problem and some of its generalizations up to now.
In the rest of Part 1, I shall consider two generalizations of this problem. In Chapter 2,
I shall first assume that there are 2m candidates who are in fact m pairs of identical twins,
each pair of twins being equally well-qualified for the job. As in the classical problem, the
interviewer knows at each stage how the candidates interviewed so far compare with each
other, but has no other measure of their ability. I shall describe an optimal strategy, a
1
2 INTRODUCTION
formula for its probability of success and the asymptotic behaviour of this strategy and its
probability of success as m→∞. I shall also find an optimal strategy and its probability
of success for the analagous version on m sets of c-tuplets, and provide bounds that give
some indication of their asymptotic behaviour.
I shall move from these known posets to unknown posets in Chapter 3. Specifically, I
shall assume that the candidates come from a poset whose size n and number of maximal
elements k is known, but whose structure is unknown. Here, the interviewer knows the
poset induced by the candidates interviewed so far. I shall describe a strategy that is
successful on all posets with given n and k with probability at least 1
e
, which is the best
possible bound when k = 1, but probably not for k > 1. I shall also find a strategy that
is successful with probability at least k−1
√
1
k
when the width of the poset is known to be
the same as the number of maximal elements k and k ≥ 2. By considering the poset
consisting of k disjoint chains, I shall show that no greater probability of success can be
guaranteed.
Part 2: Generalizations of acyclic colourings
An acyclic colouring of a graph is a proper vertex-colouring such that every cycle
contains vertices of at least three colours. To put it another way, it is an assignment of
colours to the vertices such that the graph induced by the vertices in any colour class must
be an independent set, and the graph induced by the vertices in any two colour classes
must be a forest. The acyclic chromatic number of a graph is the minimum number of
colours needed to colour it acyclically. In Chapter 4, I shall describe how this concept
came into being as a generalization of the arboricity of a graph, which is the minimum
number of forests into which its edge set can be decomposed, and I shall state some of
the many results proved about acyclic colourings up to this point. In particular, I shall
highlight a result of Alon, McDiarmid and Reed, which bounds the acyclic chromatic
number of a graph by a function of its maximum degree. My results in Part 2 will be of
this form.
In Chapter 5, I shall consider two generalizations in which the subgraphs under scru-
tiny are still cycles. In the first, the extra condition that a proper colouring must satisfy
is relaxed so that only cycles of length at least l must receive three colours, that is, the
PART 2: GENERALIZATIONS OF ACYCLIC COLOURINGS 3
graph induced by the vertices in any two colour classes does not contain a cycle of length
at least l. In the second, I shall strengthen the condition so that every cycle must receive
at least c colours, with the obvious exception of cycles of length less than c, which must
be multicoloured. In this case, the graph induced by the vertices in any x colour classes
with x < c does not contain any cycles of length greater than x.
The definitions given so far would work just as well if ‘cycle’ were replaced by ‘2-regular
subgraph’ or even ‘subgraph with minimum degree at least 2,’ since cycles fall into both
of these categories and any subgraph of either type must contain a cycle. I shall focus my
attention in Chapter 6 on subgraphs with minimum degree r; a graph contains at least
as many of these as r-regular subgraphs, and indeed is unlikely to have any r-regular
subgraphs for large r, so this is more restrictive. In a proper colouring, it is possible for
any bipartite subgraph with minimum degree r to receive only two colours; I shall insist
that it receive three. I shall also consider the same problem for u-uniform hypergraphs,
under the assumption that a proper colouring is one in which every edge is multicoloured.

Part 1
Problems of optimal choice on posets

CHAPTER 1
The classical secretary problem
1.1. The problem
The exact origins of the classical secretary problem are complicated (and the subject
of Ferguson’s history of the problem [26]), but the problem was popularized by Martin
Gardner [32, 33] in his Scientific American column in February 1960, as the game goo-
gol. The problem itself is simple to state, and its ‘secretary problem’ formulation is as
follows. There are n candidates to be interviewed for a position as a secretary. They are
interviewed one by one and, after each interview, the interviewer must decide whether or
not to accept that candidate. If the candidate is accepted then the process stops, and
if the candidate is rejected then the interviewer moves on to the next candidate. The
interviewer may only accept the most recently interviewed candidate. At each stage, the
interviewer knows the complete ranking of the candidates interviewed so far, all of whom
are comparable, but has no other measure of their ability. The interviewer is only inter-
ested in finding the very best candidate; selecting any other for the job is considered a
failure. The aim is to find a strategy that maximizes the probability that the intervie-
wer chooses the best candidate, under the assumption that the candidates are seen in a
uniformly random ordering.
1.2. Outline solution
The solution to the classical secretary problem is now folklore but was first published
by Lindley [57], and I shall give an outline of it here.
It is obvious that the interviewer should only consider accepting a candidate who is the
best seen so far. It is intuitively clear that a candidate should not be accepted very early
on even if he or she is the best seen so far, since there is a reasonable probability that a
small number of candidates all come from near the bottom of the ranking. Conversely, the
interviewer should not wait too long, or the best candidate will probably be missed and
7
8 1. THE CLASSICAL SECRETARY PROBLEM
the interviewer will not have the opportunity to select anyone. Furthermore, if a strategy
dictates that the ith candidate should be accepted if he or she is the best candidate seen
so far, it seems reasonable that the (i+1)th should be accepted in the same circumstances,
since more candidates have been seen and that candidate’s credentials are stronger. From
these observations, it might be expected that some sort of threshold should be passed
before the interviewer considers choosing a candidate.
Using backward induction, one can prove exactly that. This will be described in more
detail in Section 1.5; for now, I shall assume the following consequence of it without proof.
For some k, the strategy “reject the first k candidates, and accept the next who is the
best seen so far” is optimal. As an aside, it is worth noting that there might be more
than one optimal strategy: for example, when there are exactly two candidates, the two
possible deterministic strategies are equivalent, and both are equivalent to tossing a coin
to choose between the two candidates.
Let W be the event that, using this strategy, the interviewer chooses the best candidate
and let Bi be the event that the i
th candidate interviewed is the best candidate. Let Ai be
the event that the interviewer is still interviewing by the time the ith candidate arrives,
that is, that the best of the first i− 1 candidates interviewed is in the first k interviewed.
Then Bi and Ai are independent, and the probability of winning is given by
P(W ) =
n∑
i=k+1
P(Bi)P(W |Bi)
=
n∑
i=k+1
P(Bi)P(Ai)
=
n∑
i=k+1
1
n
· k
i− 1
=
k
n
n−1∑
j=k
1
j
.
A value of k that maximizes this satisfies
k
n
n−1∑
j=k
1
j
≥ k − 1
n
n−1∑
j=k−1
1
j
and
k
n
n−1∑
j=k
1
j
≥ k + 1
n
n−1∑
j=k+1
1
j
,
1.3. VARIANTS 9
that is,
n−1∑
j=k
1
j
≥ k − 1
k − 1 = 1 and
n−1∑
j=k+1
1
j
≤ k
k
= 1,
from which simple integration arguments give
n
e
− 1 ≤ k ≤ n
e
+
e− 1
e
.
From this, it is clear that for such k
lim
n→∞
k
n
=
1
e
and that the probability of winning tends to the same limit. In fact, in Chapter 3, it will
become evident that this is a lower bound.
1.3. Variants
A problem posed by Cayley [18] may have inspired the classical secretary problem.
This is what is now known as the ‘full information’ case, where the candidates’ abilities
are represented by real random variables from a known distribution, and the aim is to
maximize the expected ability of the chosen candidate. The uniform distribution U [0, 1]
was considered by Moser [62]; Guttman [46] found an optimal strategy for a general
distribution and also gave an explicit optimal strategy for the normal distribution N(0, 1).
His general optimal strategy is to accept a candidate if there are at least m candidates
remaining after that one and his or her ability is at least Em, for some (Em)m∈N. For
U(0, 1), the first few values of Em are 0.5, 0.625, 0.6953, 0.7417 and 0.775, and for N(0, 1)
they are 0, 0.3992, 0.6298, 0.7904 and 0.9127. The expected ability is the first threshold
for acceptance, E1.
Since 1960, many generalizations of the classical secretary problem have been conside-
red. Freeman [28] wrote an extensive review of the area in 1983, which shows how many
versions had already been considered by then. I shall describe only some of them, and
some more recent results.
Besides the full information case, the most obvious generalization might be to be more
flexible over what constitutes success, and to try to minimize the expected rank (viewing
the best candidate as being from rank 1 and the worst from rank n) rather than insisting
10 1. THE CLASSICAL SECRETARY PROBLEM
on choosing the best candidate. This version was solved by Chow, Moriguti, Robbins and
Samuels [21]. Perhaps surprisingly, in the limit as n → ∞ the optimal expected rank
tends to a constant rather than a multiple of n, namely,
∞∏
j=1
(
j + 2
j
) 1
j+1
≈ 3.8695.
They showed that an optimal strategy is to accept a candidate who is in the top k seen so
far as long as at least ik candidates have been seen in total, for some thresholds ik. They
also showed that the ik satisfy
lim
n→∞
ik
n
=
∞∏
j=k
(
j
j + 2
) 1
j+1
.
This means that, for large n, once we have seen approximately 26% of candidates we
should be prepared to accept the next one who is the best so far, after 45% we should
accept anyone who is one of the top two seen so far, after 56% one of the top three, after
64% one of the top four, after 69% one of the top five and so on. At the other end of the
scale, once we have seen 99% of candidates we should accept anyone who is in the top
200 and after 99.9% anyone in the top 2000.
Yang [80] considered the situation where, as well as being allowed to offer the job to
the most recently interviewed candidate, who would accept it, the interviewer can offer the
job to any of the other candidates seen so far, who is still available with probability q(r),
where q is a known non-increasing function of the number r of candidates interviewed since
that one. If a candidate is unavailable, he or she never becomes available again. The aim
is to choose the best possible secretary, as in the classical secretary problem. Smith [74]
studied a version where the job can only be offered to the currently interviewed candidate,
but there is some fixed probability that the candidate will refuse the offer. Petruccelli [67]
worked on these two problems simultaneously, that is, Yang’s problem with q(r) still non-
increasing but with q(0) no longer forced to be 1.
In particular, Petruccelli considered the case where the probabilities form a geometric
progression, that is, where q(r) = qpr for some p and q. He proved that there are two
cases, depending on p, q and the number n of candidates. If
∑∞
r=0 q(r) =
q
1−p > n − 1
then an optimal strategy is to observe all n candidates and then to offer the job to the
1.3. VARIANTS 11
best one. If q
1−p ≤ n − 1, then an optimal strategy is to wait until sn candidates have
been seen, for some sn, and then to offer the job to the best one seen so far. If this one
is unavailable, then the interviewer should continue interviewing and offer the job to the
next candidate who is the best seen so far. If this one is unavailable, then the interviewer
should continue as before, and so on. He gave an explicit formula for sn, namely, the
smallest value of s for which
n−1∏
k=s+1
(
1 +
1− q
k
)
≤
[
q
(
1 +
1− q
s(1− p)
)]−1
.
He also showed that both sn
n
and the probability of success tend to q
1
1−q as n→∞. This
is independent of p, which means that for large n there is effectively no benefit to being
allowed to recall a previous candidate, since p can be arbitrarily small. However, this is
not surprising, since if p is fixed then only a constant number of candidates are likely to
be available at any one point, even as n→∞.
Gusein-Zade [45] allowed selection of any of the top r candidates to count as success,
and showed that an optimal strategy is of the same form as in the expected rank case,
that is, an optimal strategy is to accept a candidate who is in the top k seen so far as
long as at least ik candidates have been seen in total, for some thresholds ik. He showed
that for r = 2 the limiting probability of success as n → ∞ is about 0.5736. Frank and
Samuels [27] proved that the optimal probability of success p(n, r) satisfies
lim
r→∞
lim
n→∞
(
1− p(n, k)) 1r = 1− t∗, where t∗ = lim
n→∞
i1
n
≈ 0.2834.
Gilbert and Mosteller [40] considered what could be called the inverse of this problem:
the interviewer is allowed to pick up to r candidates and wins if any of them is the best.
(Many other variations of the secretary problem are included in the same paper.) They
showed that an optimal strategy is to wait until t(n, r) candidates have been seen, for
some function t(n, r), then to pick the next who is the best seen so far, and then to play an
optimal strategy for the remaining candidates and r− 1 choices. They found an iterative
method to calculate the limits
ur = lim
n→∞
t(n, r)
n
,
12 1. THE CLASSICAL SECRETARY PROBLEM
showed that the first few values of ur are e
−1, e−
3
2 , e−
47
24 and e−
2761
1152 , and showed that as
n→∞ the optimal probability of success tends to
r∑
i=1
ui.
Some authors wondered what would happen if the number of candidates were unk-
nown, but the distribution of that number, the random variable N , were known. Presman
and Sonin [69] found an explicit optimal strategy for a general distribution, where the
aim is to choose the best candidate. They also showed that if the number of candidates is
uniform in [n], then an optimal strategy is of the same form as in the classical secretary
problem, but where the threshold is asymptotically equivalent to n
e2
and the probability
of success tends to 2
e2
≈ 0.2707 as n→∞.
Gianini-Pettitt [39] considered the minimal expected rank version of this problem,
and restricted her attention to distributions of the form
P
(
N = x
∣∣N ≥ x) = (n− x+ 1)−α,
for some α. She proved that, as when the number of candidates is known, an optimal
strategy is of the form ‘accept the ith candidate if it is one of the best k(i) seen so far,’
but that k(i) is not necessarily an increasing function. She also proved the surprising
fact that if N1 and N2 are possible distributions of the number of candidates and N1
is stochastically smaller than N2, that is, P(N1 ≤ x) ≥ P(N2 ≤ x) for all x, this does
not imply that the minimum expected rank decreases. One example of this is that if N1
is uniformly distributed on [n], that is, if α = 1, then the optimal expected rank tends
to infinity as n → ∞, whereas if N2 = n with probability 1 then, as shown by Chow,
Moriguti, Robbins and Samuels [21], the optimal expected rank tends to about 3.8695.
In fact, she showed that the optimal expected rank tends to infinity if α < 2 and to the
Chow, Moriguti, Robbins and Samuels limit of approximately 3.8695 if α > 2, and if
α = 2 then the lim inf of the optimal expected rank is greater than 3.8695 and the lim
sup is finite.
More generally, one could consider problems of optimal choice on more complicated
systems; up to this point the assumption has always been that there are n rankable
1.3. VARIANTS 13
candidates. Kuchta and Morayne [56] considered a version of the classical secretary
problem where the interviewer has k ‘lives’: if all n candidates are interviewed without
any of them being chosen, then a new set of n candidates is interviewed, and the aim
is to chose the best of these, and so on. At most k sets of candidates are allowed to be
interviewed in total. They showed that, for some function t(n, k), an optimal strategy
is to ignore the first t(n, k) candidates from the first set, and accept the next candidate
who is the best seen so far; if no such candidate appears, then ignore the first t(n, k − 1)
candidates from the next set and accept the next candidate who is the best seen so far,
and so on. They showed that
lim
n→∞
t(n, k)
n
exists, and denoting it by ak, that ak+1 = e
ak−1, where of course a1 = 1e .
Stadje [75] introduced the idea that the candidates could be ranked separately in
each of k > 1 different criteria, with the interviewer wishing to select a candidate who is
maximal in at least one of them, and Gnedin [41] solved the version where these rankings
are random and independent of each other. He proved that an optimal stratgy is to wait
until a certain number of candidates have been seen and then to select the next who is best
according to at least one criterion, and that the limiting values of this threshold and the
probability of success are both k−1
√
1
k
. Gnedin has also produced a more general survey
of multicriteria problems [42]. In fact, these last two problems could both be viewed as
versions of the secretary problem on k disjoint chains, which I shall solve in Chapter 3,
with an extra restriction, about which I shall say more at the time.
Moving slightly further away from the classical secretary problem, Kubicki and Mo-
rayne [55] considered the problem on a directed path, where at each stage the selector
knows the directed graph induced by the vertices seen so far and wishes to choose the
end-vertex with no edge going out of it. This is similar to the classical secretary problem,
but each candidate can only be compared with the one immediately above it or below it
in the ranking. They showed that an optimal strategy is to wait until the first time t when
the induced graph has n − t + 1 connected components and to pick the tth vertex. Note
that this is the first time when the selector can be sure that the sought after end-vertex
has been seen: if at time t the induced graph has n− t + 1 connected components, then
14 1. THE CLASSICAL SECRETARY PROBLEM
the remaining n − t vertices must be used to join components together, and so none of
them is the end-vertex. Note also that this strategy is independent of whether or not
the tth vertex is an end-vertex of its component; if it is not, then it does not make any
difference which vertex is chosen. They showed that the probability of success pn satisfies
lim
n→∞
pn
√
n =
√
pi
2
.
Przykucki and Sulkowska [71] adapted this problem in a similar way to Gusein-Zade’s
version of the classical secretary problem, so that choosing the end-vertex or its neighbour
counts as success. In this case, the optimal stopping time and its analysis are more
complicated, but their numerical analysis shows that the probability of success behaves
approximately like 1.26√
n
. For comparison with the previous result,
√
pi
2
≈ 0.8862.
s
s s s
s s s s s s s s s
s s s s s s s s s s s s s s s s s s s s s s s s s s s

1 6 PP
PPPi


PP
PPP
        6 6 6@@I @@I @@I
        @@ @@ @@
        6 6 6 6 6 6 6 6 6BBM BBM BBM BBM BBM BBM BBM BBM BBM
        BB BB BB BB BB BB BB BB BB
Figure 1.1. Directed ternary tree of depth 3.
Morayne and Sulkowska [61] studied a variation of the directed path version, in this
case working with the complete k-ary directed rooted tree of depth n (see Figure 1.1) and
again assuming that the selector knows the induced directed graph at each stage of the
process. They found a lower bound for the probability of success of an optimal strategy by
considering a natural (but not necessarily optimal) strategy: select the currently examined
element if there is a directed path of length n terminating at it. If this ever happens, then
the strategy must be successful. In this way, they showed that on a binary tree the limit
of the optimal probability of success is at least 2 log 2− 1 ≈ 0.3863 and that on a ternary
tree it is at least 3
2
log 3− 2 + pi
2
√
3
≈ 0.5548, and that it tends to 1 as k →∞.
Przykucki [70] posed a problem concerning the random graph Gn,p with n vertices
and any two connected by an edge with probability p independently of the others. Again,
the selector knows the graphs induced by the vertices seen so far, and wishes to find a
vertex of full degree, that is, degree n− 1. He showed that an optimal strategy is to wait
1.3. VARIANTS 15
until k(n, p) vertices have been seen and then to pick the next vertex connected to every
vertex seen so far, for some k(n, p). He showed that, for fixed p ∈ (0, 1),
k(n, p) = log 1
p
n+O(1)
as n→∞, and found a formula for the probability of its success, which of course tends to
zero more quickly than npn−1, which is an upper bound for the probability that a vertex
of full degree exists.
In this dissertation, the type of generalization that I shall consider is to put partial
orders other than a total order on the candidates. Here, the selector knows the poset from
which the elements are taken and the poset induced by the elements observed so far, and
wishes to choose an element that is maximal in the ground poset.
s
s s
s s s s
s s s s s s s s
s s s s s s s s s s s s s s s s

HHHHHHHH
 
 
 
 
 
 
 
 
@
@
@
@
@
@
@
@


A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A


C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
Figure 1.2. Binary tree of depth 4.
A generalization due to Morayne [60] is to consider the case of a binary tree of depth
n (see Figure 1.2). Intuitively, it seems unlikely that a random selection of nodes would
come from a subtree with a maximum other than the global maximum unless they are
linearly ordered, and he showed that this is indeed the case. An optimal strategy here is
to select the maximum out of the elements seen so far when the poset induced by these
elements is either linear of length greater than n
2
or non-linear with a unique maximum.
He showed further that as n→∞, the probability of success tends to 1.
Kaz´mierczak [49] added a ‘witness’ to the classical secretary problem, an extra element
w in the poset that lies immediately below the maximal element but cannot be compared
with any other element. Tkocz [77] extended this concept to put the witness below the
16 1. THE CLASSICAL SECRETARY PROBLEM
s
s
s
s
s
s
b
b
b
b
b
x1
xk
xn
w
Figure 1.3. Tkocz’s poset.
kth highest element (see Figure 1.3). He gave an explicit optimal strategy, which uses
three different thresholds, essentially depending on the size of k relative to n. If the poset
induced by the elements seen so far is linear and of length greater than the threshold
then a maximal element should be accepted; if it is not linear then an optimal strategy
for the classical secretary problem on k elements should be followed. He also calculated
the asymptotic probabilities of success for k = 2 and 3, approximately 0.415 and 0.384
respectively, compared with the figure of approximately 0.573 for k = 1 obtained by
Kaz´mierczak.
s
s
s
s
s
s
s
s
s
s
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
Figure 1.4. Five pairs of identical twins.
Micha l Morayne, Grzegorz Kubicki and I [34] considered the case of m pairs of ‘twins’,
where there are m levels with two incomparable elements on each level (see Figure 1.4); I
shall present these results in Chapter 2. I shall show that an optimal strategy is to wait
until elements from a certain threshold number of levels have been seen and then to select
the next element that is maximal and whose twin has already been seen. I shall further
show that as m → ∞, this threshold behaves roughly like 0.4709m and the probability
of success tends to approximately 0.7680. Calculating these asymptotic values for the
1.3. VARIANTS 17
natural extension to ‘c-tuplets’ for c > 2, is a harder problem, and I shall provide some
bounds.
A further interesting generalization, due to Preater [68], was an attempt to find an
algorithm that was successful on all posets of a given size with positive probability. Sur-
prisingly, he proved that there is such a ‘universal’ algorithm (depending only on the
size of the poset), which is successful on every poset with probability at least 1
8
. In this
algorithm, an initial random number of elements are rejected and a subsequent element is
accepted according to randomized criteria. A slightly modified version of the algorithm,
also suggested by Preater, was analysed by Georgiou, Kuchta, Morayne and Niemiec [36],
and gave an improved lower bound of 1
4
for the probability of success. More recently,
Kozik [54] introduced a ‘dynamic threshold strategy’ and showed that it was successful
with probability at least 1
4
+ ε, for some ε > 0 and for all sufficiently large posets. When
I was about to submit this dissertation, Micha l Morayne drew my attention to a very
recent paper of Freij and Wa¨stlund [29]. In it, they describe a strategy that is successful
with probability at least 1
e
. This cannot be improved, since the best possible probability
of success in the classical secretary problem, on a totally ordered set, is 1
e
. I shall say
more about this in Section 3.4.
Before Kozik, Freij and Wa¨stlund had published their results, Robert Morris and I [35]
showed that, given any poset, there is an algorithm that is successful with probability at
least 1
e
, so, in this sense, the total order is the hardest possible partial order. I shall
present these results in Chapter 3. In fact, this algorithm depends only on the size of
the poset and its number of maximal elements, so it is universal for any family where
these are given. It is therefore natural to ask which is the hardest partial order with a
given number of maximal elements. The most obvious choice is the poset consisting of
k disjoint chains. I shall give an asymptotically sharp lower bound on the probability of
success in the problem of optimal choice on k disjoint chains, and show that it is at least
as hard as on any poset with k maximal elements and of width k, that is, whose largest
antichain has size k.
18 1. THE CLASSICAL SECRETARY PROBLEM
1.4. Formal model and notation
In this section, I shall define formally the probability space in which I shall work in
Chapters 2 and 3.
This probability space will depend on a poset (P,≺) with P = {x1, . . . , xn}. In
Chapter 2, this will be a known poset, the poset of m pairs of identical twins or, later,
m sets of identical c-tuplets. In Chapter 3, this will be a fixed but unknown poset. Let
max≺(P ) denote the set of its maximal elements, that is,
max≺(P ) = {x ∈ P : 6 ∃y such that x ≺ y}.
The subscript in max≺ will be suppressed when it is clear from the context.
Given (P,≺), I shall work with a probability space (ΩP ,FP ,PP ), with EP defined in
the obvious way. The subscripts will be suppressed when they are clear from the context,
as they will be for the rest of this section. The probability space (Ω,F ,P) is defined as
follows. Set Ω = Sn×[0, 1], where Sn is the permutation group on [n], and F = P(Sn)×B,
where B is the Borel σ-algebra. Let P = µ×λ, where µ is the uniform probability measure,
that is,
µ({ρ}) = 1
n!
for all ρ ∈ Sn, and λ is the Lebesgue measure. In other words, (ρ, δ) ∈ Ω is picked
uniformly at random. Given (ρ, δ) ∈ Ω, the ρ-co-ordinate will determine the order in which
elements of P appear and the δ-co-ordinate will allow the introduction of randomness
independent of this order into our algorithms. This will not be needed in Chapter 2; in
Chapter 3, the δ-co-ordinate will determine an initial number of elements to reject without
considering. The reason why continuous space and Lebesgue measure are used, despite
the fact that all of the randomized strategies considered pick one of a finite number of
options, is that this allows them all to lie in the same probability space.
Write P [n] for the set of permutations of P , and let pi : Ω → P [n] be the random
variable defined by
pi(ρ, δ)(i) = xρ(i).
1.4. FORMAL MODEL AND NOTATION 19
LetPt denote the set of all posets with vertex set [t] = {1, . . . , t}. Let (Pt)t∈[n] be a family
of random variables with Pt representing the poset seen at time t. Formally, Pt : Ω→Pt
and each Pt(ρ, δ) = ([t],≺t) is defined by
∀i, j ∈ [t], i ≺t j ⇐⇒ pi(i) ≺ pi(j).
The poset Pt is the natural description of what is seen at time t as the elements of P
appear one by one.
Let (Ft)t∈[n] be the sequence of σ-algebras with each Ft generated by the random
variables P1, . . . , Pt, that is,
Ft = σ(P1, . . . , Pt) = σ(Pt),
the second equality holding since Pt is a labelled poset and thus its value determines the
values of P1, . . . , Pt−1. The σ-algebra Ft can be thought of as the information known at
time t about where we are in the universe Ω. Since Pt takes only finitely-many values,
Ft has a simple structure; it is the pre-images in Ω of the possible values of Pt and the
unions of these pre-images. These pre-images are called the atoms of Ft.
Let F ′t be the projection of Ft onto P(Sn). Since definitions have so far depended only
on the ρ-co-ordinate of (ρ, δ) ∈ Ω, it is clear that, for each t,
Ft = {A× [0, 1] : A ∈ F ′t}.
In other words, (ρ1, δ1) and (ρ2, δ2) are in the same atom of Ft if and only if ρ1 and ρ2
are in the same atom of F ′t, which happens if and only if the labelled posets induced by
the first t elements pi(1), . . . , pi(t) are identical.
A stopping time is a random variable τ taking values in [n] and satisfying the property
{τ = t} ∈ Ft,
that is, the decision to stop at time t is based only on the values of P1, . . . , Pt.
I shall give a brief reminder of the formal definitions of conditional expectation and
probability, which in the finite world are intuitive concepts. For more details, see page 304
of Galambos [30] or page 313 of Chung [23], for example. Let X be a random variable.
20 1. THE CLASSICAL SECRETARY PROBLEM
Then the conditional expectation of X given Ft, denoted by E(X|Ft) is any Ft-measurable
random variable satisfying∫
A
E
(
X
∣∣Ft) = ∫
A
X for all A ∈ Ft.
Any two random variables satisfying these conditions are equal with probability 1. As Ft
is finite, this random variable is constant on each atom of Ft and takes the average value
of X on that atom, that is,
E
(
X
∣∣Ft) (ω) = ∫AXP(A) = E(X|A),
where A is the atom of Ft containing ω. In the same way that the probability of an
event E is the expectation of the indicator function 1(E) of this event, the conditional
probability of E given Ft is defined by
P
(
E
∣∣Ft) = E (1(E)∣∣Ft) ,
that is, since Ft is finite,
P
(
E
∣∣Ft) (ω) = E (1(E)∣∣Ft) (ω) = ∫A 1(E)P(A) = P(A ∩ E)P(A) = P(E|A),
where A is the atom of Ft containing ω.
Define the family of random variables (Zt)t∈[n] by
Zt = P
(
pi(t) ∈ max(P )∣∣Ft) ,
that is, the random variable Zt is the probability that the t
th element observed is maxi-
mal given P1, . . . , Pt. The general aim will be to choose stopping times τ to maximize
P
(
pi(τ) ∈ max(P )). This quantity is equal to E(Zτ ) (see page 45 of Chow, Robbins and
Siegmund [22], for example):
E(Zτ ) =
∫
Ω
Zτ =
n∑
t=1
∫
{τ=t}
Zt =
n∑
t=1
∫
{τ=t}
P
(
pi(t) ∈ max(P )∣∣Ft)
=
n∑
t=1
∫
{τ=t}
1
(
pi(t) ∈ max(P )) = ∫
Ω
1
(
pi(τ) ∈ max(P )) = P(pi(τ) ∈ max(P )),
1.5. USEFUL THEOREMS 21
the fourth equality holding by the definition of conditional expectation since {τ = t} ∈ Ft.
These equivalent formulations will be useful later; this conditional probability can be
treated as a pay-off offered at each step of the process.
Recall that F ′t is the projection of Ft onto P(Sn). A randomized stopping time is a
random variable τ taking values in [n] and satisfying the property
{τ = t} ∈ F ′t × B,
that is, the decision to stop at time t is based on the values of P1, . . . , Pt and on some B-
measurable random variable. The randomized stopping times considered in Chapter 3 will
be convex combinations of a finite number of true stopping times, so if such a randomized
stopping time gives a certain probability of success, then there is a true stopping time
with at least that probability of success. In fact, this is true in general, as proved by
Ghoussoub [38].
1.5. Useful theorems
In this section, I shall define backward induction formally and use it to solve the
classical secretary problem. I shall also state a theorem that gives an optimal stopping
time for monotone processes, to be defined later.
The next theorem, which is proved as Theorem 3.2 on page 50 of Chow, Robbins and
Siegmund [22], gives in principle an optimal stopping time for any finite process, although
in practice it might be hard to define such a stopping time more explicitly. It formalizes
the concept of backward induction. Informally, it can be described is as follows.
Define a new random variable for each t, the value of the process at time t. This is the
expected pay-off ultimately accepted given what has happened so far. These values are
calculated inductively, starting at the end. The value of the process at the final step is
just the final pay-off offered. The value of the process at each earlier step is the maximum
of the currently offered pay-off and the expected value of the process at the next step.
An optimal strategy is to stop when the currently offered pay-off is at least the expected
value at the next step.
In the theorem below, the pay-offs offered are the Wt and the values at each step
are the γt. The σ-algebras At represent what is known at time t. As a reminder, being
22 1. THE CLASSICAL SECRETARY PROBLEM
At-measurable means that σ(Wt) ⊂ At, that is, the value of Wt is determined by what is
known at time t or, in the finite world, Wt is constant on each atom of At. In fact, the
nested condition means that At ⊃ σ(W1, . . . ,Wt). The conclusion of the theorem is that
the strategy that stops at the first t when Wt = γt (or, equivalently, when Wt is at least
as large as the expected value of γt+1 given At) is indeed a stopping time and achieves
the optimal value.
Theorem 1.1. Let A1 ⊂ . . . ⊂ An be a nested sequence of σ-algebras and let W1, . . . ,Wn
be a sequence of random variables with each Wt being At-measurable. Let C(At) be the class
of stopping times relative to (At)t∈[n] and let v∗ be given by
v∗ = sup
τ∈C(At)
E(Wτ ).
Define successively γn, γn−1, . . . , γ1 by setting
γn = Wn,
γt = max
{
Wt,E
(
γt+1
∣∣At)}, t = n− 1, . . . , 1.
Let
τ ∗ = min{t : Wt = γt}.
Then τ ∗ ∈ C(At) and
E(Wτ∗) = E(γ1) = v∗ ≥ E(Wτ ) for all τ ∈ C(At).

This theorem can now be used to justify the assertion in Section 1.2 that an optimal
strategy is of the form “reject the first k candidates, and accept the next who is the best
seen so far.” Recall from Section 1.4 that
Zt = P
(
pi(t) ∈ max(P )∣∣Ft) ,
that is, the probability that the tth candidate is the best one given what is known at this
point. Backward induction will be applied with the Zt corresponding to the Wt, with Ft
to Gt and with δt to γt.
1.5. USEFUL THEOREMS 23
Since the random variables Zt are independent, the values of Z1, . . . , Zt give no infor-
mation about the values of Zt+1, . . . , Zn, and therefore E(δt+1|Ft) is constant on all atoms
of Ft and equal to E(δt+1). Therefore, define the function v : [n]→ R by
v(t) = E(δt)
and note that backward induction tells us that the stopping time that stops at the first t
such that Zt ≥ E(δt+1|Ft) = v(t+ 1) is optimal. By definition,
v(t) = E
(
max{Zt, v(t+ 1)}
) ≥ v(t+ 1),
whereas t
n
, the potential non-zero value of Zt, is a non-decreasing function of t. Therefore,
there exists k such that
t
n
< v(t+ 1) if t ≤ k,
t
n
≥ v(t+ 1) if t > k,
and an optimal strategy is “reject the first k candidates, and accept the next who is the
best seen so far,” as claimed.
In fact, this argument is not even necessary, as the classical secretary problem is a
sufficiently straightfoward process that backward induction can be used to give an explicit
optimal strategy. This is because in this case the δt can be calculated explicitly, as in the
next lemma, and these define an optimal strategy.
Lemma 1.2. For all t ≤ n, if
n∑
i=t
1
i
≤ 1,
then
E
(
δt
∣∣Ft−1) = t− 1
n
n−1∑
i=t−1
1
i
,
that is, it is the constant random variable taking that value; otherwise,
E
(
δt
∣∣Ft−1) = E (δt+1∣∣Ft) .
24 1. THE CLASSICAL SECRETARY PROBLEM
Proof. By definition, δn = Zn and so
E
(
δn
∣∣Fn−1) = 1
n
=
n− 1
n
· 1
n− 1 .
For t ≤ n− 1, if
t
n
≥ t
n
n∑
i=t
1
i
= E
(
δt+1
∣∣Ft) ,
then
E
(
δt
∣∣Ft−1) = P(Zt ≥ E (δt+1∣∣Ft) ) · E(Zt∣∣∣(Zt ≥ E(δt+1|Ft)))
+ P
(
Zt < E
(
δt+1
∣∣Ft) ) · E(E(δt+1|Ft)∣∣∣(Zt < E(δt+1|Ft))) (1.1)
=
1
t
· t
n
+
t− 1
t
· t
n
n−1∑
i=t
1
i
=
t− 1
n
n−1∑
i=t−1
1
i
.
If
t
n
<
t
n
n∑
i=t
1
i
= E
(
δt+1
∣∣Ft) ,
then, from the formula in (1.1), it is the case that
E
(
δt
∣∣Ft−1) = 0 · t
n
+ 1 · E (δt+1∣∣Ft)
= E
(
δt+1
∣∣Ft) .

This lemma and the backward induction theorem clearly give the same optimal stop-
ping time as before: accept the tth candidate if he or she is the best so far and
∑n
i=t
1
i
> 1.
The other main tool used in Part 1 of this dissertation is the most basic result for
infinite processes. It is used in Chapter 2 for the reason that the twins case will be reduced
to a process that, although finite, does not have a fixed number of steps, and so backward
induction is inappropriate. The following theorem is slightly adapted from that on page
55 of Chow, Robbins and Siegmund [22], since the random variables of interest are all
positive.
1.5. USEFUL THEOREMS 25
This theorem applies only to the monotone case, and monotonicity is a strong pro-
perty: it is the property that, after the first time in the process where the currently
offered pay-off is at least the expected pay-off at the next step, the same is true at all
future times. The conclusion of the theorem is that the first time when this is true is an
optimal stopping time.
Theorem 1.3. Let A1 ⊂ A2 ⊂ . . . be a nested sequence of σ-algebras and let W1,W2, . . .
be a sequence of random variables with each Wt being At-measurable. Let C(At) be the class
of stopping times relative to (At)t∈[n]. For t ∈ N, let
At =
{
E
(
Wt+1
∣∣At) ≤ Wt}.
and suppose that
A1 ⊂ A2 ⊂ . . . and
∞⋃
t=1
At = Ω. (1.2)
Let
τ ∗ = min
{
t : Wt ≥ E
(
Wt+1
∣∣At)}.
Suppose that P(τ ∗ <∞) = 1 and E(Wτ∗) exists and that
lim inf
t
∫
{τ∗>t}
Wt = 0.
Then
E(Wτ∗) ≥ E(Wτ ) for all τ ∈ C(At).

If equation (1.2) holds then the process (Wt,At)t∈[n] is said to be monotone.

CHAPTER 2
How to choose the best twins
2.1. Introduction
Sections 2.1 to 2.5 of this chapter are based on joint work with Micha l Morayne
and Grzegorz Kubicki [34], but with some significant reorganization of its presentation,
particularly in Section 2.3. Sections 2.6 and 2.7 are my own work.
The poset initially considered in this chapter is the set of m pairs of identical twins.
This can also be viewed as a version of the classical secretary problem where each element
is seen exactly twice. The Hasse diagram of this poset is Figure 2.1.
s
s
s
s
s
s
s
s
s
s
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
"
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
u1
u2
u3
u4
u5
v1
v2
v3
v4
v5
Figure 2.1. The poset (U ∪ V,≺) for m = 5.
In Section 2.6, the poset considered is the natural extension to m sets of identical
c-tuplets. The main results of this chapter are summarized in the following theorems.
Theorem 2.1. For m ∈ N, let
km = min
{
k :
2m
k
+
m−1∑
j=k
1
j
≤ 5
}
.
An optimal strategy for the secretary problem on m pairs of identical twins is to wait until
candidates have been seen from at least km of the pairs and then to pick the next candidate
who is the best so far and whose twin has already been seen. Asymptotically,
lim
m→∞
km
m
=
1
x0
≈ 0.4709,
27
28 2. HOW TO CHOOSE THE BEST TWINS
where x0 is the unique solution to 2x+ log x = 5, and the probability of success tends to
1
x0
+
4(x0 − 1)2
3x0
((
x0
x0 − 1
) 1
2
− 1
)
≈ 0.7680.
Theorem 2.2. For c,m ∈ N with c ≥ 2, let
k(c)m = min
{
k :
m−k∑
j=1
[(
m−j−1
k−1
)(
m−1
k−1
) j∏
i=2
(
1− 1(ci
c
))] ≤ 1} .
An optimal strategy for the secretary problem on m sets of identical c-tuplets is to wait
until candidates have been seen from at least k
(c)
m of the c-tuples and then to pick the next
candidate who is the best so far and all of whose c-tuplets have already been seen. For all
m, (
1
2
− c+ 1
2(2c+1(c− 1)− c− 1)
)
m < k(c)m ≤
⌈m
2
⌉
,
and the probability of success is at least
1− c+ 1
2c(c− 1) .
At first, it might seem surprising that the secretary problem is easier in the twins
case than on the total order, since there are fewer comparisons that can be made and so
apparently less information. However, this is reconciled by the fact that if we attempt to
compare two candidates then there are three possible outcomes rather than two, which
gives us more information. Similarly, viewing the twins case as having two chances with
each candidate makes it intuitive that it should be easier, which is correct.
Most of the notation used is as described in Section 1.4, and in Section 2.2 I shall
describe the notation used specifically for the twins case. In Section 2.3, I shall use the
monotone case theorem (Theorem 1.3) to find an optimal strategy for this problem and
in Section 2.4 I shall give a formula for its probability of success. In Section 2.5, I shall
describe the asymptotic behaviour of the threshold in the definition of the optimal strategy
given and of the probability of success. I shall adapt the earlier method in Section 2.6 to
find an optimal strategy and its probability of success for the natural generalization to
c-tuplets, and give some bounds that illustrate their behaviour. Finally, in Section 2.7, I
2.2. NOTATION AND BASIC DEFINITIONS 29
shall state what work there is still to be done in this case and suggest some other posets
that one might wish to consider.
2.2. Notation and basic definitions
In this section, I shall define the poset consisting of m pairs of identical twins explicitly,
as I shall refer to it in this way in the rest of the chapter.
Consider a set consisting of two chains U = {u1, . . . , um} and V = {v1, . . . , vm} with
u1 and v1 being maximal elements. For every i with 1 ≤ i ≤ m, the elements ui and vi
are incomparable. An element with a smaller subscript is higher then an element with a
larger subscript, that is,
ui ≺ vj if i > j and vj ≺ ui if i < j.
The elements ui and vi are referred to as the twins at level i. The Hasse diagram of this
partial order (U ∪ V,≺) for m = 5 is given in Figure 2.1.
Suppose that the elements of U ∪ V are observed one by one in the order given by a
random permutation pi, with all (2m)! permutations of U∪V equally likely. For every time
t with 1 ≤ t ≤ 2m, we observe the partial order induced by the set {pi(1), pi(2), . . . , pi(t)}.
As an example, suppose that we have a permutation
pi = (u3, v2, u1, v3, u5, v1, u4, v5, v4, u2).
The induced orders observed for pi up to t = 6 are given in Figure 2.2.
The goal is to choose the presently observed element to maximize the probability
that this element is either u1 or v1, that is, one of the two maximal elements of the
poset (U ∪ V,≺). In other words, we are looking for the stopping time τ such that
P
(
pi(τ) ∈ {u1, v1}
)
is maximal. The function τ , the optimal stopping time, depends itself
on pi. The decision at time t is based exclusively on the induced partial orders observed
until time t and the order of appearance of their elements. For example, if the stopping
time tells us to stop on the third element of the permutation given in Figure 2.2, then
using this function we have to stop at time t = 3 for all permutations of U ∪ V whose
first three elements have strictly decreasing subscripts.
30 2. HOW TO CHOOSE THE BEST TWINS
s ss ss
s
ss
s
s
ss
ss s ss
ss s
s
@
@
 
 
@
@
 
 
@
@
 
 
t = 1 t = 2 t = 3 t = 4
t = 5 t = 6
pi(1) pi(2)
pi(1)
pi(3)
pi(2)
pi(1)
pi(3)
pi(2)
pi(1) pi(4)
pi(3)
pi(2)
pi(1)
pi(5)
pi(4)
pi(3)
pi(2)
pi(1)
pi(5)
pi(4)
pi(6)
Figure 2.2. Induced orders when pi = (u3, v2, u1, v3, u5, v1, u4, v5, v4, u2)
for 1 ≤ t ≤ 6.
2.3. The optimal stopping time
In this section, I shall use the monotone case theorem to find an optimal stopping
time for the process.
Define recursively the following stopping times:
τi = min
{
t > τi−1 : pi(t) ∈ max{pi(1), pi(2), . . . , pi(t)} and
{pi(1), pi(2), . . . , pi(t− 1)} contains the twin of pi(t)
}
,
with the assumptions that τ0 = 0 and if the set under the minimum is empty, then its
minimum is 2m. Recall that the twins come from m levels, with the elements ui and vi
referred to as the twins at level i. Let
km = min
{
k :
2m
k
+
m−1∑
j=k
1
j
≤ 5
}
and let
τ = min{τi : the number of levels occupied by pi(1), pi(2), . . . , pi(τi) is at least km},
with the assumption that if the set under minimum is empty, then τ = 2m.
Theorem 2.3. The stopping time τ is optimal.
2.3. THE OPTIMAL STOPPING TIME 31
Clearly, we should not consider stopping unless the currently observed element is
maximal. Furthermore, if we are yet to see its twin then we should continue: if it is from
level 1, then we shall have seen elements from at least as many levels by the time we
see its twin and so shall be at least as confident that it is from level 1; if it is not from
level 1, then we have a chance of finding this out by observing an element from a higher
level. Thus, in order to maximize P (pi(τ) ∈ {u1, v1}), it is necessary to consider only such
stopping times τ whose values coincide with some τi’s.
Define the random variables
Lt = level of the element pi(t).
Recall from Section 1.4 that
Zt = P
(
pi(t) ∈ {u1, v1}
∣∣Ft) = P (Lt = 1∣∣Ft) .
The main aim is to show that the process (Zτi ,Fτi)i∈N is monotone (see equation (1.2)),
so that Theorem 1.3 can be applied. First note that if τi = 2m then τi+1 = 2m and
E
(
Zτi+1
∣∣Fτi) = Zτi , so it is only necessary to consider those i for which τi < 2m. For the
rest of this section it will therefore be assumed that τi is a maximal element whose twin
has already been seen, and that there are still some unseen elements.
A result is needed about the probability of success of the simple stopping time τ1.
When using τ1, we stop at the first opportunity when we see a second twin who is maximal
at that moment. Denote this probability by P (m). Interestingly, the next lemma says
that even the most naive strategy is successful with probability greater than 2
3
, which is
much better than the corresponding 1
n
for the total order on n elements.
Lemma 2.4. For the stopping time τ1 the probability of success is equal to
P (m) =
2m+ 1
3m
.
Proof. Observe that
P
(
Lτ1 = 1
∣∣L1 = 1) = 1,
32 2. HOW TO CHOOSE THE BEST TWINS
since pi(τ1) can only be maximal at time τ1 if it is from a level at most L1. I shall prove
by induction on l that
P
(
Lτ1 = 1
∣∣L1 = l) = 2
3
for all l ≥ 2, from which the result follows, since then
P (m) = P
(
Lτ1 = 1
)
=
m∑
l=1
P
(
Lτ1 = 1
∣∣L1 = l) · P(L1 = l)
=
1
m
(
1 +
2
3
(m− 1)
)
=
2m+ 1
3m
.
If l = 2, then the claim is true, since the second twin from level 2 and the two twins
from level 1 are identically distributed, and we win if and only if the first of these three to
appear is from level 1. Now suppose that l > 2 and consider the level of the next element
to appear from a level at most L1, at time τ
′, say. If 1 < Lτ ′ < l, then the situation is
the same as if pi(τ ′) were the first element chosen, and by the induction hypothesis
P
(
Lτ1 = 1
∣∣{L1 = l} ∩ {1 < Lτ ′ < l}) = 2
3
.
Otherwise, we win if and only if one of the two twins from level 1 appears before the
second twin from level l. Since these three are identically ditstributed, it is the case that
P
(
Lτ1 = 1
∣∣{L1 = l} ∩ {1 < Lτ ′ < l}c) = 2
3
.
Thus, in either case, the probability is 2
3
, and the claim is proved. 
I shall use the following combinatorial identity, which arises from consideration of
the classical secretary problem, to simplify the formulae in the definition of the optimal
stopping time and, later, its probability of success.
Lemma 2.5. For all 1 ≤ k ≤ m− 1,
m−k∑
j=1
1
j
·
(
m−j−1
k−1
)(
m
k
) = k
m
m−1∑
j=k
1
j
.
Proof. The proof proceeds by showing that both sides are equal to the probability of
success in the classical secretary problem using the strategy: “reject the first k elements,
2.3. THE OPTIMAL STOPPING TIME 33
and accept the next maximal element observed.” Let W be the event that the best
candidate is chosen.
LHS. For 0 ≤ j ≤ m− k, let Bj be the event that the highest of the first k elements
observed is at level j + 1. This means that there are m − j − 1 levels available for the
remaining k − 1 elements. Let Cj be the event that the maximum element is the first
element seen out of the highest j elements. Then Bj and Cj are independent, and
P(W ) =
m−k∑
j=1
P(Bj)P(W |Bj)
=
m−k∑
j=1
P(Bj)P(Cj)
=
m−k∑
j=1
(
m−j−1
k−1
)(
m
k
) · 1
j
.
RHS. For k ≤ j ≤ m − 1, let Dj be the event that the (j + 1)th element observed is
the maximal element of the chain, and let Ej be the event that the highest of the first j
elements observed is in the first k observed. Then Dj and Ej are independent, and
P(W ) =
m−1∑
j=k
P(Dj)P(W |Dj)
=
m−1∑
j=k
P(Dj)P(Ej)
=
m−1∑
j=k
1
m
· k
j
.
Thus the lemma is proved. 
Let Nk(i) denote the event that at time τi the number of levels occupied by the
elements pi(1), pi(2), · · · , pi(τi) is equal to k. Note that Nk(i) ∈ Fτi . The next two lemmas
find the values that, when compared, determine whether or not a process is monotone.
Lemma 2.6. For all ω ∈ Nk(i),
Zτi(ω) =
k
m
.
34 2. HOW TO CHOOSE THE BEST TWINS
Proof. Elements from different levels are equally likely to appear in any position in
the random order. Therefore, when k levels have been observed, the probability that one
of them is the highest level is k
m
, independent of the most recently observed element being
from the highest level seen so far. 
Lemma 2.7. For all ω ∈ Nk(i),
E
(
Zτi+1
∣∣Fτi) (ω) = 23 + k3m
(
m−1∑
j=k
1
j
− 2
)
.
Proof. Let
E =
∫
A
E
(
Zτi+1
∣∣Fτi) ,
where A ∈ Fτi is the atom containing ω. By the definition of conditional expectation, it
is sufficient to show that
E =
(
2
3
+
k
3m
(
m−1∑
j=k
1
j
− 2
))
P(A).
It is the case that
E =
∫
A
Zτi+1 =
∫
A
1
(
Lτi+1 = 1
)
= P
({Lτi+1 = 1} ∩ A)
=
m−k∑
j=1
P
({Lτi+1 = 1} ∩ {Lτi = j + 1} ∩ A)
=
m−k∑
j=1
P
(
Lτi+1 = 1
∣∣{Lτi = j + 1} ∩ A) · P (Lτi = j + 1∣∣A) · P(A).
As the probability of getting an element from the highest level at time τi+1 does not
depend on the pattern of the levels up to the time τi, and as A provides only a certain
pattern how to put elements on any chosen k levels, it follows that, by Lemma 2.4,
E =
m−k∑
j=1
2j + 1
3j
·
(
m−j−1
k−1
)(
m
k
) · P(A)
=
(
m−k∑
j=1
2
3
·
(
m−j−1
k−1
)(
m
k
) + m−k∑
j=1
1
3j
(
m−j−1
k−1
)(
m
k
) )P(A)
2.3. THE OPTIMAL STOPPING TIME 35
and, by Lemma 2.5,
E =
(
2
3
·
(
m−1
k
)(
m
k
) + 1
3
m−1∑
j=k
1
j
(
m−1
k−1
)(
m
k
) )P(A)
=
(
2
3
· m− k
m
+
1
3
· k
m
m−1∑
j=k
1
j
)
P(A),
which proves the lemma. 
Let Ai =
{
E
(
Zτi+1
∣∣Fτi) ≤ Zτi} be the event that the pay-off offered at time τi is at
least the expected value of the pay-off at time τi+1.
Lemma 2.8. For all ω ∈ Nk(i), it is the case that ω ∈ Ai if and only if
2m
k
+
m−1∑
j=k
1
j
≤ 5.
Proof. Let ω ∈ Nk(i). Then, by Lemmas 2.6 and 2.7, it is the case that ω ∈ Ai if
and only if
2
3
+
k
3m
(
m−1∑
j=k
1
j
− 2
)
= E
(
Zτi+1
∣∣Fτi) (ω) ≤ Zτi(ω) = km,
which gives the result. 
The proof that τ is an optimal stopping time can now be completed.
Proof of Theorem 2.3. It is sufficient to show that the process (Zτi ,Fτi)i∈N is
in the monotone case, that is, that Ai ⊂ Ai+1 for each i; the monotone case theorem
(Theorem 1.3) then says that τ is optimal.
Let ω ∈ Ai and let k be the number of levels observed at time τi, so that ω ∈ Nk(i).
Then, by Lemma 2.8, the inequality
2m
k
+
m−1∑
j=k
1
j
≤ 5
holds. At time τi+1 we must have seen more than k levels, and the left-hand side of the
inequality decreases as k increases, and so ω ∈ Ai+1. 
36 2. HOW TO CHOOSE THE BEST TWINS
2.4. The probability of success
In this section, I shall work out a formula for the probability of success when using an
optimal stopping time.
Theorem 2.9. When using the optimal stopping time τ , the probability of success
equals
P
(
Lpi(τ) = 1
)
=
1
3m
[
2m+ km −
(
km −
km−1∑
s=0
s∏
r=1
2(m− km + r)
2(m− km + r) + 1
)(
3−
m−1∑
j=km
1
j
)]
.
(2.1)
It is not surprising that evaluating the threshold produces a formula involving the
harmonic series that cannot be expressed in closed form; a similar phenomenon occurs in
the solution of the classical secretary problem.
For a permutation pi ∈ S(U ∪ V ), let
τpi = min
{
t : pi(1), pi(2), . . . , pi(t) occupy exactly km levels
}
.
Set
M1 =
{
poset
〈
pi(1), pi(2), . . . , pi(τpi)
〉
has exactly one maximal element
}
and
M2 =
{
poset
〈
pi(1), pi(2), . . . , pi(τpi)
〉
has exactly two maximal elements
}
.
I shall begin by proving the following lemma.
Lemma 2.10.
P(M1) =
1
km
km−1∑
s=0
s∏
r=1
2(m− km + r)
2(m− km + r) + 1 . (2.2)
Proof. In the proof of this lemma, the levels will be referred to in the order in which
they are observed (1, 2, . . . , km), not their positions in the poset. Some of the elements
will come from the same level as earlier elements, so one or two elements from each of
these km levels will have been observed.
Let τpi(j) be the smallest t such that pi(1), pi(2), . . . , pi(t) occupy exactly j levels (so,
in particular, τpi = τpi(km)). For i ≤ j, let Bi(j) be the event that exactly one of
2.4. THE PROBABILITY OF SUCCESS 37
pi(1), pi(2), . . . , pi
(
τpi(j)
)
occupies the ith level, that is, at the first moment when j le-
vels have been seen, exactly one element has been seen from the ith level. Note that
P
(
Bi(i)
)
= 1 for all i, and also that, for i ≤ km, the events are nested in the sequence
Bi(i) ⊃ Bi(i+ 1) ⊃ . . . ⊃ Bi(km). Thus
P
(
Bi(km)
)
= P
(
Bi(i) ∩Bi(i+ 1) ∩ . . . ∩Bi(km)
)
= P
(
Bi(i)
)
P
(
Bi(i+ 1)
∣∣Bi(i)) . . .P (Bi(km)∣∣Bi(km − 1)) .
The probability P
(
Bi(j + 1)
∣∣Bi(j)) is easy to calculate. Indeed, suppose we are at
time τpi(j), having observed elements from j levels including exactly one element from
the ith level. There are 2(m − j) elements occupying levels not yet observed, and for
Bi(j + 1) we require that one of these is observed before the second element from the i
th
level. These 2(m− j) + 1 elements are identically distributed, so by symmetry
P
(
Bi(j + 1)
∣∣Bi(j)) = 2(m− j)
2(m− j) + 1 .
Thus
P
(
Bi(km)
)
= P
(
Bi(i)
)
P
(
Bi(i+ 1)
∣∣Bi(i)) . . .P (Bi(km)∣∣Bi(km − 1))
=
km−1∏
r=i
2(m− r)
2(m− r) + 1
=
km−i∏
r=1
2(m− km + r)
2(m− km + r) + 1 ,
following the convention that the empty product has value 1.
Let Ci be the event that the i
th level observed is the highest level observed by time
τpi, and let Di be the event that both Bi(km) and Ci hold, that is, that by time τpi exactly
one element has been observed from the ith level and it is the highest observed so far.
Note that the levels are equally likely to be observed in any order, so P(Ci) = 1km , and
38 2. HOW TO CHOOSE THE BEST TWINS
that Bi(km) and Ci are independent, so
P(Di) = P
(
Bi(km) ∩ Ci
)
= P
(
Bi(km)
)
P(Ci)
=
1
km
km−i∏
r=1
2(m− km + r)
2(m− km + r) + 1 .
Note finally that M1 is the disjoint union of the Dis, so
P(M1) = P(D1) + P(D2) + . . .+ P(Dkm)
=
km∑
i=1
1
km
km−i∏
r=1
2(m− km + r)
2(m− km + r) + 1
=
1
km
km−1∑
s=0
s∏
r=1
2(m− km + r)
2(m− km + r) + 1 ,
and the lemma is proved. 
The proof of the formula for the probability of success can now be completed.
Proof of Theorem 2.9. For each integer j with 2 ≤ j ≤ m− km + 2, define M j1 to
be the subevent of M1 where a second best element of 〈pi(1), pi(2), . . . , pi(τpi)〉 is on level
j. These subevents form a partition of M1 and
P(M j1 ) = P
(
M j1
∣∣M1)P(M1) = (m−jkm−2)(j − 1)(m
km
) P(M1).
Similarly, for each integer j with 1 ≤ j ≤ m − km + 1, define M j2 to be the subevent of
M2 where the maximal elements of 〈pi(1), pi(2), . . . , pi(τpi)〉 are on level j. These subevents
form a partition of M2 and
P(M j2 ) = P
(
M j2
∣∣M2)P(M2) = (m−jkm−1)(m
km
) (1− P(M1)).
Given M j2 , the probability of success when using τ , which tells us to stop at the next
τi, is equal to
2(j−1)+1
3(j−1) , the probability of success when we use the simple stopping time τ1
on the poset consisting of j − 1 pairs of twins above the level j. Similarly, given M j1 , the
probability of success when using τ is equal to 2(j−1)+1
3(j−1) , the probability of success when
we use the simple stopping time τ1 on the poset consisting of j − 1 pairs of twins above
2.4. THE PROBABILITY OF SUCCESS 39
the level j of a second best element obtained so far, and with the maximal element being
treated as the first random element observed in this reduced poset. The term with j = 1
in the second sum below disappears since there is no chance of stopping on a maximal
element in that case. Therefore,
P(Lτ = 1) =
m−km+2∑
j=2
P
(
Lpi(τ) = 1
∣∣M j1)P(M j1 ) + m−km+1∑
j=1
P
(
Lpi(τ) = 1
∣∣M j2)P(M j2 )
=
m−km+2∑
j=2
2(j − 1) + 1
3(j − 1) ·
(
m−j
km−2
)
(j − 1)(
m
km
) P(M1)
+
m−km+1∑
j=2
2(j − 1) + 1
3(j − 1) ·
(
m−j
km−1
)(
m
km
) (1− P(M1)),
which, after changing the index of summation in both sums, becomes
P(Lτ = 1) = P(M1)
m−km+1∑
j=1
2j + 1
3
·
(
m−j−1
km−2
)(
m
km
) + (1− P(M1))m−km∑
j=1
2j + 1
3j
·
(
m−j−1
km−1
)(
m
km
)
= P(M1)
∑m−km+1
j=1
2j
3
(
m−j−1
km−2
)
+ 1
3
∑m−km+1
j=1
(
m−j−1
km−2
)(
m
km
)
+
(
1− P(M1)
)∑m−kmj=1 23(m−j−1km−1 )+∑m−kmj=1 13j (m−j−1km−1 )(
m
km
) .
Since, by standard combinatorial methods,
m−km+1∑
j=1
j
(
m− j − 1
km − 2
)
=
(
m
km
)
,
m−km+1∑
j=1
(
m− j − 1
km − 2
)
=
(
m− 1
km − 1
)
and
m−km∑
j=1
(
m− j − 1
km − 2
)
=
(
m− 1
km
)
,
and, by Lemma 2.5,
m−km∑
j=1
1
j
(
m− j − 1
km − 1
)
=
(
m− 1
km − 1
) m−1∑
j=km
1
j
,
40 2. HOW TO CHOOSE THE BEST TWINS
it is the case that
P(Lτ = 1) = P(M1)
(
2
3
+
1
3
·
(
m−1
km−1
)(
m
km
) )+ (1− P(M1)) 23(m−1km )+ 13(m−1km−1)∑m−1j=km 1j(m
km
)
= P(M1)
(
2
3
+
km
3m
)
+
(
1− P(M1)
)(2(m− km)
3m
+
km
3m
m−1∑
j=km
1
j
)
=
2m+ km
3m
− (1− P(M1))(km
m
− km
3m
m−1∑
j=km
1
j
)
,
and substituting in the formula for P(M1) from Lemma 2.10 gives the result. 
2.5. Asymptotics
I shall begin this section with exact evaluations of, and a crude bound for, the threshold
needed for the optimal stopping time τ and the probability of success for m = 7, and then
determine the asymptotic behaviour of those quantities as m→∞.
Example. For m = 7, the threshold km = k7 = 4, since for k = 4 it is the case that
2 · 7
4
+
1
4
+
1
5
+
1
6
=
247
60
≤ 5
but for k = 3 the reverse inequality holds, namely,
2 · 7
3
+
1
3
+
1
4
+
1
5
+
1
6
=
337
60
> 5.
The probability of success is equal to
P(Lτ = 1) =
1
21
[
18−
(
4−
(
1 +
8
9
+
80
99
+
960
1287
))(
3− 37
60
)]
=
3001
3780
≈ 0.7939.
Observation. By the definition of km, one can easily observe that for every m it
must be the case that
km ≤
⌈m
2
⌉
.
2.5. ASYMPTOTICS 41
Indeed, if k ≥ m
2
then
2m
k
+
m−1∑
j=k
1
j
≤ 2m
k
+
m− k
k
≤ 5.
The following theorems establish the asymptotic behavior of the threshold km and
the limit of the probability of success as m → ∞. Since m changes here, the optimal
stopping time for the partial order consisting of m twins will be denoted by τ (m). Denote
the unique solution of the equation 2x+ log x = 5 by x0, so that x0 ≈ 2.12347.
Theorem 2.11. The threshold km satisfies
m
x0
< km <
m
x0
+ 2
for all m ∈ N. In particular,
lim
m→∞
km
m
=
1
x0
≈ 0.4709.
Proof. By the definition of km from Theorem 2.9, it is the case that km satisfies the
inequality
5 ≥ 2m
km
+
m−1∑
j=km
1
j
>
2m
km
+ log
(
m
km
)
,
the latter inequality by a simple integration argument. Since 2x + log x is an increasing
function, it must be the case that
m
km
< x0.
On the other hand, km − 1 does not satisfy the condition, so a similar argument yields
5 <
2m
km − 1 +
m−1∑
j=km−1
1
j
<
2m
km − 1 + log
(
m− 1
km − 2
)
<
2m
km − 2 + log
(
m
kn − 2
)
and hence
m
km − 2 > x0.
Thus
m
x0
< km <
m
x0
+ 2,
as required. 
I shall now turn to the asymptotic probability of success.
42 2. HOW TO CHOOSE THE BEST TWINS
Theorem 2.12. The probability of success when using optimal stopping time τ (m)
satisfies
lim
m→∞
P(Lpi(τ (m)) = 1) =
1
x0
+
4(x0 − 1)2
3x0
((
x0
x0 − 1
) 1
2
− 1
)
≈ 0.7680.
The proof uses the following lemma, which gives the asymptotic value of the expression
in (2.2). As with τ (m), the notation P
(
M
(m)
1
)
is used to emphasize the dependence on m.
Lemma 2.13. The probability of M
(m)
1 occurring satisfies
lim
m→∞
P
(
M
(m)
1
)
= 2(x0 − 1)
((
1 +
1
x0 − 1
) 1
2
− 1
)
.
Proof. Recall from Lemma 2.10 that P(M (m)1 ) is given by
P
(
M
(m)
1
)
=
1
km
km−1∑
s=0
s∏
r=1
2(m− km + r)
2(m− km + r) + 1 .
Write f(s) for the summand, that is,
f(s) =
s∏
r=1
(
1− 1
2(m− km + r) + 1
)
,
and consider log(f(s)):
log(f(s)) =
s∑
r=1
log
(
1− 1
2(m− kn + r) + 1
)
= −
s∑
r=1
(
1
2(m− km + r) + 1 +
1
2
(
1
2(m− km + r) + 1
)2
+ . . .
)
.
For sufficiently largem, Theorem 2.11 shows that km
m
< 1
2
, and hence that 1
2(m−km+r)+1 <
1
m
for all r. Thus, since s < m, the expression log(f(s)) is bounded by
−
s∑
r=1
1
2(m− km + r) + 1 −
(
1
2m
+
1
3m2
+ . . .
)
< log(f(s)) < −
s∑
r=1
1
2(m− km + r) + 1
and hence by
−
s∑
r=1
1
2(m− km + r) + 1 −
(
1
m
+
1
2m2
+ . . .
)
< log(f(s)) < −
s∑
r=1
1
2(m− km + r) + 1 .
(2.3)
2.5. ASYMPTOTICS 43
The sum can be bounded further by integrals. Note that 1
2x
is a decreasing function
and obtain∫ s+1
m
1
m
1
2
(
1− km
m
+ 1
2m
+ x
)dx ≤ s∑
r=1
1
2(m− km + r) + 1 ≤
∫ s
m
0
1
2
(
1− km
m
+ 1
2m
+ x
)dx
and hence
1
2
log
(
1 +
s
m− km + 32
)
≤
s∑
r=1
1
2(m− km + r) + 1 ≤
1
2
log
(
1 +
s
m− km + 12
)
. (2.4)
Exponentiating (2.3) and substituting in the bounds in (2.4) gives(
1− 1
m
)(
1 +
s
m− km + 12
)− 1
2
< f(s) <
(
1 +
s
m− km + 32
)− 1
2
.
Noting that f(s) is a decreasing function, the same technique is used to bound
P
(
M
(m)
1
)
= 1
km
∑km−1
s=0 f(s), which gives
∫ 1
0
(
1− 1
m
)(
1 +
x
m
km
− 1 + 1
2km
)− 1
2
dx < P
(
M
(m)
1
)
<
∫ 1− 1
km
− 1
km
(
1 +
x
m
km
− 1 + 3
2km
)− 1
2
dx
and hence
2
(
1− 1
m
)(
m
km
− 1 + 1
2km
)((
m+ 1
2
m− km + 12
) 1
2
− 1
)
< P
(
M
(m)
1
)
<
2
(
m
km
− 1 + 3
2km
)((
m+ 1
2
m− km + 32
) 1
2
−
(
1− 1
m− km + 32
) 1
2
)
.
Recalling from Theorem 2.11 that limm→∞ mkm = x0, take limits as m→∞ and notice
that both sides of this inequality tend to the same limit. Hence,
lim
m→∞
P
(
M
(m)
1
)
= 2(x0 − 1)
((
x0
x0 − 1
) 1
2
− 1
)
,
and the lemma is proved. 
The proof of the asymptotic optimal probability of success can now be completed.
44 2. HOW TO CHOOSE THE BEST TWINS
Proof of Theorem 2.12. A similar integration argument to that in the proof of
Lemma 2.13 gives
lim
m→∞
m−1∑
j=km
1
j
= lim
m→∞
log
(
m
km
)
= log x0 = 5− 2x0.
Substituting these limiting values into the equation for success (2.1),
P(Lτ = 1) =
1
3m
[
2m+ km −
(
km −
km−1∑
s=0
s∏
r=1
2(m− km + r)
2(m− km + r) + 1
)(
3−
m−1∑
j=km
1
j
)]
,
and writing L = limm→∞ P
(
Lτ (m) = 1
)
gives
L =
1
3
[
2 +
1
x0
−
(
1
x0
− 1
x0
· 2(x0 − 1)
[(
x0
x0 − 1
) 1
2
− 1
])(
3− (5− 2x0)
)]
=
1
x0
+
4(x0 − 1)2
3x0
((
x0
x0 − 1
) 1
2
− 1
)
,
as required. 
2.6. How to choose the best c-tuplets
The posets most closely related to the one considered in this chapter are those that
consist of m sets of c-tuplets, that is, with vertex set U1∪. . .∪Uc where Ui = {u1i , . . . , umi },
for fixed j the uji s are incomparable and for all i1, j1, i2, j2 it is the case that
uj1i1 ≺ uj2i2 if and only if j1 > j2.
In this section, I shall give an explicit optimal strategy and a formula for the probability
of its success. Finding their asymptotic behaviour is an open problem, although I shall
give some bounds that represent progress in that direction.
As with the twins case, the monotone case theorem can be used to find an explicit
optimal strategy. Where it is unambiguous, I shall use notation from earlier in this chapter
in this slightly different context without comment. Similarly to before, let
τ
(c)
i = min
{
t > τ
(c)
i−1 : pi(t) ∈ max{pi(1), pi(2), . . . , pi(t)} and
{pi(1), pi(2), . . . , pi(t− 1)} contains all the c-tuplets of pi(t)
}
,
2.6. HOW TO CHOOSE THE BEST c-TUPLETS 45
with the assumptions that τ
(c)
0 = 0 and if the set under the minimum is empty, then its
minimum is cm.
To begin with, Lemma 2.4 can be generalized in the following way, where τ
(c)
1 is the
stopping time that stops at the first element that is maximal so far and all of whose
c-tuplets have been seen.
Lemma 2.14. For the stopping time τ
(c)
1 the probability of success is equal to
Pc(m) =
m∏
i=2
(
1− 1(ci
c
)) .
Proof. For 2 ≤ i ≤ m, let Ni be the event that an element from level i is not chosen.
Then
Pc(m) = P
(
N2 ∩ . . . ∩Nm
)
= P
(
Nm
)
P
(
Nm−1
∣∣Nm) . . .P (N2∣∣N3 ∩ . . . ∩Nm) .
Conditioned on Ni+1 ∩ . . . ∩ Nm, an element from level i will be selected if and only if
the c elements from this level all come before the c(i − 1) elements from the i − 1 levels
above. Hence,
P
(
Ni
∣∣Ni+1 ∩ . . . ∩Nm) = 1− 1(ci
c
) ,
and the result follows. 
This gives rise to an optimal stopping time. Similarly to before, let
k(c)m = min
{
k :
m−k∑
j=1
[(
m−j−1
k−1
)(
m−1
k−1
) j∏
i=2
(
1− 1(ci
c
))] ≤ 1}
and let
τ (c) = min
{
τ
(c)
i : the number of levels occupied by pi(1), pi(2), . . . , pi(τi) is at least k
(c)
m
}
,
with the assumption that if the set under the minimum is empty then its minimum is cm.
Theorem 2.15. The stopping time τ (c) is optimal for the secretary problem on m sets
of identical c-tuplets.
46 2. HOW TO CHOOSE THE BEST TWINS
Proof. The proof of this using the monotone case theorem is essentially the same as
in Section 2.3. 
Calculating the probability of success is slightly hard work algebraically, but follows
broadly similar lines to the twins case. As then, let
Lt = level of the element pi(t).
I shall use the multinomial coefficient notation(
m
r1, . . . , rc,m− r1 − . . .− rc
)
to denote the number of ways of choosing c distinguishable sets of sizes r1, . . . , rc from a
ground set of size m.
Theorem 2.16. The probability that the stopping time τ (c) is successful is
P(Lτ (c) = 1) =
c−1∑
i=1

 ∑
r1+...+rc=k
(c)
m
r1,ri≥1
(
m
r1,...,rc,m−k(c)m
)(
c
1
)r1 . . . (c
c
)rc(
cm
r1+2r2+...+crc
) · r1
r1 + 2r2 + . . .+ crc
· ri
k
(c)
m

k(c)m
m
+
m−k(c)m +1∑
j=2
( m−j
k
(c)
m −1
)(
m
k
(c)
m
) (1− 1(cj−i
c−i
)) j−1∏
k=2
(
1− 1(
ck
c
))

+
 ∑
r1+...+rc=k
(c)
m
r1,rc≥1
(
m
r1,...,rc,m−k(c)m
)(
c
1
)r1 . . . (c
c
)rc(
cm
r1+2r2+...+crc
) · r1
r1 + 2r2 + . . .+ crc
· rc
k
(c)
m

m−k(c)m +1∑
j=2
( m−j
k
(c)
m −1
)(
m
k
(c)
m
) j−1∏
k=2
(
1− 1(
ck
c
))
 .
As with the twins case, I shall need a lemma that gives the probability that, at the
first moment when k
(c)
m levels have been seen, exactly i elements have been observed from
the highest level seen. Let Mi be this event.
Lemma 2.17. Let i be an integer with 1 ≤ i ≤ c. Then
P(Mi) =
∑
r1+...+rc=k
(c)
m
r1,ri≥1
(
m
r1,...,rc,m−k(c)m
)(
c
1
)r1 . . . (c
c
)rc(
cm
r1+2r2+...+crc
) · r1
r1 + 2r2 + . . .+ crc
· ri
k
(c)
m
.
2.6. HOW TO CHOOSE THE BEST c-TUPLETS 47
Proof. For r1 + . . .+ rc = k
(c)
m , let Ar1,...,rc be the event that when r1 + 2r2 + . . .+ crc
elements have been seen, the number of levels from which exacly j elements have been seen
is rj, for all j. Out of the
(
cm
r1+2r2+...+crc
)
possible choices for the first r1 + 2r2 + . . . + crc
elements, the number of choices such that Ar1,...,rc holds is as follows. First, there are(
m
r1,...,rc,m−k(c)m
)
choices for the levels with each number of elements seen. Then, for each j
and for each of the rj levels with j elements, there are
(
c
j
)
choices for which of its elements
have been seen. Therefore,
P(Ar1,...,rc) =
(
m
r1,...,rc,m−k(c)m
)(
c
1
)r1 . . . (c
c
)rc(
cm
r1+2r2+...+crc
) .
Now let Br1,...,rc be the event that the number of levels from which exacly j elements
have been seen is rj, for all j, at the first moment when k
(c)
m levels have been observed.
Then Br1,...,rc is only a non-empty event if r1 ≥ 1 and in that case Br1,...,rc occurs if and
only if Ar1,...,rc occurs and the last element observed is one of the singletons. By symmetry,
the singletons are equally likely to be in any position in the first r1+2r2+. . .+crc elements,
and therefore
P
(
Br1,...,rc
∣∣Ar1,...,rc) = r1r1 + 2r2 + . . .+ crc .
Finally, let Cr1,...,rc be the event that Br1,...,rc holds and that exactly i elements have
been seen from the highest level. The highest level is equally likely to be any of the
r1 + . . .+ rc = k
(c)
m levels chosen, and so
P
(
Cr1,...,rc
∣∣Br1,...,rc) = ri
k
(c)
m
.
For Mi to occur, there must be at least one level from which only one element has
been seen (the last one observed), and at least one level from which exactly i elements
have been seen (the highest one). Subject to these restrictions, if Mi occurs then Cr1,...,rc
occurs for some choice of r1, . . . , rc with r1 + . . . + rc = k
(c)
m , and the Cr1,...,rc are disjoint
48 2. HOW TO CHOOSE THE BEST TWINS
and all lie within Mi. Therefore,
P(Mi) =
∑
r1+...+rc=k
(c)
m
r1,ri≥1
P(Cr1,...,rc)
=
∑
r1+...+rc=k
(c)
m
r1,ri≥1
(
m
r1,...,rc,m−k(c)m
)(
c
1
)r1 . . . (c
c
)rc(
cm
r1+2r2+...+crc
) · r1
r1 + 2r2 + . . .+ crc
· ri
k
(c)
m
,
as required. 
Let M ji be the event that at the first moment when k
(c)
m levels have been seen, there
have been i elements from the highest level seen and that level is level j. Lemma 2.14
can easily be modified to give the following lemma, which gives the probability of success
conditioned on M ji for all possible i and j.
Lemma 2.18. (1) For i with 1 ≤ i ≤ c− 1,
P
(
Lτ (c) = 1
∣∣M1i ) = 1.
(This is the case where 1 ≤ i ≤ c− 1 and j = 1.)
(2) For i, j with 1 ≤ i ≤ c− 1 and 2 ≤ j ≤ m− k(c)m + 1,
P
(
Lτ (c) = 1
∣∣M ji ) =
(
1− 1(cj−i
c−i
)) j−1∏
k=2
(
1− 1(
ck
c
)) .
(This is the case where 1 ≤ i ≤ c− 1 and 2 ≤ j ≤ m− k(c)m + 1.)
(3)
P
(
Lτ (c) = 1
∣∣M1c ) = 0.
(This is the case where i = c and j = 1.)
(4) For j with 2 ≤ j ≤ m− k(c)m + 1,
P
(
Lτ (c) = 1
∣∣M jc ) = j−1∏
k=2
(
1− 1(
ck
c
)) .
(This is the case where i = c and 2 ≤ j ≤ m− k(c)m + 1.)

The proof of the formula for the probability of success can now be completed.
2.6. HOW TO CHOOSE THE BEST c-TUPLETS 49
Proof of Theorem 2.16. By the same argument as in the proof of Theorem 2.9,
it is the case that
P(M ji ) =
( m−j
k
(c)
m −1
)(
m
k
(c)
m
) P(Mi)
=
( m−j
k
(c)
m −1
)(
m
k
(c)
m
) ∑
r1+...+rc=k
(c)
m
r1,ri≥1
(
m
r1,...,rc,m−k(c)m
)(
c
1
)r1 . . . (c
c
)rc(
cm
r1+2r2+...+crc
) · r1
r1 + 2r2 + . . .+ crc
· ri
k
(c)
m
.
The events
{
M ji : 1 ≤ i ≤ c, 1 ≤ j ≤ m− k(c)m + 1
}
partition the space and thus
P (Lτ (c) = 1) =
c∑
i=1
m−k(c)m +1∑
j=1
P
(
Lτ (c) = 1
∣∣M ji )P (M ji ) ,
and the proof is completed using Lemma 2.18. 
Recall that
k(c)m = min
{
k :
m−k∑
j=1
[(
m−j−1
k−1
)(
m−1
k−1
) j∏
i=2
(
1− 1(ci
c
))] ≤ 1} .
The following lemma will be helpful for finding bounds for k
(c)
m .
Lemma 2.19. For all integers c ≥ 2, and for all positive integers k and m with k ≤ m,
it is the case that(
1− c+ 1
2c(c− 1)
)(m
k
− 1
)
<
m−k∑
j=1
[(
m−j−1
k−1
)(
m−1
k−1
) j∏
i=2
(
1− 1(ci
c
))] ≤ m
k
− 1.
Proof. For the upper bound, first observe that the value of the product is at most
one. By standard combinatorial arguments,
m−k∑
j=1
(
m− j − 1
k − 1
)
=
(
m− 1
k
)
and so
m−k∑
j=1
(
m−j−1
k−1
)(
m−1
k−1
) = m− k
k
=
m
k
− 1,
which gives the upper bound.
50 2. HOW TO CHOOSE THE BEST TWINS
For the lower bound, first observe that(
ci
c
)
=
ci
c
· ci− 1
c− 1 · . . . ·
ci− c+ 1
1
≥ ic.
Therefore,
j∏
i=2
(
1− 1(ci
c
)) > 1− ∞∑
i=2
i−c
> 1− 1
2c
−
∫ ∞
i=2
x−cdx (2.5)
= 1− c+ 1
2c(c− 1) .
Since this bound is independent of j, it can be treated as a constant factor and the sum
evaluated as before, and the lower bound follows. 
The reason that in (2.5) I bounded the sum only from i = 3, rather than the whole
sum, was just to make the upper and lower bounds exponentially close. If I had not done
so, then the first factor in the lower bound in Lemma 2.19 would have been 1− 1
c−1 , and
the first factor in the lower bound in the next theorem would have been 1
2
− 1
2(2c−3) .
Theorem 2.20. For all integers c ≥ 2 and m ≥ 1, it is the case that(
1
2
− c+ 1
2(2c+1(c− 1)− c− 1)
)
m < k(c)m ≤
⌈m
2
⌉
.
In particular, for all m ≥ 1,
lim
c→∞
k(c)m =
⌈m
2
⌉
.
Proof. If
k ≤
(
1
2
− c+ 1
2(2c+1(c− 1)− c− 1)
)
m =
1− c+1
2c(c−1)
2− c+1
2c(c−1)
m,
then, by Lemma 2.19,
m−k∑
j=1
[(
m−j−1
k−1
)(
m−1
k−1
) j∏
i=2
(
1− 1(ci
c
))] > (1− c+ 1
2c(c− 1)
)(m
k
− 1
)
≥ 1,
and the lower bound follows.
2.6. HOW TO CHOOSE THE BEST c-TUPLETS 51
If
k ≥ m
2
,
then, by Lemma 2.19,
m−k∑
j=1
[(
m−j−1
k−1
)(
m−1
k−1
) j∏
i=2
(
1− 1(ci
c
))] ≤ m
k
− 1 ≤ 1,
and the upper bound follows. 
A lower bound for the probability of success comes from observing that the proof of
Lemma 2.19 bounds the probability of success of the simple stopping time τ1 given in
Lemma 2.14 by
Pc(m) ≥ 1− c+ 1
2c(c− 1) ,
and an optimal stopping time must do at least as well. This argument is equivalent to
bounding the products in the formula for the probability of success in Theorem 2.16.
c 1
2
− c+1
2(2c+1(c−1)−c−1) k
(c)
100 k
(c)
1000 limm→∞
k
(c)
m
m
1− c+1
2c(c−1) w
(c)
100 limm→∞w
(c)
m
1 - 38 369 0.3679 - 0.3708 0.3679
2 0.2000 48 471 0.4709 0.2500 0.7697 0.7680
3 0.4286 50 493 ? 0.7500 0.9354 ?
4 0.4725 50 499 ? 0.8958 ? ?
5 0.4880 50 500 ? 0.9531 ? ?
6 0.4945 50 500 ? 0.9781 ? ?
7 0.4974 50 500 ? 0.9896 ? ?
8 0.4987 50 500 ? 0.9950 ? ?
9 0.4994 50 500 ? 0.9976 ? ?
10 0.4997 50 500 ? 0.9988 ? ?
Figure 2.3. A table showing bounds for and exact and asymptotic values
of k
(c)
m and w
(c)
m = P(Lτ (c) = 1) for small values of c.
For convenience and to emphasise the dependence on m, for given c and m let
w(c)m = P(Lτ (c) = 1)
be the probability of success of τ (c) on m sets of c-tuplets. Figure 2.3 summarizes the
bounds from this section, and gives exact values of k
(c)
100, k
(c)
1000 and w
(c)
100, calculated using
52 2. HOW TO CHOOSE THE BEST TWINS
Maple™ 13. Unfortunately, the formula in Theorem 2.16 is too complicated to be calcu-
lated efficiently for anything but small c; similarly, these values of m are near the limit of
what I could calculate.
2.7. Open problems
I shall first summarize the main goals for the c-tuplets case.
(1) Simplify the formula for the probability of success in Theorem 2.16.
(2) For fixed c, prove that limm→∞ k
(c)
m
m
exists.
(3) If limm→∞ k
(c)
m
m
exists, find its value as the root of an equation or as a function of
limm→∞ k
(c−1)
m
m
.
(4) For fixed c, prove that limm→∞w
(c)
m exists.
(5) If limm→∞w
(c)
m exists, find its value as the root of an equation or as a function of
the value for limm→∞w
(c−1)
m .
I have already shown that for any fixed m,
lim
c→∞
k(c)m =
⌈m
2
⌉
and
lim
c→∞
w(c)m = 1,
which gives the values of
lim
c→∞
lim
m→∞
k
(c)
m
m
and
lim
c→∞
lim
m→∞
w(c)m
if they exist.
In this chapter, I have assumed that the aim is to pick the best candidate. It is almost
certainly a harder problem, as with the original secretary problem, to try to minimize
the expected rank instead. Alternatively, the full information case would be interesting
to study; here, m random samples would be taken from a known distribution and each
replicated c− 1 times, and all cm of these values revealed to the selector one by one.
2.7. OPEN PROBLEMS 53
Inspired further by the results in Chapter 1, one could allow the interviewer to go
back to an earlier candidate, who is still available with a certain probability, or assume
that the number of sets of c-tuplets is unknown but comes from a known distribution.
A modification of the c-tuplets poset that might be worth thinking about is a ‘pyramid’
with i candidates at level i. It seems likely that this would lie between the classical
secretary problem and the twins version in terms of difficulty. It is also natural to try to
solve the problem of optimal choice, or any of these variations, on other common posets,
for example
(1) Qk = {0, 1}k, where (x1, . . . , xk) ≺ (y1, . . . , yk) if (x1, . . . , xk) 6= (y1, . . . , yk) and
xi ≤ yi for 1 ≤ i ≤ k,
(2) [m]2, where (x1, x2) ≺ (y1, y2) if (x1, x2) 6= (y1, y2) and xi ≤ yi for i = 1, 2, or
even
(3) [m]k, where (x1, . . . , xk) ≺ (y1, . . . , yk) if (x1, . . . , xk) 6= (y1, . . . , yk) and xi ≤ yi
for 1 ≤ i ≤ k, which includes both of the previous examples.

CHAPTER 3
The secretary problem on an unknown poset
3.1. Introduction
This chapter is based on joint work with Robert Morris [35], and is substantially the
same as the paper.
In this chapter, I shall show that for the problem of optimal choice on any poset there
is an algorithm that is successful with probability at least 1
e
, so, in this sense, the total
order is the hardest possible partial order. In fact, this algorithm depends only on the size
of the poset and its number of maximal elements, so it is universal for any family where
these are given. It is therefore natural to ask which is the hardest partial order with a
given number of maximal elements. The most obvious choice is the poset consisting of
k disjoint chains. I shall give an asymptotically sharp lower bound on the probability of
success in the problem of optimal choice on k disjoint chains, and show that it is at least
as hard as on any poset with k maximal elements and of width k, that is, whose largest
antichain has size k.
More precisely, the main aim is to prove the following two theorems.
Theorem 3.1. Let (P,≺) be a poset with k maximal elements and of width k. Then
there is an algorithm for the secretary problem on (P,≺) depending only on |P | and k
that is successful with probability at least pk, where
pk =

1
e
if k = 1,
k−1
√
1
k
if k > 1,
(3.1)
and these are the best possible such bounds.
Theorem 3.2. Let (P,≺) be a poset with k maximal elements. Then there is an
algorithm for the secretary problem on (P,≺) depending only on |P | and k that is successful
with probability at least 1
e
, and this is the best possible such bound.
55
56 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
Robert Morris and I conjectured that both of these theorems can be improved, by
removing the width condition in Theorem 3.1 and the dependence on k in Theorem 3.2;
see Conjectures 3.20 and 3.21 in Section 3.4.
The poset consisting of k disjoint chains was considered by Kuchta and Morayne [56],
but with a restriction on the order in which the elements are observed: those from the
first chain all appear in a random order, then those from the second chain, and so on.
This poset is also related to multicriteria extensions of the secretary problem. In the
multicriteria version due to Gnedin [41], each element is ranked independently in k > 1
different criteria, and the selector wishes to select an element that is maximal in at least
one of them. This is equivalent to the problem on k equally-sized disjoint chains with the
elements appearing one at a time from each chain in the same cyclic order. Interestingly,
the asymptotic value of the probability of success in Theorem 3.1, k−1
√
1
k
, is the same as
in the multicriteria version.
This chapter is organized as follows. In Section 3.2, I shall describe a (randomized)
algorithm for choosing an element of a given poset, and prove lower bounds for its pro-
bability of success on various families of posets. In Section 3.3, I shall show that these
bounds are best possible, by proving that, for the poset that consists of k disjoint chains
of length x (which lies in each of these families), there is no strategy that is successful with
probability greater than pk + o(1) (as x→∞). Finally, I shall suggest some conjectures
and open problems in Section 3.4.
3.2. Lower bounds
Throughout this section, p is a real number satisfying 0 < p < 1. Recall that pi(t) is
the tth element of the poset P that we see, and that Pt is a poset with vertex set [t] that
is isomorphic to the poset seen at time t. I shall prove lower bounds for the probability
of success of the following randomized algorithm on different families of posets.
Algorithm. Given a poset with n elements, of which k are maximal, let X(p) ∼
Bin(n, p). Reject the first X(p) elements and accept the first subsequent element where
the following condition holds: the poset induced by the elements seen so far (including the
3.2. LOWER BOUNDS 57
currently observed element) has at most k maximal elements and the currently observed
element is one of them.
This algorithm gives rise to the following randomized stopping time, τk(p).
Recall that Ω = Sn × [0, 1] and that P is the uniform probability measure on Ω. Let
X(p) : Ω→ {0, . . . , n} be the random variable defined by
X(p)(ρ, δ) = min
{
x ≥ 0 :
x∑
i=0
(
n
i
)
pi(1− p)n−i ≥ δ
}
,
so that
P
(
X(p) = x
)
=
(
n
x
)
px(1− p)n−x
and X(p) = X(p)(ρ, δ) is independent of ρ. Then τk(p) is defined by
τk(p) =

min
{
t > X(p) : |max(Pt)| ≤ k and t ∈ max(Pt)
}
if this exists,
n otherwise.
Given the definition of τk(p), it makes sense to consider another random variable, the
set of X(p) elements that we reject without considering. Denote this random variable by
S(p), where
S(p) = {pi(t) : t ≤ X(p)}.
The following simple property of S(p) is important. It means that one can think of
the algorithm as rejecting each element of P with probability p, independently of the
others and of the random ordering pi, which makes it easier to work out a formula for the
probability that it is successful.
Lemma 3.3. The events {x ∈ S(p)}x∈P are independent and P
(
x ∈ S(p)) = p for all
x ∈ P .
Proof. The random variables pi and X(p) can be generated with the required dis-
tributions in the following way. Put each element of P in S(p) with probability p in-
dependently of all other elements. Let pi consist of a uniformly random ordering of the
elements of S(p) followed by a uniformly random ordering of P \S(p). By symmetry, pi is
58 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
a uniformly random ordering of P , and X(p) = |S(p)| is a binomial random variable inde-
pendent of pi. The events {x ∈ S(p)}x∈P depend only on pi and X(p), and by construction
the properties in the statement of the lemma hold. 
The following identity will be useful for simplifying the formula for the probability of
success.
Lemma 3.4. For all integers k ≥ 1, the following holds:
∞∑
s=0
(
k + s− 1
k − 1
)
(1− p)s = 1
pk
.
Proof. Suppose that we have a coin that comes up heads with probability p and tails
with probability 1− p, and that we toss it infinitely many times. Then, with probability
1, we shall see at least k heads, and the kth head comes up in position k + s for some
s ≥ 0. In this case, we know that k − 1 of the first k + s − 1 tosses are heads and the
remaining s are tails, and so, summing over the probabilities that the kth head comes in
each position, we have
∞∑
s=0
(
k + s− 1
k − 1
)
pk(1− p)s = 1.

In order to prove Theorem 3.1, I shall first calculate a lower bound for the probability
that τk(p) is successful on the poset consisting of k disjoint chains. Recall that p is a real
number satisfying 0 < p < 1 and that pi
(
τk(p)
)
is the element that the algorithm τk(p)
selects.
Theorem 3.5. Let (P,≺) be a poset consisting of k disjoint chains. Then
P
(
pi
(
τk(p)
) ∈ max(P )) >

p log
(
1
p
)
if k = 1,
k
k−1p(1− pk−1) if k > 1.
Proof. First note that pi
(
τk(p)
) ∈ max(P ) in the exceptional case where S(p) = P
and pi
(
τk(p)
)
= pi(n) ∈ max(P ), an event with probability k
n
pn. This tends to 0 as n→∞,
and the bounds in the theorem are obtained by considering only the cases where X(p) < n
3.2. LOWER BOUNDS 59
and hence pi
(
τk(p)
) 6∈ S(p). However, later on, in the proof of Lemma 3.11, the fact that
these bounds are for a slightly smaller event will be important.
s
s
s
s
s
pppi
i
i
s
s
s
s
s
s
pppi
i
p p p
s
s
s
s
ppp
i
6
?
6
?
m1
jk
C1 C2 Ck
Figure 3.1. An example of k disjoint chains with the elements of S(p) cir-
cled. This illustrates an instance of the event A0,3,...,1. The region enclosed
by the solid curve marks the j1 + . . .+ jk elements that might be selected.
Let the k chains be denoted by C1, . . . , Ck and have lengths m1, . . . ,mk. Let Aj1,...,jk
be the event that for each i there are ji elements from Ci not in S(p) above the highest
element from Ci in S(p) (see Figure 3.1), that is,
Aj1,...,jk =
k⋂
i=1
{ ∣∣∣{x ∈ Ci\S(p) :6 ∃y ∈ Ci ∩ S(p) such that x ≺ y}∣∣∣ = ji}.
For ji < mi, this means that the top ji elements are not in S(p) but the (ji + 1)
th is.
For ji = mi, this means that there are no elements from the i
th chain in S(p). Note that
if Aj1,...,jk occurs then pi
(
τk(p)
)
will be the first element observed from the j1 + . . . + jk
elements not in S(p) that are at the tops of their chains.
The events
{
Aj1,...,jk : 0 ≤ j1 ≤ m1, . . . , 0 ≤ jk ≤ mk
}
partition the whole space.
Thus, writing Qk(p) for P
(
pi
(
τk(p)
) ∈ max(P )),
Qk(p) =
∑
0≤j1≤m1,...,0≤jk≤mk
P
(
pi
(
τk(p)
) ∈ max(P )∣∣∣Aj1,...,jk)P (Aj1,...,jk)
>
∑
0≤j1≤m1,...,0≤jk≤mk
(j1,...,jk)6=(0,...,0)
|{i : ji > 0}|
j1 + . . .+ jk
(1− p)j1+...+jkp|{i:ji<mi}|. (3.2)
60 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
Since 1 + (1− p) + (1− p)2 + . . . = 1
p
, this can be written as
Qk(p) >
∑
0≤j1≤m1,...,0≤jk≤mk
(j1,...,jk) 6=(0,...,0)
|{i : ji > 0}|
j1 + . . .+ jk
(1− p)j1+...+jkpk(1 + (1− p) + (1− p)2 + . . .)|{i:ji=mi}|
=
∑
j1,...,jk≥0
(j1,...,jk) 6=(0,...,0)
|{i : ji > 0}|
min{j1,m1}+ . . .+ min{jk,mk}(1− p)
j1+...+jkpk
>
∑
j1,...,jk≥0
(j1,...,jk) 6=(0,...,0)
|{i : ji > 0}|
j1 + . . .+ jk
(1− p)j1+...+jkpk.
Rewriting the last line as a sum over r = |{i : ji > 0}| and s = j1 + . . .+ jk gives
Qk(p) >
k∑
r=1
∞∑
s=r
∣∣∣{(j1, . . . , jk) : ∣∣{i : ji > 0}∣∣ = r and j1 + . . .+ jk = s}∣∣∣ · r
s
(1− p)spk.
The rest of the proof is a straightforward calculation. To calculate
∣∣∣{(j1, . . . , jk) : ∣∣{i : ji > 0}∣∣ = r and j1 + . . .+ jk = s}∣∣∣ ,
we note that there are
(
k
r
)
ways of choosing the indices i with ji > 0 and there are then(
s−1
r−1
)
ways for r non-zero numbers to add up to s. Thus
Qk(p) >
k∑
r=1
∞∑
s=r
(
k
r
)(
s− 1
r − 1
)
r
s
(1− p)spk = kpk
k∑
r=1
∞∑
s=r
(
k − 1
r − 1
)(
s− 1
r − 1
)
1
s
(1− p)s.
Reversing the order of summation,
Qk(p) > kp
k
∞∑
s=1
1
s
(1− p)s
min{k,s}∑
r=1
(
s− 1
r − 1
)(
k − 1
k − r
)
.
The second sum is easily evaluated (a result known as Vandermonde’s identity) to give
Qk(p) > kp
k
∞∑
s=1
1
s
(1− p)s
(
k + s− 2
k − 1
)
. (3.3)
It remains to evaluate the sum in the above inequality. Write
Vk(p) =
∞∑
s=1
1
s
(1− p)s
(
k + s− 2
k − 1
)
.
3.2. LOWER BOUNDS 61
Differentiating, and then applying Lemma 3.4, gives
dVk(p)
dp
= −
∞∑
s=1
(1− p)s−1
(
k + s− 2
k − 1
)
= − 1
pk
.
Integrating gives
Vk(p) =

− log p+ c1 if k = 1,
1
(k−1)pk−1 + ck if k > 1,
where the ck are constants. Since the expressions above are continuous in p in the interval
(0, 1], limits as p→ 1 exist and may be used to find ck and deduce that
Vk(p) =
∞∑
s=1
1
s
(1− p)s
(
k + s− 2
s− 1
)
=

log
(
1
p
)
if k = 1,
1
k−1
(
1
pk−1 − 1
)
if k > 1.
Substituting the value of Vk(p) into (3.3) gives the result. 
In order to extend the result above to posets whose width is the same as their number of
maximal elements, I shall use Dilworth’s theorem [24] (see also page 81 of Bolloba´s [11]):
Theorem 3.6. A poset with largest antichain of size k can be covered by k chains. 
In the next theorem, I shall show that the secretary problem is no harder on a poset
with k maximal elements and width k than on a poset consisting of k disjoint chains.
s
s
s
s
s
pppi
i
i
s
s
s
s
s
s
pppi
i
p p p
s
s
s
s
ppp
iAA
A
A
Appppppp
p
ppppppp
p
pp p p p p p p p pp
6
?
6
?
m1
jk
C1 C2 Ck
Figure 3.2. An example of k disjoint chains with one extra comparison,
and with elements of S(p) circled. The region enclosed by the solid curve
marks the elements that might be selected. The element in the dotted
region could have been selected if the extra comparison were not there (cf.
Figure 3.1).
62 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
Theorem 3.7. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k
maximal elements and that none of its antichains has size greater than k. Then
P
(
pi
(
τk(p)
) ∈ max(P )) >

p log
(
1
p
)
if k = 1,
k
k−1p(1− pk−1) if k > 1.
Proof. Dilworth’s theorem (Theorem 3.6) implies that P takes the form of k chains
with some comparisons in between them. Clearly, the k elements of max(P ) lie at the
top of the k chains. The proof therefore proceeds in an almost identical manner to that
of Theorem 3.5. The only difference is that the denominator in each term of (3.2) is now
at most, rather than equal to, j1 + . . . + jk (see Figure 3.2), so the expression in this
line is still a lower bound. The calculations that make up the remainder of the proof of
Theorem 3.5 therefore follow in the same way. 
The values that maximize the function in Theorem 3.7 are
pk =

1
e
if k = 1,
k−1
√
1
k
if k > 1.
(3.4)
This has the following consequence, which gives the lower bounds in Theorem 3.1.
Corollary 3.8. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k
maximal elements and that none of its antichains has size greater than k. Then
P
(
pi
(
τk(pk)
) ∈ max(P )) > pk.

It is curious to note that the expected proportion of elements that we reject without
considering is the same as the probability of success; it is not clear to me why this should
be the case.
The next objective is to prove the following theorem, which, with the right choice of
p, will give a lower bound of 1
e
for all posets, as in Theorem 3.2.
3.2. LOWER BOUNDS 63
Theorem 3.9. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k
maximal elements. Then
P
(
pi
(
τk(p)
) ∈ max(P )) > kpk log(1
p
)
.
The proof will use two simple lemmas. The first states that the linear order is the
hardest of all posets with a unique maximal element.
Lemma 3.10. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has exactly
one maximal element. Then the probability that τ1(p) is successful on (P,≺) is at least
the probability that it is successful on a linear extension of P , and hence
P
(
pi
(
τk(p)
) ∈ max(P )) > p log(1
p
)
.
Proof. Begin by taking an arbitrary linear extension of ≺, that is, a partial order
≺′ such that any two elements are comparable and such that x ≺ y ⇒ x ≺′ y. (It is clear
that such a partial order exists.) Denote the unique element in max≺(P ) = max≺′(P ) by
xmax.
Given this new poset, (P,≺′), define random variables pi′, X ′(p), S ′(p) and τ ′1(p) in
the same way as pi, X(p), S(p) and τ1(p) were defined given (P,≺). Couple the random
variables
(
pi,X(p), S(p), τ1(p)
)
and
(
pi′, X ′(p), S ′(p), τ ′1(p)
)
in the obvious way; set pi′ = pi
and X ′(p) = X(p), and hence S ′(p) = S(p). This means that the elements appear in
the same order in both instances, and the same set S(p) is rejected in both cases. The
induced posets observed in the process on (P,≺′) are linear extensions of those observed
in the process on (P,≺). The proof proceeds by showing that if pi(τ1(p)) 6= xmax then
pi′
(
τ ′1(p)
) 6= xmax, that is, if τ1(p) fails in the process on (P,≺) then τ ′1(p) fails on (P,≺′).
From this, the result follows, since the probability of success is therefore at least as large
on (P,≺) as on (P,≺′), and Theorem 3.5 applied to (P,≺′) gives the lower bound.
If we reach xmax then it will be accepted, since it must be the unique maximal element
in the poset induced by the elements observed so far. Thus pi
(
τ1(p)
) 6= xmax if either
(i) xmax ∈ S(p) or
(ii) after rejecting S(p), we accept an element that appears earlier than xmax.
64 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
In case (i), τ ′1(p) must fail on (P,≺′) for the same reason, since S ′(p) = S(p). In case (ii),
such an element must be the unique maximal element of the poset induced by what we
have seen so far, and this is still the case in any linear extension. Therefore, with τ ′1(p), if
this element is observed then it must be accepted, and so we still accept an element that
appears earlier than xmax. It follows that in either case pi
′(τ ′1(p)) 6= xmax. 
The next lemma gives a lower bound for the probability of success restricting our
attention to the case when all but one of the maximal elements of our poset are in S(p).
This turns out to be enough to prove Theorem 3.9.
Lemma 3.11. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k maximal
elements. Then
P
(
pi
(
τk(p)
) ∈ max(P )∣∣∣|max(P ) ∩ S(p)| = k − 1) > p
1− p log
(
1
p
)
.
Proof. First observe that we may assume that k = 1, for the following reason. The
condition that |max(P ) ∩ S(p)| = k − 1 means that the k − 1 maximal elements in
max(P ) ∩ S(p) will be maximal for the remainder of the process, so when using τk(p)
these and all elements dominated by at least one of these may be ignored, and a unique
maximal element from the remaining elements will be selected. Those elements form
a poset (P ′,≺′) with a unique maximal element xmax, which is not in S(p). Since all
elements are in S(p) with probability p independently of the others, the situation is the
same as if we were working with (P ′,≺′) and conditioning on xmax 6∈ S(p). Looking for
one of at most k maximal elements in P using τk(p) is the same as looking for a unique
maximal element in P ′ using τ1(p).
Assume from now on that k = 1; Lemma 3.10 will be used to prove the result in
this case. Lemma 3.10 used the bound from Theorem 3.5, and recall from the proof of
that theorem that the lower bound for P
(
pi
(
τ1(p)
)
= xmax
)
is in fact a lower bound for
P
(
{S(p) 6= P} ∩ {pi(τ1(p)) = xmax}).
Set
M = {xmax ∈ S(p)}
3.2. LOWER BOUNDS 65
and
W = {S(p) 6= P} ∩ {pi(τ1(p)) = xmax} .
Note that if S(p) 6= P and xmax ∈ S(p) then pi
(
τ1(p)
) 6= xmax and hence W ⊂ M c.
Therefore,
P(W ) = P(W ∩M c) = P(M c)P(W |M c) = (1− p)P(W |M c).
Since, by the bound from Lemma 3.10,
P(W ) > p log
(
1
p
)
,
and the quantity that we are interested in is P(W |M c), the result follows. 
The theorem can now be proved.
Proof of Theorem 3.9. It is clear that
P
(
|max(P ) ∩ S(p)| = k − 1
)
= kpk−1(1− p).
Thus, for general k,
P
(
pi
(
τk(p)
) ∈ max(P )) > P(pi(τk(p)) ∈ max(P )∣∣∣|max(P ) ∩ S(p)| = k − 1)
· P
(
|max(P ) ∩ S(p)| = k − 1
)
>
p
1− p log
(
1
p
)
· kpk−1(1− p)
= kpk log
(
1
p
)
,
as required. 
This theorem has the following consequence. The probability e−
1
k is chosen to maxi-
mize the function in Theorem 3.9 and gives the lower bound in Theorem 3.2. As described
in Chapter 1, it is well-known that 1
e
is the best possible lower bound for the probability of
success in the classical secretary problem, and so this completes the proof of Theorem 3.2.
66 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
Corollary 3.12. Let (P,≺) be a poset with n elements of which k are maximal.
Then
P
(
pi
(
τk
(
e−
1
k
))
∈ max(P )
)
>
1
e
.

3.3. Upper bound
In this section, I shall show that the bound in Corollary 3.8 is best possible. The
proof of Theorem 3.5 shows that the probability of success of the stopping time τk(pk) on
k disjoint chains decreases towards the given lower bound as the chains increase in length.
This might suggest that the probability of success of an optimal strategy is reduced as
the chains increase in length and thus, to prove that the bounds are best possible, one
should consider chains of length tending to infinity. The main theorem in this section,
Theorem 3.13, does just that; for sufficiently long chains, the probability of success of
an optimal stopping time can be made arbitrarily close to that in Corollary 3.8, and so
τk(pk) is asymptotically optimal. Since the poset consisting of k disjoint chains satisfies
the conditions of Corollary 3.8, the bounds given are the best possible such bounds.
Define Dk(x) to be the poset consisting of k disjoint chains, each of size x. It might
be useful at this point to recall some definitions from Section 1.4. The probability space
associated with the poset Dk(x) is denoted by (ΩDk(x),FDk(x),PDk(x)), but the subscripts
are suppressed when they are clear from the context. The poset induced by the first t
elements observed is isomorphic to the random variable Pt, which is a poset on vertex
set [t], and Ft is the σ-algebra generated by P1, . . . , Pt, which represents what is known
at time t. A stopping time is a random variable τ taking values in [n] and satisfying the
property
{τ = t} ∈ Ft.
Let C(Ft) denote the class of all such stopping times, and extend this notation to any
sequence of σ-algebras in the analogous way.
The aim is to find an upper bound for E(Zτ ) that holds for all stopping times τ , where
Zt = P
(
pi(t) ∈ max (Dk(x))∣∣∣Ft) .
3.3. UPPER BOUND 67
Theorem 3.13 states that, as x → ∞, the limit of the probability of success of an
optimal stopping time on Dk(x) is no greater than pk. Since Corollary 3.8 showed the
existence of a stopping time with probability of success at least pk, Theorem 3.13 shows
that this is the best possible such bound and so gives Theorem 3.1.
Theorem 3.13. Let pk be as defined in (3.4). Then
lim
x→∞
sup
τ∈C(Ft)
EDk(x)(Zτ ) ≤ pk.
Note that the supremum is over stopping times in C(Ft), which means that we are
allowed to use the extra information from the structure of the posets, not just the pay-
offs that we are offered.
The following observation is important and so is recorded as a lemma.
Lemma 3.14. When (P,≺) = Dk(x), it is the case that
Zt =

y
x
if t ∈ max(Pt), pi(t) ∈ C and |C ∩ {pi(1), . . . , pi(t)}| = y,
0 if t 6∈ max(Pt),
where C is one of the k chains in Dk(x).
Proof. The maximal element of a chain C is equally likely to be at any position in
the order in which its x elements are observed. Therefore, when y elements have been
observed from this chain, the probability that one of them is the maximal element is y
x
,
independent of the most recently observed element being maximal. 
The proof of Theorem 3.13 will proceed roughly as follows. At time t we expect to have
seen approximately t
k
elements from each chain. Therefore, since all orders are equally
likely, the tth element observed is maximal in what has been seen so far with probability
approximately 1
t/k
= k
t
. By Lemma 3.14, if this happens then it is a maximal element of
Dk(x) with probability approximately
t
kx
. Therefore, Zt is approximately distributed as
Zt =

t
kx
with probability k
t
,
0 with probability 1− k
t
.
(3.5)
68 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
If Zt were distributed exactly like this with the Zt all independent of each other, then
the proof would not be difficult to complete. Since the potential non-zero value of Zt
increases with t, it is straightforward to show (as in the classical secretary problem; more
details will be given later in this section) that an optimal strategy is, for some I, to ignore
the first I elements and accept the next non-zero Zt. I will denote the associated stopping
time by τI and make some rough calculations. This is only an outline of the more precise
argument that will be given later; in particular, ≈ is only intended to have an intuitive
meaning and does not stand for any well-defined relation. The approximate distribution
of Zt gives
E(ZτI ) =
kx∑
t=I+1
P(τI = t)E
(
Zt
∣∣τI = t)
≈
kx∑
t=I+1
P
({
Zi = 0 for all i ∈ {I + 1, . . . , t− 1}
}
∩ {Zt > 0}) · t
kx
≈
kx∑
t=I+1
(
t−1∏
i=I+1
(
1− k
i
))
· k
t
· t
kx
=
1
x
kx∑
t=I+1
t−1∏
i=I+1
(
1− k
i
)
.
I shall apply this formula in the case where k is much smaller than I, so 1 − k
i
can be
approximated by e−
k
i , and sums are twice approximated by integrals to give
E(ZτI ) ≈
1
x
kx∑
t=I+1
e−
∑t−1
i=I+1
k
i ≈ 1
x
kx∑
t=I+1
e−k log(
t
I )
=
Ik
x
kx∑
t=I+1
t−k ≈

I
x
· log (x
I
)
if k = 1,
k
k−1 · Ikx ·
(
1− ( I
kx
)k−1)
if k > 1.
This is the formula in Theorem 3.7 with p = I
kx
and is thus maximized, as in Corol-
lary 3.8, when I
kx
= pk, in which case E(ZτI ) ≈ pk. Therefore the bounds in Corollary 3.8
are best possible, and if these calculations had been exact then the proof of Theorem 3.1
would be complete.
Unfortunately, Zt is not distributed exactly as in (3.5). In order to conclude that an
optimal stopping time is of the simple form above, it would be nice to use the principle of
3.3. UPPER BOUND 69
backward induction described in Section 1.5. This formalizes the intuitive principle that,
in a finite game, an optimal strategy is simply to analyse at each step whether or not we
expect our situation to improve by continuing, and to do so if and only if this is the case.
The reason why the sequence of random variables (Zt)t∈[n] is difficult to analyse is that
the values they can take vary depending on how the process unfolds. However, it is very
likely that at any time we shall have seen approximately the same number of elements from
each chain. The proof will therefore proceed by defining a sequence of random variables
(Yt)t∈[n], which act as asymptotically almost sure upper bounds for Zt and are easier to
analyse. To obtain these bounds, each chain will be split into m segments, each of length
l, and the process split into m sets of kl observations. These lengths l are margins of error
beyond which the number of elements observed from a chain is not expected to deviate.
Initially, I shall fix m and let l →∞ to find an upper bound for E(Yτ ) and hence E(Zτ )
in terms of m. Letting m→∞ will then give a best possible result.
This means that the precise statement proved for Theorem 3.13 is in fact
lim
m→∞
lim
l→∞
sup
τ∈C(Ft)
EDk(lm)(Zτ ) ≤ pk.
However, this is purely a matter of convenience; it is clear that the proof can be extended
to posets Dk(x) where x is not a multiple of m by dividing each chain into m almost equal
rather than exactly equal segments.
p p p
6
?
6
?
lm
l
l(s− 1)
ls
l(s+ 1)
C1 C2 Ck
Figure 3.3. This figure shows the number of elements observed from each
chain. In this example, after a total of kls elements have been observed we
see that UC1,s and UCk,s hold but UC2,s does not.
It is necessary to show that the process behaves in this approximately uniform manner
with high probability as l → ∞. I shall first define what it means to be approximately
70 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
uniform in one particular chain C at time kls, an event called UC,s (see Figure 3.3), and
then what it means to be approximately uniform everywhere at all times, an event called
U .
Given one of the chains, C, and for all s ∈ {0, . . . ,m}, let UC,s be the event that when
we have observed kls elements in total we have observed between l(s − 1) and l(s + 1)
elements from C, that is,
UC,s =
{
l(s− 1) ≤
∣∣∣C ∩ {pi(1), . . . , pi(kls)}∣∣∣ ≤ l(s+ 1)}.
For all t ∈ {0, . . . , klm}, let s(t) be the unique integer s such that kl(s − 1) < t ≤ kls,
that is,
s(t) =
⌈
t
kl
⌉
. (3.6)
Let U be the event that for all t when we have observed t elements in total we have
observed between l
(
s(t)− 2) and l(s(t) + 1) elements from each chain, that is,
U =
⋂
i,t
{
l
(
s(t)− 2) ≤ ∣∣∣Ci ∩ {pi(1), . . . , pi(t)}∣∣∣ ≤ l(s(t) + 1)}.
I shall use Lemma 3.16, which states that the process is approximately uniform with
high probability and follows easily from Lemma 3.15.
Lemma 3.15. Let m ≥ 1 be an integer, let C be one of the chains in Dk(lm) and let
s ∈ {0, . . . ,m}. Then
lim
l→∞
PDk(lm)(UC,s) = 1.
Proof. I shall show that the probability that we have observed more than l(s + 1)
or fewer than l(s− 1) elements tends to zero as l→∞. (If s = 0 or s = m then only one
of these tails needs to be considered.)
Assume C and s are given. Let N be the number of elements observed from chain C
when kls elements have been observed in total, and write f(x) = P(N = x). Then
f(x) =
(
lm
x
)(
(k−1)lm
kls−x
)(
klm
kls
)
and hence
f(x+ 1)
f(x)
=
(lm− x)(kls− x)
(x+ 1)((k − 1)lm− kls+ x+ 1) .
3.3. UPPER BOUND 71
Note that this is a decreasing function in x. Thus, if x ≥ ls, then
f(x+ 1)
f(x)
<
l(m− s)(k − 1)ls
ls(k − 1)l(m− s) = 1 (3.7)
and, if x+ 1 ≤ ls, then
f(x+ 1)
f(x)
>
l(m− s)(k − 1)ls
ls(k − 1)l(m− s) = 1. (3.8)
Observe also that, if x ≥ l(s+ 1), then
f(x+ 1)
f(x)
<
(m− s− 1)((k − 1)s− 1)
(s+ 1)((k − 1)(m− s) + 1) = cs+1 < 1 (3.9)
and, if x+ 1 ≤ l(s− 1), then
f(x+ 1)
f(x)
>
(m− s+ 1)((k − 1)s+ 1)
(s− 1)((k − 1)(m− s)− 1) =
1
cs−1
> 1, (3.10)
where cs+1 and cs−1 are constants depending on k, m and s but not l.
Since (3.7) and (3.8) imply that f
(
l(s+1)
)
< . . . < f(ls) and f
(
l(s−1)) < . . . < f(ls),
it follows that f
(
l(s+ 1)
)
< 1
l
and f
(
l(s− 1)) < 1
l
. Finally, (3.9) and (3.10) give
P
(
N > l(s+ 1)
)
<
∞∑
i=0
cis+1
l
=
1
l (1− cs+1) → 0 as l→∞
and
P
(
N < l(s− 1)) < ∞∑
i=0
cis−1
l
=
1
l (1− cs−1) → 0 as l→∞,
which proves the claim. 
Lemma 3.16. Let m ≥ 1 be an integer. Then
lim
l→∞
PDk(lm)(U) = 1.
Proof. This lemma follows simply from the previous lemma: choose l sufficiently
large that each of the k(m+ 1) events UCi,s occurs with probability at least 1− δ. Then,
trivially, all k(m+ 1) events hold with probability at least 1− k(m+ 1)δ. It is easy to see
that ⋂
Ci,s
UCi,s ⊂ U,
72 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
since the the events UCi,s(t)−1 and UCi,s(t) imply that
l
(
s(t)− 2) ≤ ∣∣∣Ci ∩ {pi(1), . . . , pi(t)}∣∣∣ ≤ l(s(t) + 1),
and so this holds for every t and i, as required. 
The next lemma states that in order to prove Theorem 3.13, it suffices to show that its
statement is true conditioned on U occurring. This formalizes the intuition that, since the
process is asymptotically almost surely uniform (that is, since liml→∞ PDk(lm)(U) → 1),
we may assume that it is uniform.
Recall that C(Ft) is the class of all stopping times relative to the σ-algebras Ft =
σ(P1, . . . , Pt), that is, the decision to stop at time t depends only on P1, . . . , Pt.
Lemma 3.17. For all m,
lim
l→∞
sup
τ∈C(Ft)
EDk(lm)(Zτ ) ≤ lim
l→∞
sup
τ∈C(Ft)
EDk(lm)(Zτ |U).
Proof. By Lemma 3.16, for all ε > 0 the value of l may be chosen sufficiently large
that PDk(lm)(U c) ≤ ε. By definition, it is also the case that Zt ≤ 1 for all t. Therefore,
for all τ ,
EDk(lm)(Zτ ) = EDk(lm)(Zτ |U)PDk(lm)(U) + EDk(lm)(Zτ |U c)PDk(lm)(U c)
≤ EDk(lm)(Zτ |U) + ε.
Taking suprema gives
sup
τ∈C(Ft)
EDk(lm)(Zτ ) ≤ sup
τ∈C(Ft)
EDk(lm)(Zτ |U) + ε.
Since ε is arbitrary, the result follows. 
Now that I have shown that the process is asymptotically almost surely uniform, and
in light of the previous lemma, I shall assume that U occurs. The next aim is to find a
process that offers at least the pay-offs that Zt offers but is easier to analyse. This means
that I shall be able to find an upper bound for the probability of success of an optimal
stopping time on this process and hence on Zt.
3.3. UPPER BOUND 73
I shall now define the random variables Yt that act as upper bounds for the (Zt|U)
and are easier to analyse. These random variables are not strict upper bounds, in the
sense that the random variables are not coupled in any way. However, conditioned on U
occurring, Zt is less than the potential non-zero value of Yt and the probability that Zt
is non-zero is less than the probability that Yt is non-zero, and I shall be able to show
that an optimal strategy for the game on Zt has a lower expected pay-off than an optimal
strategy for the game on Yt.
I shall often need to refer to the potential non-zero value of Yt and the probability
that it takes this value; set
yt =
s(t) + 1
m
and pt =

1
l(s(t)−2) if s(t) ≥ 3,
1 if s(t) ≤ 2,
where s(t) is as defined in (3.6). Now define a sequence of independent random variables
(Yt)t∈[n] by
Yt =

yt with probability pt,
0 with probability 1− pt.
These will not be explicitly defined on Ω as there is no need, although it is of course
straightforward to do so.
The next lemma states the intuitive principle that we expect to do at least as well
in the game with the random variables Yt as in the game with the random variables
Zt conditioned on U occurring. Analagously to Ft for Zt, let (Gt)t∈[n] be defined by
Gt = σ(Y1, . . . , Yt).
Lemma 3.18. For all l,m ∈ N with m ≥ 3,
sup
τ∈C(Ft)
EDk(lm)(Zτ |U) < sup
τ∈C(Gt)
EDk(lm)(Yτ ).
This will be proved shortly. Putting Lemmas 3.17 and 3.18 together tells us that
lim
l→∞
sup
τ∈C(Ft)
EDk(lm)(Zτ ) ≤ lim
l→∞
sup
τ∈C(Gt)
EDk(lm)(Yτ ).
74 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
The proof of the lemma relies on Theorem 1.1, which was the theorem that formalized
backward induction.
Proof of Lemma 3.18. For convenience, I shall continue to use n to denote the
number of elements in Dk(lm), that is, n = klm. I shall apply Theorem 1.1 to the
sequences (Yt)t∈[n] and (Zt)t∈[n], conditioned on U occurring, to show that the optimal
expected pay-off for (Yt)t∈[n] is at least as large for (Zt)t∈[n].
First consider what happens with the sequence (Yt)t∈[n]. Recall that (Gt)t∈[n] are
defined by Gt = σ(Y1, . . . , Yt) and let (αt)t∈[n] be defined for (Yt)t∈[n] as (γt)t∈[n] were for
(Wt)t∈[n] in the backward induction theorem, that is,
αn = Yn,
αt = max
{
Yt,E
(
αt+1 | Gt
)}
, t = n− 1, . . . , 1.
Since the random variables Yt are independent, the values of Y1, . . . , Yt give no information
about the values of Yt+1, . . . , Yn, and therefore E(αt+1|Gt) is constant on all atoms of Gt
and equal to E(αt+1). Therefore, define the function v : [n]→ R by
v(t) = E(αt)
and note that backward induction tells us that the stopping time that stops at the first t
such that Yt ≥ E(αt+1|Gt) = v(t+ 1) is optimal. By definition,
v(t) = E
(
max{Yt, v(t+ 1)}
) ≥ v(t+ 1),
whereas yt, the potential non-zero value of Yt, is a non-decreasing function of t. Therefore,
there exists I such that
yt < v(t+ 1) if t ≤ I and
yt ≥ v(t+ 1) if t > I,
and an optimal strategy for the game on Yt is ‘reject the first I elements, and accept the
next with a non-zero pay-off.’
3.3. UPPER BOUND 75
Recall that the distribution of Yt is given by
Yt =

yt with probability pt,
0 with probability 1− pt.
From this, it follows that
v(n) = E(αn) = E(Yn) = pnyn =
m+ 1
m
· 1
l(m− 2) >
1
lm
=
k
n
(3.11)
and that, for 1 ≤ t ≤ n− 1,
v(t) = E(αt) = E
(
max{Yt,E(αt+1)}
)
=

ptyt + (1− pt)v(t+ 1) if t > I,
v(t+ 1) if t ≤ I.
(3.12)
Now consider the sequence of random variables (Zt)t∈[n]. Since the statement of the
lemma is conditioned on U occurring, define a new sequence of σ-algebras (Ht)t∈[n] by
Ht = σ
(Ft ∪ {U})
and consider only ω ∈ U . Analogously to γt and αt, let
βn = Zn
and, for 1 ≤ t ≤ n− 1, let
βt = max
{
Zt,E
(
βt+1 |Ht
)}
.
Recalling that C(Ft) denotes the class of stopping times relative to Ft, observe that Ft ⊂
Ht, and hence C(Ft) ⊂ C(Ht) and
sup
τ∈C(Ft)
EDk(lm)(Zτ |U) ≤ sup
τ∈C(Ht)
EDk(lm)(Zτ |U).
Note that intuitively this is obvious: it just says that having extra information (that the
event U occurs) can only help in choosing our stopping time τ .
76 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
Recall that E(βt|Ht−1)(ω) denotes the expected value of the game at time t, if we have
so far seen the first t − 1 elements of P and are told whether or not U holds (that is,
whether or not ω ∈ U). I shall prove the following claim by induction on n− t.
Claim. For all ω ∈ U and for all t ∈ [n],
E
(
βt
∣∣Ht−1) (ω) < v(t),
where H0 = {∅, U, U c,Ω} and so E(β1|H0)(ω) = E(β1|U).
Proof of claim. First, observe that if ω ∈ U , then
Zt(ω) ≤ yt and P
(
Zt > 0 |Ht−1
)
(ω) ≤ pt,
by Lemma 3.14 and the definition of U . Moreover, by Lemma 3.14 and (3.11),
E
(
βn
∣∣Hn−1) = 1
n/k
· 1 = k
n
< v(n),
which proves the case n− t = 0.
Now let 1 ≤ t ≤ n − 1 and assume that the result holds for t + 1. Since Ht−1 is
finite, the random variable E(βt|Ht−1) is constant on each of its atoms. Given ω ∈ U , let
A ∈ Ht−1 be the atom containing ω, so that
E
(
βt
∣∣Ht−1) (ω) = E (βt∣∣A)
= P
(
Zt ≥ E
(
βt+1
∣∣Ht) ∣∣∣A) · E(Zt∣∣∣(Zt ≥ E(βt+1|Ht)) ∩ A)
+ P
(
Zt < E
(
βt+1
∣∣Ht) ∣∣∣A) · E(E(βt+1|Ht)∣∣∣(Zt < E(βt+1|Ht)) ∩ A) .
Now,
(
Zt < E(βt+1|Ht)
) ∈ Ht and Ht−1 ⊂ Ht, and so (Zt < E(βt+1|Ht)) ∩ A ∈ Ht. Note
first that, since ω ∈ U and U ∈ Ht−1, it must be the case that A ⊂ U , and second that(
Zt < E(βt+1|Ht)
) ∩ A is non-empty since P (Zt < E(βt+1|Ht)∣∣A) ≥ 1− pt > 0.
For all ω′ ∈ (Zt < E(βt+1|Ht)) ∩ A, by the induction hypothesis,
E
(
βt+1
∣∣Ht) (ω′) < v(t+ 1)
3.3. UPPER BOUND 77
and therefore
E
(
βt+1
∣∣∣(Zt < E(βt+1|Ht)) ∩ A) < v(t+ 1).
Moreover,
E
(
Zt
∣∣∣(Zt ≥ E(βt+1|Ht)) ∩ A) ≤ yt,
since Zt ≤ yt whenever U holds and A ⊂ U . Since yt ≤ v(t+ 1) if and only if t ≤ I, and
P
(
Zt ≥ E(βt+1|Ht)
∣∣A) ≤ P(Zt > 0|A) ≤ pt < 1, it follows that
E
(
βt
∣∣Ht−1) (ω) <

ptyt + (1− pt)v(t+ 1) if t > I,
v(t+ 1) if t ≤ I,
and hence that E
(
βt
∣∣Ht−1) (ω) < v(t), by (3.12). This completes the induction step, and
hence proves the claim. 
The lemma follows from the claim, since, by the backward induction theorem (Theo-
rem 1.1), it follows that
sup
τ∈C(Ht)
EDk(lm)(Zτ |U) = E(β1|U) < v(1) = E(α1) = sup
τ∈C(Gt)
EDk(lm)(Yτ ),
as required. 
In the final lemma before the proof of Theorem 3.13, backward induction is used to
show that an optimal stopping time for the process with the Yt takes the simple form
of rejecting the first klu∗ elements, for some integer u∗, and accepting the next non-zero
pay-off.
Lemma 3.19. For u∗ ∈ {0, . . . ,m− 1}, let
τu∗ =

min {t > klu∗ : Yt > 0} if this exists,
n otherwise.
Then
sup
τ∈C(Gt)
EDk(lm)(Yτ ) = sup
u∗∈{0,...,m−1}
EDk(lm)(Yτu∗ )
Proof. It has already been shown in the proof of Lemma 3.18 that an optimal
strategy takes the form ‘ignore the first I, and accept the next non-zero pay-off.’ In
78 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
fact, I must be a multiple of kl: suppose, for contradiction, that I = klu∗ + r, where
r ∈ {1, . . . , kl − 1}. Then it must be the case that
v(I + 1) ≤ kl
(
s(I + 1) + 1
)
m
=
kl(u∗ + 2)
m
,
since we would be willing to stop at time I + 1, but then
v(I) ≤ max
{
kl(u∗ + 2)
m
, v(I + 1)
}
≤ kl(u
∗ + 2)
m
=
kl(s(I) + 1)
m
,
which is a contradiction since we would not be willing to stop at time I. 
The proof of the main theorem in this section can now be completed.
Proof of Theorem 3.13. All that remains is to calculate and maximize E(Yτu∗ )
over u∗, where τu∗ is the smallest t > klu∗ such that Yt > 0. These calculations are very
similar to those on page 68, but also include error terms which tend to zero as l,m→∞.
Assume first that u∗ → ∞ as m → ∞, as the calculation will show that this is a
valid assumption. Indeed, if t = o(n) then yt = o(1), whereas a probability of success will
be obtained that is separated from zero. We should therefore never accept a pay-off for
t = o(n). In particular, for sufficiently large m it is the case that u∗ ≥ 3, and so
E (Yτu∗ ) =
m−1∑
u=u∗
kl∑
r=1
P
(
Yklu∗+1 = 0, . . . , Yklu+r−1 = 0, Yklu+r > 0
)
· yklu+r.
Recall that the Yt are independent, and that s(klu+h) = u+ 1 when 1 ≤ h ≤ kl. Hence,
using the convention that the empty product takes the value 1, it follows that
E (Yτu∗ ) =
m−1∑
u=u∗
kl∑
r=1
(
u∏
q=u∗+1
(
1− 1
l(q − 2)
)kl)(
1− 1
l(u− 1)
)r−1
· 1
l(u− 1) ·
u+ 2
m
.
The formula for the sum of the geometric progression
(
1− 1
l(u−1)
)r−1
gives
E (Yτu∗ ) =
m−1∑
u=u∗
(
u∏
q=u∗+1
(
1− 1
l(q − 2)
)kl)(
1−
(
1− 1
l(u− 1)
)kl)
· u+ 2
m
,
3.3. UPPER BOUND 79
and multiplying out the term
(
1−
(
1− 1
l(u−1)
)kl)
gives
E (Yτu∗ ) =
m−1∑
u=u∗
(
u∏
q=u∗+1
(
1− 1
l(q − 2)
)kl)
· u+ 2
m
−
m∑
u=u∗+1
(
u∏
q=u∗+1
(
1− 1
l(q − 2)
)kl)
· u+ 1
m
.
Parts of these two sums cancel each other out, leaving
E (Yτu∗ ) =
u∗ + 2
m
−
(
m∏
q=u∗+1
(
1− 1
l(q − 2)
)kl)
· m+ 1
m
+
m−1∑
u=u∗+1
(
u∏
q=u∗+1
(
1− 1
l(q − 2)
)kl)
· 1
m
.
Now, let ε > 0 be arbitrary, choose m = m(ε) and l = l(m, ε) sufficiently large,
and recall that therefore u∗ = u∗(ε) may be chosen to be sufficiently large also. Since(
1− 1
n
)n
< 1
e
<
(
1− 1
n
)n−1
for all n ≥ 2, it is the case that
m∏
q=u∗+1
(
1− 1
l(q − 2)
)kl
≥ exp
(
−kl − 1
l
m∑
q=u∗+1
1
q − 2
)
≥
(
u∗
m
)k
− ε
2
,
and similarly
u∏
q=u∗+1
(
1− 1
l(q − 2)
)kl
≤ exp
(
−k
u∑
q=u∗+1
1
q − 2
)
≤
(
u∗
u
)k
.
Setting p = u
∗
m
, it follows that
E (Yτu∗ ) ≤

p− p+ p log
(
1
p
)
+ ε if k = 1,
p− pk + p
k − 1
(
1− pk−1)+ ε if k > 1.
As before, these expressions are maximized when p = pk, and thus
lim
m→∞
lim
l→∞
E
(
Yτu∗
) ≤ pk + ε.
Since ε > 0 was arbitrary, this completes the proof. 
Putting Corollary 3.8 and Theorem 3.13 together gives Theorem 3.1.
80 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
3.4. Open problems
It seems likely that these theorems can be improved. Firstly, Robert Morris and I [35]
conjectured that Theorem 3.1 can be extended to all posets with k maximal elements.
It is not inconceivable that the algorithm in Corollary 3.8 works on all posets with k
maximal elements rather than just those whose width is also k; if not, it would be good
to find some other algorithm dependent only on k that does so.
Conjecture 3.20. Let (P,≺) be a poset with k maximal elements. Then there is an
algorithm for the secretary problem on (P,≺) that is successful with probability at least
pk, where pk is as defined in (3.1).
Our algorithm, which gives the bound in Theorem 3.2, depends only on the size of the
poset and the number of maximal elements. Our second conjecture was that the latter
piece of information is not needed.
Conjecture 3.21. Let (P,≺) be a poset. Then there is an algorithm for the secretary
problem on (P,≺) that depends only on |P | and is successful with probability at least 1
e
.
In light of the threshold probabilities e−
1
k in Corollary 3.12, we considered the stopping
time
τ =

min
{
t : t > e−
1
mn, where m = |max(Pt)| and t ∈ max(Pt)
}
if this exists,
n otherwise,
and tried to show that this algorithm might be used to prove Conjecture 3.21, but un-
fortunately we did not make any progress with its analysis. Kozik [54] considered this
stopping time independently at the same time, and was able to show that it was successful
with probability strictly greater than 1
4
for sufficiently large posets. Kozik’s proof is based
on the analysis of many cases by computer, and he does not specify how much greater.
However, whereas the bound of 1
4
is best possible for Preater’s algorithm, Kozik’s analysis
is not at all tight and, like Robert Morris and me, he is hopeful that this algorithm might
be successful with the best possible probability of 1
e
.
After I had finished writing this dissertation, Micha l Morayne alerted me to the fact
that, very recently, Freij and Wa¨stlund [29] had proved Conjecture 3.21. Their strategy
3.4. OPEN PROBLEMS 81
and proof of its probability of success are beautifully simple. Firstly, they assume that
the elements are observed at n times uniformly distributed in [0, 1]. Secondly, in addition
to this, they assign a weight uniformly distributed in [0, 1] to each element as it appears,
independently of the others, and define the greedy maximum of a poset (P,≺) as follows.
Let z0 ∈ P be the element of minimal weight. If z0 is not a maximal element, then let
z1 be the element of minimal weight in {x : z0 ≺ x}. Repeat this process, producing a
chain z0 ≺ z1 ≺ z2 ≺ . . ., until a maximal element is reached. This element is the greedy
maximum.
Their strategy is to wait until time 1
e
and then to select the next element that is the
greedy maximum out of what has been seen so far; their analysis of it is surprisingly
brief. Writing Ak for the event that the k
th element observed is the greedy maximum
out of what has been seen so far, they first show that the joint distribution of (Ak)k∈[n] is
independent of the ground poset, and in particular the same as for the single chain. They
then show that the probability that a maximal element is the greedy maximum at time t,
conditioned on it appearing at time t, is equal to the conditional probability that it is the
greedy maximum of the ground poset given that its weight is at most t; this is at least the
probability that it is the greedy maximum of the ground poset. Finally, they carry out
a simple calculation for the conditional probability that a maximal element is accepted
given that it appears at time t, integrate over t and sum over the maximal elements, and
find that with probability at least 1
e
one of them is accepted.
In this chapter, the aim was to select a maximal element; the same problems can be
posed for the minimum rank version instead. In a general poset, the rank of an element x
is most naturally defined as the maximum number of elements in a chain whose minimal
element is x. In light of the results in this chapter, the following questions are appealing:
(1) Is it the case that the single chain is the hardest poset for the minimal expected
rank problem, that is, is it the case that for every poset there is a strategy that
selects an element with expected rank at most about 3.8695?
(2) Is there a strategy that depends only on the size of the poset and its number of
maximal elements, or even just its size, that selects an element of finite expected
rank for every poset where these are given?
82 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET
As with the problems that I suggested at the end of Chapter 2, one could also allow
the interviewer to go back to a previous candidate, who is still available with a certain
probability, and see how this affects the problem.
This chapter has been about unknown posets with a known number of elements.
In Chapter 1, I described a version of the problem where the poset is a chain whose
length comes from a known distribution. Distributions that did not seem too unpleasant
nevertheless forced the expected rank to infinity. What distributions on the number of
elements in an unknown poset admit a strategy that chooses a maximal element with
positive probability, or a strategy that selects an element with finite expected rank?
Part 2
Generalizations of acyclic colourings

CHAPTER 4
Acyclic colourings of graphs
4.1. Colouring problems
Throughout the history of graph theory, colouring problems have been widely studied.
I shall summarize some of the highlights in this chapter, and show how this part of this
dissertation fits into that journey. For a more detailed history and for proofs of these
results, see Chapter V of Bolloba´s [11], whose notation I shall use.
A proper vertex-colouring of a graph G = (V,E) is an assignment of colours to its
vertices such that the end-points of any edge are differently coloured. The smallest number
of colours needed to do this is called the chromatic number χ(G) of G. It is obvious that
any graph with maximum degree d = ∆(G) can be properly coloured with at most d+ 1
colours: if the vertices are coloured with a palette of d+1 colours one by one in any order,
then for each vertex v there is always at least one colour available that has not been used
to colour its neighbours, and so v can be given this colour.
The following theorem, due to Brooks [16], says that one can usually do slightly better.
Theorem 4.1. Let G be a connected graph. Then χ(G) ≤ ∆(G) unless G is a complete
graph or a cycle of odd length, in which case χ(G) = ∆(G) + 1. 
The chromatic number of a graph can also be bounded if it can be drawn on a particular
surface without its edges crossing. The Euler characteristic of a surface S, unfortunately
usually denoted by χ(S), is the invariant
χ(S) = V − E + F,
where V , E and F are the numbers of vertices, edges and faces in a triangulation of S,
that is, a graph drawn on S all of whose faces are triangles. In the following theorem,
due to Heawood [47], a bound is given for surfaces other than the plane or, equivalently,
the sphere, which has Euler characteristic 2.
85
86 4. ACYCLIC COLOURINGS OF GRAPHS
Theorem 4.2. The chromatic number of a graph drawn on a closed surface of Euler
characteristic χ ≤ 1 is at most
h(χ) =
⌊
7 +
√
49− 24χ
2
⌋
.

The function h(χ) is known as the Heawood bound, which was proved to be best
possible for every surface except the Klein bottle by Ringel and Youngs [72]. The Klein
bottle has Euler characteristic 0, but every graph that can be drawn on it has chromatic
number at most 6, rather than 7. The famous four-colour theorem, proved by Appel and
Haken with computational assistance from Koch [7, 8, 9], states that every graph drawn
on the plane can be coloured using at most four colours.
It is just as natural to colour a graph’s edges as its vertices; in a proper edge-colouring
the edges meeting at any one vertex are all coloured differently. This gives rise to the
chromatic index χ′(G) of G, the smallest number of colours needed to do this. In this
case, it is obvious that ∆(G) is a lower bound for χ′(G), since the edges incident to a
vertex of maximum degree must all be differently coloured. What is surprising is that
the best possible upper bound is only one larger, which was proved by Vizing [78] in the
following theorem.
Theorem 4.3. Let G be a graph. Then ∆(G) ≤ χ′(G) ≤ ∆(G) + 1. 
There are of course many more results concerning colourings, and I shall briefly men-
tion some in only one more direction. An L-colouring is a proper vertex-colouring where
the colour of each vertex v is chosen from a given list L(v) of possible colours for that
vertex. The list-chromatic number χl(G) is the minimum k such that G can be L-coloured
for any L where |L(v)| ≥ k for all v. It is clearly the case that χl(G) ≥ χ(G) and one
might expect always to have equality, since it seems most restrictive when the L(v) are
all the same. This turns out not to be true; Voigt [79] proved that there are some planar
graphs with list-chromatic number greater than 4. However, Thomassen [76] proved that
the list chromatic number of planar graphs is at most 5.
4.2. FROM NASH-WILLIAMS TO ACYCLIC COLOURINGS 87
The list-chromatic index χ′l(G) is the corresponding concept for edge colourings, and
again it must be the case that χ′l(G) ≥ χ′(G). It is conjectured (by whom first is not
known) that χ′l(G) = χ
′(G) for all graphs G. Galvin [31] proved the conjecture for
bipartite graphs and Kahn [48] proved that χ′l(G) =
(
1 + o(1)
)
χ′(G) as χ′(G)→∞. The
full conjecture is still open.
4.2. From Nash-Williams to acyclic colourings
Perhaps surprisingly, given their name, acyclic colourings did not evolve as generali-
zations of proper colourings; their history, which I shall outline in this section, is more
closely related to the ‘acyclic’ part of the term. An acyclic graph is often called a forest,
and I shall use the terms interchangeably.
Suppose that a graph G is the union of k forests F1 ∪ . . . ∪ Fk. For all subgraphs
H ⊂ G and for all i, it must be the case that
∣∣E(Fi ∩H)∣∣ ≤ |V (H)| − 1,
as otherwise Fi would contain a cycle somewhere in H. Summing over the forests, it is
therefore the case that
|E(H)| ≤ k(|V (H)| − 1).
A fundamental theorem of Nash-Williams [65] (whose proof relies on his work in an earlier
paper [64]) asserts that this necessary condition is also sufficient.
Theorem 4.4. Let G be a graph. Suppose that every subgraph H ⊂ G satisfies
|E(H)| ≤ k(|V (H)| − 1), (4.1)
for some integer k. Then G can be decomposed into at most k forests. 
The minimum value of k such that G can be decomposed into at most k forests, or
equivalently such that G satisfies (4.1), is called the arboricity of G.
This concept was generalized to point-arboricity by Chartrand, Geller and Hedet-
niemi [19], and formally defined and further explored by Chartrand, Kronk and Wall [20].
The point-arboricity of a graph is the minimum number of colours needed to colour the
88 4. ACYCLIC COLOURINGS OF GRAPHS
vertices (not necessarily properly) so that the graph induced by each colour class is a
forest, that is, is acyclic.






















@
@
@ 
 
 
PPPPPPPPP
B
B
B
B
B
B
B
B
B
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T 
 
 
 
 
  @
@
@
@
@
@s
2
s1
s
3
s3 or 4
s2
s
1
Figure 4.1. This graph can be 4-coloured in two different ways, shown on
this figure up to relabelling of the colours. Neither of them is acyclic, since
the vertices coloured 1 and 2 form a cycle.
Acyclic colourings were introduced as a natural next step by Gru¨nbaum [44]. A proper
vertex-colouring of a graph is acyclic if every graph induced by the union of two colour
classes is acyclic, that is, there is no two-coloured cycle. The minimum number of colours
needed to do this is the graph’s acyclic chromatic number. Whereas the point-arboricity
of a graph is clearly at most the chromatic number, since an independent set is certainly
acyclic, the acyclic chromatic number must be at least as large. Gru¨nbaum exhibited
simple planar graphs with acyclic colouring number 5, for example the octahedron (see
Figure 4.1), and conjectured that the acyclic chromatic number of all planar graphs is
at most 5. Gru¨nbaum himself proved that it was at most 9; this was improved to 8 by
Mitchem [58], to 7 by Albertson and Berman [2] and to 6 by Kostochka [50]. Finally,
the full conjecture was proved by Borodin [13].
4.3. Variants
In this section, I shall demonstrate how many traditional colouring problems, as des-
cribed in Section 4.1, have been transferred to the acyclic arena. I shall also describe the
natural extensions of these problems that I shall consider in this part of this dissertation.
Kostochka [51] proved that it is an NP-complete problem to decide for given G and k
4.3. VARIANTS 89
if the acyclic chromatic number of G is at most k; thus, it makes sense to try to find
sufficient conditions for it to be small.
Given their introduction, it is unsurprising that acyclic chromatic numbers were next
bounded for graphs drawn on surfaces other than the plane. The genus g(S) of a surface
S is the maximum number of closed curves along which S can be cut without being
disconnected or, informally, the number of handles that need to be added to a sphere to
obtain S. The plane has genus 0 and, for any surface S, it is the case that χ(S) = 2−2g(S).
A graph is said to be of genus g if the smallest genus of a surface on which it can be
drawn without its edges crossing is g. Albertson and Berman [1] instigated this study
and proved the following theorem [3].
Theorem 4.5. Any graph of genus g > 0 can be acyclically coloured with 4g + 4
colours.
When bounding the acyclic chromatic number, the shortest cycles are critical, since
these are less likely to receive three colours in a proper colouring. The girth of a graph is
the length of its shortest cycle. Borodin, Kostochka and Woodall [15] proved that planar
graphs of girth at least 5 have acyclic chromatic number at most 4 and that those of girth
at least 7 have acyclic chromatic number at most 3.
A bound for the acyclic chromatic number of a graph as a function of its maximum
degree was found by Alon, McDiarmid and Reed [4]. In the following theorem, A(d) is the
maximum possible value of the acyclic chromatic number of a graph of maximum degree
d.
Theorem 4.6.
Ω
(
d
4
3
(log d)
1
3
)
≤ A(d) ≤ O
(
d
4
3
)
.
My results are generalizations of this one, so it is worth outlining its proof. The upper
bound is proved by colouring a graph of maximum degree d randomly and then using
the Erdo˝s-Lova´sz local lemma. This essentially says that if we have a collection of bad
events, each of which happens with small probability and is independent of most other bad
events, then with positive probability none of them happens. I shall state this precisely
in Section 4.5. For acyclic colourings, the most obvious bad events are,
90 4. ACYCLIC COLOURINGS OF GRAPHS
(1) for each edge, that the vertices that it connects are the same colour, and,
(2) for each even cycle, that its vertices are coloured with alternating colours.
There is no need to consider odd cycles explicitly, since in a proper colouring they will
receive at least three colours anyway.
These are not quite the events considered, for two reasons. Firstly, considering cycles
of all lengths makes it harder to find a useful bound for the number of other bad events that
each bad event might depend on. Secondly, if two vertices u and v are opposite vertices
in many 4-cycles, then by avoiding bad events of the second type we effectively end up
insisting that u and v are differently coloured too many times to get best possible bounds.
The first problem is easily dealt with; 4-cycles and paths of length 4 are considered, since
longer cycles contain a path of length 4. The second problem is overcome by calling u and
v ‘special’ if they have many common neighbours and are therefore in many 4-cycles and,
for each special pair u, v, introducing the bad event that u and v are the same colour. If
these bad events do not occur then properly coloured 4-cycles containing a special pair
receive at least three colours, and there remain only 4-cycles not containing a special pair.
The final type of bad event is that such 4-cycles are coloured with alternating colours,
and each vertex cannot be in too many of them without being in a special pair.
They proved that the upper bound is best possible up to a logarithmic factor by
considering a random graph Gn,p on n vertices with each edge present with probability
p independently of the others. They showed that if p ≥ 3 ( logn
n
) 1
4 then, however the
vertices are coloured with at most n
2
colours, the probability that every 4-cycle receives
at least three colours is so small that the probability that any colouring has every 4-cycle
receiving at least three colours is less than one. At the same time, for p = 3
(
logn
n
) 1
4 , with
high probability the maximum degree of the graph d satisfies d < 2pn = 6n
3
4 (log n)
1
4 , and
therefore a graph exists with maximum degree d that cannot be acyclically coloured with
fewer than n
2
= Ω
(
d
4
3
(log d)
1
3
)
colours.
In terms of precise evaluation of A(d), it is obvious that A(1) = 2 and A(2) = 3. It is
known that A(3) = 4 (mentioned without proof by Gru¨nbaum [44]; see also Skulrattana-
kulchai [73]), A(4) = 5 (Burnstein [17]), A(5) ≤ 8 and A(6) ≤ 12 (Kothapalli, Varagani,
4.3. VARIANTS 91
Venkaiah and Yadav [52, 53]). No lower bounds better than d+ 1 are known for d = 5, 6,
for which the example is the complete graph Kd+1.
In the same paper as earlier, Alon, McDiarmid and Reed [4] defined the edge acyclic
chromatic number A′(G) and its maximum A′(d) for graphs of maximum degree d in the
obvious way. They proved that A′(d) ≤ 64d, without trying to optimize the constant.
When quoting this result in a later paper, Molloy and Reed [59] showed that the same
proof can easily be modified to give a bound of 16d. Alon, Sudakov and Zaks [6] conjec-
tured that in fact A′(d) ≤ d+2. They proved that A′(Gn,d) ≤ d+2 for almost all random
d-regular graphs Gn,d and showed that there exists c > 0 such that A
′(G) ≤ d + 2 for
all graphs G with ∆(G) = d and girth at least cd log d. They also commented that this
conjecture would be best possible, by considering acyclic edge colourings of the complete
graph K2n. The largest that any one colour class can be is n, when it is a perfect mat-
ching, and no two colours classes can have more than 2n − 1 edges without containing
a cycle. This means that the largest colour class has size at most n and all the others
at most n − 1, so there are at least 2n + 1 colour classes. Nesˇetrˇil and Wormald [66]
improved the bound for almost all random d-regular graphs to d+ 1.
As with the vertex version, further improvements have been made for graphs of large
girth. Muthu, Narayanan and Subramanian [63] improved the bound to 6d for graphs
with girth at least 9 and to 4.52d for graphs of girth at least 220.
Gerke, Greenhill and Wormald [37] defined the r-acyclic edge chromatic number to be
the minimum number of colours in a proper edge-colouring in which each cycle receives
at least r colours. They showed that (r− 2)d is asymptotically almost surely (as d→∞)
a bound for the r-acyclic edge chromatic number of a random d-regular graph.
List colourings have natural acyclic analogues, which I shall not define formally as
they are obvious. Borodin, Fon-Der Flaass, Kostochka, Raspaud and Sopena [14] proved
that the list-acyclic chromatic index of a planar graph is at most 7.
When acyclic colourings are considered in isolation, rather than as a generalization
of arboricity, there is no particular reason why the subgraphs chosen to be subject to
extra conditions should be cycles. Again in the same paper as before, Alon, McDiarmid
and Reed [4] discussed Pk-free colourings; these are proper vertex colourings such that
92 4. ACYCLIC COLOURINGS OF GRAPHS
no path with k vertices is 2-coloured. They pointed out that, since there are at most
d+ d(d− 1) = d2 vertices within distance 2 of any vertex, a greedy algorithm can be used
to colour a graph in a P3-free fashion with at most d
2 + 1 colours. They also commented
that a modification of their work in that paper proves that there is always a Pk-free
colouring for k ≥ 4 with at most O
(
d
k−1
k−2
)
colours.
4.4. Generalized acyclic colourings
The problems that I shall consider in this part of this dissertation are extensions
of Theorem 4.6, which gave a bound for A(d), the maximum possible acyclic chromatic
number of a graph of maximum degree d. There are many possibilities for generalizations,
and these are just some of them. I shall introduce some more general versions of acyclic
colourings, which will be formally defined in Section 4.5.
As described in Section 4.3, since it is the shortest cycles that are most likely to receive
only two colours in a proper colouring, several authors have imposed a minimum girth
condition on the underlying graph to be coloured. My approach is slightly different; I
shall continue to allow all graphs of maximum degree d, but only require long cycles to
receive at least three colours. Specifically, I shall introduce the length-l-acyclic chromatic
number, where cycles of length at least l must be 3-coloured. Since an odd cycle is
already 3-coloured in a proper colouring, it is only necessary to consider length-2m-acyclic
colourings for integers m.
Recall from Section 4.3 that the r-acyclic edge chromatic number was defined to be
the minimum number of colours in a proper edge-colouring in which each cycle receives at
least r colours. I shall do the same for the vertex-colouring version, but to avoid ambiguity
with the length-l-acyclic chromatic number, I shall call it the c-colour acyclic chromatic
number. This is a natural concept to consider; for point-arboricity the defining condition
is that every cycle receives at least two colours, for an acyclic colouring it is that every
cycle receives at least three colours. What happens if we demand that every cycle receives
at least c colours? Point-arboricity corresponds to the 2-colour acyclic chromatic number
and the usual acyclic chromatic number to the 3-colour acyclic chromatic number.
4.5. DEFINITIONS AND A USEFUL TOOL 93
During my oral examination, I discovered that my bounds on the c-colour acyclic
chromatic number (Section 5.3) had already been proved by Greenhill and Pikhurko [43],
along with corresponding results for edge-colourings.
I also mentioned Pk-free colourings in Section 4.3, which are an example of a variation
on the theme of acyclic colourings in which the subgraphs that must receive extra colours
are not cycles. One way to think of the acyclic chromatic number as a generalization
of the chromatic number is as follows. In a proper colouring, every 1-regular connected
subgraph (edge) receives at least two colours. In an acyclic colouring, every 2-regular
connected subgraph (cycle) receives at least three colours. The next step is to ask what
happens if every r-regular connected subgraph must receive at least three colours.
However, this seems quite restrictive, as there are not likely to be many r-regular
connected subgraphs. An alternative, related approach is to require subgraphs with mi-
nimum degree at least r to receive at least three colours. This is still a natural thing
to do: a proper colouring is one in which every subgraph with minimum degree at least
1 receives at least two colours, and an acyclic colouring is one in which every subgraph
with minimum degree at least 2 receives at least three colours. I shall define the degree-r
chromatic number of a graph to be the minimum number of colours needed to colour the
graph properly in such a way that this happens. The methods that I shall use work just
as well for r-regular subgraphs; I have chosen to proceed in this way only because it seems
more interesting.
This definition can be easily extended to hypergraphs; here, a degree-r colouring of
a u-uniform hypergraph is a colouring such that every edge is multicoloured and every
subhypergraph with minimum degree at least r receives at least u+ 1 colours. This is the
first case to consider since every subhypergraph with at least one edge already receives at
least u colours in a multicolouring.
4.5. Definitions and a useful tool
In the next two chapters, I shall bound the following quantities for (hyper)graphs with
maximum degree d. These bounds will be asymptotically best possible up to constant or
logarithmic factors. In Chapter 5, I shall bound the two that concern cycles in graphs:
the length-l-acyclic chromatic number and the c-colour chromatic number. In Chapter 6,
94 4. ACYCLIC COLOURINGS OF GRAPHS
I shall bound the degree-r chromatic number of a u-uniform hypergraph. This is best
separated into two cases: where u = 2, that is, for simple graphs, and where u ≥ 3.
Note that the maxima in the following definitions all exist; trivial upper bounds are
d2 + 1, dc−1 + 1 and
(
(u− 1)d)2 + 1 colours, sufficient to colour vertices differently from
those at distance at most 2, c− 1 and 2 away.
The length-l-acyclic chromatic number of a graph. A colouring f : V (G)→ [x]
is a length-l-acyclic colouring if it is a proper colouring and every cycle of length at least
l receives at least three colours. For a graph G and an integer l, let
A(l)(G) = min{x : G can be length-l-acyclically coloured using x colours}.
For integers l, d, let
A(l)(d) = max{A(l)(G) : ∆(G) = d}.
Note that a length-l-acyclic colouring is also a length-(l+1)-acyclic colouring, soA(l+1)(G) ≤
A(l)(G) and A(l+1)(d) ≤ A(l)(d). Also, since a cycle of odd length must receive three
colours in a proper colouring, it must be the case that A(2m−1)(G) = A(2m)(G) and
A(2m−1)(d) = A(2m)(d) for any integer m.
The c-colour acyclic chromatic number of a graph. A colouring f : V (G)→ [x]
is a c-colour acyclic colouring if it is a proper colouring, every cycle of length less than
c is multicoloured and every cycle of length at least c receives at least c colours. For a
graph G and an integer c, let
Ac(G) = min{x : G can be c-acyclically coloured using x colours}.
For integers c, d, let
Ac(d) = max{Ac(G) : ∆(G) = d}.
Note that a (c+1)-colour acyclic colouring is also a c-colour acyclic colouring, so Ac(G) ≤
Ac+1(G) and Ac(d) ≤ Ac+1(d).
The degree-r chromatic number of a u-uniform hypergraph. Let H be a u-
uniform hypergraph. A colouring f : V (H) → [x] is a degree-r colouring if every edge is
multicoloured and every subhypergraph K with δ(K) ≥ r receives at least u+ 1 colours.
4.5. DEFINITIONS AND A USEFUL TOOL 95
For a u-uniform hypergraph H and an integer r, let
D(u)r (H) = min{x : H can be degree-r coloured using x colours}.
For integers r, d, let
D(u)r (d) = max{D(u)r (H) : ∆(H) = d and H is a u-uniform hypergraph}.
I shall bound D
(u)
r separately when u = 2 and when u ≥ 3.
It is not easy to guess what the upper bounds should be, even knowing that they should
include Theorem 4.6. In fact, in the case of c-colour acyclic colourings, this theorem is in
fact an exceptional case and so unhelpful in providing intuition.
As with most results in this area, the proofs of these bounds will all use random
colourings and the Erdo˝s-Lova´sz local lemma [25], in its nonsymmetric form (see page 64
of Alon and Spencer [5]).
Lemma 4.7. Let A1, . . . , An be events in an arbitrary probability space. Let the graph
G = (V,E) with V = [n] be a dependency graph for the events Ai, that is, assume that
for each i, Ai is independent of the family of events {Aj : ij /∈ E}. If there are reals
0 < yi < 1 such that for all i
P(Ai) ≤ yi
∏
ij∈E
(1− yj),
then
P
(⋂
i
Aci
)
≥
n∏
i=1
(1− yi) > 0,
so that with positive probability no event Ai occurs.
The events Ai are commonly called bad events.
To prove that the upper bounds that I shall give are asymptotically best possible up
to constant or logarithmic factors, I shall give explicit or probabilistic constructions. In
the probabilistic constructions, the number of colours needed will be proportional to the
number of vertices, so I shall need to bound this in terms of the maximum degree of a
random graph. Bolloba´s [12] (page 65, Corollary 3.4) proved the following precise result
96 4. ACYCLIC COLOURINGS OF GRAPHS
about the maximum degree ∆ of a random graph Gn,p with n vertices and with each edge
present with probability p = p(n) independent of the others, and with q = 1− p.
Theorem 4.8. Suppose pqn
(logn)3
→∞ and y is a fixed real number. Then the maximal
degree ∆ of Gn,p satisfies
lim
n→∞
P
[
∆ < pn+ (2pqn log n)
1
2
(
1− log log n
4 log n
− log(2pi
1
2 )
2 log n
+
y
2 log n
)]
= e−e
−y
.
I shall use only the trivial consequence that, under these conditions, asymptotically
almost surely the maximum degree satisfies ∆ < 2pn. In fact, I shall only ever use
probabilities p of a particular form, and the lemma that I shall apply is as follows.
Lemma 4.9. Let α ≥ 0, 0 < β < 1 and C ≥ 1
2
be real numbers, and let
p = C
(log n)α
nβ
.
Then asymptotically almost surely the maximal degree ∆ of Gn,p satisfies
n >
(
∆
2C
) 1
1−β(
1
1−β log ∆
) α
1−β
. (4.2)
Proof. The bounds on β mean that Theorem 4.8 can be applied, and so asymptoti-
cally almost surely
∆ < 2Cn1−β(log n)α,
from which it follows that
n >
(
∆
2C
) 1
1−β
(log n)
α
1−β
. (4.3)
Either the bound in (4.2) holds, in which case there is nothing to prove, or it does not
hold, in which case
log n ≤ 1
1− β log ∆−
α
1− β log log ∆−
1
1− β log
(
2C
(1− β)α
)
≤ 1
1− β log ∆,
and (4.2) follows from (4.3). 
CHAPTER 5
Long acyclic colourings and acyclic colourings with many
colours
5.1. Introduction
I have grouped the two results in this chapter together as they both involve genera-
lizations of the acyclic chromatic number in a way that still relates to cycles in graphs,
which is not the case with the results in the next chapter. Alon, McDiarmid and Reed [4]
showed that a graph with maximum degree d can be acyclically coloured with O(d
4
3 )
colours. In this chapter, I shall answer the following questions:
(1) How many colours do we need if only cycles of length at least l must receive at
least three colours (Section 5.2)?
(2) How many colours do we need if cycles must receive at least c colours (Sec-
tion 5.3)?
In Section 5.4, I shall suggest some natural open problems following on from these.
5.2. The length-l-acyclic chromatic number
Recall that the length-l-acyclic chromatic number is the minimum number of colours
needed to colour a graph properly so that each cycle of length at least l receives at least
three colours. Recall also that, in a proper colouring, cycles of odd length will automati-
cally receive at least three colours. In this section, I shall bound the maximum possible
length-l-acyclic chromatic number of graphs of a given maximum degree d, denoted by
A(l)(d).
Theorem 5.1. For all integers d,m ≥ 2, it is the case that(
d
6
) 2m
2m−1
m
(
2m
2m−1 log d
) 1
2m−1
< A(2m)(d) = A(2m−1)(d) < 14(2m+ 1)d
2m
2m−1 .
This has the following immediate corollary.
97
98 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
Corollary 5.2. For a fixed integer m ≥ 2, it is the case that
Ω
(
d
2m
2m−1
(log d)
1
2m−1
)
≤ A(2m)(d) = A(2m−1)(d) ≤ O
(
d
2m
2m−1
)
as d→∞. 
I shall begin by proving the lower bound.
Theorem 5.3. For n sufficiently large there is a graph G on n vertices with maximum
degree ∆(G) = d satisfying
n
m
≥
(
d
6
) 2m
2m−1
m
(
2m
2m−1 log d
) 1
2m−1
= Ω
(
d
2m
2m−1
(log d)
1
2m−1
)
such that there is no length-2m-acyclic colouring with at most n
m
colours.
Proof. This proof is a generalization of one by Alon, McDiarmid and Reed [4].
Let
p = 3
(
log n
n
) 1
2m
,
let n be large and divisible by m2, and let G ∈ Gn,p.
Suppose that f : V (G)→ [ n
m
]
is a colouring that uses at most n
m
colours. From each
class, discard as few vertices as possible to make its size divisible by m. Note that at most
m− 1 vertices are discarded from each class, so at most (m−1)n
m
vertices are discarded in
total and there remain at least n
m
vertices. Since the size of each colour class is now
divisible by m, at least n
m2
distinct monochromatic m-tuples can be taken from these
vertices. Choose n
m2
of these and call them T1, . . . , T n
m2
.











Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Z
Zs
s
s
s
s
s
s
s
t
(m)
i
t
(2)
i
t
(1)
i
t
(m)
j
t
(2)
j
t
(1)
j
Figure 5.1. A 2m-cycle of the form t
(1)
i t
(1)
j t
(2)
i t
(2)
j . . . t
(k)
i t
(k)
j .
5.2. THE LENGTH-l-ACYCLIC CHROMATIC NUMBER 99
For each m-tuple Ti, fix an ordering of the vertices t
(1)
i , . . . , t
(m)
i . Let Ci,j be the event
that G contains the cycle t
(1)
i t
(1)
j t
(2)
i t
(2)
j . . . t
(m)
i t
(m)
j (see Figure 5.1). If any event Ci,j occurs
then this colouring cannot be a length-2m-acyclic colouring of G. Let Af be the event
that f is a length-2m-acyclic colouring. Then
P(Af ) ≤ P
(⋂
i<j
Cci,j
)
=
(
1− p2m)(n/m22 ) ,
since the events Ci,j are independent. Writing E for the event that G can be length-2m-
acyclically coloured with at most n
m
colours,
P(E) ≤
∑
f :V (G)→[ nm ]
P(Af )
≤ nn (1− p2m)(n/m22 )
≤ exp
(
n log n−
(
n/m2
2
)
p2m
)
= o(1),
which means that there exists a graph G that cannot be length-2m-acyclically coloured
with at most n
m
colours, as required. By Lemma 4.9, the graph G can be chosen so that,
in addition, the bound in the theorem is satisfied. 
I shall now prove the upper bound.
Theorem 5.4. Let G be a graph with ∆(G) = d and let m ≥ 2 be an integer. Then
it is possible to colour G length-2m-acyclically with at most 14(2m+ 1)d
2m
2m−1 = O(d
2m
2m−1 )
colours.
Proof. In this and the next chapter, there will be several proofs that use the Erdo˝s-
Lova´sz local lemma. I shall apply the method in more detail in this proof than in later
ones.
Call a pair of vertices {u, v} special if u and v have at least d 2m−22m−1 common neighbours.
Let f : V (G)→ [x] be a random function where each vertex receives a colour from [x]
uniformly at random.
100 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
The proof uses the Erdo˝s-Lova´sz local lemma. The bad events to be considered are
as follows:
(1) For each edge uv, let Wuv = {f(u) = f(v)}.
(2) For each special pair {u, v}, let Xu,v = {f(u) = f(v)}.
(3) For each 2m-cycle C = v0v1 . . . v2m−1 containing no special pairs of vertices dis-
tance 2 apart, where v0, v1, . . . , v2m−1 are distinct vertices, let
YC =
{
f(v0) = f(v2) = . . . = f(v2m−2) and f(v1) = f(v3) = . . . = f(v2m−1)
}
.
(4) For each path P = v0v1 . . . v2m of length 2m, where v0, v1, . . . , v2m are distinct
vertices, let
ZP =
{
f(v0) = f(v2) = . . . = f(v2m) and f(v1) = f(v3) = . . . = f(v2m−1)
}
.
If none of the events Wuv occurs then f is a proper colouring. The only way that a
cycle of even length can be 2-coloured in a proper colouring is if its colours alternate, and
so if in addition none of the events Xu,v and YC occurs then every cycle of length 2m
receives at least three colours whether or not it contains special pairs distance 2 apart.
If it is also the case that none of the events ZP occurs then every cycle of length at least
2m+ 2 receives at least three colours.
Claim 1. Each vertex v is in at most
(1) d edges,
(2) d
2m
2m−1 special pairs,
(3) d
2m(2m−2)
2m−1 cycles of length 2m not containing any special pairs of vertices distance
2 apart, and
(4) (m+ 1)d2m paths of length 2m.
Proof of claim. (1) By supposition, the maximum degree ∆(G) = d.
(2) Since ∆(G) = d, the vertex v has at most d neighbours, each of which has at
most d neighbours, and so there are at most d2 paths of length 2 starting at v.
By definition, for each special pair that v is in there are at least d
2m−2
2m−1 paths of
5.2. THE LENGTH-l-ACYCLIC CHROMATIC NUMBER 101
length 2 starting at v and so v is in at most
d2
d
2m−2
2m−1
= d
4m−2−(2m−2)
2m−1 = d
2m
2m−1
special pairs.
(3) Without loss of generality, assume that v plays the role of v0 in the cycle C.
Since ∆(G) = d, there are at most d choices for v1, for each of these there are
most d choices for v2 and so on, and so there at most d
2m−2 choices for the
path v0 . . . v2m−2. Since {v0, v2m−2} is not a special pair, there are at most d
2m−2
2m−1
choices for their common neighbour v2m−1 and so in total v is in at most
d2m−2+
2m−2
2m−1 = d
2m(2m−2)
2m−1
cycles of length 2m not containing any special pairs of vertices distance 2 apart.
(4) Without loss of generality, assume that v plays the role of one of v0, v1, . . . , vm in
P = v0v1 . . . v2m, which gives m + 1 possibilities. For each of these there are at
most d2m paths P , which means that v is in at most (m+ 1)d2m paths of length
2m.

By multiplying the number of vertices in a pair, cycle or path by the number of edges,
special pairs, cycles without special pairs and paths that each vertex can be in, the number
of edges in the dependency graph from an event of each type to events of each type is at
most that shown in the table below.
to
Wu,v Xu,v YC ZP
Wu,v 2d d
2m
2m−1 2d
2m(2m−2)
2m−1 2(m+ 1)d2m
from Xu,v 2d 2d
2m
2m−1 2d
2m(2m−2)
2m−1 2(m+ 1)d2m
YC 2md 2md
2m
2m−1 2md
2m(2m−2)
2m−1 2m(m+ 1)d2m
ZP (2m+ 1)d (2m+ 1)d
2m
2m−1 (2m+ 1)d
2m(2m−2)
2m−1 (2m+ 1)(m+ 1)d2m
Claim 2. The probabilities of the events of each type are given by
(1) P(Wu,v) = 1x ,
(2) P(Xu,v) = 1x ,
102 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
(3) P(YC) = 1x2m−2 , and
(4) P(ZP ) = 1x2m−1 .
Proof of claim. All four statements follow from the fact that f colours vertices
uniformly at random with colours taken from [x]. 
I shall now apply the Erdo˝s-Lova´sz local lemma (Lemma 4.7). The weightings used
are double the probabilities, that is, let
y1 = 2P(Wu,v) =
2
x
,
y2 = 2P(Xu,v) =
2
x
,
y3 = 2P(YC) =
2
x2m−2
and
y4 = 2P(ZP ) =
2
x2m−1
,
and the four statements that need to be proved are
P(Wu,v) ≤ 2P(Wu,v)
(
1− 2
x
)2d(
1− 2
x
)2d 2m2m−1
(
1− 2
x2m−2
)2d 2m(2m−2)2m−1 (
1− 2
x2m−1
)2(m+1)d2m
,
P(Xu,v) ≤ 2P(Xu,v)
(
1− 2
x
)2d(
1− 2
x
)2d 2m2m−1
(
1− 2
x2m−2
)2d 2m(2m−2)2m−1 (
1− 2
x2m−1
)2(m+1)d2m
,
P(YC) ≤ 2P(YC)
(
1− 2
x
)2md(
1− 2
x
)2md 2m2m−1
(
1− 2
x2m−2
)2md 2m(2m−2)2m−1 (
1− 2
x2m−1
)2m(m+1)d2m
, and
P(ZP ) ≤ 2P(ZP )
(
1− 2
x
)(2m+1)d(
1− 2
x
)(2m+1)d 2m2m−1
(
1− 2
x2m−2
)(2m+1)d 2m(2m−2)2m−1 (
1− 2
x2m−1
)(2m+1)(m+1)d2m
.
5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 103
Since m ≥ 2, it suffices to show the fourth statement, that is,
(
1− 2
x
)(2m+1)d(
1− 2
x
)(2m+1)d 2m2m−1 (
1− 2
x2m−2
)(2m+1)d 2m(2m−2)2m−1
(
1− 2
x2m−1
)(2m+1)(m+1)d2m
≥ 1
2
.
It suffices to choose x such that each factor is at least 6
7
, since
(
6
7
)4
= 1296
2401
> 1
2
, and it
therefore suffices to choose x such that
x ≥ max
{
14(2m+ 1)d, 14(2m+ 1)d
2m
2m−1 , 2m−2
√
14(2m+ 1)d
2m
2m−1 ,
2m−1
√
14(2m+ 1)(m+ 1)d
2m
2m−1
}
,
for which it suffices to take x = 14(2m+ 1)d
2m
2m−1 . 
5.3. The c-colour acyclic chromatic number
The results in this section had already been proved by Greenhill and Pikhurko [43],
which I did not know until my oral examination.
Recall that the c-colour acyclic chromatic number is the minimum number of colours
needed to colour a graph properly so that each cycle receives at least c colours. In this
section, I shall bound the maximum possible c-colour acyclic chromatic number of graphs
of a given maximum degree d, denoted by Ac(d).
Theorem 5.5. For all integers d, k ≥ 2, it is the case that
max
{(
d
k
)k
,
(
d
4
)k
2k log d
}
< A2k(d) ≤ A2k+1(d) < 6(k + 2)(2k)k+1dk.
This has the following immediate corollary.
Corollary 5.6. For a fixed integer k ≥ 2, it is the case that
A2k(d), A2k+1(d) = Θ(d
k)
as d→∞. 
104 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
It is interesting that this bound does not include the bound on the 3-colour acyclic
chromatic number of Alon, McDiarmid and Reed [4]. It will become clearer why this is
the case once Theorem 5.11 has been proved; I shall say more at the time.
For c = 4, a probabilistic lower bound can be found in a similar way to Theorem 5.3.
Theorem 5.7. For n sufficiently large there is a graph G on n vertices with maximum
degree ∆(G) = d satisfying
n
2
≥ d
2
400 log d
such that there is no 4-colour acyclic colouring with at most n
2
colours.
Proof. Let
p = 5
√
log n
n
,
let n be large and divisible by 4, and let G ∈ Gn,p.
Suppose that f : V (G)→ [n
2
]
is a colouring that uses at most n
2
colours. By discarding
at most one vertex from each colour class, at least n
4
monochromatic pairs can be found.
Let P be a set of n
4
of these pairs and split the remaining singletons into two sets S1 and
S2 each of size
n
4
.
s
s
s
s
s
s
s
s
s
s
s
s
S1 P S2
Figure 5.2. The type of 4-cycle that is avoided.
If any pair in P has a common neighbour in S1 and a common neighbour in S2 then
f cannot be 4-colour acyclic (see Figure 5.2). For each pair pi ∈ P let Npi,j be the event
that the pair pi has a common neighbour in Sj and let Af be the event that f is a 4-colour
5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 105
acyclic colouring. Then
P(Af ) ≤ P
(⋂
pi∈P
(
N cpi,1 ∪N cpi,2
))
≤
(
2
(
1− p2)n4 )n4 ,
since the events Npi,j are independent. Writing E for the event that G can be 4-colour
acyclically coloured with at most n
2
colours,
P(E) ≤
∑
f :V (G)→[n2 ]
P(Af )
≤ nn
(
2
(
1− p2)n4 )n4
≤ exp
(
n log n+
n
4
log 2− p
2n2
16
)
= o(1),
which means that there exists a graph G that cannot be 4-colour acyclically coloured with
at most n
2
colours, as required. By Lemma 4.9, the graph G can be chosen so that, in
addition, the bound in the theorem is satisfied. 
For c > 4, this does not seem easy to generalize. A sensible alternative approach to
finding a lower bound is to find a graph of small maximum degree such that any two
vertices are in a cycle of length at most c, so that all vertices must be coloured differently.
One way of doing this is to join two graphs of small diameter together. The most obvious
graph with small diameter for a given maximum degree is a tree.
s
s s s
s s s s s s
s s s s s s s s s s s s




Z
Z
Z
Z


B
B
B
B
B
B
B
B
B


















E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
Figure 5.3. The tree Td,h for d = h = 3.
Let Td,h be the tree where all vertices except the leaves have degree d and the leaves
are all distance h from a fixed root vertex (see Figure 5.3). Let T1 and T2 be copies of
106 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
Td,b k−12 c and form G by taking T1 and T2 and adding edges between vertices v1 ∈ T1 and
v2 ∈ T2 if they are copies of the same vertex v ∈ Td,b k−12 c. Then the number of vertices
in G is
2
(
1 + d+ d(d− 1) + . . .+ d(d− 1)b k−12 c−1
)
= 2 + 2d
(
(d− 1)b k−12 c − 1
d− 2
)
>
(
d
2
)b k−12 c
= Ω
(
db k−12 c
)
,
and any two vertices of G are in a cycle of length at most 2k. Therefore, G cannot
be 2k-acyclically coloured (or (2k + 1)-acyclically coloured) with less than Ω
(
db k−12 c
)
colours.
This is a long way from the lower bound of Ω
(
dk
log d
)
. Using probabilistic techniques,
Bolloba´s [10] (see also page 259 of Bolloba´s [12], Theorem 10.10) found graphs of small
diameter with much smaller maximum degree:
Theorem 5.8. Let h be a positive constant, k = k(n) ≥ 2 a natural number, and
define p = p(n, h, k), 0 < p < 1, by
pknk−1 = log
(
n2
h
)
.
Suppose that pn
(logn)3
→∞. Then for G ∈ Gn,p we have
lim
n→∞
P
(
diam(G) = k
)
= e−
h
2 and lim
n→∞
P
(
diam(G) = k + 1
)
= 1− e−h2 .

Using this, I can obtain a lower bound that is best possible up to logarithmic factors.
Theorem 5.9. For n sufficiently large there is a graph G on n vertices with maximum
degree ∆(G) = d satisfying
n ≥ d
k
22k+1k log d
= Ω
(
dk
log d
)
such that any two vertices are in a cycle of length at most 2k. In particular, a 2k- or
(2k + 1)-colour acyclic colouring of such a graph must use at least n colours.
5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 107
Proof. Set
p =
(
2 log n
nk−1
) 1
k
,
which corresponds to h = 1 in Theorem 5.8, and q = 2p. Let Gn,q ∈ Gn,q and obtain G1
and G2 by putting each edge of Gn,q in G1 or G2 uniformly at random independently of
all other edges. Since q = 2p, this coupling means that G1 ∈ Gn,p and G2 ∈ Gn,p, but G1
and G2 are disjoint. By Theorem 5.8,
lim
n→∞
P
(
diam(G1) = k
)
= lim
n→∞
P
(
diam(G2) = k
)
= e−
1
2 ≈ 0.6065.
Take n sufficiently large that P
(
diam(G1) = k
)
= P
(
diam(G2) = k
)
> 3
5
and so
P
((
diam(G1) = k
) ∧ (diam(G2) = k)) > 1
5
.
Since this probability is greater than zero, this means that there is a graph G with any
two vertices in a cycle of length at most 2k. By Lemma 4.9, the graph G can be chosen
so that, in addition, the bound in the theorem is satisfied. 
In fact, an explicit construction does even better and gives the best possible lower
bound up to a constant factor.
Theorem 5.10. For n sufficiently large there is a graph G on n vertices with maximum
degree ∆(G) = d satisfying
n >
(
d
k
)k
= Ω(dk),
such that any two non-adjacent vertices are in a cycle of length at most 2k (and so a 2k-
or (2k + 1)-colour acyclic colouring must use at least n colours).
Proof. Let G = (V,E) be defined as follows. For some large positive integer b, let
V = [b]k and let uv ∈ E if u and v differ in exactly one co-ordinate. Then G is a k(b− 1)-
regular graph. For all non-adjacent u, v, there is a cycle of length at most 2k containing u
and v as follows. Start with u and then, working from left to right, for each digit ui where
ui 6= vi, change it to vi, and then do the same to get back to u from v (see Figure 5.4). All
vertices are therefore adjacent or in a cycle of length at most 2k, and must be differently
coloured in a 2k- or (2k + 1)-colour acyclic colouring.
108 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
(1, 2, 3,2, 1)
(3, 2, 3, 2, 1)
(3, 2,2, 2, 1)
(3, 2, 2,3, 1)
(1, 2, 2, 3, 1)
(1, 2,3, 3, 1)
Figure 5.4. A 6-cycle for b = 3, k = 5, u = (1, 2, 3, 2, 1) and v =
(3, 2, 2, 3, 1), with the new digit in each k-tuple in bold type.
Setting d = k(b− 1), this is a graph G with n = |V (G)| and ∆(G) = d satisfying
n =
(
d
k
+ 1
)k
,
and the result follows. 
For fixed k, the probabilistic bound is worse as d → ∞, but it is better for small d,
that is, for d up to about
exp
((
k
4
)k−1
8
)
.
To prove an upper bound of the same order of magnitude as the explicit construction,
I shall again use the Erdo˝s-Lova´sz local lemma.
Theorem 5.11. Let G be a graph with ∆(G) = d and let k ≥ 2 be an integer. Then
it is possible to colour G (2k + 1)-colour acyclically (and therefore 2k-colour acyclically)
with at most 6(k + 2)(2k)k+1dk = O(dk) colours.
Proof. If d = 1, then G contains only isolated edges and only two colours are needed,
so assume that d ≥ 2.
Call a pair of vertices {u, v} special if there are at least d paths of length k + 1 from
u to v.
Let f : V (G)→ [x] be a random function where each vertex receives a colour from [x]
uniformly at random.
The proof uses the Erdo˝s-Lova´sz local lemma. The bad events to be considered are
as follows:
(1) For each u, v with 1 ≤ d(u, v) ≤ k, let Wu,v = {f(u) = f(v)}.
5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 109
(2) For each special pair u, v, let Xu,v = {f(u) = f(v)}.
(3) For each (2k+2)-cycle C = v0v1 . . . v2k+1 containing a non-special pair of opposite
vertices, where v0, v1, . . . , v2k+1 are distinct vertices, let YC be the event that C
receives at most 2k colours.
(4) For each path P = v0v1 . . . v2k+2, where v0, v1, . . . , v2k+2 are distinct vertices, let
ZP be the event that P receives at most 2k colours.
If none of the events of type Wu,v occurs then f is a proper colouring and every cycle
of length at most 2k+1 is multicoloured. In this case, the only vertices in a (2k+2)-cycle
that can be the same colour are pairs of opposite vertices, so if in addition none of the
events Xu,v and YC occurs then every cycle of length 2k+2 receives at least 2k+1 colours,
whether or not it contains a non-special pair of opposite vertices. If none of the events
ZP occurs then every cycle of length at least 2k + 3 receives at least 2k + 1 colours.
Claim 1. Each vertex v is in at most
(1) 2dk pairs of vertices distance at most k apart,
(2) dk special pairs,
(3) (k + 1)dk+2 cycles containing a non-special pair of opposite vertices, and
(4) (k + 2)d2k+2 paths of length 2k + 2.
Proof of claim. (1) By the same argument as in the proof of Theorem 5.4,
there are at most d vertices distance 1 from v, at most d2 vertices distance 2 from
v and so on, and so at most
d+ d2 + . . .+ dk =
d
d− 1(d
k − 1) < 2dk
vertices distance at most k from v, the inequality holding since d ≥ 2.
(2) There are at most dk+1 paths of length k + 1 starting at v, and each special pair
that v is in contributes d such paths, so v is in at most
dk+1
d
= dk
special pairs.
110 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
s
s
s
s
s
s
s
s
?
R
6
R
 
 
 
 
 
 
 
 	
v
a1
w1
a2
w2
non-special pair
Figure 5.5. A cycle with a non-special opposite pair.
(3) The aim is to count the (2k + 2)-cycles containing v and a non-special pair of
opposite vertices in groups depending on where the closest non-special pair to v
is. Moving clockwise from v, let w1 be the first vertex in a non-special pair and
let a1 be its distance from v, and let w2 be the vertex opposite w1 at distance a2
anticlockwise from v, so that a1 + a2 = k + 1 (see Figure 5.5). It is possible that
w1 = v and a1 = 0. There are k+ 1 choices for a1 and a2. Given a1 and a2, there
are at most da1 paths of length a1 and d
a2 paths of length a2 starting at v, and
so at most
da1+a2 = dk+1
choices for the arc from w1 to w2 containing v. Since {w1, w2} is a non-special
pair, there are at most d choices for the arc from w1 to w2 not containing v. In
total, there are therefore at most
(k + 1)dk+1d = (k + 1)dk+2
(2k + 2)-cycles containing v and a non-special pair of opposite vertices.
(4) Without loss of generality, assume that v plays the role of one of v0, v1, . . . , vk+1
in P = v0v1 . . . v2k+2, which gives k + 2 possibilities. For each of these there are
at most d2k+2 paths P , which means that v is in at most (k + 2)d2k+2 paths of
length 2k + 2.

5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 111
By multiplying the number of vertices in a pair, cycle or path by the number of paths
of length at most k, special pairs, cycles with a non-special pair and paths that each
vertex can be in, the number of edges in the dependency graph from an event of each
type to events of each type is at most that shown in the table below.
to
Wu,v Xu,v YC ZP
Wu,v 4d
k 2dk 2(k + 1)dk+2 2(k + 2)d2k+2
from Xu,v 4d
k 2dk 2(k + 1)dk+2 2(k + 2)d2k+2
YC 4(k + 1)d
k 2(k + 1)dk 2(k + 1)2dk+2 2(k + 1)(k + 2)d2k+2
ZP 2(2k + 3)d
k (2k + 3)dk (2k + 3)(k + 1)dk+2 (2k + 3)(k + 2)d2k+2
Claim 2. The probabilities of the events of each type satisfy the following equalities
and inequalities:
(1) P(Wu,v) = 1x ,
(2) P(Xu,v) = 1x ,
(3) P(YC) ≤ (2k)2k+2x2 , and
(4) P(ZP ) ≤ (2k)2k+3x3 .
Proof of claim. (1) This is true since f is a uniformly random colouring with
colours taken from [x].
(2) See (1).
(3) For each S ⊂ [x] with |S| = 2k, let Y SC be the event that C is coloured with
colours taken from S, so that
YC =
⋃
S⊂[x]
|S|=2k
Y SC .
(However, this is not a partition, since if a colouring uses strictly fewer than 2k
colours, then more than one event Y SC holds.) Given a choice of S, each of the
2k + 2 vertices of C must receive one of the colours in S, which means that
P
(
Y SC
)
=
(
2k
x
)2k+2
,
112 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS
and hence
P(YC) ≤
∑
S⊂[x]
|S|=2k
P
(
Y SC
) ≤ ( x
2k
)(
2k
x
)2k+2
≤ x2k
(
2k
x
)2k+2
=
(2k)2k+2
x2
.
(4) As in (3),
P(ZP ) ≤
(
x
2k
)(
2k
x
)2k+3
≤ (2k)
2k+3
x3
.

As in the proof of Theorem 5.4, the weightings used are double the probabilities or
bounds for the probabilities given in Claim 2; it is enough to prove the condition in the
Erdo˝s-Lova´sz local lemma for the events ZP , that is,
(
1− 2
x
)2(2k+3)dk (
1− 2
x
)(2k+3)dk (
1− 2(2k)
2k+2
x2
)(2k+3)(k+1)dk+2
(
1− 2(2k)
2k+3
x3
)(2k+3)(k+2)d2k+2
≥ 1
2
.
It suffices to find x such that all four factors are at least 6
7
, for which any
x ≥ max
{
28(2k + 3)dk, 14(2k + 3)dk,
√
14(2k + 3)(k + 1)(2k)k+1d
k
2
+1,
3
√
14(2k)2k+3(2k + 3)(k + 2)d
2k+2
3
}
will do. Since k ≥ 2, it is enough to take x = 6(k+ 2)(2k)k+1dk (or even x = 28(2k+ 3)dk
once d is reasonably large). 
When choosing the order of magnitude of x at the end of the proof, the critical
condition was that the colouring was a distance-k colouring, not that special pairs and
(2k + 2)-cycles were coloured in a particular way. This was because there was plenty
of room to manoeuvre when choosing the threshold function for a pair to be considered
special.
Indeed, suppose that a pair of vertices were called special if there are least ds paths of
length k+1 between them. Then each vertex would be in at most O(dk+1−s) special pairs
and at most O(dk+1+s) (2k + 2)-cycles containing a non-special pair of opposite vertices.
Since the probability that a special pair is monochromatic is 1
x
and the probability that a
5.4. OPEN PROBLEMS 113
(2k+ 2)-cycle is coloured with at most 2k colours is O( 1
x2
), the number of colours needed
to avoid these events is O(dk+1−s) and O(d
k+1+s
2 ). Since k ≥ 2, these are both O(dk) for
any choice of s provided that 1 ≤ s ≤ k − 1.
In the case of 3-colour acyclic colouring studied by Alon, McDiarmid and Reed [4],
k = 1 and so this is not possible. Instead, s is chosen so that k + 1− s = k+1+s
2
, that is,
s = k+1
3
= 2
3
, and the number of colours needed is O(d
4
3 ) rather than O(d).
5.4. Open problems
In the results in this chapter, I have found the orders of magnitude of both the length-
l-acyclic chromatic number (up to a logarithmic factor) and the c-colour acyclic chromatic
number of a graph as functions of its maximum degree, assuming that l or c is fixed. As
a result, the bounds are nowhere near best possible as functions of l or c. It would be
worth trying to improve these and to remove the logarithmic factor in the lower bound
for the length-l-acyclic chromatic number.
The two concepts in this chapter can be combined to give a c-colour length-l-acyclic
colouring. If l ≥ c then this is a proper colouring such that every cycle of length at least
l receives at least c colours. If l < c then it is a proper colouring such that every cycle
of length at least l but less than c is multicoloured and every cycle of length at least c
receives at least c colours.
In this chapter I have proved results for c = 3 and general l, and for l = 3 and general
c. What happens for general c and l?

CHAPTER 6
Small minimum degree colourings of graphs and hypergraphs
6.1. Introduction
Recall that a degree-r colouring of a u-uniform hypergraph is a colouring such that
every edge is multicoloured and every subhypergraph with minimum degree at least r
receives at least u + 1 colours. In this chapter, I shall bound the maximum possible
degree-r chromatic number for u-uniform hypergraphs of a given maximum degree d,
denoted by D
(u)
r (d). The bounds are of a different nature when u = 2 and when u ≥ 3, so
I shall treat them separately. In Section 6.2, I shall bound D
(2)
r (d), which for convenience
I shall denote simply by Dr(d), and in Section 6.3 I shall bound D
(u)
r (d) for u ≥ 3. In
Section 6.4, I shall suggest some natural open problems following on from these.
6.2. The degree-r chromatic number of a graph
The main theorem that I shall prove in this section is the following.
Theorem 6.1. For integers r, d ≥ 2, it is the case that
(
d
6
) r2
r2−1
r
(
r2
r2−1 log d
) 1
r2−1
< Dr(d) < 2
r3+r+1r4d
r2
r2−1 .
This has the following immediate corollary.
Corollary 6.2. For a fixed integer r ≥ 2, it is the case that
Ω
 d r2r2−1
(log d)
1
r2−1
 ≤ Dr(d) ≤ O(d r2r2−1)
as d→∞. 
The probabilistic method for finding a lower bound is by now familiar from Theo-
rems 5.3, 5.7 and 5.9.
115
116 6. SMALL MINIMUM DEGREE COLOURINGS OF GRAPHS AND HYPERGRAPHS
Theorem 6.3. For n sufficiently large there is a graph G on n vertices with maximum
degree ∆(G) = d satisfying
n
r
≥
(
d
6
) r2
r2−1
r
(
r2
r2−1 log d
) 1
r2−1
= Ω
 d r2r2−1
(log d)
1
r2−1

such that there is no degree-r colouring with at most n
r
colours.
Proof. Let
p = 3
(
log n
n
) 1
r2
,
let n be large and divisible by r2 and let G ∈ Gn,p.
Suppose that f : V (G)→ [n
r
]
is a colouring that uses at most n
r
colours. By discarding
at most r − 1 vertices from each colour class, at least n
r2
monochromatic r-tuples can be
found. Choose n
r2
of these and call them T1, . . . , T n
r2
.
Let Ki,j be the event that the graph induced by the vertices in Ti ∪ Tj contains the
complete bipartite graph with vertex classes Ti and Tj. If any event Ki,j occurs then this
colouring cannot be a degree-r colouring of G. Let Af be the event that f is a degree-r
colouring. Then
P(Af ) ≤ P
(⋂
i<j
Kci,j
)
≤
(
1− pr2
)(n/r22 )
,
since the events Ki,j are independent. Writing E for the event that G can be degree-r
coloured with at most n
r
colours,
P(E) ≤
∑
f :V (G)→[nr ]
P(Af )
≤ nn
(
1− pr2
)(n/r22 )
≤ exp
(
n log n−
(
n/r2
2
)
pr
2
)
= o(1),
6.2. THE DEGREE-r CHROMATIC NUMBER OF A GRAPH 117
which means that there exists a graph G that cannot be degree-r coloured with at most
n
r
colours, as required. By Lemma 4.9, the graph G can be chosen so that, in addition,
the bound in the theorem is satisfied. 
Theorem 6.4. Let G be a graph with ∆(G) = d and let r ≥ 2 be an integer. Then it
is possible to degree-r colour G with at most 2r
3+r+1r4d
r2
r2−1 = O
(
d
r2
r2−1
)
colours.
Proof. Call the tree on r+1 vertices, r of which are leaves, the r-claw (see Figure 6.1);
call the r-tuple {v1, . . . , vr} special if it is the set of leaves of at least d rr+1 r-claws, that
is, if there are at least d
r
r+1 vertices adjacent to all of v1, . . . , vr.
s
s s s s s s
 
 
 
 
 







D
D
D
D
D
T
T
T
T
T
@
@
@
@
@
Figure 6.1. The 6-claw.
Let f : V (G)→ [x] be a random function where each vertex receives a colour from [x]
uniformly at random.
The proof uses the Erdo˝s-Lova´sz local lemma. The bad events to be considered are
as follows:
(1) For each edge uv, let Wuv = {f(u) = f(v)}.
(2) For each special r-tuple v1, . . . , vr, let Xv1,...,vr be the event that f(v1) = . . . =
f(vr).
(3) For each connected bipartite subgraph B ⊂ G with δ(B) ≥ r and |V (B)| = t ≤ r2
not containing a special r-tuple, let Y tB be the event that B receives at most two
colours.
(4) For each connected bipartite subgraph B ⊂ G with |V (B)| = r2 + 1, let ZB be
the event that B receives at most two colours.
If none of the events of type Wuv occurs then f is a proper colouring and every non-
bipartite subgraph receives at least three colours. If in addition none of the events Xv1,...,vr
and Y tB occurs then every bipartite subgraph with minimum degree at least r on at most
r2 vertices receives at least three colours. If none of the events ZB occurs then every
118 6. SMALL MINIMUM DEGREE COLOURINGS OF GRAPHS AND HYPERGRAPHS
bipartite subgraph with minimum degree at least r on at least r2 + 1 vertices receives at
least three colours.
Claim 1. Each vertex is in at most
(1) d edges,
(2) d
r2
r+1 special r-tuples,
(3) t2
t2
2 dt−
2r
r+1 connected bipartite subgraphs B with δ(B) ≥ r and |V (B)| = t ≤ r2
not containing a special r-tuple, and
(4) (r2 + 1)2r
4
dr
2
bipartite subgraphs B with |V (B)| = r2 + 1.
Proof of claim. (1) The maximum degree ∆(G) = d.
(2) The vertex v is a leaf in a most dr r-claws: there are at most d choices for the
root and then at most d choices for each of the other r − 1 leaves. Each special
r-tuple that v is in provides at least d
r
r+1 r-claws, and so v is in at most
dr
d
r
r+1
= d
r2+r−r
r+1 = d
r2
r+1
special r-tuples.
(3) There are at most
2(
t
2) ≤ 2 t
2
2
graphs on t vertices, so certainly at most that many connected bipartite graphs
B on t vertices with δ(B) ≥ r. For each vertex w ∈ V (B), I shall show that v is
in at most dt−
2r
r+1 copies of B in the position of w, which is enough since there
are at most t choices for w.
In the bipartite graph B, let V1 be the smaller vertex class and let V2 be the
larger. (If they are equally sized then either labelling will do.) Let s1 = |V1| and
s2 = |V2|. I shall first construct a set U ⊂ V2 with |U | ≤ s2 − r + 1 such that
w ∈ V1 ∪ U and the graph induced by the vertices in V1 ∪ U is connected (see
Figure 6.2).
If w ∈ V2 then let u1 = w; otherwise, let u1 be any neighbour of w, so that in
either case u1 ∈ V2. Let X1 = Γ(u1), so that w ∈ {u1} ∪X1 and |X1| ≥ r. Since
B is a connected graph, there exists some u2 ∈ V2 with neighbours in both X1
6.2. THE DEGREE-r CHROMATIC NUMBER OF A GRAPH 119
ss
ss
ss
ss
ss
ss((((((((((((
























((
((
((
ppp ppp ppp ppp ppp ppp
6
?
6
?
≥ r
≤ s1 − r
U
V2 \ U
V1 V2
Figure 6.2. A set U ⊂ V2 such that the graph induced by the vertices in
V1 ∪ U is connected, and |U | ≤ s2 − r + 1 and |V2 \ U | ≥ r − 1.
and V1 \X1. Let X2 = X1 ∪ Γ(u2). Since u2 has neighbours in V1 \X1, it is the
case that |X2| ≥ r + 1, and since u2 has neighbours in X1, the graph induced by
the vertices in {u1, u2} ∪X2 is connected.
Repeat this process, at each step picking a vertex ui+1 with neighbours in
both Xi and V1 \ Xi and then defining Xi+1 = Xi ∪ Γ(ui+1). By the choice of
ui+1, it is always the case that |Xi+1| ≥ r + i and that the graph induced by
{u1, . . . , ui+1} ∪Xi+1 is connected.
The process terminates after j steps when Xj = V1. Let U = {u1, . . . , uj}.
Since s2 ≥ s1 = |V1| = |Xj| ≥ r+ j− 1, the inequality |U | = j ≤ s2− r+ 1 holds,
as required.
Given w, there are at most ds1−1+j ways of choosing the s1− 1 + j vertices in(
V1 ∪ U
) \ {w}. Since all the vertices in V1 have now been chosen and there are
no special r-tuples, there are then at most
(
d
r
r+1
)s2−j
ways of picking the s2 − j
vertices in V2 \ U , which gives a total of at most
ds1−1+j+
(s2−j)r
r+1 = d
r(s1+s2)+s1+j−r−1
r+1
ways. Since j ≤ s2 − r + 1, this gives at most
d
r(s1+s2)+s1+s2−2r
r+1 = d
(r+1)t−2r
r+1 = dt−
2r
r+1
ways, as required.
(4) There are at most
2(
r2+1
2 ) ≤ 2r4
120 6. SMALL MINIMUM DEGREE COLOURINGS OF GRAPHS AND HYPERGRAPHS
graphs on r2 + 1 vertices, so certainly at most that many connected bipartite
graphs B on r2 + 1 vertices. For each such B, there are at most r2 + 1 choices
for where to put v, up to isomorphism. Working through the remaining vertices
one by one, each time picking a vertex that is adjacent to one of those already
considered, there are then at most dr
2
choices for the other r2 vertices, which
means that v is in at most (r2 + 1)2r
4
dr
2
such graphs.

By multiplying the number of vertices in an edge, r-tuple or bipartite graph by the
number of edges, special r-tuples, and bipartite graphs that each vertex can be in, the
number of edges in the dependency graph from an event of each type to events of each
type is at most that shown in the table below.
to
Wuv Xv1,...,vr Y
t
B ZB
Wuv 2d 2d
r2
r+1 t2
t2
2
+1dt−
2r
r+1 (r2 + 1)2r
4+1dr
2
from Xv1,...,vr rd rd
r2
r+1 rt2
t2
2 dt−
2r
r+1 r(r2 + 1)2r
4
dr
2
Y tB r
2d r2d
r2
r+1 r2t2
t2
2 dt−
2r
r+1 r2(r2 + 1)2r
4
dr
2
ZB (r
2 + 1)d (r2 + 1)d
r2
r+1 (r2 + 1)t2
t2
2 dt−
2r
r+1 (r2 + 1)22r
4
dr
2
Claim 2. The probabilities of the events of each type satisfy the following equalities
and inequalities:
(1) P(Wuv) = 1x ,
(2) P(Xv1,...,vr) = 1xr−1 ,
(3) P(Y tB) ≤ 2
t
xt−2 , and
(4) P(ZB) ≤ 2r
2
xr2−1
.
Proof of claim. This is straightforward, and proved in the same way as Claim 2
in the proof of Theorem 5.11. 
As in the proofs of Theorems 5.4 and 5.11, the weightings used are double the pro-
babilities or bounds for the probabilities given in Claim 2, and it is enough to prove the
6.3. THE DEGREE-r CHROMATIC NUMBER OF A HYPERGRAPH 121
condition in the Erdo˝s-Lova´sz local lemma for the events ZB, that is,
(
1− 2
x
)(r2+1)d(
1− 2
xr−1
)(r2+1)d r2r+1 r2∏
t=2r
(
1− 2
t+1
xt−2
)(r2+1)t2 t22 dt− 2rr+1
(
1− 2
r2+1
xr2−1
)(r2+1)22r4dr2
≥ 1
2
.
Since there are at most r2 factors in this product, it suffices to find x such that each factor
is at least 1 − 1
2r2
. The power of d in the order of magnitude of x necessitated by each
term (
1− 2
t+1
xt−2
)(r2+1)t2 t22 dt− 2rr+1
is
t− 2r
r+1
t−2 . This is maximized in the range 2r ≤ t ≤ r2 when t = 2r, when it is equal to
2r − 2r
r+1
2r − 2 =
2r2 + 2r − 2r
2(r − 1)(r + 1) =
r2
r2 − 1 .
It therefore suffices to choose any
x ≥ max
{
4r2(r2 + 1)d, r−1
√
4r2(r2 + 1)d
r2
r2−1 ,
2r−2
√
2
r4
2
+r2+2r4(r2 + 1)r2r3+2d
r2
r2−1 ,
r2−1
√
2r4+r2+2r2(r2 + 1)2d
r2
r2−1
}
.
Since r ≥ 2, it is enough to take x = 2r3+r+1r4d r
2
r2−1 . 
6.3. The degree-r chromatic number of a hypergraph
There are (at least) two possible definitions of what constitutes a proper vertex-
colouring of a uniform hypergraph: one where each edge must be multicoloured and
one where each edge cannot be monochromatic. I shall consider the first case, and prove
that degree-r colouring a u-uniform hypergraph for u ≥ 3 is essentially no harder than en-
suring that all of its edges are multicoloured. Colouring the vertices greedily in any order
shows that the vertices of a u-uniform hypergraph H with maximum degree ∆(H) = d
can be coloured with at most (u − 1)d + 1 colours, and I shall show that a number of
colours linear in d is sufficient to degree-2 colour and therefore to degree-r colour H for
all r.
122 6. SMALL MINIMUM DEGREE COLOURINGS OF GRAPHS AND HYPERGRAPHS
Recall that a degree-r colouring of a u-uniform hypergraph is a colouring such that
every edge is multicoloured and every subhypergraph with minimum degree at least r
receives at least u + 1 colours. In this section, I shall bound the maximum possible
degree-r chromatic number of hypergraphs of a given maximum degree d, denoted by
D
(u)
r (d).
Theorem 6.5. For integers u ≥ 3, r ≥ 2 and d ≥ 2, it is the case that(
1− 1
u
)
d < D(u)r (d) ≤ D(u)2 (d) < 3duu.
This has the following immediate corollary.
Corollary 6.6. For fixed integers u ≥ 3, r ≥ 2, it is the case that
D(u)r (d) = Θ(d)
as d→∞. 
s
s
s
s
s s
s s
Figure 6.3. The desired hypergraph is formed by superimposing one copy
of this hypergraph for each vertex.
For every edge to be multicoloured, it might be necessary to use at least
(
1− 1
u
)
d
colours. Indeed, for any m ≥ 2, let H be a u-uniform hypergraph with V (H) = [m(u−1)],
for 0 ≤ j ≤ m− 1 let
Ij =
{
j(u− 1) + 1, j(u− 1) + 2, . . . , (j + 1)(u− 1)}
6.3. THE DEGREE-r CHROMATIC NUMBER OF A HYPERGRAPH 123
and let
E(H) =
{{i} ∪ Ij : i /∈ Ij}
(see Figure 6.3). Then each vertex i is in an edge for each of the m − 1 intervals Ij not
containing i and in an edge for each of the (m − 1)(u − 1) vertices not in the interval
containing i. Therefore,
d = m− 1 + (m− 1)(u− 1) = (m− 1)u
and every pair of vertices is in at least one common edge, so the number of colours needed
to ensure that every edge is multicoloured is
(u− 1)m = (m− 1)(u− 1) + u− 1 =
(
1− 1
u
)
d+ u− 1.
The proof of the upper bound using the Erdo˝s-Lova´sz local lemma is easier than those
up to this point, in the sense that there are only two types of bad event.
Theorem 6.7. Let H be a u-uniform hypergraph with ∆(G) = d. Then it is possible
to degree-2 colour H with at most 3duu colours.
Proof. Let f : V (G)→ [x] be a random function where each vertex receives a colour
from [x] uniformly at random.
The proof uses the Erdo˝s-Lova´sz local lemma. The bad events to be considered are
as follows:
(1) For all v1, v2 with v1 and v2 in some common edge, let Wv1,v2 = {f(v1) = f(v2)}.
(2) For every pair of edges e1, e2 with 1 ≤ |e1 ∩ e2| ≤ u − 2, let Xe1,e2 be the event
that the two edges receive the same multiset of colours.
If none of the events Wv1,v2 occurs then f is a multicolouring and every non-u-partite
subhypergraph receives at least u+ 1 colours. In the claim below, I shall show that every
connected u-partite u-uniform hypergraph with minimum degree at least 2 contains two
edges e1, e2 with 1 ≤ |e1 ∩ e2| ≤ u− 2. Therefore, if in addition none of the events Xe1,e2
occurs then every subhypergraph with minimum degree at least 2 receives at least u + 1
colours.
124 6. SMALL MINIMUM DEGREE COLOURINGS OF GRAPHS AND HYPERGRAPHS
Claim 1. Let K be a connected u-partite u-uniform hypergraph with δ(K) ≥ 2, where
u ≥ 3. Then there exist e1, e2 ∈ E(K) such that 1 ≤ |e1 ∩ e2| ≤ u− 2.
Proof of claim. Suppose that there are no two edges e1, e2 ∈ E(K) such that
1 ≤ |e1 ∩ e2| ≤ u − 2. Then for any two distinct edges e1 6= e2, either e1 ∩ e2 = ∅ or
|e1 ∩ e2| = u− 1.
Pick any edge e1 ∈ E(K). Since K is connected, there is an edge e2 ∈ E(K) with
|e1 ∩ e2| = u − 1. Let v1 be the vertex in e1 \ e2, let v2 be the vertex in e2 \ e1 and let
S = e1 ∩ e2. Since K is u-partite, v1 and v2 cannot be in any common edge.
s s s
s s
v1 v2 v3
S
Figure 6.4. The condition on the sizes of the intersections forces K to
have this structure, which contradicts δ(K) ≥ 2.
There must be at least three edges, as otherwise d(v1) = d(v2) = 1, which contradicts
δ(K) ≥ 2. Thus there is a third edge e3 that intersects at least one of e1 and e2, so
without loss of generality |e1 ∩ e3| = u− 1. Since u < 2(u− 1) for u ≥ 3, this means that
e2 ∩ e3 6= ∅ and so |e2 ∩ e3| = u− 1. Since |e1 ∩ e3| = |e2 ∩ e3|, the edge e3 must contain
both v1 and v2 or neither. The former is not possible as K is u-partite, so it must be that
e1 ∩ e2 = e1 ∩ e3 = e2 ∩ e3 = S, and e3 = S ∪ {v3} for some v3 (see Figure 6.4).
Since K is connected, the argument can be repeated and must eventually include all
edges, so that each edge ei = S ∪ {vi} for some vertex vi. But then the degree d(vi) = 1
for all i, which contradicts δ(K) ≥ 2. 
Claim 2. Each vertex v is in at most
(1) (u− 1)d pairs of vertices in a common edge, and
(2) ud2 pairs of edges e1, e2 with 1 ≤ |e1 ∩ e2| ≤ u− 2.
6.3. THE DEGREE-r CHROMATIC NUMBER OF A HYPERGRAPH 125
Proof of claim. (1) Since ∆(H) = d, the vertex v is in at most d edges, each
of which contains u−1 more vertices (not necessarily different from those in other
edges).
(2) Since ∆(H) = d, the vertex v is in at most
(
d
2
)
< d2 pairs of edges whose
intersection contains v. By choosing one of the edges containing v, one of the
other vertices in that edge and then one of the other edges containing that vertex,
the vertex v is in at most (u− 1)d2 pairs of intersecting edges whose intersection
does not contain v.

Note that the bounds on |e1 ∩ e2| were not needed to obtain the bound in (2); this
condition is in the statement of the claim only as a reminder of the type of event being
considered.
By multiplying the number of vertices in a pair of vertices or edges by the number
of pairs of vertices and edges that each vertex can be in, the number of edges in the
dependency graph from an event of each type to events of each type is at most that
shown in the table below.
to
Wv1,v2 Xe1,e2
from Wv1,v2 2(u− 1)d 2ud2
Xe1,e2 (2u− 1)(u− 1)d (2u− 1)ud2
Claim 3. The probabilities of the events of each type satisfy the following equality and
inequality:
(1) P(Wv1,v2) = 1x and
(2) P(Xe1,e2) ≤ (u−1)!x2 .
Proof of claim. (1) This is true since f is a uniformly random colouring with
colours taken from [x].
(2) Let q = |e1 ∩ e2|, so that 1 ≤ q ≤ u− 2. For each bijection pi : e1 \ e2 → e2 \ e1,
let
Api =
{
f(v) = f
(
pi(v)
)
for all v ∈ e1 \ e2
}
126 6. SMALL MINIMUM DEGREE COLOURINGS OF GRAPHS AND HYPERGRAPHS
be the event that the colourings of the edges e1 and e2 match up in this order,
and let E be the event that the colourings of the edges e1 and e2 are the same in
any order, so that
E =
⋃
pi:e1\e2→e2\e1
Api.
(However, this is not a partition, since if f is not a multicolouring, then more
than one event Api might hold.) Then
P(E) ≤
∑
pi:e1\e2→e2\e1
P
(
Api
)
=
(u− q)!
x(u−q)
≤ (u− 1)!
x2
.

As in the proofs of Theorems 5.4, 5.11 and 6.4, the weightings used are double the
probabilities or bounds for the probabilities given in Claim 2, and it is enough to prove
the condition in the Erdo˝s-Lova´sz local lemma for the events Xe1,e2 , that is,(
1− 2
x
)(2u−1)(u−1)d(
1− 2(u− 1)!
x2
)(2u−1)ud2
≥ 1
2
.
It suffices to find x for which both factors are at least 3
4
, for which any
x ≥ max
{
8(2u− 1)(u− 1)d,
√
8(u− 1)!(2u− 1)ud
}
will do. Since u ≥ 3, it is enough to take x = 3duu. 
6.4. Open problems
As in Chapter 5, I have only tried to find the order of magnitude of the degree-r
chromatic numbers of graphs and u-uniform hypergraphs as functions of their maximum
degree d, and it would be nice to have tighter bounds as functions of r and u as well.
An alternative way of describing a degree-r colouring is that any 2-coloured subgraph
has minimum degree at most r. What if the graph induced by any two classes has
maximum degree at most r?
Over the course of the last two chapters, I have introduced an analogue of the acyclic
chromatic number by varying the following one at a time:
6.4. OPEN PROBLEMS 127
(1) the minimum length l of the cycles that we care about (or more generally the
minimum order l of the subgraphs we care about),
(2) the minimum number of colours c that these subgraphs should receive,
(3) the minimum degree r of these subgraphs, and
(4) the number of vertices u in an edge.
In addition to these, for u ≥ 3 one could relax the condition from every edge being
multicoloured to no edge being monochromatic.
The ultimate goal is a result including all five of these factors, but for now even a
result taking more than one into account would constitute progress. As well as uniting
some or all of these, one could merge them with many of the ideas from Chapter 4. For
example, there are obvious list and edge analogues. Finding exact best possible bounds
for the various quantities as functions of the graph’s maximum degree d for small d would
also be good.
Beyond that, there is still plenty of scope for varying the problems posed; in this part
of this dissertation, I have only attempted to bound them as functions of the graph’s
maximum degree. As with the original work on acyclic colourings, it would be interesting
to find bounds for graphs drawn on different surfaces, for graphs with large girth, and for
random graphs, for example.

Conclusion
In this dissertation, I proved eight main results, four in each part.
Part 1: Problems of optimal choice on posets
In Chapter 2, I found an optimal strategy for the secretary problem when the candi-
dates arem pairs of identical twins, its probability of success and the asymptotic behaviour
of both of these.
Result 1 (Theorem 2.1). For m ∈ N, let
km = min
{
k :
2m
k
+
m−1∑
j=k
1
j
≤ 5
}
.
An optimal strategy for the secretary problem on m pairs of identical twins is to wait until
candidates have been seen from at least km of the pairs and then to pick the next candidate
who is the best so far and whose twin has already been seen. Asymptotically,
lim
m→∞
km
m
=
1
x0
≈ 0.4709,
where x0 is the unique solution to 2x+ log x = 5, and the probability of success tends to
1
x0
+
4(x0 − 1)2
3x0
((
x0
x0 − 1
) 1
2
− 1
)
≈ 0.7680.
For its extension to c-tuplets, I found an optimal strategy and its probability of success,
and bounded the behaviour of these.
Result 2 (Theorem 2.2). For c,m ∈ N with c ≥ 2, let
k(c)m = min
{
k :
m−k∑
j=1
[(
m−j−1
k−1
)(
m−1
k−1
) j∏
i=2
(
1− 1(ci
c
))] ≤ 1} .
An optimal strategy for the secretary problem on m sets of identical c-tuplets is to wait
until candidates have been seen from at least k
(c)
m of the c-tuples and then to pick the next
129
130 CONCLUSION
candidate who is the best so far and all of whose c-tuplets have already been seen. For all
m, (
1
2
− c+ 1
2(2c+1(c− 1)− c− 1)
)
m < k(c)m ≤
⌈m
2
⌉
,
and the probability of success is at least
1− c+ 1
2c(c− 1) .
In Chapter 3, I found a strategy for the secretary problem on a poset where the number
of elements n and the number of which are maximal k are both known and the width
is equal to k, and showed that it gave the best possible probability of success for this
problem.
Result 3 (Theorem 3.1). Let (P,≺) be a poset with k maximal elements and of width
k. Then there is an algorithm for the secretary problem on (P,≺) depending only on |P |
and k that is successful with probability at least pk, where
pk =

1
e
if k = 1,
k−1
√
1
k
if k > 1,
and these are the best possible such bounds.
For a poset whose width is not necessarily equal to its number of maximal elements
k, I found a strategy that is successful with probability 1
e
, which is the best possible
probability of success for the classical secretary problem and therefore best possible in
this situation.
Result 4 (Theorem 3.2). Let (P,≺) be a poset with k maximal elements. Then there
is an algorithm for the secretary problem on (P,≺) depending only on |P | and k that is
successful with probability at least 1
e
, and this is the best possible such bound.
Part 2: Generalizations of acyclic colourings
In Chapter 5, I found an upper bound for the number of colours needed to colour a
graph G properly in such a way that every cycle of length at least l receives at least three
PART 2: GENERALIZATIONS OF ACYCLIC COLOURINGS 131
colours, as a function of its maximum degree d, and showed that it was best possible up
to a logarithmic factor.
Result 5 (Corollary 5.2). For a fixed integer m ≥ 2, it is the case that
Ω
(
d
2m
2m−1
(log d)
1
2m−1
)
≤ A(2m)(d) = A(2m−1)(d) ≤ O
(
d
2m
2m−1
)
as d→∞.
I found an upper bound for the number of colours needed to colour a graph G properly
in such a way that every cycle receives at least c colours, as a function of its maximum
degree d, and showed that it was best possible up to a constant factor.
Result 6 (Corollary 5.6). For a fixed integer k ≥ 2, it is the case that
A2k(d), A2k+1(d) = Θ(d
k)
as d→∞.
In Chapter 6, I found an upper bound for the number of colours needed to colour
a graph G properly in such a way that every subgraph with minimum degree at least r
receives at least three colours, as a function of its maximum degree d, and showed that it
was best possible up to a logarithmic factor.
Result 7 (Corollary 6.2). For a fixed integer r ≥ 2, it is the case that
Ω
 d r2r2−1
(log d)
1
r2−1
 ≤ Dr(d) ≤ O(d r2r2−1)
as d→∞.
I found an upper bound for the number of colours needed to colour a u-uniform
hypergraph H in such a way that every edge is multicoloured and every subhypergraph
of minimum degree at least r receives at least three colours, as a function of its maximum
degree d, and showed that it was best possible up to a constant factor.
Result 8 (Corollary 6.6). For fixed integers u ≥ 3, r ≥ 2, it is the case that
D(u)r (d) = Θ(d)
132 CONCLUSION
as d→∞.
These results are part of a natural progression from earlier work, and suggest possible
directions of future research.
Bibliography
1. M. O. Albertson and D. M. Berman, The acyclic chromatic number, Proceedings of the seventh
southeastern conference on combinatorics, graph theory, and computing, 1976, pp. 51–69.
2. , Every planar graph has an acyclic 7-coloring, Israel J. Math. 28 (1977), no. 1, 169–174.
3. , An acyclic analogue to Heawood’s theorem, Glasg. Math. J. 19 (1978), no. 2, 163–166.
4. N. Alon, C. McDiarmid, and B. Reed, Acyclic coloring of graphs, Random Structures & Algorithms
2 (1991), no. 3, 277–288.
5. N. Alon and J. H. Spencer, The probabilistic method, Wiley-Interscience, 2000.
6. N. Alon, B. Sudakov, and A. Zaks, Acyclic edge colorings of graphs, J. Graph Theory 37 (2001),
no. 3, 157–167.
7. K. Appel and W. Haken, Every planar map is four colorable, Bull. Amer. Math. Soc. 82 (1976),
no. 5, 711–712.
8. , Every planar map is four colorable. Part I: discharging, Illinois J. Math. 21 (1977), no. 3,
429–490.
9. K. Appel, W. Haken, and J. Koch, Every planar map is four colorable. Part II: reducibility, Illinois
J. Math. 21 (1977), no. 3, 491–567.
10. B. Bolloba´s, The diameter of random graphs, Trans. Amer. Math. Soc. 267 (1981), no. 1, 41–52.
11. , Modern graph theory, Springer Verlag, 1998.
12. , Random graphs, Cambridge University Press, 2001.
13. O. V. Borodin, On acyclic colorings of planar graphs, Discrete Math. 25 (1979), no. 3, 211–236.
14. O. V. Borodin, D. G. Fon-Der Flaass, A. V. Kostochka, A. Raspaud, and E. Sopena, Acyclic list
7-coloring of planar graphs, J. Graph Theory 40 (2002), no. 2, 83–90.
15. O. V. Borodin, A. V. Kostochka, and D. R. Woodall, Acyclic colourings of planar graphs with large
girth, J. Lond. Math. Soc. 60 (1999), no. 2, 344–352.
16. R. L. Brooks, On colouring the nodes of a network, Math. Proc. Cambridge Philos. Soc. 37 (1941),
no. 2, 194–197.
17. M. I. Burnstein, Every 4-valent graph has an acyclic 5-coloring, Soobsˇcˇ Akad. Nauk Gruzin. SSR 93
(1979), no. 1, 21–24 (Russian).
18. A. Cayley, Mathematical questions with their solutions, The Educational Times 23 (1875), 18–19.
19. G. Chartrand, D. Geller, and S. Hedetniemi, Graphs with forbidden subgraphs, J. Combin. Theory
Ser. B 10 (1971), no. 1, 12–41.
133
134 BIBLIOGRAPHY
20. G. Chartrand, H. Kronk, and C. Wall, The point-arboricity of a graph, Israel J. Math. 6 (1968), no. 2,
169–175.
21. Y. S. Chow, S. Moriguti, H. Robbins, and S. M. Samuels, Optimal selection based on relative rank
(the “secretary problem”), Israel J. Math. 2 (1964), no. 2, 81–90.
22. Y. S. Chow, H. Robbins, and D. Siegmund, The theory of optimal stopping, Dover, 1991.
23. K. L. Chung, A course in probability theory, Academic Press, 2001.
24. R. P. Dilworth, A decomposition theorem for partially ordered sets, Ann. of Math. 51 (1950), no. 1,
161–166.
25. P. Erdo˝s and L. Lova´sz, Problems and results on 3-chromatic hypergraphs and some related questions,
Infinite and finite sets, 1975, pp. 609–627.
26. T. S. Ferguson, Who solved the secretary problem?, Statist. Sci. 4 (1989), no. 3, 282–289.
27. A. Q. Frank and S. M. Samuels, On an optimal stopping problem of Gusein-Zade, Stochastic Process.
Appl. 10 (1980), no. 3, 299–311.
28. P. R. Freeman, The secretary problem and its extensions: a review, Int. Stat. Rev. 51 (1983), no. 2,
189–206.
29. R. Freij and J. Wa¨stlund, Partially ordered secretaries, Electron. Commun. Probab. 15 (2010), 504–
507.
30. J. Galambos, Advanced probability theory, CRC, 1995.
31. F. Galvin, The list chromatic index of a bipartite multigraph, J. Combin. Theory Ser. B 63 (1995),
no. 1, 153–158.
32. M. Gardner, Mathematical games, Sci. Amer. 202 (1960), no. 2, 152.
33. , Mathematical games, Sci. Amer. 202 (1960), no. 3, 178–179.
34. B. J. Garrod, G. Kubicki, and M. Morayne, How to choose the best twins, SIAM J. Discrete Math.,
to appear.
35. B. J. Garrod and R. D. Morris, The secretary problem on an unknown poset, submitted.
36. N. Georgiou, M. Kuchta, M. Morayne, and J. Niemiec, On a universal best choice algorithm for
partially ordered sets, Random Structures & Algorithms 32 (2008), no. 3, 263–273.
37. S. Gerke, C. Greenhill, and N. Wormald, The generalized acyclic edge chromatic number of random
regular graphs, J. Graph Theory 53 (2006), no. 2, 101–125.
38. N. Ghoussoub, An integral representation of randomized probabilities and its applications, Se´minaire
de Probabilite´s XVI 1980/81, 1982, pp. 519–543.
39. J. Gianini-Pettitt, Optimal selection based on relative ranks with a random number of individuals,
Adv. in Appl. Probab. 11 (1979), no. 4, 720–736.
40. J. P. Gilbert and F. Mosteller, Recognizing the maximum of a sequence, J. Amer. Statist. Assoc. 61
(1966), no. 313, 35–73.
BIBLIOGRAPHY 135
41. A. V. Gnedin, A multicriteria problem of optimal stopping of a selection process, Autom. Remote
Control 42 (1981), 981–986.
42. , Multicriteria extensions of the best choice problem: sequential selection without linear order,
Strategies for sequential search and selection in real time, 1992, pp. 153–172.
43. C. Greenhill and O. Pikhurko, Bounds on the generalised acyclic chromatic numbers of bounded degree
graphs, Graphs Combin. 21 (2005), no. 4, 407–419.
44. B. Gru¨nbaum, Acyclic colorings of planar graphs, Israel J. Math. 14 (1973), no. 4, 390–408.
45. S. M. Gusein-Zade, The problem of choice and the optimal stopping rule for a sequence of independent
trials, Theory Probab. Appl. 11 (1966), no. 3, 472–476.
46. I. Guttman, On a problem of L. Moser, Canad. Math. Bull. 3 (1960), 35–39.
47. P. J. Heawood, Map colour theorem, Q. J. Math. 24 (1890), 332–338.
48. J. Kahn, Asymptotics of the list-chromatic index for multigraphs, Random Structures & Algorithms
17 (2000), no. 2, 117–156.
49. W. Kaz´mierczak, An algorithm that chooses the maximum element of a partially ordered set, Master’s
thesis, Politechnika Wroc lawska, 2006 (Polish).
50. A. V. Kostochka, Acyclic 6-coloring of planar graphs, Diskret. Analiz. 28 (1976), 40 (Russian).
51. , Upper bounds of chromatic functions of graphs, 1978.
52. K. Kothapalli, S. Varagani, V. C. Venkaiah, and K. Yadav, Acyclic vertex coloring of graphs of
maximum degree 5, International conference on graph theory and its applications, 2008.
53. , Acyclic vertex coloring of graphs of maximum degree 6, LAGOS: Latin-American algorithms,
graphs and optimization symposium, 2009.
54. J. Kozik, Dynamic threshold strategy for universal best choice problem, 21st international meeting on
probabilistic, combinatorial, and asymptotic methods in the analysis of algorithms, 2010, pp. 439–452.
55. G. Kubicki and M. Morayne, Graph-theoretic generalization of the secretary problem: the directed
path case, SIAM J. Discrete Math. 19 (2005), no. 3, 622–632.
56. M. Kuchta and M. Morayne, A secretary problem with many lives, submitted.
57. D. V. Lindley, Dynamic programming and decision theory, J. R. Stat. Soc. Ser. C. Appl. Stat. 10
(1961), no. 1, 39–51.
58. J. Mitchem, Every planar graph has an acyclic 8-coloring, Duke Math. J. 41 (1974), no. 1, 177–181.
59. M. Molloy and B. Reed, Further algorithmic aspects of the local lemma, Proceedings of the thirtieth
annual ACM symposium on theory of computing, 1998, pp. 524–529.
60. M. Morayne, Partial-order analogue of the secretary problem: the binary tree case, Discrete Math.
184 (1998), no. 1–3, 165–181.
61. M. Morayne and M. Sulkowska, Graph-theoretic generalization of the best choice problem; randomized
analysis of a simple effective algorithm for k-ary trees, submitted.
136 BIBLIOGRAPHY
62. L. Moser, On a problem of Cayley, Scripta Mathematica 22 (1956), no. 5, 289–292.
63. R. Muthu, N. Narayanan, and C. R. Subramanian, Improved bounds on acyclic edge colouring, Dis-
crete Math. 307 (2007), no. 23, 3063–3069.
64. C. St.J. A. Nash-Williams, Edge-disjoint spanning trees of finite graphs, J. Lond. Math. Soc. 36
(1961), no. 1, 445–450.
65. , Decomposition of finite graphs into forests, J. Lond. Math. Soc. 39 (1964), no. 1, 12.
66. J. Nesˇetrˇil and N. C. Wormald, The acyclic edge chromatic number of a random d-regular graph is
d + 1, J. Graph Theory 49 (2005), no. 1, 69–74.
67. J. D. Petruccelli, Best-choice problems involving uncertainty of selection and recall of observations,
J. Appl. Probab. 18 (1981), no. 2, 415–425.
68. J. Preater, The best-choice problem for partially ordered objects, Oper. Res. Lett. 25 (1999), no. 4,
187–190.
69. E. L. Presman and I. M. Sonin, The best choice problem for a random number of objects, Theory
Probab. Appl. 17 (1973), no. 4, 657–668.
70. M. Przykucki, Optimal stopping in a search for a vertex with full degree in a random graph, submitted.
71. M. Przykucki and M. Sulkowska, Gusein-Zade problem for directed path, Discrete Optim. 7 (2010),
no. 1–2, 13–20.
72. G. Ringel and J. W. T. Youngs, Solution of the Heawood map-coloring problem, Proc. Natl. Acad.
Sci. USA 60 (1968), no. 2, 438–445.
73. S. Skulrattanakulchai, Acyclic colorings of subcubic graphs, Inform. Process. Lett. 92 (2004), no. 4,
161–167.
74. M. H. Smith, A secretary problem with uncertain employment, J. Appl. Probab. 12 (1975), no. 3,
620–624.
75. W. Stadje, Efficient stopping of a random series of partially ordered points, Multiple criteria decision
making theory and applications, 1980, pp. 430–447.
76. C. Thomassen, Every planar graph is 5-choosable, J. Combin. Theory Ser. B 62 (1994), no. 1, 180–181.
77. J. Tkocz, The best choice problem for almost linear orders, submitted.
78. V. G. Vizing, On an estimate of the chromatic class of a p-graph, Diskret. Analiz. 3 (1964), 25–30
(Russian).
79. M. Voigt, List colourings of planar graphs, Discrete Math. 120 (1993), no. 1–3, 215–219.
80. M. C. K. Yang, Recognizing the maximum of a random sequence based on relative rank with backward
solicitation, J. Appl. Probab. 11 (1974), no. 3, 504–512.