Problems of optimal choice on posets and generalizations of acyclic colourings Bryn Garrod Department of Pure Mathematics and Mathematical Statistics Trinity College University of Cambridge This dissertation is submitted for the degree of Doctor of Philosophy 3rd May 2011 For Raphae¨le Acknowledgements I am very grateful to my supervisor, Be´la Bolloba´s, for everything that he has done for me over the last three and a half years, from offering me professional assistance, support and advice to organizing research visits to the Re´nyi Institute in Budapest and to the University of Memphis. He has kept me going in the right direction when I have lost my way or my motivation, asked crucial questions when I have been stuck, and of course taught me and introduced me to a wealth of mathematical knowledge and understanding. At the same time, he has become an integral part of my social life, whether sharing coffee and biscuits in the Pavilion E common room, or inviting me to parties or a Super Bowl in his home. I have enjoyed the many non-mathematical conversations that we have had, even if my lack of cultural knowledge is a constant disappointment! I should like to thank my collaborators Grzegorz Kubicki, Micha l Morayne and Rob Morris. I am particularly grateful to Micha l and to Jan Tkocz for their warm hospitality when I visited them in Wroc law and to Rob for the help and advice that he has given me with my work far beyond his role as my departmental mentor. Thank you also to Graham Brightwell, whose discussions with Be´la and Rob led to an earlier proof of Theorem 3.2, for his helpful comments on the paper that I wrote with Rob, which became Chapter 3. Thank you to Victor Falgas-Ravry for a useful conversation about Lemma 2.19. I hugely appreciate the constructive comments and corrections provided by my examiners, Andrew Thomason and Mark Walters. I am grateful to the Engineering and Physical Sciences Research Council for funding me for the duration of my Ph.D. and to the Department of Pure Mathematics and Mathe- matical Statistics at the University of Cambridge for providing me with a very pleasant and productive working environment. Thank you to the organizers of the following confe- rences, all of whom supported my attendance financially, and also to the department, Trinity College and the London Mathematical Society for contributing to these costs: v vi ACKNOWLEDGEMENTS the Bristol Summer School on Probabilistic Techniques in Computer Science, Building Bridges, the Fete of Combinatorics and Computer Science, the LMS-EPSRC Short Course on Probabilistic Combinatorics, the 22nd British Combinatorial Conference, the 14th In- ternational Conference on Random Structures and Algorithms, a conference in honour of the 70th birthday of Endre Szemere´di, and the International Congress of Mathematicians 2010. I have been very lucky with the people whom I have had to guide me through my mathematical studies. I am extremely grateful to my secondary school maths teacher, Peter Jack, for the encouragement and enthusiasm that he provided for more than seven years, and to Tim Gowers and Imre Leader for inspiring me in supervisions and lectures for the next seven. Thank you to the rest of Team Bolloba´s for the coffee times, mini-seminars and confe- rence social life, particularly my good friends Tom Coker, Micha l Przykucki and Misha Tyomkyn. Thank you to all my other friends for their emotional support, particularly Jen Gold, Alex Holyoake, Phil Horler, Ce´line Miani, Paul Smith, David Taylor, Charles Vial and Ioanna Vlahou, with whom I have spent most of my spare time over the last three and a half years. I am hugely indebted to my parents, Martin and Dilys, and my sister and brother, Tania and Ross, for the happy, supportive family background in which I have grown up, and the encouragement that they have always given me. My greatest thanks go to my wife, Raphae¨le, whom I met at the start of my Ph.D. and who is as much a part of it as the combinatorics. Thank you to her for marrying me and for looking after me so well. I am thankful for everything that she has done for me, which is more than I can write here. Declaration of joint work This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration except where specifically indicated in the text. The first five sections of Chapter 2 are based on joint work with Grzegorz Kubicki1 of the University of Louisville, USA, and Micha l Morayne2 of Politechnika Wroc lawska, Poland, and about 50% of this is attributable to me. The work that forms the basis of Chapter 3 was joint with Robert Morris,3 then of the University of Cambridge, UK, but now of IMPA, Brazil, and my contribution was about 75%. 1gkubicki@louisville.edu 2michal.morayne@pwr.wroc.pl 3rob@impa.br vii Abstract This dissertation is in two parts, each of three chapters. In Part 1, I shall prove some results concerning variants of the ‘secretary problem’. In Part 2, I shall bound several generalizations of the acyclic chromatic number of a graph as functions of its maximum degree. I shall begin Chapter 1 by describing the classical secretary problem, in which the aim is to select the best candidate for the post of a secretary, and its solution. I shall then summarize some of its many generalizations that have been studied up to now, provide some basic theory, and briefly outline the results that I shall prove. In Chapter 2, I shall suppose that the candidates come as m pairs of equally qualified identical twins. I shall describe an optimal strategy, a formula for its probability of success and the asymptotic behaviour of this strategy and its probability of success as m→∞. I shall also find an optimal strategy and its probability of success for the analagous version with c-tuplets. I shall move away from known posets in Chapter 3, assuming instead that the candi- dates come from a poset about which the only information known is its size and number of maximal elements. I shall show that, given this information, there is an algorithm that is successful with probability at least 1 e . For posets with k ≥ 2 maximal elements, I shall prove that if their width is also k then this can be improved to k−1 √ 1 k , and show that no better bound of this type is possible. In Chapter 4, I shall describe the history of acyclic colourings, in which a graph must be properly coloured with no two-coloured cycle, and state some results known about them and their variants. In particular, I shall highlight a result of Alon, McDiarmid and Reed, which bounds the acyclic chromatic number of a graph by a function of its maximum degree. My results in the next two chapters are of this form. ix x ABSTRACT I shall consider two natural generalizations in Chapter 5. In the first, only cycles of length at least l must receive at least three colours. In the second, every cycle must receive at least c colours, except those of length less than c, which must be multicoloured. My results in Chapter 6 generalize the concept of a cycle; it is now subgraphs with minimum degree r that must receive at least three colours, rather than subgraphs with minimum degree two (which contain cycles). I shall also consider a natural version of this problem for hypergraphs. Contents Acknowledgements v Declaration of joint work vii Abstract ix Introduction 1 Part 1: Problems of optimal choice on posets 1 Part 2: Generalizations of acyclic colourings 2 Part 1. Problems of optimal choice on posets 5 Chapter 1. The classical secretary problem 7 1.1. The problem 7 1.2. Outline solution 7 1.3. Variants 9 1.4. Formal model and notation 18 1.5. Useful theorems 21 Chapter 2. How to choose the best twins 27 2.1. Introduction 27 2.2. Notation and basic definitions 29 2.3. The optimal stopping time 30 2.4. The probability of success 36 2.5. Asymptotics 40 2.6. How to choose the best c-tuplets 44 2.7. Open problems 52 Chapter 3. The secretary problem on an unknown poset 55 3.1. Introduction 55 xi xii CONTENTS 3.2. Lower bounds 56 3.3. Upper bound 66 3.4. Open problems 80 Part 2. Generalizations of acyclic colourings 83 Chapter 4. Acyclic colourings of graphs 85 4.1. Colouring problems 85 4.2. From Nash-Williams to acyclic colourings 87 4.3. Variants 88 4.4. Generalized acyclic colourings 92 4.5. Definitions and a useful tool 93 Chapter 5. Long acyclic colourings and acyclic colourings with many colours 97 5.1. Introduction 97 5.2. The length-l-acyclic chromatic number 97 5.3. The c-colour acyclic chromatic number 103 5.4. Open problems 113 Chapter 6. Small minimum degree colourings of graphs and hypergraphs 115 6.1. Introduction 115 6.2. The degree-r chromatic number of a graph 115 6.3. The degree-r chromatic number of a hypergraph 121 6.4. Open problems 126 Conclusion 129 Part 1: Problems of optimal choice on posets 129 Part 2: Generalizations of acyclic colourings 130 Bibliography 133 Introduction This dissertation is split into two unrelated parts. In Part 1, I shall consider several problems of optimal choice on posets, which are generalizations of a problem popularly known as the ‘secretary problem’. In Part 2, I shall consider generalizations of acyclic colourings of graphs, a concept whose origins can be traced back to Nash-Williams’s theorem concerning the decomposition of the edge set of a graph into forests. Part 1: Problems of optimal choice on posets The classical secretary problem is as follows. There are n candidates to be interviewed for a position as a secretary. They are interviewed one by one and, after each interview, the interviewer must decide whether or not to accept that candidate. If the candidate is accepted then the process stops, and if the candidate is rejected then the interviewer moves on to the next candidate. The interviewer may only accept the most recently interviewed candidate. At each stage, the interviewer knows the complete ranking of the candidates interviewed so far, all of whom are comparable, but has no other measure of their ability. The interviewer is only interested in finding the very best candidate; selecting any other for the job is considered a failure. The aim is to find a strategy that maximizes the probability that the interviewer chooses the best candidate, under the assumption that the candidates are seen in a uniformly random ordering. In Chapter 1, I shall describe the solution to this problem, that there is a strategy that is successful with probability at least 1 e and that this is asymptotically best possible. I shall also provide more of the historical background of this problem and some of its generalizations up to now. In the rest of Part 1, I shall consider two generalizations of this problem. In Chapter 2, I shall first assume that there are 2m candidates who are in fact m pairs of identical twins, each pair of twins being equally well-qualified for the job. As in the classical problem, the interviewer knows at each stage how the candidates interviewed so far compare with each other, but has no other measure of their ability. I shall describe an optimal strategy, a 1 2 INTRODUCTION formula for its probability of success and the asymptotic behaviour of this strategy and its probability of success as m→∞. I shall also find an optimal strategy and its probability of success for the analagous version on m sets of c-tuplets, and provide bounds that give some indication of their asymptotic behaviour. I shall move from these known posets to unknown posets in Chapter 3. Specifically, I shall assume that the candidates come from a poset whose size n and number of maximal elements k is known, but whose structure is unknown. Here, the interviewer knows the poset induced by the candidates interviewed so far. I shall describe a strategy that is successful on all posets with given n and k with probability at least 1 e , which is the best possible bound when k = 1, but probably not for k > 1. I shall also find a strategy that is successful with probability at least k−1 √ 1 k when the width of the poset is known to be the same as the number of maximal elements k and k ≥ 2. By considering the poset consisting of k disjoint chains, I shall show that no greater probability of success can be guaranteed. Part 2: Generalizations of acyclic colourings An acyclic colouring of a graph is a proper vertex-colouring such that every cycle contains vertices of at least three colours. To put it another way, it is an assignment of colours to the vertices such that the graph induced by the vertices in any colour class must be an independent set, and the graph induced by the vertices in any two colour classes must be a forest. The acyclic chromatic number of a graph is the minimum number of colours needed to colour it acyclically. In Chapter 4, I shall describe how this concept came into being as a generalization of the arboricity of a graph, which is the minimum number of forests into which its edge set can be decomposed, and I shall state some of the many results proved about acyclic colourings up to this point. In particular, I shall highlight a result of Alon, McDiarmid and Reed, which bounds the acyclic chromatic number of a graph by a function of its maximum degree. My results in Part 2 will be of this form. In Chapter 5, I shall consider two generalizations in which the subgraphs under scru- tiny are still cycles. In the first, the extra condition that a proper colouring must satisfy is relaxed so that only cycles of length at least l must receive three colours, that is, the PART 2: GENERALIZATIONS OF ACYCLIC COLOURINGS 3 graph induced by the vertices in any two colour classes does not contain a cycle of length at least l. In the second, I shall strengthen the condition so that every cycle must receive at least c colours, with the obvious exception of cycles of length less than c, which must be multicoloured. In this case, the graph induced by the vertices in any x colour classes with x < c does not contain any cycles of length greater than x. The definitions given so far would work just as well if ‘cycle’ were replaced by ‘2-regular subgraph’ or even ‘subgraph with minimum degree at least 2,’ since cycles fall into both of these categories and any subgraph of either type must contain a cycle. I shall focus my attention in Chapter 6 on subgraphs with minimum degree r; a graph contains at least as many of these as r-regular subgraphs, and indeed is unlikely to have any r-regular subgraphs for large r, so this is more restrictive. In a proper colouring, it is possible for any bipartite subgraph with minimum degree r to receive only two colours; I shall insist that it receive three. I shall also consider the same problem for u-uniform hypergraphs, under the assumption that a proper colouring is one in which every edge is multicoloured. Part 1 Problems of optimal choice on posets CHAPTER 1 The classical secretary problem 1.1. The problem The exact origins of the classical secretary problem are complicated (and the subject of Ferguson’s history of the problem [26]), but the problem was popularized by Martin Gardner [32, 33] in his Scientific American column in February 1960, as the game goo- gol. The problem itself is simple to state, and its ‘secretary problem’ formulation is as follows. There are n candidates to be interviewed for a position as a secretary. They are interviewed one by one and, after each interview, the interviewer must decide whether or not to accept that candidate. If the candidate is accepted then the process stops, and if the candidate is rejected then the interviewer moves on to the next candidate. The interviewer may only accept the most recently interviewed candidate. At each stage, the interviewer knows the complete ranking of the candidates interviewed so far, all of whom are comparable, but has no other measure of their ability. The interviewer is only inter- ested in finding the very best candidate; selecting any other for the job is considered a failure. The aim is to find a strategy that maximizes the probability that the intervie- wer chooses the best candidate, under the assumption that the candidates are seen in a uniformly random ordering. 1.2. Outline solution The solution to the classical secretary problem is now folklore but was first published by Lindley [57], and I shall give an outline of it here. It is obvious that the interviewer should only consider accepting a candidate who is the best seen so far. It is intuitively clear that a candidate should not be accepted very early on even if he or she is the best seen so far, since there is a reasonable probability that a small number of candidates all come from near the bottom of the ranking. Conversely, the interviewer should not wait too long, or the best candidate will probably be missed and 7 8 1. THE CLASSICAL SECRETARY PROBLEM the interviewer will not have the opportunity to select anyone. Furthermore, if a strategy dictates that the ith candidate should be accepted if he or she is the best candidate seen so far, it seems reasonable that the (i+1)th should be accepted in the same circumstances, since more candidates have been seen and that candidate’s credentials are stronger. From these observations, it might be expected that some sort of threshold should be passed before the interviewer considers choosing a candidate. Using backward induction, one can prove exactly that. This will be described in more detail in Section 1.5; for now, I shall assume the following consequence of it without proof. For some k, the strategy “reject the first k candidates, and accept the next who is the best seen so far” is optimal. As an aside, it is worth noting that there might be more than one optimal strategy: for example, when there are exactly two candidates, the two possible deterministic strategies are equivalent, and both are equivalent to tossing a coin to choose between the two candidates. Let W be the event that, using this strategy, the interviewer chooses the best candidate and let Bi be the event that the i th candidate interviewed is the best candidate. Let Ai be the event that the interviewer is still interviewing by the time the ith candidate arrives, that is, that the best of the first i− 1 candidates interviewed is in the first k interviewed. Then Bi and Ai are independent, and the probability of winning is given by P(W ) = n∑ i=k+1 P(Bi)P(W |Bi) = n∑ i=k+1 P(Bi)P(Ai) = n∑ i=k+1 1 n · k i− 1 = k n n−1∑ j=k 1 j . A value of k that maximizes this satisfies k n n−1∑ j=k 1 j ≥ k − 1 n n−1∑ j=k−1 1 j and k n n−1∑ j=k 1 j ≥ k + 1 n n−1∑ j=k+1 1 j , 1.3. VARIANTS 9 that is, n−1∑ j=k 1 j ≥ k − 1 k − 1 = 1 and n−1∑ j=k+1 1 j ≤ k k = 1, from which simple integration arguments give n e − 1 ≤ k ≤ n e + e− 1 e . From this, it is clear that for such k lim n→∞ k n = 1 e and that the probability of winning tends to the same limit. In fact, in Chapter 3, it will become evident that this is a lower bound. 1.3. Variants A problem posed by Cayley [18] may have inspired the classical secretary problem. This is what is now known as the ‘full information’ case, where the candidates’ abilities are represented by real random variables from a known distribution, and the aim is to maximize the expected ability of the chosen candidate. The uniform distribution U [0, 1] was considered by Moser [62]; Guttman [46] found an optimal strategy for a general distribution and also gave an explicit optimal strategy for the normal distribution N(0, 1). His general optimal strategy is to accept a candidate if there are at least m candidates remaining after that one and his or her ability is at least Em, for some (Em)m∈N. For U(0, 1), the first few values of Em are 0.5, 0.625, 0.6953, 0.7417 and 0.775, and for N(0, 1) they are 0, 0.3992, 0.6298, 0.7904 and 0.9127. The expected ability is the first threshold for acceptance, E1. Since 1960, many generalizations of the classical secretary problem have been conside- red. Freeman [28] wrote an extensive review of the area in 1983, which shows how many versions had already been considered by then. I shall describe only some of them, and some more recent results. Besides the full information case, the most obvious generalization might be to be more flexible over what constitutes success, and to try to minimize the expected rank (viewing the best candidate as being from rank 1 and the worst from rank n) rather than insisting 10 1. THE CLASSICAL SECRETARY PROBLEM on choosing the best candidate. This version was solved by Chow, Moriguti, Robbins and Samuels [21]. Perhaps surprisingly, in the limit as n → ∞ the optimal expected rank tends to a constant rather than a multiple of n, namely, ∞∏ j=1 ( j + 2 j ) 1 j+1 ≈ 3.8695. They showed that an optimal strategy is to accept a candidate who is in the top k seen so far as long as at least ik candidates have been seen in total, for some thresholds ik. They also showed that the ik satisfy lim n→∞ ik n = ∞∏ j=k ( j j + 2 ) 1 j+1 . This means that, for large n, once we have seen approximately 26% of candidates we should be prepared to accept the next one who is the best so far, after 45% we should accept anyone who is one of the top two seen so far, after 56% one of the top three, after 64% one of the top four, after 69% one of the top five and so on. At the other end of the scale, once we have seen 99% of candidates we should accept anyone who is in the top 200 and after 99.9% anyone in the top 2000. Yang [80] considered the situation where, as well as being allowed to offer the job to the most recently interviewed candidate, who would accept it, the interviewer can offer the job to any of the other candidates seen so far, who is still available with probability q(r), where q is a known non-increasing function of the number r of candidates interviewed since that one. If a candidate is unavailable, he or she never becomes available again. The aim is to choose the best possible secretary, as in the classical secretary problem. Smith [74] studied a version where the job can only be offered to the currently interviewed candidate, but there is some fixed probability that the candidate will refuse the offer. Petruccelli [67] worked on these two problems simultaneously, that is, Yang’s problem with q(r) still non- increasing but with q(0) no longer forced to be 1. In particular, Petruccelli considered the case where the probabilities form a geometric progression, that is, where q(r) = qpr for some p and q. He proved that there are two cases, depending on p, q and the number n of candidates. If ∑∞ r=0 q(r) = q 1−p > n − 1 then an optimal strategy is to observe all n candidates and then to offer the job to the 1.3. VARIANTS 11 best one. If q 1−p ≤ n − 1, then an optimal strategy is to wait until sn candidates have been seen, for some sn, and then to offer the job to the best one seen so far. If this one is unavailable, then the interviewer should continue interviewing and offer the job to the next candidate who is the best seen so far. If this one is unavailable, then the interviewer should continue as before, and so on. He gave an explicit formula for sn, namely, the smallest value of s for which n−1∏ k=s+1 ( 1 + 1− q k ) ≤ [ q ( 1 + 1− q s(1− p) )]−1 . He also showed that both sn n and the probability of success tend to q 1 1−q as n→∞. This is independent of p, which means that for large n there is effectively no benefit to being allowed to recall a previous candidate, since p can be arbitrarily small. However, this is not surprising, since if p is fixed then only a constant number of candidates are likely to be available at any one point, even as n→∞. Gusein-Zade [45] allowed selection of any of the top r candidates to count as success, and showed that an optimal strategy is of the same form as in the expected rank case, that is, an optimal strategy is to accept a candidate who is in the top k seen so far as long as at least ik candidates have been seen in total, for some thresholds ik. He showed that for r = 2 the limiting probability of success as n → ∞ is about 0.5736. Frank and Samuels [27] proved that the optimal probability of success p(n, r) satisfies lim r→∞ lim n→∞ ( 1− p(n, k)) 1r = 1− t∗, where t∗ = lim n→∞ i1 n ≈ 0.2834. Gilbert and Mosteller [40] considered what could be called the inverse of this problem: the interviewer is allowed to pick up to r candidates and wins if any of them is the best. (Many other variations of the secretary problem are included in the same paper.) They showed that an optimal strategy is to wait until t(n, r) candidates have been seen, for some function t(n, r), then to pick the next who is the best seen so far, and then to play an optimal strategy for the remaining candidates and r− 1 choices. They found an iterative method to calculate the limits ur = lim n→∞ t(n, r) n , 12 1. THE CLASSICAL SECRETARY PROBLEM showed that the first few values of ur are e −1, e− 3 2 , e− 47 24 and e− 2761 1152 , and showed that as n→∞ the optimal probability of success tends to r∑ i=1 ui. Some authors wondered what would happen if the number of candidates were unk- nown, but the distribution of that number, the random variable N , were known. Presman and Sonin [69] found an explicit optimal strategy for a general distribution, where the aim is to choose the best candidate. They also showed that if the number of candidates is uniform in [n], then an optimal strategy is of the same form as in the classical secretary problem, but where the threshold is asymptotically equivalent to n e2 and the probability of success tends to 2 e2 ≈ 0.2707 as n→∞. Gianini-Pettitt [39] considered the minimal expected rank version of this problem, and restricted her attention to distributions of the form P ( N = x ∣∣N ≥ x) = (n− x+ 1)−α, for some α. She proved that, as when the number of candidates is known, an optimal strategy is of the form ‘accept the ith candidate if it is one of the best k(i) seen so far,’ but that k(i) is not necessarily an increasing function. She also proved the surprising fact that if N1 and N2 are possible distributions of the number of candidates and N1 is stochastically smaller than N2, that is, P(N1 ≤ x) ≥ P(N2 ≤ x) for all x, this does not imply that the minimum expected rank decreases. One example of this is that if N1 is uniformly distributed on [n], that is, if α = 1, then the optimal expected rank tends to infinity as n → ∞, whereas if N2 = n with probability 1 then, as shown by Chow, Moriguti, Robbins and Samuels [21], the optimal expected rank tends to about 3.8695. In fact, she showed that the optimal expected rank tends to infinity if α < 2 and to the Chow, Moriguti, Robbins and Samuels limit of approximately 3.8695 if α > 2, and if α = 2 then the lim inf of the optimal expected rank is greater than 3.8695 and the lim sup is finite. More generally, one could consider problems of optimal choice on more complicated systems; up to this point the assumption has always been that there are n rankable 1.3. VARIANTS 13 candidates. Kuchta and Morayne [56] considered a version of the classical secretary problem where the interviewer has k ‘lives’: if all n candidates are interviewed without any of them being chosen, then a new set of n candidates is interviewed, and the aim is to chose the best of these, and so on. At most k sets of candidates are allowed to be interviewed in total. They showed that, for some function t(n, k), an optimal strategy is to ignore the first t(n, k) candidates from the first set, and accept the next candidate who is the best seen so far; if no such candidate appears, then ignore the first t(n, k − 1) candidates from the next set and accept the next candidate who is the best seen so far, and so on. They showed that lim n→∞ t(n, k) n exists, and denoting it by ak, that ak+1 = e ak−1, where of course a1 = 1e . Stadje [75] introduced the idea that the candidates could be ranked separately in each of k > 1 different criteria, with the interviewer wishing to select a candidate who is maximal in at least one of them, and Gnedin [41] solved the version where these rankings are random and independent of each other. He proved that an optimal stratgy is to wait until a certain number of candidates have been seen and then to select the next who is best according to at least one criterion, and that the limiting values of this threshold and the probability of success are both k−1 √ 1 k . Gnedin has also produced a more general survey of multicriteria problems [42]. In fact, these last two problems could both be viewed as versions of the secretary problem on k disjoint chains, which I shall solve in Chapter 3, with an extra restriction, about which I shall say more at the time. Moving slightly further away from the classical secretary problem, Kubicki and Mo- rayne [55] considered the problem on a directed path, where at each stage the selector knows the directed graph induced by the vertices seen so far and wishes to choose the end-vertex with no edge going out of it. This is similar to the classical secretary problem, but each candidate can only be compared with the one immediately above it or below it in the ranking. They showed that an optimal strategy is to wait until the first time t when the induced graph has n − t + 1 connected components and to pick the tth vertex. Note that this is the first time when the selector can be sure that the sought after end-vertex has been seen: if at time t the induced graph has n− t + 1 connected components, then 14 1. THE CLASSICAL SECRETARY PROBLEM the remaining n − t vertices must be used to join components together, and so none of them is the end-vertex. Note also that this strategy is independent of whether or not the tth vertex is an end-vertex of its component; if it is not, then it does not make any difference which vertex is chosen. They showed that the probability of success pn satisfies lim n→∞ pn √ n = √ pi 2 . Przykucki and Sulkowska [71] adapted this problem in a similar way to Gusein-Zade’s version of the classical secretary problem, so that choosing the end-vertex or its neighbour counts as success. In this case, the optimal stopping time and its analysis are more complicated, but their numerical analysis shows that the probability of success behaves approximately like 1.26√ n . For comparison with the previous result, √ pi 2 ≈ 0.8862. s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s  1 6 PP PPPi   PP PPP   6 6 6@@I @@I @@I @@ @@ @@         6 6 6 6 6 6 6 6 6BBM BBM BBM BBM BBM BBM BBM BBM BBM         BB BB BB BB BB BB BB BB BB Figure 1.1. Directed ternary tree of depth 3. Morayne and Sulkowska [61] studied a variation of the directed path version, in this case working with the complete k-ary directed rooted tree of depth n (see Figure 1.1) and again assuming that the selector knows the induced directed graph at each stage of the process. They found a lower bound for the probability of success of an optimal strategy by considering a natural (but not necessarily optimal) strategy: select the currently examined element if there is a directed path of length n terminating at it. If this ever happens, then the strategy must be successful. In this way, they showed that on a binary tree the limit of the optimal probability of success is at least 2 log 2− 1 ≈ 0.3863 and that on a ternary tree it is at least 3 2 log 3− 2 + pi 2 √ 3 ≈ 0.5548, and that it tends to 1 as k →∞. Przykucki [70] posed a problem concerning the random graph Gn,p with n vertices and any two connected by an edge with probability p independently of the others. Again, the selector knows the graphs induced by the vertices seen so far, and wishes to find a vertex of full degree, that is, degree n− 1. He showed that an optimal strategy is to wait 1.3. VARIANTS 15 until k(n, p) vertices have been seen and then to pick the next vertex connected to every vertex seen so far, for some k(n, p). He showed that, for fixed p ∈ (0, 1), k(n, p) = log 1 p n+O(1) as n→∞, and found a formula for the probability of its success, which of course tends to zero more quickly than npn−1, which is an upper bound for the probability that a vertex of full degree exists. In this dissertation, the type of generalization that I shall consider is to put partial orders other than a total order on the candidates. Here, the selector knows the poset from which the elements are taken and the poset induced by the elements observed so far, and wishes to choose an element that is maximal in the ground poset. s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s  HHHHHHHH @ @ @ @ @ @ @ @                 A A A A A A A A A A A A A A A A                                 C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Figure 1.2. Binary tree of depth 4. A generalization due to Morayne [60] is to consider the case of a binary tree of depth n (see Figure 1.2). Intuitively, it seems unlikely that a random selection of nodes would come from a subtree with a maximum other than the global maximum unless they are linearly ordered, and he showed that this is indeed the case. An optimal strategy here is to select the maximum out of the elements seen so far when the poset induced by these elements is either linear of length greater than n 2 or non-linear with a unique maximum. He showed further that as n→∞, the probability of success tends to 1. Kaz´mierczak [49] added a ‘witness’ to the classical secretary problem, an extra element w in the poset that lies immediately below the maximal element but cannot be compared with any other element. Tkocz [77] extended this concept to put the witness below the 16 1. THE CLASSICAL SECRETARY PROBLEM s s s s s s b b b b b x1 xk xn w Figure 1.3. Tkocz’s poset. kth highest element (see Figure 1.3). He gave an explicit optimal strategy, which uses three different thresholds, essentially depending on the size of k relative to n. If the poset induced by the elements seen so far is linear and of length greater than the threshold then a maximal element should be accepted; if it is not linear then an optimal strategy for the classical secretary problem on k elements should be followed. He also calculated the asymptotic probabilities of success for k = 2 and 3, approximately 0.415 and 0.384 respectively, compared with the figure of approximately 0.573 for k = 1 obtained by Kaz´mierczak. s s s s s s s s s s " " " " " " " " " " " " " " " " " " " " b b b b b b b b b b b b b b b b b b b b Figure 1.4. Five pairs of identical twins. Micha l Morayne, Grzegorz Kubicki and I [34] considered the case of m pairs of ‘twins’, where there are m levels with two incomparable elements on each level (see Figure 1.4); I shall present these results in Chapter 2. I shall show that an optimal strategy is to wait until elements from a certain threshold number of levels have been seen and then to select the next element that is maximal and whose twin has already been seen. I shall further show that as m → ∞, this threshold behaves roughly like 0.4709m and the probability of success tends to approximately 0.7680. Calculating these asymptotic values for the 1.3. VARIANTS 17 natural extension to ‘c-tuplets’ for c > 2, is a harder problem, and I shall provide some bounds. A further interesting generalization, due to Preater [68], was an attempt to find an algorithm that was successful on all posets of a given size with positive probability. Sur- prisingly, he proved that there is such a ‘universal’ algorithm (depending only on the size of the poset), which is successful on every poset with probability at least 1 8 . In this algorithm, an initial random number of elements are rejected and a subsequent element is accepted according to randomized criteria. A slightly modified version of the algorithm, also suggested by Preater, was analysed by Georgiou, Kuchta, Morayne and Niemiec [36], and gave an improved lower bound of 1 4 for the probability of success. More recently, Kozik [54] introduced a ‘dynamic threshold strategy’ and showed that it was successful with probability at least 1 4 + ε, for some ε > 0 and for all sufficiently large posets. When I was about to submit this dissertation, Micha l Morayne drew my attention to a very recent paper of Freij and Wa¨stlund [29]. In it, they describe a strategy that is successful with probability at least 1 e . This cannot be improved, since the best possible probability of success in the classical secretary problem, on a totally ordered set, is 1 e . I shall say more about this in Section 3.4. Before Kozik, Freij and Wa¨stlund had published their results, Robert Morris and I [35] showed that, given any poset, there is an algorithm that is successful with probability at least 1 e , so, in this sense, the total order is the hardest possible partial order. I shall present these results in Chapter 3. In fact, this algorithm depends only on the size of the poset and its number of maximal elements, so it is universal for any family where these are given. It is therefore natural to ask which is the hardest partial order with a given number of maximal elements. The most obvious choice is the poset consisting of k disjoint chains. I shall give an asymptotically sharp lower bound on the probability of success in the problem of optimal choice on k disjoint chains, and show that it is at least as hard as on any poset with k maximal elements and of width k, that is, whose largest antichain has size k. 18 1. THE CLASSICAL SECRETARY PROBLEM 1.4. Formal model and notation In this section, I shall define formally the probability space in which I shall work in Chapters 2 and 3. This probability space will depend on a poset (P,≺) with P = {x1, . . . , xn}. In Chapter 2, this will be a known poset, the poset of m pairs of identical twins or, later, m sets of identical c-tuplets. In Chapter 3, this will be a fixed but unknown poset. Let max≺(P ) denote the set of its maximal elements, that is, max≺(P ) = {x ∈ P : 6 ∃y such that x ≺ y}. The subscript in max≺ will be suppressed when it is clear from the context. Given (P,≺), I shall work with a probability space (ΩP ,FP ,PP ), with EP defined in the obvious way. The subscripts will be suppressed when they are clear from the context, as they will be for the rest of this section. The probability space (Ω,F ,P) is defined as follows. Set Ω = Sn×[0, 1], where Sn is the permutation group on [n], and F = P(Sn)×B, where B is the Borel σ-algebra. Let P = µ×λ, where µ is the uniform probability measure, that is, µ({ρ}) = 1 n! for all ρ ∈ Sn, and λ is the Lebesgue measure. In other words, (ρ, δ) ∈ Ω is picked uniformly at random. Given (ρ, δ) ∈ Ω, the ρ-co-ordinate will determine the order in which elements of P appear and the δ-co-ordinate will allow the introduction of randomness independent of this order into our algorithms. This will not be needed in Chapter 2; in Chapter 3, the δ-co-ordinate will determine an initial number of elements to reject without considering. The reason why continuous space and Lebesgue measure are used, despite the fact that all of the randomized strategies considered pick one of a finite number of options, is that this allows them all to lie in the same probability space. Write P [n] for the set of permutations of P , and let pi : Ω → P [n] be the random variable defined by pi(ρ, δ)(i) = xρ(i). 1.4. FORMAL MODEL AND NOTATION 19 LetPt denote the set of all posets with vertex set [t] = {1, . . . , t}. Let (Pt)t∈[n] be a family of random variables with Pt representing the poset seen at time t. Formally, Pt : Ω→Pt and each Pt(ρ, δ) = ([t],≺t) is defined by ∀i, j ∈ [t], i ≺t j ⇐⇒ pi(i) ≺ pi(j). The poset Pt is the natural description of what is seen at time t as the elements of P appear one by one. Let (Ft)t∈[n] be the sequence of σ-algebras with each Ft generated by the random variables P1, . . . , Pt, that is, Ft = σ(P1, . . . , Pt) = σ(Pt), the second equality holding since Pt is a labelled poset and thus its value determines the values of P1, . . . , Pt−1. The σ-algebra Ft can be thought of as the information known at time t about where we are in the universe Ω. Since Pt takes only finitely-many values, Ft has a simple structure; it is the pre-images in Ω of the possible values of Pt and the unions of these pre-images. These pre-images are called the atoms of Ft. Let F ′t be the projection of Ft onto P(Sn). Since definitions have so far depended only on the ρ-co-ordinate of (ρ, δ) ∈ Ω, it is clear that, for each t, Ft = {A× [0, 1] : A ∈ F ′t}. In other words, (ρ1, δ1) and (ρ2, δ2) are in the same atom of Ft if and only if ρ1 and ρ2 are in the same atom of F ′t, which happens if and only if the labelled posets induced by the first t elements pi(1), . . . , pi(t) are identical. A stopping time is a random variable τ taking values in [n] and satisfying the property {τ = t} ∈ Ft, that is, the decision to stop at time t is based only on the values of P1, . . . , Pt. I shall give a brief reminder of the formal definitions of conditional expectation and probability, which in the finite world are intuitive concepts. For more details, see page 304 of Galambos [30] or page 313 of Chung [23], for example. Let X be a random variable. 20 1. THE CLASSICAL SECRETARY PROBLEM Then the conditional expectation of X given Ft, denoted by E(X|Ft) is any Ft-measurable random variable satisfying∫ A E ( X ∣∣Ft) = ∫ A X for all A ∈ Ft. Any two random variables satisfying these conditions are equal with probability 1. As Ft is finite, this random variable is constant on each atom of Ft and takes the average value of X on that atom, that is, E ( X ∣∣Ft) (ω) = ∫AXP(A) = E(X|A), where A is the atom of Ft containing ω. In the same way that the probability of an event E is the expectation of the indicator function 1(E) of this event, the conditional probability of E given Ft is defined by P ( E ∣∣Ft) = E (1(E)∣∣Ft) , that is, since Ft is finite, P ( E ∣∣Ft) (ω) = E (1(E)∣∣Ft) (ω) = ∫A 1(E)P(A) = P(A ∩ E)P(A) = P(E|A), where A is the atom of Ft containing ω. Define the family of random variables (Zt)t∈[n] by Zt = P ( pi(t) ∈ max(P )∣∣Ft) , that is, the random variable Zt is the probability that the t th element observed is maxi- mal given P1, . . . , Pt. The general aim will be to choose stopping times τ to maximize P ( pi(τ) ∈ max(P )). This quantity is equal to E(Zτ ) (see page 45 of Chow, Robbins and Siegmund [22], for example): E(Zτ ) = ∫ Ω Zτ = n∑ t=1 ∫ {τ=t} Zt = n∑ t=1 ∫ {τ=t} P ( pi(t) ∈ max(P )∣∣Ft) = n∑ t=1 ∫ {τ=t} 1 ( pi(t) ∈ max(P )) = ∫ Ω 1 ( pi(τ) ∈ max(P )) = P(pi(τ) ∈ max(P )), 1.5. USEFUL THEOREMS 21 the fourth equality holding by the definition of conditional expectation since {τ = t} ∈ Ft. These equivalent formulations will be useful later; this conditional probability can be treated as a pay-off offered at each step of the process. Recall that F ′t is the projection of Ft onto P(Sn). A randomized stopping time is a random variable τ taking values in [n] and satisfying the property {τ = t} ∈ F ′t × B, that is, the decision to stop at time t is based on the values of P1, . . . , Pt and on some B- measurable random variable. The randomized stopping times considered in Chapter 3 will be convex combinations of a finite number of true stopping times, so if such a randomized stopping time gives a certain probability of success, then there is a true stopping time with at least that probability of success. In fact, this is true in general, as proved by Ghoussoub [38]. 1.5. Useful theorems In this section, I shall define backward induction formally and use it to solve the classical secretary problem. I shall also state a theorem that gives an optimal stopping time for monotone processes, to be defined later. The next theorem, which is proved as Theorem 3.2 on page 50 of Chow, Robbins and Siegmund [22], gives in principle an optimal stopping time for any finite process, although in practice it might be hard to define such a stopping time more explicitly. It formalizes the concept of backward induction. Informally, it can be described is as follows. Define a new random variable for each t, the value of the process at time t. This is the expected pay-off ultimately accepted given what has happened so far. These values are calculated inductively, starting at the end. The value of the process at the final step is just the final pay-off offered. The value of the process at each earlier step is the maximum of the currently offered pay-off and the expected value of the process at the next step. An optimal strategy is to stop when the currently offered pay-off is at least the expected value at the next step. In the theorem below, the pay-offs offered are the Wt and the values at each step are the γt. The σ-algebras At represent what is known at time t. As a reminder, being 22 1. THE CLASSICAL SECRETARY PROBLEM At-measurable means that σ(Wt) ⊂ At, that is, the value of Wt is determined by what is known at time t or, in the finite world, Wt is constant on each atom of At. In fact, the nested condition means that At ⊃ σ(W1, . . . ,Wt). The conclusion of the theorem is that the strategy that stops at the first t when Wt = γt (or, equivalently, when Wt is at least as large as the expected value of γt+1 given At) is indeed a stopping time and achieves the optimal value. Theorem 1.1. Let A1 ⊂ . . . ⊂ An be a nested sequence of σ-algebras and let W1, . . . ,Wn be a sequence of random variables with each Wt being At-measurable. Let C(At) be the class of stopping times relative to (At)t∈[n] and let v∗ be given by v∗ = sup τ∈C(At) E(Wτ ). Define successively γn, γn−1, . . . , γ1 by setting γn = Wn, γt = max { Wt,E ( γt+1 ∣∣At)}, t = n− 1, . . . , 1. Let τ ∗ = min{t : Wt = γt}. Then τ ∗ ∈ C(At) and E(Wτ∗) = E(γ1) = v∗ ≥ E(Wτ ) for all τ ∈ C(At).  This theorem can now be used to justify the assertion in Section 1.2 that an optimal strategy is of the form “reject the first k candidates, and accept the next who is the best seen so far.” Recall from Section 1.4 that Zt = P ( pi(t) ∈ max(P )∣∣Ft) , that is, the probability that the tth candidate is the best one given what is known at this point. Backward induction will be applied with the Zt corresponding to the Wt, with Ft to Gt and with δt to γt. 1.5. USEFUL THEOREMS 23 Since the random variables Zt are independent, the values of Z1, . . . , Zt give no infor- mation about the values of Zt+1, . . . , Zn, and therefore E(δt+1|Ft) is constant on all atoms of Ft and equal to E(δt+1). Therefore, define the function v : [n]→ R by v(t) = E(δt) and note that backward induction tells us that the stopping time that stops at the first t such that Zt ≥ E(δt+1|Ft) = v(t+ 1) is optimal. By definition, v(t) = E ( max{Zt, v(t+ 1)} ) ≥ v(t+ 1), whereas t n , the potential non-zero value of Zt, is a non-decreasing function of t. Therefore, there exists k such that t n < v(t+ 1) if t ≤ k, t n ≥ v(t+ 1) if t > k, and an optimal strategy is “reject the first k candidates, and accept the next who is the best seen so far,” as claimed. In fact, this argument is not even necessary, as the classical secretary problem is a sufficiently straightfoward process that backward induction can be used to give an explicit optimal strategy. This is because in this case the δt can be calculated explicitly, as in the next lemma, and these define an optimal strategy. Lemma 1.2. For all t ≤ n, if n∑ i=t 1 i ≤ 1, then E ( δt ∣∣Ft−1) = t− 1 n n−1∑ i=t−1 1 i , that is, it is the constant random variable taking that value; otherwise, E ( δt ∣∣Ft−1) = E (δt+1∣∣Ft) . 24 1. THE CLASSICAL SECRETARY PROBLEM Proof. By definition, δn = Zn and so E ( δn ∣∣Fn−1) = 1 n = n− 1 n · 1 n− 1 . For t ≤ n− 1, if t n ≥ t n n∑ i=t 1 i = E ( δt+1 ∣∣Ft) , then E ( δt ∣∣Ft−1) = P(Zt ≥ E (δt+1∣∣Ft) ) · E(Zt∣∣∣(Zt ≥ E(δt+1|Ft))) + P ( Zt < E ( δt+1 ∣∣Ft) ) · E(E(δt+1|Ft)∣∣∣(Zt < E(δt+1|Ft))) (1.1) = 1 t · t n + t− 1 t · t n n−1∑ i=t 1 i = t− 1 n n−1∑ i=t−1 1 i . If t n < t n n∑ i=t 1 i = E ( δt+1 ∣∣Ft) , then, from the formula in (1.1), it is the case that E ( δt ∣∣Ft−1) = 0 · t n + 1 · E (δt+1∣∣Ft) = E ( δt+1 ∣∣Ft) .  This lemma and the backward induction theorem clearly give the same optimal stop- ping time as before: accept the tth candidate if he or she is the best so far and ∑n i=t 1 i > 1. The other main tool used in Part 1 of this dissertation is the most basic result for infinite processes. It is used in Chapter 2 for the reason that the twins case will be reduced to a process that, although finite, does not have a fixed number of steps, and so backward induction is inappropriate. The following theorem is slightly adapted from that on page 55 of Chow, Robbins and Siegmund [22], since the random variables of interest are all positive. 1.5. USEFUL THEOREMS 25 This theorem applies only to the monotone case, and monotonicity is a strong pro- perty: it is the property that, after the first time in the process where the currently offered pay-off is at least the expected pay-off at the next step, the same is true at all future times. The conclusion of the theorem is that the first time when this is true is an optimal stopping time. Theorem 1.3. Let A1 ⊂ A2 ⊂ . . . be a nested sequence of σ-algebras and let W1,W2, . . . be a sequence of random variables with each Wt being At-measurable. Let C(At) be the class of stopping times relative to (At)t∈[n]. For t ∈ N, let At = { E ( Wt+1 ∣∣At) ≤ Wt}. and suppose that A1 ⊂ A2 ⊂ . . . and ∞⋃ t=1 At = Ω. (1.2) Let τ ∗ = min { t : Wt ≥ E ( Wt+1 ∣∣At)}. Suppose that P(τ ∗ <∞) = 1 and E(Wτ∗) exists and that lim inf t ∫ {τ∗>t} Wt = 0. Then E(Wτ∗) ≥ E(Wτ ) for all τ ∈ C(At).  If equation (1.2) holds then the process (Wt,At)t∈[n] is said to be monotone. CHAPTER 2 How to choose the best twins 2.1. Introduction Sections 2.1 to 2.5 of this chapter are based on joint work with Micha l Morayne and Grzegorz Kubicki [34], but with some significant reorganization of its presentation, particularly in Section 2.3. Sections 2.6 and 2.7 are my own work. The poset initially considered in this chapter is the set of m pairs of identical twins. This can also be viewed as a version of the classical secretary problem where each element is seen exactly twice. The Hasse diagram of this poset is Figure 2.1. s s s s s s s s s s " " " " " " " " " " " " " " " " " " " " b b b b b b b b b b b b b b b b b b b b u1 u2 u3 u4 u5 v1 v2 v3 v4 v5 Figure 2.1. The poset (U ∪ V,≺) for m = 5. In Section 2.6, the poset considered is the natural extension to m sets of identical c-tuplets. The main results of this chapter are summarized in the following theorems. Theorem 2.1. For m ∈ N, let km = min { k : 2m k + m−1∑ j=k 1 j ≤ 5 } . An optimal strategy for the secretary problem on m pairs of identical twins is to wait until candidates have been seen from at least km of the pairs and then to pick the next candidate who is the best so far and whose twin has already been seen. Asymptotically, lim m→∞ km m = 1 x0 ≈ 0.4709, 27 28 2. HOW TO CHOOSE THE BEST TWINS where x0 is the unique solution to 2x+ log x = 5, and the probability of success tends to 1 x0 + 4(x0 − 1)2 3x0 (( x0 x0 − 1 ) 1 2 − 1 ) ≈ 0.7680. Theorem 2.2. For c,m ∈ N with c ≥ 2, let k(c)m = min { k : m−k∑ j=1 [( m−j−1 k−1 )( m−1 k−1 ) j∏ i=2 ( 1− 1(ci c ))] ≤ 1} . An optimal strategy for the secretary problem on m sets of identical c-tuplets is to wait until candidates have been seen from at least k (c) m of the c-tuples and then to pick the next candidate who is the best so far and all of whose c-tuplets have already been seen. For all m, ( 1 2 − c+ 1 2(2c+1(c− 1)− c− 1) ) m < k(c)m ≤ ⌈m 2 ⌉ , and the probability of success is at least 1− c+ 1 2c(c− 1) . At first, it might seem surprising that the secretary problem is easier in the twins case than on the total order, since there are fewer comparisons that can be made and so apparently less information. However, this is reconciled by the fact that if we attempt to compare two candidates then there are three possible outcomes rather than two, which gives us more information. Similarly, viewing the twins case as having two chances with each candidate makes it intuitive that it should be easier, which is correct. Most of the notation used is as described in Section 1.4, and in Section 2.2 I shall describe the notation used specifically for the twins case. In Section 2.3, I shall use the monotone case theorem (Theorem 1.3) to find an optimal strategy for this problem and in Section 2.4 I shall give a formula for its probability of success. In Section 2.5, I shall describe the asymptotic behaviour of the threshold in the definition of the optimal strategy given and of the probability of success. I shall adapt the earlier method in Section 2.6 to find an optimal strategy and its probability of success for the natural generalization to c-tuplets, and give some bounds that illustrate their behaviour. Finally, in Section 2.7, I 2.2. NOTATION AND BASIC DEFINITIONS 29 shall state what work there is still to be done in this case and suggest some other posets that one might wish to consider. 2.2. Notation and basic definitions In this section, I shall define the poset consisting of m pairs of identical twins explicitly, as I shall refer to it in this way in the rest of the chapter. Consider a set consisting of two chains U = {u1, . . . , um} and V = {v1, . . . , vm} with u1 and v1 being maximal elements. For every i with 1 ≤ i ≤ m, the elements ui and vi are incomparable. An element with a smaller subscript is higher then an element with a larger subscript, that is, ui ≺ vj if i > j and vj ≺ ui if i < j. The elements ui and vi are referred to as the twins at level i. The Hasse diagram of this partial order (U ∪ V,≺) for m = 5 is given in Figure 2.1. Suppose that the elements of U ∪ V are observed one by one in the order given by a random permutation pi, with all (2m)! permutations of U∪V equally likely. For every time t with 1 ≤ t ≤ 2m, we observe the partial order induced by the set {pi(1), pi(2), . . . , pi(t)}. As an example, suppose that we have a permutation pi = (u3, v2, u1, v3, u5, v1, u4, v5, v4, u2). The induced orders observed for pi up to t = 6 are given in Figure 2.2. The goal is to choose the presently observed element to maximize the probability that this element is either u1 or v1, that is, one of the two maximal elements of the poset (U ∪ V,≺). In other words, we are looking for the stopping time τ such that P ( pi(τ) ∈ {u1, v1} ) is maximal. The function τ , the optimal stopping time, depends itself on pi. The decision at time t is based exclusively on the induced partial orders observed until time t and the order of appearance of their elements. For example, if the stopping time tells us to stop on the third element of the permutation given in Figure 2.2, then using this function we have to stop at time t = 3 for all permutations of U ∪ V whose first three elements have strictly decreasing subscripts. 30 2. HOW TO CHOOSE THE BEST TWINS s ss ss s ss s s ss ss s ss ss s s @ @ @ @ @ @ t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 pi(1) pi(2) pi(1) pi(3) pi(2) pi(1) pi(3) pi(2) pi(1) pi(4) pi(3) pi(2) pi(1) pi(5) pi(4) pi(3) pi(2) pi(1) pi(5) pi(4) pi(6) Figure 2.2. Induced orders when pi = (u3, v2, u1, v3, u5, v1, u4, v5, v4, u2) for 1 ≤ t ≤ 6. 2.3. The optimal stopping time In this section, I shall use the monotone case theorem to find an optimal stopping time for the process. Define recursively the following stopping times: τi = min { t > τi−1 : pi(t) ∈ max{pi(1), pi(2), . . . , pi(t)} and {pi(1), pi(2), . . . , pi(t− 1)} contains the twin of pi(t) } , with the assumptions that τ0 = 0 and if the set under the minimum is empty, then its minimum is 2m. Recall that the twins come from m levels, with the elements ui and vi referred to as the twins at level i. Let km = min { k : 2m k + m−1∑ j=k 1 j ≤ 5 } and let τ = min{τi : the number of levels occupied by pi(1), pi(2), . . . , pi(τi) is at least km}, with the assumption that if the set under minimum is empty, then τ = 2m. Theorem 2.3. The stopping time τ is optimal. 2.3. THE OPTIMAL STOPPING TIME 31 Clearly, we should not consider stopping unless the currently observed element is maximal. Furthermore, if we are yet to see its twin then we should continue: if it is from level 1, then we shall have seen elements from at least as many levels by the time we see its twin and so shall be at least as confident that it is from level 1; if it is not from level 1, then we have a chance of finding this out by observing an element from a higher level. Thus, in order to maximize P (pi(τ) ∈ {u1, v1}), it is necessary to consider only such stopping times τ whose values coincide with some τi’s. Define the random variables Lt = level of the element pi(t). Recall from Section 1.4 that Zt = P ( pi(t) ∈ {u1, v1} ∣∣Ft) = P (Lt = 1∣∣Ft) . The main aim is to show that the process (Zτi ,Fτi)i∈N is monotone (see equation (1.2)), so that Theorem 1.3 can be applied. First note that if τi = 2m then τi+1 = 2m and E ( Zτi+1 ∣∣Fτi) = Zτi , so it is only necessary to consider those i for which τi < 2m. For the rest of this section it will therefore be assumed that τi is a maximal element whose twin has already been seen, and that there are still some unseen elements. A result is needed about the probability of success of the simple stopping time τ1. When using τ1, we stop at the first opportunity when we see a second twin who is maximal at that moment. Denote this probability by P (m). Interestingly, the next lemma says that even the most naive strategy is successful with probability greater than 2 3 , which is much better than the corresponding 1 n for the total order on n elements. Lemma 2.4. For the stopping time τ1 the probability of success is equal to P (m) = 2m+ 1 3m . Proof. Observe that P ( Lτ1 = 1 ∣∣L1 = 1) = 1, 32 2. HOW TO CHOOSE THE BEST TWINS since pi(τ1) can only be maximal at time τ1 if it is from a level at most L1. I shall prove by induction on l that P ( Lτ1 = 1 ∣∣L1 = l) = 2 3 for all l ≥ 2, from which the result follows, since then P (m) = P ( Lτ1 = 1 ) = m∑ l=1 P ( Lτ1 = 1 ∣∣L1 = l) · P(L1 = l) = 1 m ( 1 + 2 3 (m− 1) ) = 2m+ 1 3m . If l = 2, then the claim is true, since the second twin from level 2 and the two twins from level 1 are identically distributed, and we win if and only if the first of these three to appear is from level 1. Now suppose that l > 2 and consider the level of the next element to appear from a level at most L1, at time τ ′, say. If 1 < Lτ ′ < l, then the situation is the same as if pi(τ ′) were the first element chosen, and by the induction hypothesis P ( Lτ1 = 1 ∣∣{L1 = l} ∩ {1 < Lτ ′ < l}) = 2 3 . Otherwise, we win if and only if one of the two twins from level 1 appears before the second twin from level l. Since these three are identically ditstributed, it is the case that P ( Lτ1 = 1 ∣∣{L1 = l} ∩ {1 < Lτ ′ < l}c) = 2 3 . Thus, in either case, the probability is 2 3 , and the claim is proved.  I shall use the following combinatorial identity, which arises from consideration of the classical secretary problem, to simplify the formulae in the definition of the optimal stopping time and, later, its probability of success. Lemma 2.5. For all 1 ≤ k ≤ m− 1, m−k∑ j=1 1 j · ( m−j−1 k−1 )( m k ) = k m m−1∑ j=k 1 j . Proof. The proof proceeds by showing that both sides are equal to the probability of success in the classical secretary problem using the strategy: “reject the first k elements, 2.3. THE OPTIMAL STOPPING TIME 33 and accept the next maximal element observed.” Let W be the event that the best candidate is chosen. LHS. For 0 ≤ j ≤ m− k, let Bj be the event that the highest of the first k elements observed is at level j + 1. This means that there are m − j − 1 levels available for the remaining k − 1 elements. Let Cj be the event that the maximum element is the first element seen out of the highest j elements. Then Bj and Cj are independent, and P(W ) = m−k∑ j=1 P(Bj)P(W |Bj) = m−k∑ j=1 P(Bj)P(Cj) = m−k∑ j=1 ( m−j−1 k−1 )( m k ) · 1 j . RHS. For k ≤ j ≤ m − 1, let Dj be the event that the (j + 1)th element observed is the maximal element of the chain, and let Ej be the event that the highest of the first j elements observed is in the first k observed. Then Dj and Ej are independent, and P(W ) = m−1∑ j=k P(Dj)P(W |Dj) = m−1∑ j=k P(Dj)P(Ej) = m−1∑ j=k 1 m · k j . Thus the lemma is proved.  Let Nk(i) denote the event that at time τi the number of levels occupied by the elements pi(1), pi(2), · · · , pi(τi) is equal to k. Note that Nk(i) ∈ Fτi . The next two lemmas find the values that, when compared, determine whether or not a process is monotone. Lemma 2.6. For all ω ∈ Nk(i), Zτi(ω) = k m . 34 2. HOW TO CHOOSE THE BEST TWINS Proof. Elements from different levels are equally likely to appear in any position in the random order. Therefore, when k levels have been observed, the probability that one of them is the highest level is k m , independent of the most recently observed element being from the highest level seen so far.  Lemma 2.7. For all ω ∈ Nk(i), E ( Zτi+1 ∣∣Fτi) (ω) = 23 + k3m ( m−1∑ j=k 1 j − 2 ) . Proof. Let E = ∫ A E ( Zτi+1 ∣∣Fτi) , where A ∈ Fτi is the atom containing ω. By the definition of conditional expectation, it is sufficient to show that E = ( 2 3 + k 3m ( m−1∑ j=k 1 j − 2 )) P(A). It is the case that E = ∫ A Zτi+1 = ∫ A 1 ( Lτi+1 = 1 ) = P ({Lτi+1 = 1} ∩ A) = m−k∑ j=1 P ({Lτi+1 = 1} ∩ {Lτi = j + 1} ∩ A) = m−k∑ j=1 P ( Lτi+1 = 1 ∣∣{Lτi = j + 1} ∩ A) · P (Lτi = j + 1∣∣A) · P(A). As the probability of getting an element from the highest level at time τi+1 does not depend on the pattern of the levels up to the time τi, and as A provides only a certain pattern how to put elements on any chosen k levels, it follows that, by Lemma 2.4, E = m−k∑ j=1 2j + 1 3j · ( m−j−1 k−1 )( m k ) · P(A) = ( m−k∑ j=1 2 3 · ( m−j−1 k−1 )( m k ) + m−k∑ j=1 1 3j ( m−j−1 k−1 )( m k ) )P(A) 2.3. THE OPTIMAL STOPPING TIME 35 and, by Lemma 2.5, E = ( 2 3 · ( m−1 k )( m k ) + 1 3 m−1∑ j=k 1 j ( m−1 k−1 )( m k ) )P(A) = ( 2 3 · m− k m + 1 3 · k m m−1∑ j=k 1 j ) P(A), which proves the lemma.  Let Ai = { E ( Zτi+1 ∣∣Fτi) ≤ Zτi} be the event that the pay-off offered at time τi is at least the expected value of the pay-off at time τi+1. Lemma 2.8. For all ω ∈ Nk(i), it is the case that ω ∈ Ai if and only if 2m k + m−1∑ j=k 1 j ≤ 5. Proof. Let ω ∈ Nk(i). Then, by Lemmas 2.6 and 2.7, it is the case that ω ∈ Ai if and only if 2 3 + k 3m ( m−1∑ j=k 1 j − 2 ) = E ( Zτi+1 ∣∣Fτi) (ω) ≤ Zτi(ω) = km, which gives the result.  The proof that τ is an optimal stopping time can now be completed. Proof of Theorem 2.3. It is sufficient to show that the process (Zτi ,Fτi)i∈N is in the monotone case, that is, that Ai ⊂ Ai+1 for each i; the monotone case theorem (Theorem 1.3) then says that τ is optimal. Let ω ∈ Ai and let k be the number of levels observed at time τi, so that ω ∈ Nk(i). Then, by Lemma 2.8, the inequality 2m k + m−1∑ j=k 1 j ≤ 5 holds. At time τi+1 we must have seen more than k levels, and the left-hand side of the inequality decreases as k increases, and so ω ∈ Ai+1.  36 2. HOW TO CHOOSE THE BEST TWINS 2.4. The probability of success In this section, I shall work out a formula for the probability of success when using an optimal stopping time. Theorem 2.9. When using the optimal stopping time τ , the probability of success equals P ( Lpi(τ) = 1 ) = 1 3m [ 2m+ km − ( km − km−1∑ s=0 s∏ r=1 2(m− km + r) 2(m− km + r) + 1 )( 3− m−1∑ j=km 1 j )] . (2.1) It is not surprising that evaluating the threshold produces a formula involving the harmonic series that cannot be expressed in closed form; a similar phenomenon occurs in the solution of the classical secretary problem. For a permutation pi ∈ S(U ∪ V ), let τpi = min { t : pi(1), pi(2), . . . , pi(t) occupy exactly km levels } . Set M1 = { poset 〈 pi(1), pi(2), . . . , pi(τpi) 〉 has exactly one maximal element } and M2 = { poset 〈 pi(1), pi(2), . . . , pi(τpi) 〉 has exactly two maximal elements } . I shall begin by proving the following lemma. Lemma 2.10. P(M1) = 1 km km−1∑ s=0 s∏ r=1 2(m− km + r) 2(m− km + r) + 1 . (2.2) Proof. In the proof of this lemma, the levels will be referred to in the order in which they are observed (1, 2, . . . , km), not their positions in the poset. Some of the elements will come from the same level as earlier elements, so one or two elements from each of these km levels will have been observed. Let τpi(j) be the smallest t such that pi(1), pi(2), . . . , pi(t) occupy exactly j levels (so, in particular, τpi = τpi(km)). For i ≤ j, let Bi(j) be the event that exactly one of 2.4. THE PROBABILITY OF SUCCESS 37 pi(1), pi(2), . . . , pi ( τpi(j) ) occupies the ith level, that is, at the first moment when j le- vels have been seen, exactly one element has been seen from the ith level. Note that P ( Bi(i) ) = 1 for all i, and also that, for i ≤ km, the events are nested in the sequence Bi(i) ⊃ Bi(i+ 1) ⊃ . . . ⊃ Bi(km). Thus P ( Bi(km) ) = P ( Bi(i) ∩Bi(i+ 1) ∩ . . . ∩Bi(km) ) = P ( Bi(i) ) P ( Bi(i+ 1) ∣∣Bi(i)) . . .P (Bi(km)∣∣Bi(km − 1)) . The probability P ( Bi(j + 1) ∣∣Bi(j)) is easy to calculate. Indeed, suppose we are at time τpi(j), having observed elements from j levels including exactly one element from the ith level. There are 2(m − j) elements occupying levels not yet observed, and for Bi(j + 1) we require that one of these is observed before the second element from the i th level. These 2(m− j) + 1 elements are identically distributed, so by symmetry P ( Bi(j + 1) ∣∣Bi(j)) = 2(m− j) 2(m− j) + 1 . Thus P ( Bi(km) ) = P ( Bi(i) ) P ( Bi(i+ 1) ∣∣Bi(i)) . . .P (Bi(km)∣∣Bi(km − 1)) = km−1∏ r=i 2(m− r) 2(m− r) + 1 = km−i∏ r=1 2(m− km + r) 2(m− km + r) + 1 , following the convention that the empty product has value 1. Let Ci be the event that the i th level observed is the highest level observed by time τpi, and let Di be the event that both Bi(km) and Ci hold, that is, that by time τpi exactly one element has been observed from the ith level and it is the highest observed so far. Note that the levels are equally likely to be observed in any order, so P(Ci) = 1km , and 38 2. HOW TO CHOOSE THE BEST TWINS that Bi(km) and Ci are independent, so P(Di) = P ( Bi(km) ∩ Ci ) = P ( Bi(km) ) P(Ci) = 1 km km−i∏ r=1 2(m− km + r) 2(m− km + r) + 1 . Note finally that M1 is the disjoint union of the Dis, so P(M1) = P(D1) + P(D2) + . . .+ P(Dkm) = km∑ i=1 1 km km−i∏ r=1 2(m− km + r) 2(m− km + r) + 1 = 1 km km−1∑ s=0 s∏ r=1 2(m− km + r) 2(m− km + r) + 1 , and the lemma is proved.  The proof of the formula for the probability of success can now be completed. Proof of Theorem 2.9. For each integer j with 2 ≤ j ≤ m− km + 2, define M j1 to be the subevent of M1 where a second best element of 〈pi(1), pi(2), . . . , pi(τpi)〉 is on level j. These subevents form a partition of M1 and P(M j1 ) = P ( M j1 ∣∣M1)P(M1) = (m−jkm−2)(j − 1)(m km ) P(M1). Similarly, for each integer j with 1 ≤ j ≤ m − km + 1, define M j2 to be the subevent of M2 where the maximal elements of 〈pi(1), pi(2), . . . , pi(τpi)〉 are on level j. These subevents form a partition of M2 and P(M j2 ) = P ( M j2 ∣∣M2)P(M2) = (m−jkm−1)(m km ) (1− P(M1)). Given M j2 , the probability of success when using τ , which tells us to stop at the next τi, is equal to 2(j−1)+1 3(j−1) , the probability of success when we use the simple stopping time τ1 on the poset consisting of j − 1 pairs of twins above the level j. Similarly, given M j1 , the probability of success when using τ is equal to 2(j−1)+1 3(j−1) , the probability of success when we use the simple stopping time τ1 on the poset consisting of j − 1 pairs of twins above 2.4. THE PROBABILITY OF SUCCESS 39 the level j of a second best element obtained so far, and with the maximal element being treated as the first random element observed in this reduced poset. The term with j = 1 in the second sum below disappears since there is no chance of stopping on a maximal element in that case. Therefore, P(Lτ = 1) = m−km+2∑ j=2 P ( Lpi(τ) = 1 ∣∣M j1)P(M j1 ) + m−km+1∑ j=1 P ( Lpi(τ) = 1 ∣∣M j2)P(M j2 ) = m−km+2∑ j=2 2(j − 1) + 1 3(j − 1) · ( m−j km−2 ) (j − 1)( m km ) P(M1) + m−km+1∑ j=2 2(j − 1) + 1 3(j − 1) · ( m−j km−1 )( m km ) (1− P(M1)), which, after changing the index of summation in both sums, becomes P(Lτ = 1) = P(M1) m−km+1∑ j=1 2j + 1 3 · ( m−j−1 km−2 )( m km ) + (1− P(M1))m−km∑ j=1 2j + 1 3j · ( m−j−1 km−1 )( m km ) = P(M1) ∑m−km+1 j=1 2j 3 ( m−j−1 km−2 ) + 1 3 ∑m−km+1 j=1 ( m−j−1 km−2 )( m km ) + ( 1− P(M1) )∑m−kmj=1 23(m−j−1km−1 )+∑m−kmj=1 13j (m−j−1km−1 )( m km ) . Since, by standard combinatorial methods, m−km+1∑ j=1 j ( m− j − 1 km − 2 ) = ( m km ) , m−km+1∑ j=1 ( m− j − 1 km − 2 ) = ( m− 1 km − 1 ) and m−km∑ j=1 ( m− j − 1 km − 2 ) = ( m− 1 km ) , and, by Lemma 2.5, m−km∑ j=1 1 j ( m− j − 1 km − 1 ) = ( m− 1 km − 1 ) m−1∑ j=km 1 j , 40 2. HOW TO CHOOSE THE BEST TWINS it is the case that P(Lτ = 1) = P(M1) ( 2 3 + 1 3 · ( m−1 km−1 )( m km ) )+ (1− P(M1)) 23(m−1km )+ 13(m−1km−1)∑m−1j=km 1j(m km ) = P(M1) ( 2 3 + km 3m ) + ( 1− P(M1) )(2(m− km) 3m + km 3m m−1∑ j=km 1 j ) = 2m+ km 3m − (1− P(M1))(km m − km 3m m−1∑ j=km 1 j ) , and substituting in the formula for P(M1) from Lemma 2.10 gives the result.  2.5. Asymptotics I shall begin this section with exact evaluations of, and a crude bound for, the threshold needed for the optimal stopping time τ and the probability of success for m = 7, and then determine the asymptotic behaviour of those quantities as m→∞. Example. For m = 7, the threshold km = k7 = 4, since for k = 4 it is the case that 2 · 7 4 + 1 4 + 1 5 + 1 6 = 247 60 ≤ 5 but for k = 3 the reverse inequality holds, namely, 2 · 7 3 + 1 3 + 1 4 + 1 5 + 1 6 = 337 60 > 5. The probability of success is equal to P(Lτ = 1) = 1 21 [ 18− ( 4− ( 1 + 8 9 + 80 99 + 960 1287 ))( 3− 37 60 )] = 3001 3780 ≈ 0.7939. Observation. By the definition of km, one can easily observe that for every m it must be the case that km ≤ ⌈m 2 ⌉ . 2.5. ASYMPTOTICS 41 Indeed, if k ≥ m 2 then 2m k + m−1∑ j=k 1 j ≤ 2m k + m− k k ≤ 5. The following theorems establish the asymptotic behavior of the threshold km and the limit of the probability of success as m → ∞. Since m changes here, the optimal stopping time for the partial order consisting of m twins will be denoted by τ (m). Denote the unique solution of the equation 2x+ log x = 5 by x0, so that x0 ≈ 2.12347. Theorem 2.11. The threshold km satisfies m x0 < km < m x0 + 2 for all m ∈ N. In particular, lim m→∞ km m = 1 x0 ≈ 0.4709. Proof. By the definition of km from Theorem 2.9, it is the case that km satisfies the inequality 5 ≥ 2m km + m−1∑ j=km 1 j > 2m km + log ( m km ) , the latter inequality by a simple integration argument. Since 2x + log x is an increasing function, it must be the case that m km < x0. On the other hand, km − 1 does not satisfy the condition, so a similar argument yields 5 < 2m km − 1 + m−1∑ j=km−1 1 j < 2m km − 1 + log ( m− 1 km − 2 ) < 2m km − 2 + log ( m kn − 2 ) and hence m km − 2 > x0. Thus m x0 < km < m x0 + 2, as required.  I shall now turn to the asymptotic probability of success. 42 2. HOW TO CHOOSE THE BEST TWINS Theorem 2.12. The probability of success when using optimal stopping time τ (m) satisfies lim m→∞ P(Lpi(τ (m)) = 1) = 1 x0 + 4(x0 − 1)2 3x0 (( x0 x0 − 1 ) 1 2 − 1 ) ≈ 0.7680. The proof uses the following lemma, which gives the asymptotic value of the expression in (2.2). As with τ (m), the notation P ( M (m) 1 ) is used to emphasize the dependence on m. Lemma 2.13. The probability of M (m) 1 occurring satisfies lim m→∞ P ( M (m) 1 ) = 2(x0 − 1) (( 1 + 1 x0 − 1 ) 1 2 − 1 ) . Proof. Recall from Lemma 2.10 that P(M (m)1 ) is given by P ( M (m) 1 ) = 1 km km−1∑ s=0 s∏ r=1 2(m− km + r) 2(m− km + r) + 1 . Write f(s) for the summand, that is, f(s) = s∏ r=1 ( 1− 1 2(m− km + r) + 1 ) , and consider log(f(s)): log(f(s)) = s∑ r=1 log ( 1− 1 2(m− kn + r) + 1 ) = − s∑ r=1 ( 1 2(m− km + r) + 1 + 1 2 ( 1 2(m− km + r) + 1 )2 + . . . ) . For sufficiently largem, Theorem 2.11 shows that km m < 1 2 , and hence that 1 2(m−km+r)+1 < 1 m for all r. Thus, since s < m, the expression log(f(s)) is bounded by − s∑ r=1 1 2(m− km + r) + 1 − ( 1 2m + 1 3m2 + . . . ) < log(f(s)) < − s∑ r=1 1 2(m− km + r) + 1 and hence by − s∑ r=1 1 2(m− km + r) + 1 − ( 1 m + 1 2m2 + . . . ) < log(f(s)) < − s∑ r=1 1 2(m− km + r) + 1 . (2.3) 2.5. ASYMPTOTICS 43 The sum can be bounded further by integrals. Note that 1 2x is a decreasing function and obtain∫ s+1 m 1 m 1 2 ( 1− km m + 1 2m + x )dx ≤ s∑ r=1 1 2(m− km + r) + 1 ≤ ∫ s m 0 1 2 ( 1− km m + 1 2m + x )dx and hence 1 2 log ( 1 + s m− km + 32 ) ≤ s∑ r=1 1 2(m− km + r) + 1 ≤ 1 2 log ( 1 + s m− km + 12 ) . (2.4) Exponentiating (2.3) and substituting in the bounds in (2.4) gives( 1− 1 m )( 1 + s m− km + 12 )− 1 2 < f(s) < ( 1 + s m− km + 32 )− 1 2 . Noting that f(s) is a decreasing function, the same technique is used to bound P ( M (m) 1 ) = 1 km ∑km−1 s=0 f(s), which gives ∫ 1 0 ( 1− 1 m )( 1 + x m km − 1 + 1 2km )− 1 2 dx < P ( M (m) 1 ) < ∫ 1− 1 km − 1 km ( 1 + x m km − 1 + 3 2km )− 1 2 dx and hence 2 ( 1− 1 m )( m km − 1 + 1 2km )(( m+ 1 2 m− km + 12 ) 1 2 − 1 ) < P ( M (m) 1 ) < 2 ( m km − 1 + 3 2km )(( m+ 1 2 m− km + 32 ) 1 2 − ( 1− 1 m− km + 32 ) 1 2 ) . Recalling from Theorem 2.11 that limm→∞ mkm = x0, take limits as m→∞ and notice that both sides of this inequality tend to the same limit. Hence, lim m→∞ P ( M (m) 1 ) = 2(x0 − 1) (( x0 x0 − 1 ) 1 2 − 1 ) , and the lemma is proved.  The proof of the asymptotic optimal probability of success can now be completed. 44 2. HOW TO CHOOSE THE BEST TWINS Proof of Theorem 2.12. A similar integration argument to that in the proof of Lemma 2.13 gives lim m→∞ m−1∑ j=km 1 j = lim m→∞ log ( m km ) = log x0 = 5− 2x0. Substituting these limiting values into the equation for success (2.1), P(Lτ = 1) = 1 3m [ 2m+ km − ( km − km−1∑ s=0 s∏ r=1 2(m− km + r) 2(m− km + r) + 1 )( 3− m−1∑ j=km 1 j )] , and writing L = limm→∞ P ( Lτ (m) = 1 ) gives L = 1 3 [ 2 + 1 x0 − ( 1 x0 − 1 x0 · 2(x0 − 1) [( x0 x0 − 1 ) 1 2 − 1 ])( 3− (5− 2x0) )] = 1 x0 + 4(x0 − 1)2 3x0 (( x0 x0 − 1 ) 1 2 − 1 ) , as required.  2.6. How to choose the best c-tuplets The posets most closely related to the one considered in this chapter are those that consist of m sets of c-tuplets, that is, with vertex set U1∪. . .∪Uc where Ui = {u1i , . . . , umi }, for fixed j the uji s are incomparable and for all i1, j1, i2, j2 it is the case that uj1i1 ≺ uj2i2 if and only if j1 > j2. In this section, I shall give an explicit optimal strategy and a formula for the probability of its success. Finding their asymptotic behaviour is an open problem, although I shall give some bounds that represent progress in that direction. As with the twins case, the monotone case theorem can be used to find an explicit optimal strategy. Where it is unambiguous, I shall use notation from earlier in this chapter in this slightly different context without comment. Similarly to before, let τ (c) i = min { t > τ (c) i−1 : pi(t) ∈ max{pi(1), pi(2), . . . , pi(t)} and {pi(1), pi(2), . . . , pi(t− 1)} contains all the c-tuplets of pi(t) } , 2.6. HOW TO CHOOSE THE BEST c-TUPLETS 45 with the assumptions that τ (c) 0 = 0 and if the set under the minimum is empty, then its minimum is cm. To begin with, Lemma 2.4 can be generalized in the following way, where τ (c) 1 is the stopping time that stops at the first element that is maximal so far and all of whose c-tuplets have been seen. Lemma 2.14. For the stopping time τ (c) 1 the probability of success is equal to Pc(m) = m∏ i=2 ( 1− 1(ci c )) . Proof. For 2 ≤ i ≤ m, let Ni be the event that an element from level i is not chosen. Then Pc(m) = P ( N2 ∩ . . . ∩Nm ) = P ( Nm ) P ( Nm−1 ∣∣Nm) . . .P (N2∣∣N3 ∩ . . . ∩Nm) . Conditioned on Ni+1 ∩ . . . ∩ Nm, an element from level i will be selected if and only if the c elements from this level all come before the c(i − 1) elements from the i − 1 levels above. Hence, P ( Ni ∣∣Ni+1 ∩ . . . ∩Nm) = 1− 1(ci c ) , and the result follows.  This gives rise to an optimal stopping time. Similarly to before, let k(c)m = min { k : m−k∑ j=1 [( m−j−1 k−1 )( m−1 k−1 ) j∏ i=2 ( 1− 1(ci c ))] ≤ 1} and let τ (c) = min { τ (c) i : the number of levels occupied by pi(1), pi(2), . . . , pi(τi) is at least k (c) m } , with the assumption that if the set under the minimum is empty then its minimum is cm. Theorem 2.15. The stopping time τ (c) is optimal for the secretary problem on m sets of identical c-tuplets. 46 2. HOW TO CHOOSE THE BEST TWINS Proof. The proof of this using the monotone case theorem is essentially the same as in Section 2.3.  Calculating the probability of success is slightly hard work algebraically, but follows broadly similar lines to the twins case. As then, let Lt = level of the element pi(t). I shall use the multinomial coefficient notation( m r1, . . . , rc,m− r1 − . . .− rc ) to denote the number of ways of choosing c distinguishable sets of sizes r1, . . . , rc from a ground set of size m. Theorem 2.16. The probability that the stopping time τ (c) is successful is P(Lτ (c) = 1) = c−1∑ i=1   ∑ r1+...+rc=k (c) m r1,ri≥1 ( m r1,...,rc,m−k(c)m )( c 1 )r1 . . . (c c )rc( cm r1+2r2+...+crc ) · r1 r1 + 2r2 + . . .+ crc · ri k (c) m  k(c)m m + m−k(c)m +1∑ j=2 ( m−j k (c) m −1 )( m k (c) m ) (1− 1(cj−i c−i )) j−1∏ k=2 ( 1− 1( ck c ))  +  ∑ r1+...+rc=k (c) m r1,rc≥1 ( m r1,...,rc,m−k(c)m )( c 1 )r1 . . . (c c )rc( cm r1+2r2+...+crc ) · r1 r1 + 2r2 + . . .+ crc · rc k (c) m  m−k(c)m +1∑ j=2 ( m−j k (c) m −1 )( m k (c) m ) j−1∏ k=2 ( 1− 1( ck c ))  . As with the twins case, I shall need a lemma that gives the probability that, at the first moment when k (c) m levels have been seen, exactly i elements have been observed from the highest level seen. Let Mi be this event. Lemma 2.17. Let i be an integer with 1 ≤ i ≤ c. Then P(Mi) = ∑ r1+...+rc=k (c) m r1,ri≥1 ( m r1,...,rc,m−k(c)m )( c 1 )r1 . . . (c c )rc( cm r1+2r2+...+crc ) · r1 r1 + 2r2 + . . .+ crc · ri k (c) m . 2.6. HOW TO CHOOSE THE BEST c-TUPLETS 47 Proof. For r1 + . . .+ rc = k (c) m , let Ar1,...,rc be the event that when r1 + 2r2 + . . .+ crc elements have been seen, the number of levels from which exacly j elements have been seen is rj, for all j. Out of the ( cm r1+2r2+...+crc ) possible choices for the first r1 + 2r2 + . . . + crc elements, the number of choices such that Ar1,...,rc holds is as follows. First, there are( m r1,...,rc,m−k(c)m ) choices for the levels with each number of elements seen. Then, for each j and for each of the rj levels with j elements, there are ( c j ) choices for which of its elements have been seen. Therefore, P(Ar1,...,rc) = ( m r1,...,rc,m−k(c)m )( c 1 )r1 . . . (c c )rc( cm r1+2r2+...+crc ) . Now let Br1,...,rc be the event that the number of levels from which exacly j elements have been seen is rj, for all j, at the first moment when k (c) m levels have been observed. Then Br1,...,rc is only a non-empty event if r1 ≥ 1 and in that case Br1,...,rc occurs if and only if Ar1,...,rc occurs and the last element observed is one of the singletons. By symmetry, the singletons are equally likely to be in any position in the first r1+2r2+. . .+crc elements, and therefore P ( Br1,...,rc ∣∣Ar1,...,rc) = r1r1 + 2r2 + . . .+ crc . Finally, let Cr1,...,rc be the event that Br1,...,rc holds and that exactly i elements have been seen from the highest level. The highest level is equally likely to be any of the r1 + . . .+ rc = k (c) m levels chosen, and so P ( Cr1,...,rc ∣∣Br1,...,rc) = ri k (c) m . For Mi to occur, there must be at least one level from which only one element has been seen (the last one observed), and at least one level from which exactly i elements have been seen (the highest one). Subject to these restrictions, if Mi occurs then Cr1,...,rc occurs for some choice of r1, . . . , rc with r1 + . . . + rc = k (c) m , and the Cr1,...,rc are disjoint 48 2. HOW TO CHOOSE THE BEST TWINS and all lie within Mi. Therefore, P(Mi) = ∑ r1+...+rc=k (c) m r1,ri≥1 P(Cr1,...,rc) = ∑ r1+...+rc=k (c) m r1,ri≥1 ( m r1,...,rc,m−k(c)m )( c 1 )r1 . . . (c c )rc( cm r1+2r2+...+crc ) · r1 r1 + 2r2 + . . .+ crc · ri k (c) m , as required.  Let M ji be the event that at the first moment when k (c) m levels have been seen, there have been i elements from the highest level seen and that level is level j. Lemma 2.14 can easily be modified to give the following lemma, which gives the probability of success conditioned on M ji for all possible i and j. Lemma 2.18. (1) For i with 1 ≤ i ≤ c− 1, P ( Lτ (c) = 1 ∣∣M1i ) = 1. (This is the case where 1 ≤ i ≤ c− 1 and j = 1.) (2) For i, j with 1 ≤ i ≤ c− 1 and 2 ≤ j ≤ m− k(c)m + 1, P ( Lτ (c) = 1 ∣∣M ji ) = ( 1− 1(cj−i c−i )) j−1∏ k=2 ( 1− 1( ck c )) . (This is the case where 1 ≤ i ≤ c− 1 and 2 ≤ j ≤ m− k(c)m + 1.) (3) P ( Lτ (c) = 1 ∣∣M1c ) = 0. (This is the case where i = c and j = 1.) (4) For j with 2 ≤ j ≤ m− k(c)m + 1, P ( Lτ (c) = 1 ∣∣M jc ) = j−1∏ k=2 ( 1− 1( ck c )) . (This is the case where i = c and 2 ≤ j ≤ m− k(c)m + 1.)  The proof of the formula for the probability of success can now be completed. 2.6. HOW TO CHOOSE THE BEST c-TUPLETS 49 Proof of Theorem 2.16. By the same argument as in the proof of Theorem 2.9, it is the case that P(M ji ) = ( m−j k (c) m −1 )( m k (c) m ) P(Mi) = ( m−j k (c) m −1 )( m k (c) m ) ∑ r1+...+rc=k (c) m r1,ri≥1 ( m r1,...,rc,m−k(c)m )( c 1 )r1 . . . (c c )rc( cm r1+2r2+...+crc ) · r1 r1 + 2r2 + . . .+ crc · ri k (c) m . The events { M ji : 1 ≤ i ≤ c, 1 ≤ j ≤ m− k(c)m + 1 } partition the space and thus P (Lτ (c) = 1) = c∑ i=1 m−k(c)m +1∑ j=1 P ( Lτ (c) = 1 ∣∣M ji )P (M ji ) , and the proof is completed using Lemma 2.18.  Recall that k(c)m = min { k : m−k∑ j=1 [( m−j−1 k−1 )( m−1 k−1 ) j∏ i=2 ( 1− 1(ci c ))] ≤ 1} . The following lemma will be helpful for finding bounds for k (c) m . Lemma 2.19. For all integers c ≥ 2, and for all positive integers k and m with k ≤ m, it is the case that( 1− c+ 1 2c(c− 1) )(m k − 1 ) < m−k∑ j=1 [( m−j−1 k−1 )( m−1 k−1 ) j∏ i=2 ( 1− 1(ci c ))] ≤ m k − 1. Proof. For the upper bound, first observe that the value of the product is at most one. By standard combinatorial arguments, m−k∑ j=1 ( m− j − 1 k − 1 ) = ( m− 1 k ) and so m−k∑ j=1 ( m−j−1 k−1 )( m−1 k−1 ) = m− k k = m k − 1, which gives the upper bound. 50 2. HOW TO CHOOSE THE BEST TWINS For the lower bound, first observe that( ci c ) = ci c · ci− 1 c− 1 · . . . · ci− c+ 1 1 ≥ ic. Therefore, j∏ i=2 ( 1− 1(ci c )) > 1− ∞∑ i=2 i−c > 1− 1 2c − ∫ ∞ i=2 x−cdx (2.5) = 1− c+ 1 2c(c− 1) . Since this bound is independent of j, it can be treated as a constant factor and the sum evaluated as before, and the lower bound follows.  The reason that in (2.5) I bounded the sum only from i = 3, rather than the whole sum, was just to make the upper and lower bounds exponentially close. If I had not done so, then the first factor in the lower bound in Lemma 2.19 would have been 1− 1 c−1 , and the first factor in the lower bound in the next theorem would have been 1 2 − 1 2(2c−3) . Theorem 2.20. For all integers c ≥ 2 and m ≥ 1, it is the case that( 1 2 − c+ 1 2(2c+1(c− 1)− c− 1) ) m < k(c)m ≤ ⌈m 2 ⌉ . In particular, for all m ≥ 1, lim c→∞ k(c)m = ⌈m 2 ⌉ . Proof. If k ≤ ( 1 2 − c+ 1 2(2c+1(c− 1)− c− 1) ) m = 1− c+1 2c(c−1) 2− c+1 2c(c−1) m, then, by Lemma 2.19, m−k∑ j=1 [( m−j−1 k−1 )( m−1 k−1 ) j∏ i=2 ( 1− 1(ci c ))] > (1− c+ 1 2c(c− 1) )(m k − 1 ) ≥ 1, and the lower bound follows. 2.6. HOW TO CHOOSE THE BEST c-TUPLETS 51 If k ≥ m 2 , then, by Lemma 2.19, m−k∑ j=1 [( m−j−1 k−1 )( m−1 k−1 ) j∏ i=2 ( 1− 1(ci c ))] ≤ m k − 1 ≤ 1, and the upper bound follows.  A lower bound for the probability of success comes from observing that the proof of Lemma 2.19 bounds the probability of success of the simple stopping time τ1 given in Lemma 2.14 by Pc(m) ≥ 1− c+ 1 2c(c− 1) , and an optimal stopping time must do at least as well. This argument is equivalent to bounding the products in the formula for the probability of success in Theorem 2.16. c 1 2 − c+1 2(2c+1(c−1)−c−1) k (c) 100 k (c) 1000 limm→∞ k (c) m m 1− c+1 2c(c−1) w (c) 100 limm→∞w (c) m 1 - 38 369 0.3679 - 0.3708 0.3679 2 0.2000 48 471 0.4709 0.2500 0.7697 0.7680 3 0.4286 50 493 ? 0.7500 0.9354 ? 4 0.4725 50 499 ? 0.8958 ? ? 5 0.4880 50 500 ? 0.9531 ? ? 6 0.4945 50 500 ? 0.9781 ? ? 7 0.4974 50 500 ? 0.9896 ? ? 8 0.4987 50 500 ? 0.9950 ? ? 9 0.4994 50 500 ? 0.9976 ? ? 10 0.4997 50 500 ? 0.9988 ? ? Figure 2.3. A table showing bounds for and exact and asymptotic values of k (c) m and w (c) m = P(Lτ (c) = 1) for small values of c. For convenience and to emphasise the dependence on m, for given c and m let w(c)m = P(Lτ (c) = 1) be the probability of success of τ (c) on m sets of c-tuplets. Figure 2.3 summarizes the bounds from this section, and gives exact values of k (c) 100, k (c) 1000 and w (c) 100, calculated using 52 2. HOW TO CHOOSE THE BEST TWINS Maple™ 13. Unfortunately, the formula in Theorem 2.16 is too complicated to be calcu- lated efficiently for anything but small c; similarly, these values of m are near the limit of what I could calculate. 2.7. Open problems I shall first summarize the main goals for the c-tuplets case. (1) Simplify the formula for the probability of success in Theorem 2.16. (2) For fixed c, prove that limm→∞ k (c) m m exists. (3) If limm→∞ k (c) m m exists, find its value as the root of an equation or as a function of limm→∞ k (c−1) m m . (4) For fixed c, prove that limm→∞w (c) m exists. (5) If limm→∞w (c) m exists, find its value as the root of an equation or as a function of the value for limm→∞w (c−1) m . I have already shown that for any fixed m, lim c→∞ k(c)m = ⌈m 2 ⌉ and lim c→∞ w(c)m = 1, which gives the values of lim c→∞ lim m→∞ k (c) m m and lim c→∞ lim m→∞ w(c)m if they exist. In this chapter, I have assumed that the aim is to pick the best candidate. It is almost certainly a harder problem, as with the original secretary problem, to try to minimize the expected rank instead. Alternatively, the full information case would be interesting to study; here, m random samples would be taken from a known distribution and each replicated c− 1 times, and all cm of these values revealed to the selector one by one. 2.7. OPEN PROBLEMS 53 Inspired further by the results in Chapter 1, one could allow the interviewer to go back to an earlier candidate, who is still available with a certain probability, or assume that the number of sets of c-tuplets is unknown but comes from a known distribution. A modification of the c-tuplets poset that might be worth thinking about is a ‘pyramid’ with i candidates at level i. It seems likely that this would lie between the classical secretary problem and the twins version in terms of difficulty. It is also natural to try to solve the problem of optimal choice, or any of these variations, on other common posets, for example (1) Qk = {0, 1}k, where (x1, . . . , xk) ≺ (y1, . . . , yk) if (x1, . . . , xk) 6= (y1, . . . , yk) and xi ≤ yi for 1 ≤ i ≤ k, (2) [m]2, where (x1, x2) ≺ (y1, y2) if (x1, x2) 6= (y1, y2) and xi ≤ yi for i = 1, 2, or even (3) [m]k, where (x1, . . . , xk) ≺ (y1, . . . , yk) if (x1, . . . , xk) 6= (y1, . . . , yk) and xi ≤ yi for 1 ≤ i ≤ k, which includes both of the previous examples. CHAPTER 3 The secretary problem on an unknown poset 3.1. Introduction This chapter is based on joint work with Robert Morris [35], and is substantially the same as the paper. In this chapter, I shall show that for the problem of optimal choice on any poset there is an algorithm that is successful with probability at least 1 e , so, in this sense, the total order is the hardest possible partial order. In fact, this algorithm depends only on the size of the poset and its number of maximal elements, so it is universal for any family where these are given. It is therefore natural to ask which is the hardest partial order with a given number of maximal elements. The most obvious choice is the poset consisting of k disjoint chains. I shall give an asymptotically sharp lower bound on the probability of success in the problem of optimal choice on k disjoint chains, and show that it is at least as hard as on any poset with k maximal elements and of width k, that is, whose largest antichain has size k. More precisely, the main aim is to prove the following two theorems. Theorem 3.1. Let (P,≺) be a poset with k maximal elements and of width k. Then there is an algorithm for the secretary problem on (P,≺) depending only on |P | and k that is successful with probability at least pk, where pk =  1 e if k = 1, k−1 √ 1 k if k > 1, (3.1) and these are the best possible such bounds. Theorem 3.2. Let (P,≺) be a poset with k maximal elements. Then there is an algorithm for the secretary problem on (P,≺) depending only on |P | and k that is successful with probability at least 1 e , and this is the best possible such bound. 55 56 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET Robert Morris and I conjectured that both of these theorems can be improved, by removing the width condition in Theorem 3.1 and the dependence on k in Theorem 3.2; see Conjectures 3.20 and 3.21 in Section 3.4. The poset consisting of k disjoint chains was considered by Kuchta and Morayne [56], but with a restriction on the order in which the elements are observed: those from the first chain all appear in a random order, then those from the second chain, and so on. This poset is also related to multicriteria extensions of the secretary problem. In the multicriteria version due to Gnedin [41], each element is ranked independently in k > 1 different criteria, and the selector wishes to select an element that is maximal in at least one of them. This is equivalent to the problem on k equally-sized disjoint chains with the elements appearing one at a time from each chain in the same cyclic order. Interestingly, the asymptotic value of the probability of success in Theorem 3.1, k−1 √ 1 k , is the same as in the multicriteria version. This chapter is organized as follows. In Section 3.2, I shall describe a (randomized) algorithm for choosing an element of a given poset, and prove lower bounds for its pro- bability of success on various families of posets. In Section 3.3, I shall show that these bounds are best possible, by proving that, for the poset that consists of k disjoint chains of length x (which lies in each of these families), there is no strategy that is successful with probability greater than pk + o(1) (as x→∞). Finally, I shall suggest some conjectures and open problems in Section 3.4. 3.2. Lower bounds Throughout this section, p is a real number satisfying 0 < p < 1. Recall that pi(t) is the tth element of the poset P that we see, and that Pt is a poset with vertex set [t] that is isomorphic to the poset seen at time t. I shall prove lower bounds for the probability of success of the following randomized algorithm on different families of posets. Algorithm. Given a poset with n elements, of which k are maximal, let X(p) ∼ Bin(n, p). Reject the first X(p) elements and accept the first subsequent element where the following condition holds: the poset induced by the elements seen so far (including the 3.2. LOWER BOUNDS 57 currently observed element) has at most k maximal elements and the currently observed element is one of them. This algorithm gives rise to the following randomized stopping time, τk(p). Recall that Ω = Sn × [0, 1] and that P is the uniform probability measure on Ω. Let X(p) : Ω→ {0, . . . , n} be the random variable defined by X(p)(ρ, δ) = min { x ≥ 0 : x∑ i=0 ( n i ) pi(1− p)n−i ≥ δ } , so that P ( X(p) = x ) = ( n x ) px(1− p)n−x and X(p) = X(p)(ρ, δ) is independent of ρ. Then τk(p) is defined by τk(p) =  min { t > X(p) : |max(Pt)| ≤ k and t ∈ max(Pt) } if this exists, n otherwise. Given the definition of τk(p), it makes sense to consider another random variable, the set of X(p) elements that we reject without considering. Denote this random variable by S(p), where S(p) = {pi(t) : t ≤ X(p)}. The following simple property of S(p) is important. It means that one can think of the algorithm as rejecting each element of P with probability p, independently of the others and of the random ordering pi, which makes it easier to work out a formula for the probability that it is successful. Lemma 3.3. The events {x ∈ S(p)}x∈P are independent and P ( x ∈ S(p)) = p for all x ∈ P . Proof. The random variables pi and X(p) can be generated with the required dis- tributions in the following way. Put each element of P in S(p) with probability p in- dependently of all other elements. Let pi consist of a uniformly random ordering of the elements of S(p) followed by a uniformly random ordering of P \S(p). By symmetry, pi is 58 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET a uniformly random ordering of P , and X(p) = |S(p)| is a binomial random variable inde- pendent of pi. The events {x ∈ S(p)}x∈P depend only on pi and X(p), and by construction the properties in the statement of the lemma hold.  The following identity will be useful for simplifying the formula for the probability of success. Lemma 3.4. For all integers k ≥ 1, the following holds: ∞∑ s=0 ( k + s− 1 k − 1 ) (1− p)s = 1 pk . Proof. Suppose that we have a coin that comes up heads with probability p and tails with probability 1− p, and that we toss it infinitely many times. Then, with probability 1, we shall see at least k heads, and the kth head comes up in position k + s for some s ≥ 0. In this case, we know that k − 1 of the first k + s − 1 tosses are heads and the remaining s are tails, and so, summing over the probabilities that the kth head comes in each position, we have ∞∑ s=0 ( k + s− 1 k − 1 ) pk(1− p)s = 1.  In order to prove Theorem 3.1, I shall first calculate a lower bound for the probability that τk(p) is successful on the poset consisting of k disjoint chains. Recall that p is a real number satisfying 0 < p < 1 and that pi ( τk(p) ) is the element that the algorithm τk(p) selects. Theorem 3.5. Let (P,≺) be a poset consisting of k disjoint chains. Then P ( pi ( τk(p) ) ∈ max(P )) >  p log ( 1 p ) if k = 1, k k−1p(1− pk−1) if k > 1. Proof. First note that pi ( τk(p) ) ∈ max(P ) in the exceptional case where S(p) = P and pi ( τk(p) ) = pi(n) ∈ max(P ), an event with probability k n pn. This tends to 0 as n→∞, and the bounds in the theorem are obtained by considering only the cases where X(p) < n 3.2. LOWER BOUNDS 59 and hence pi ( τk(p) ) 6∈ S(p). However, later on, in the proof of Lemma 3.11, the fact that these bounds are for a slightly smaller event will be important. s s s s s pppi i i s s s s s s pppi i p p p s s s s ppp i 6 ? 6 ? m1 jk C1 C2 Ck Figure 3.1. An example of k disjoint chains with the elements of S(p) cir- cled. This illustrates an instance of the event A0,3,...,1. The region enclosed by the solid curve marks the j1 + . . .+ jk elements that might be selected. Let the k chains be denoted by C1, . . . , Ck and have lengths m1, . . . ,mk. Let Aj1,...,jk be the event that for each i there are ji elements from Ci not in S(p) above the highest element from Ci in S(p) (see Figure 3.1), that is, Aj1,...,jk = k⋂ i=1 { ∣∣∣{x ∈ Ci\S(p) :6 ∃y ∈ Ci ∩ S(p) such that x ≺ y}∣∣∣ = ji}. For ji < mi, this means that the top ji elements are not in S(p) but the (ji + 1) th is. For ji = mi, this means that there are no elements from the i th chain in S(p). Note that if Aj1,...,jk occurs then pi ( τk(p) ) will be the first element observed from the j1 + . . . + jk elements not in S(p) that are at the tops of their chains. The events { Aj1,...,jk : 0 ≤ j1 ≤ m1, . . . , 0 ≤ jk ≤ mk } partition the whole space. Thus, writing Qk(p) for P ( pi ( τk(p) ) ∈ max(P )), Qk(p) = ∑ 0≤j1≤m1,...,0≤jk≤mk P ( pi ( τk(p) ) ∈ max(P )∣∣∣Aj1,...,jk)P (Aj1,...,jk) > ∑ 0≤j1≤m1,...,0≤jk≤mk (j1,...,jk)6=(0,...,0) |{i : ji > 0}| j1 + . . .+ jk (1− p)j1+...+jkp|{i:ji ∑ 0≤j1≤m1,...,0≤jk≤mk (j1,...,jk) 6=(0,...,0) |{i : ji > 0}| j1 + . . .+ jk (1− p)j1+...+jkpk(1 + (1− p) + (1− p)2 + . . .)|{i:ji=mi}| = ∑ j1,...,jk≥0 (j1,...,jk) 6=(0,...,0) |{i : ji > 0}| min{j1,m1}+ . . .+ min{jk,mk}(1− p) j1+...+jkpk > ∑ j1,...,jk≥0 (j1,...,jk) 6=(0,...,0) |{i : ji > 0}| j1 + . . .+ jk (1− p)j1+...+jkpk. Rewriting the last line as a sum over r = |{i : ji > 0}| and s = j1 + . . .+ jk gives Qk(p) > k∑ r=1 ∞∑ s=r ∣∣∣{(j1, . . . , jk) : ∣∣{i : ji > 0}∣∣ = r and j1 + . . .+ jk = s}∣∣∣ · r s (1− p)spk. The rest of the proof is a straightforward calculation. To calculate ∣∣∣{(j1, . . . , jk) : ∣∣{i : ji > 0}∣∣ = r and j1 + . . .+ jk = s}∣∣∣ , we note that there are ( k r ) ways of choosing the indices i with ji > 0 and there are then( s−1 r−1 ) ways for r non-zero numbers to add up to s. Thus Qk(p) > k∑ r=1 ∞∑ s=r ( k r )( s− 1 r − 1 ) r s (1− p)spk = kpk k∑ r=1 ∞∑ s=r ( k − 1 r − 1 )( s− 1 r − 1 ) 1 s (1− p)s. Reversing the order of summation, Qk(p) > kp k ∞∑ s=1 1 s (1− p)s min{k,s}∑ r=1 ( s− 1 r − 1 )( k − 1 k − r ) . The second sum is easily evaluated (a result known as Vandermonde’s identity) to give Qk(p) > kp k ∞∑ s=1 1 s (1− p)s ( k + s− 2 k − 1 ) . (3.3) It remains to evaluate the sum in the above inequality. Write Vk(p) = ∞∑ s=1 1 s (1− p)s ( k + s− 2 k − 1 ) . 3.2. LOWER BOUNDS 61 Differentiating, and then applying Lemma 3.4, gives dVk(p) dp = − ∞∑ s=1 (1− p)s−1 ( k + s− 2 k − 1 ) = − 1 pk . Integrating gives Vk(p) =  − log p+ c1 if k = 1, 1 (k−1)pk−1 + ck if k > 1, where the ck are constants. Since the expressions above are continuous in p in the interval (0, 1], limits as p→ 1 exist and may be used to find ck and deduce that Vk(p) = ∞∑ s=1 1 s (1− p)s ( k + s− 2 s− 1 ) =  log ( 1 p ) if k = 1, 1 k−1 ( 1 pk−1 − 1 ) if k > 1. Substituting the value of Vk(p) into (3.3) gives the result.  In order to extend the result above to posets whose width is the same as their number of maximal elements, I shall use Dilworth’s theorem [24] (see also page 81 of Bolloba´s [11]): Theorem 3.6. A poset with largest antichain of size k can be covered by k chains.  In the next theorem, I shall show that the secretary problem is no harder on a poset with k maximal elements and width k than on a poset consisting of k disjoint chains. s s s s s pppi i i s s s s s s pppi i p p p s s s s ppp iAA A A Appppppp p ppppppp p pp p p p p p p p pp 6 ? 6 ? m1 jk C1 C2 Ck Figure 3.2. An example of k disjoint chains with one extra comparison, and with elements of S(p) circled. The region enclosed by the solid curve marks the elements that might be selected. The element in the dotted region could have been selected if the extra comparison were not there (cf. Figure 3.1). 62 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET Theorem 3.7. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k maximal elements and that none of its antichains has size greater than k. Then P ( pi ( τk(p) ) ∈ max(P )) >  p log ( 1 p ) if k = 1, k k−1p(1− pk−1) if k > 1. Proof. Dilworth’s theorem (Theorem 3.6) implies that P takes the form of k chains with some comparisons in between them. Clearly, the k elements of max(P ) lie at the top of the k chains. The proof therefore proceeds in an almost identical manner to that of Theorem 3.5. The only difference is that the denominator in each term of (3.2) is now at most, rather than equal to, j1 + . . . + jk (see Figure 3.2), so the expression in this line is still a lower bound. The calculations that make up the remainder of the proof of Theorem 3.5 therefore follow in the same way.  The values that maximize the function in Theorem 3.7 are pk =  1 e if k = 1, k−1 √ 1 k if k > 1. (3.4) This has the following consequence, which gives the lower bounds in Theorem 3.1. Corollary 3.8. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k maximal elements and that none of its antichains has size greater than k. Then P ( pi ( τk(pk) ) ∈ max(P )) > pk.  It is curious to note that the expected proportion of elements that we reject without considering is the same as the probability of success; it is not clear to me why this should be the case. The next objective is to prove the following theorem, which, with the right choice of p, will give a lower bound of 1 e for all posets, as in Theorem 3.2. 3.2. LOWER BOUNDS 63 Theorem 3.9. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k maximal elements. Then P ( pi ( τk(p) ) ∈ max(P )) > kpk log(1 p ) . The proof will use two simple lemmas. The first states that the linear order is the hardest of all posets with a unique maximal element. Lemma 3.10. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has exactly one maximal element. Then the probability that τ1(p) is successful on (P,≺) is at least the probability that it is successful on a linear extension of P , and hence P ( pi ( τk(p) ) ∈ max(P )) > p log(1 p ) . Proof. Begin by taking an arbitrary linear extension of ≺, that is, a partial order ≺′ such that any two elements are comparable and such that x ≺ y ⇒ x ≺′ y. (It is clear that such a partial order exists.) Denote the unique element in max≺(P ) = max≺′(P ) by xmax. Given this new poset, (P,≺′), define random variables pi′, X ′(p), S ′(p) and τ ′1(p) in the same way as pi, X(p), S(p) and τ1(p) were defined given (P,≺). Couple the random variables ( pi,X(p), S(p), τ1(p) ) and ( pi′, X ′(p), S ′(p), τ ′1(p) ) in the obvious way; set pi′ = pi and X ′(p) = X(p), and hence S ′(p) = S(p). This means that the elements appear in the same order in both instances, and the same set S(p) is rejected in both cases. The induced posets observed in the process on (P,≺′) are linear extensions of those observed in the process on (P,≺). The proof proceeds by showing that if pi(τ1(p)) 6= xmax then pi′ ( τ ′1(p) ) 6= xmax, that is, if τ1(p) fails in the process on (P,≺) then τ ′1(p) fails on (P,≺′). From this, the result follows, since the probability of success is therefore at least as large on (P,≺) as on (P,≺′), and Theorem 3.5 applied to (P,≺′) gives the lower bound. If we reach xmax then it will be accepted, since it must be the unique maximal element in the poset induced by the elements observed so far. Thus pi ( τ1(p) ) 6= xmax if either (i) xmax ∈ S(p) or (ii) after rejecting S(p), we accept an element that appears earlier than xmax. 64 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET In case (i), τ ′1(p) must fail on (P,≺′) for the same reason, since S ′(p) = S(p). In case (ii), such an element must be the unique maximal element of the poset induced by what we have seen so far, and this is still the case in any linear extension. Therefore, with τ ′1(p), if this element is observed then it must be accepted, and so we still accept an element that appears earlier than xmax. It follows that in either case pi ′(τ ′1(p)) 6= xmax.  The next lemma gives a lower bound for the probability of success restricting our attention to the case when all but one of the maximal elements of our poset are in S(p). This turns out to be enough to prove Theorem 3.9. Lemma 3.11. Let (P,≺) be a poset with n elements. Suppose that (P,≺) has k maximal elements. Then P ( pi ( τk(p) ) ∈ max(P )∣∣∣|max(P ) ∩ S(p)| = k − 1) > p 1− p log ( 1 p ) . Proof. First observe that we may assume that k = 1, for the following reason. The condition that |max(P ) ∩ S(p)| = k − 1 means that the k − 1 maximal elements in max(P ) ∩ S(p) will be maximal for the remainder of the process, so when using τk(p) these and all elements dominated by at least one of these may be ignored, and a unique maximal element from the remaining elements will be selected. Those elements form a poset (P ′,≺′) with a unique maximal element xmax, which is not in S(p). Since all elements are in S(p) with probability p independently of the others, the situation is the same as if we were working with (P ′,≺′) and conditioning on xmax 6∈ S(p). Looking for one of at most k maximal elements in P using τk(p) is the same as looking for a unique maximal element in P ′ using τ1(p). Assume from now on that k = 1; Lemma 3.10 will be used to prove the result in this case. Lemma 3.10 used the bound from Theorem 3.5, and recall from the proof of that theorem that the lower bound for P ( pi ( τ1(p) ) = xmax ) is in fact a lower bound for P ( {S(p) 6= P} ∩ {pi(τ1(p)) = xmax}). Set M = {xmax ∈ S(p)} 3.2. LOWER BOUNDS 65 and W = {S(p) 6= P} ∩ {pi(τ1(p)) = xmax} . Note that if S(p) 6= P and xmax ∈ S(p) then pi ( τ1(p) ) 6= xmax and hence W ⊂ M c. Therefore, P(W ) = P(W ∩M c) = P(M c)P(W |M c) = (1− p)P(W |M c). Since, by the bound from Lemma 3.10, P(W ) > p log ( 1 p ) , and the quantity that we are interested in is P(W |M c), the result follows.  The theorem can now be proved. Proof of Theorem 3.9. It is clear that P ( |max(P ) ∩ S(p)| = k − 1 ) = kpk−1(1− p). Thus, for general k, P ( pi ( τk(p) ) ∈ max(P )) > P(pi(τk(p)) ∈ max(P )∣∣∣|max(P ) ∩ S(p)| = k − 1) · P ( |max(P ) ∩ S(p)| = k − 1 ) > p 1− p log ( 1 p ) · kpk−1(1− p) = kpk log ( 1 p ) , as required.  This theorem has the following consequence. The probability e− 1 k is chosen to maxi- mize the function in Theorem 3.9 and gives the lower bound in Theorem 3.2. As described in Chapter 1, it is well-known that 1 e is the best possible lower bound for the probability of success in the classical secretary problem, and so this completes the proof of Theorem 3.2. 66 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET Corollary 3.12. Let (P,≺) be a poset with n elements of which k are maximal. Then P ( pi ( τk ( e− 1 k )) ∈ max(P ) ) > 1 e .  3.3. Upper bound In this section, I shall show that the bound in Corollary 3.8 is best possible. The proof of Theorem 3.5 shows that the probability of success of the stopping time τk(pk) on k disjoint chains decreases towards the given lower bound as the chains increase in length. This might suggest that the probability of success of an optimal strategy is reduced as the chains increase in length and thus, to prove that the bounds are best possible, one should consider chains of length tending to infinity. The main theorem in this section, Theorem 3.13, does just that; for sufficiently long chains, the probability of success of an optimal stopping time can be made arbitrarily close to that in Corollary 3.8, and so τk(pk) is asymptotically optimal. Since the poset consisting of k disjoint chains satisfies the conditions of Corollary 3.8, the bounds given are the best possible such bounds. Define Dk(x) to be the poset consisting of k disjoint chains, each of size x. It might be useful at this point to recall some definitions from Section 1.4. The probability space associated with the poset Dk(x) is denoted by (ΩDk(x),FDk(x),PDk(x)), but the subscripts are suppressed when they are clear from the context. The poset induced by the first t elements observed is isomorphic to the random variable Pt, which is a poset on vertex set [t], and Ft is the σ-algebra generated by P1, . . . , Pt, which represents what is known at time t. A stopping time is a random variable τ taking values in [n] and satisfying the property {τ = t} ∈ Ft. Let C(Ft) denote the class of all such stopping times, and extend this notation to any sequence of σ-algebras in the analogous way. The aim is to find an upper bound for E(Zτ ) that holds for all stopping times τ , where Zt = P ( pi(t) ∈ max (Dk(x))∣∣∣Ft) . 3.3. UPPER BOUND 67 Theorem 3.13 states that, as x → ∞, the limit of the probability of success of an optimal stopping time on Dk(x) is no greater than pk. Since Corollary 3.8 showed the existence of a stopping time with probability of success at least pk, Theorem 3.13 shows that this is the best possible such bound and so gives Theorem 3.1. Theorem 3.13. Let pk be as defined in (3.4). Then lim x→∞ sup τ∈C(Ft) EDk(x)(Zτ ) ≤ pk. Note that the supremum is over stopping times in C(Ft), which means that we are allowed to use the extra information from the structure of the posets, not just the pay- offs that we are offered. The following observation is important and so is recorded as a lemma. Lemma 3.14. When (P,≺) = Dk(x), it is the case that Zt =  y x if t ∈ max(Pt), pi(t) ∈ C and |C ∩ {pi(1), . . . , pi(t)}| = y, 0 if t 6∈ max(Pt), where C is one of the k chains in Dk(x). Proof. The maximal element of a chain C is equally likely to be at any position in the order in which its x elements are observed. Therefore, when y elements have been observed from this chain, the probability that one of them is the maximal element is y x , independent of the most recently observed element being maximal.  The proof of Theorem 3.13 will proceed roughly as follows. At time t we expect to have seen approximately t k elements from each chain. Therefore, since all orders are equally likely, the tth element observed is maximal in what has been seen so far with probability approximately 1 t/k = k t . By Lemma 3.14, if this happens then it is a maximal element of Dk(x) with probability approximately t kx . Therefore, Zt is approximately distributed as Zt =  t kx with probability k t , 0 with probability 1− k t . (3.5) 68 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET If Zt were distributed exactly like this with the Zt all independent of each other, then the proof would not be difficult to complete. Since the potential non-zero value of Zt increases with t, it is straightforward to show (as in the classical secretary problem; more details will be given later in this section) that an optimal strategy is, for some I, to ignore the first I elements and accept the next non-zero Zt. I will denote the associated stopping time by τI and make some rough calculations. This is only an outline of the more precise argument that will be given later; in particular, ≈ is only intended to have an intuitive meaning and does not stand for any well-defined relation. The approximate distribution of Zt gives E(ZτI ) = kx∑ t=I+1 P(τI = t)E ( Zt ∣∣τI = t) ≈ kx∑ t=I+1 P ({ Zi = 0 for all i ∈ {I + 1, . . . , t− 1} } ∩ {Zt > 0}) · t kx ≈ kx∑ t=I+1 ( t−1∏ i=I+1 ( 1− k i )) · k t · t kx = 1 x kx∑ t=I+1 t−1∏ i=I+1 ( 1− k i ) . I shall apply this formula in the case where k is much smaller than I, so 1 − k i can be approximated by e− k i , and sums are twice approximated by integrals to give E(ZτI ) ≈ 1 x kx∑ t=I+1 e− ∑t−1 i=I+1 k i ≈ 1 x kx∑ t=I+1 e−k log( t I ) = Ik x kx∑ t=I+1 t−k ≈  I x · log (x I ) if k = 1, k k−1 · Ikx · ( 1− ( I kx )k−1) if k > 1. This is the formula in Theorem 3.7 with p = I kx and is thus maximized, as in Corol- lary 3.8, when I kx = pk, in which case E(ZτI ) ≈ pk. Therefore the bounds in Corollary 3.8 are best possible, and if these calculations had been exact then the proof of Theorem 3.1 would be complete. Unfortunately, Zt is not distributed exactly as in (3.5). In order to conclude that an optimal stopping time is of the simple form above, it would be nice to use the principle of 3.3. UPPER BOUND 69 backward induction described in Section 1.5. This formalizes the intuitive principle that, in a finite game, an optimal strategy is simply to analyse at each step whether or not we expect our situation to improve by continuing, and to do so if and only if this is the case. The reason why the sequence of random variables (Zt)t∈[n] is difficult to analyse is that the values they can take vary depending on how the process unfolds. However, it is very likely that at any time we shall have seen approximately the same number of elements from each chain. The proof will therefore proceed by defining a sequence of random variables (Yt)t∈[n], which act as asymptotically almost sure upper bounds for Zt and are easier to analyse. To obtain these bounds, each chain will be split into m segments, each of length l, and the process split into m sets of kl observations. These lengths l are margins of error beyond which the number of elements observed from a chain is not expected to deviate. Initially, I shall fix m and let l →∞ to find an upper bound for E(Yτ ) and hence E(Zτ ) in terms of m. Letting m→∞ will then give a best possible result. This means that the precise statement proved for Theorem 3.13 is in fact lim m→∞ lim l→∞ sup τ∈C(Ft) EDk(lm)(Zτ ) ≤ pk. However, this is purely a matter of convenience; it is clear that the proof can be extended to posets Dk(x) where x is not a multiple of m by dividing each chain into m almost equal rather than exactly equal segments. p p p 6 ? 6 ? lm l l(s− 1) ls l(s+ 1) C1 C2 Ck Figure 3.3. This figure shows the number of elements observed from each chain. In this example, after a total of kls elements have been observed we see that UC1,s and UCk,s hold but UC2,s does not. It is necessary to show that the process behaves in this approximately uniform manner with high probability as l → ∞. I shall first define what it means to be approximately 70 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET uniform in one particular chain C at time kls, an event called UC,s (see Figure 3.3), and then what it means to be approximately uniform everywhere at all times, an event called U . Given one of the chains, C, and for all s ∈ {0, . . . ,m}, let UC,s be the event that when we have observed kls elements in total we have observed between l(s − 1) and l(s + 1) elements from C, that is, UC,s = { l(s− 1) ≤ ∣∣∣C ∩ {pi(1), . . . , pi(kls)}∣∣∣ ≤ l(s+ 1)}. For all t ∈ {0, . . . , klm}, let s(t) be the unique integer s such that kl(s − 1) < t ≤ kls, that is, s(t) = ⌈ t kl ⌉ . (3.6) Let U be the event that for all t when we have observed t elements in total we have observed between l ( s(t)− 2) and l(s(t) + 1) elements from each chain, that is, U = ⋂ i,t { l ( s(t)− 2) ≤ ∣∣∣Ci ∩ {pi(1), . . . , pi(t)}∣∣∣ ≤ l(s(t) + 1)}. I shall use Lemma 3.16, which states that the process is approximately uniform with high probability and follows easily from Lemma 3.15. Lemma 3.15. Let m ≥ 1 be an integer, let C be one of the chains in Dk(lm) and let s ∈ {0, . . . ,m}. Then lim l→∞ PDk(lm)(UC,s) = 1. Proof. I shall show that the probability that we have observed more than l(s + 1) or fewer than l(s− 1) elements tends to zero as l→∞. (If s = 0 or s = m then only one of these tails needs to be considered.) Assume C and s are given. Let N be the number of elements observed from chain C when kls elements have been observed in total, and write f(x) = P(N = x). Then f(x) = ( lm x )( (k−1)lm kls−x )( klm kls ) and hence f(x+ 1) f(x) = (lm− x)(kls− x) (x+ 1)((k − 1)lm− kls+ x+ 1) . 3.3. UPPER BOUND 71 Note that this is a decreasing function in x. Thus, if x ≥ ls, then f(x+ 1) f(x) < l(m− s)(k − 1)ls ls(k − 1)l(m− s) = 1 (3.7) and, if x+ 1 ≤ ls, then f(x+ 1) f(x) > l(m− s)(k − 1)ls ls(k − 1)l(m− s) = 1. (3.8) Observe also that, if x ≥ l(s+ 1), then f(x+ 1) f(x) < (m− s− 1)((k − 1)s− 1) (s+ 1)((k − 1)(m− s) + 1) = cs+1 < 1 (3.9) and, if x+ 1 ≤ l(s− 1), then f(x+ 1) f(x) > (m− s+ 1)((k − 1)s+ 1) (s− 1)((k − 1)(m− s)− 1) = 1 cs−1 > 1, (3.10) where cs+1 and cs−1 are constants depending on k, m and s but not l. Since (3.7) and (3.8) imply that f ( l(s+1) ) < . . . < f(ls) and f ( l(s−1)) < . . . < f(ls), it follows that f ( l(s+ 1) ) < 1 l and f ( l(s− 1)) < 1 l . Finally, (3.9) and (3.10) give P ( N > l(s+ 1) ) < ∞∑ i=0 cis+1 l = 1 l (1− cs+1) → 0 as l→∞ and P ( N < l(s− 1)) < ∞∑ i=0 cis−1 l = 1 l (1− cs−1) → 0 as l→∞, which proves the claim.  Lemma 3.16. Let m ≥ 1 be an integer. Then lim l→∞ PDk(lm)(U) = 1. Proof. This lemma follows simply from the previous lemma: choose l sufficiently large that each of the k(m+ 1) events UCi,s occurs with probability at least 1− δ. Then, trivially, all k(m+ 1) events hold with probability at least 1− k(m+ 1)δ. It is easy to see that ⋂ Ci,s UCi,s ⊂ U, 72 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET since the the events UCi,s(t)−1 and UCi,s(t) imply that l ( s(t)− 2) ≤ ∣∣∣Ci ∩ {pi(1), . . . , pi(t)}∣∣∣ ≤ l(s(t) + 1), and so this holds for every t and i, as required.  The next lemma states that in order to prove Theorem 3.13, it suffices to show that its statement is true conditioned on U occurring. This formalizes the intuition that, since the process is asymptotically almost surely uniform (that is, since liml→∞ PDk(lm)(U) → 1), we may assume that it is uniform. Recall that C(Ft) is the class of all stopping times relative to the σ-algebras Ft = σ(P1, . . . , Pt), that is, the decision to stop at time t depends only on P1, . . . , Pt. Lemma 3.17. For all m, lim l→∞ sup τ∈C(Ft) EDk(lm)(Zτ ) ≤ lim l→∞ sup τ∈C(Ft) EDk(lm)(Zτ |U). Proof. By Lemma 3.16, for all ε > 0 the value of l may be chosen sufficiently large that PDk(lm)(U c) ≤ ε. By definition, it is also the case that Zt ≤ 1 for all t. Therefore, for all τ , EDk(lm)(Zτ ) = EDk(lm)(Zτ |U)PDk(lm)(U) + EDk(lm)(Zτ |U c)PDk(lm)(U c) ≤ EDk(lm)(Zτ |U) + ε. Taking suprema gives sup τ∈C(Ft) EDk(lm)(Zτ ) ≤ sup τ∈C(Ft) EDk(lm)(Zτ |U) + ε. Since ε is arbitrary, the result follows.  Now that I have shown that the process is asymptotically almost surely uniform, and in light of the previous lemma, I shall assume that U occurs. The next aim is to find a process that offers at least the pay-offs that Zt offers but is easier to analyse. This means that I shall be able to find an upper bound for the probability of success of an optimal stopping time on this process and hence on Zt. 3.3. UPPER BOUND 73 I shall now define the random variables Yt that act as upper bounds for the (Zt|U) and are easier to analyse. These random variables are not strict upper bounds, in the sense that the random variables are not coupled in any way. However, conditioned on U occurring, Zt is less than the potential non-zero value of Yt and the probability that Zt is non-zero is less than the probability that Yt is non-zero, and I shall be able to show that an optimal strategy for the game on Zt has a lower expected pay-off than an optimal strategy for the game on Yt. I shall often need to refer to the potential non-zero value of Yt and the probability that it takes this value; set yt = s(t) + 1 m and pt =  1 l(s(t)−2) if s(t) ≥ 3, 1 if s(t) ≤ 2, where s(t) is as defined in (3.6). Now define a sequence of independent random variables (Yt)t∈[n] by Yt =  yt with probability pt, 0 with probability 1− pt. These will not be explicitly defined on Ω as there is no need, although it is of course straightforward to do so. The next lemma states the intuitive principle that we expect to do at least as well in the game with the random variables Yt as in the game with the random variables Zt conditioned on U occurring. Analagously to Ft for Zt, let (Gt)t∈[n] be defined by Gt = σ(Y1, . . . , Yt). Lemma 3.18. For all l,m ∈ N with m ≥ 3, sup τ∈C(Ft) EDk(lm)(Zτ |U) < sup τ∈C(Gt) EDk(lm)(Yτ ). This will be proved shortly. Putting Lemmas 3.17 and 3.18 together tells us that lim l→∞ sup τ∈C(Ft) EDk(lm)(Zτ ) ≤ lim l→∞ sup τ∈C(Gt) EDk(lm)(Yτ ). 74 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET The proof of the lemma relies on Theorem 1.1, which was the theorem that formalized backward induction. Proof of Lemma 3.18. For convenience, I shall continue to use n to denote the number of elements in Dk(lm), that is, n = klm. I shall apply Theorem 1.1 to the sequences (Yt)t∈[n] and (Zt)t∈[n], conditioned on U occurring, to show that the optimal expected pay-off for (Yt)t∈[n] is at least as large for (Zt)t∈[n]. First consider what happens with the sequence (Yt)t∈[n]. Recall that (Gt)t∈[n] are defined by Gt = σ(Y1, . . . , Yt) and let (αt)t∈[n] be defined for (Yt)t∈[n] as (γt)t∈[n] were for (Wt)t∈[n] in the backward induction theorem, that is, αn = Yn, αt = max { Yt,E ( αt+1 | Gt )} , t = n− 1, . . . , 1. Since the random variables Yt are independent, the values of Y1, . . . , Yt give no information about the values of Yt+1, . . . , Yn, and therefore E(αt+1|Gt) is constant on all atoms of Gt and equal to E(αt+1). Therefore, define the function v : [n]→ R by v(t) = E(αt) and note that backward induction tells us that the stopping time that stops at the first t such that Yt ≥ E(αt+1|Gt) = v(t+ 1) is optimal. By definition, v(t) = E ( max{Yt, v(t+ 1)} ) ≥ v(t+ 1), whereas yt, the potential non-zero value of Yt, is a non-decreasing function of t. Therefore, there exists I such that yt < v(t+ 1) if t ≤ I and yt ≥ v(t+ 1) if t > I, and an optimal strategy for the game on Yt is ‘reject the first I elements, and accept the next with a non-zero pay-off.’ 3.3. UPPER BOUND 75 Recall that the distribution of Yt is given by Yt =  yt with probability pt, 0 with probability 1− pt. From this, it follows that v(n) = E(αn) = E(Yn) = pnyn = m+ 1 m · 1 l(m− 2) > 1 lm = k n (3.11) and that, for 1 ≤ t ≤ n− 1, v(t) = E(αt) = E ( max{Yt,E(αt+1)} ) =  ptyt + (1− pt)v(t+ 1) if t > I, v(t+ 1) if t ≤ I. (3.12) Now consider the sequence of random variables (Zt)t∈[n]. Since the statement of the lemma is conditioned on U occurring, define a new sequence of σ-algebras (Ht)t∈[n] by Ht = σ (Ft ∪ {U}) and consider only ω ∈ U . Analogously to γt and αt, let βn = Zn and, for 1 ≤ t ≤ n− 1, let βt = max { Zt,E ( βt+1 |Ht )} . Recalling that C(Ft) denotes the class of stopping times relative to Ft, observe that Ft ⊂ Ht, and hence C(Ft) ⊂ C(Ht) and sup τ∈C(Ft) EDk(lm)(Zτ |U) ≤ sup τ∈C(Ht) EDk(lm)(Zτ |U). Note that intuitively this is obvious: it just says that having extra information (that the event U occurs) can only help in choosing our stopping time τ . 76 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET Recall that E(βt|Ht−1)(ω) denotes the expected value of the game at time t, if we have so far seen the first t − 1 elements of P and are told whether or not U holds (that is, whether or not ω ∈ U). I shall prove the following claim by induction on n− t. Claim. For all ω ∈ U and for all t ∈ [n], E ( βt ∣∣Ht−1) (ω) < v(t), where H0 = {∅, U, U c,Ω} and so E(β1|H0)(ω) = E(β1|U). Proof of claim. First, observe that if ω ∈ U , then Zt(ω) ≤ yt and P ( Zt > 0 |Ht−1 ) (ω) ≤ pt, by Lemma 3.14 and the definition of U . Moreover, by Lemma 3.14 and (3.11), E ( βn ∣∣Hn−1) = 1 n/k · 1 = k n < v(n), which proves the case n− t = 0. Now let 1 ≤ t ≤ n − 1 and assume that the result holds for t + 1. Since Ht−1 is finite, the random variable E(βt|Ht−1) is constant on each of its atoms. Given ω ∈ U , let A ∈ Ht−1 be the atom containing ω, so that E ( βt ∣∣Ht−1) (ω) = E (βt∣∣A) = P ( Zt ≥ E ( βt+1 ∣∣Ht) ∣∣∣A) · E(Zt∣∣∣(Zt ≥ E(βt+1|Ht)) ∩ A) + P ( Zt < E ( βt+1 ∣∣Ht) ∣∣∣A) · E(E(βt+1|Ht)∣∣∣(Zt < E(βt+1|Ht)) ∩ A) . Now, ( Zt < E(βt+1|Ht) ) ∈ Ht and Ht−1 ⊂ Ht, and so (Zt < E(βt+1|Ht)) ∩ A ∈ Ht. Note first that, since ω ∈ U and U ∈ Ht−1, it must be the case that A ⊂ U , and second that( Zt < E(βt+1|Ht) ) ∩ A is non-empty since P (Zt < E(βt+1|Ht)∣∣A) ≥ 1− pt > 0. For all ω′ ∈ (Zt < E(βt+1|Ht)) ∩ A, by the induction hypothesis, E ( βt+1 ∣∣Ht) (ω′) < v(t+ 1) 3.3. UPPER BOUND 77 and therefore E ( βt+1 ∣∣∣(Zt < E(βt+1|Ht)) ∩ A) < v(t+ 1). Moreover, E ( Zt ∣∣∣(Zt ≥ E(βt+1|Ht)) ∩ A) ≤ yt, since Zt ≤ yt whenever U holds and A ⊂ U . Since yt ≤ v(t+ 1) if and only if t ≤ I, and P ( Zt ≥ E(βt+1|Ht) ∣∣A) ≤ P(Zt > 0|A) ≤ pt < 1, it follows that E ( βt ∣∣Ht−1) (ω) <  ptyt + (1− pt)v(t+ 1) if t > I, v(t+ 1) if t ≤ I, and hence that E ( βt ∣∣Ht−1) (ω) < v(t), by (3.12). This completes the induction step, and hence proves the claim.  The lemma follows from the claim, since, by the backward induction theorem (Theo- rem 1.1), it follows that sup τ∈C(Ht) EDk(lm)(Zτ |U) = E(β1|U) < v(1) = E(α1) = sup τ∈C(Gt) EDk(lm)(Yτ ), as required.  In the final lemma before the proof of Theorem 3.13, backward induction is used to show that an optimal stopping time for the process with the Yt takes the simple form of rejecting the first klu∗ elements, for some integer u∗, and accepting the next non-zero pay-off. Lemma 3.19. For u∗ ∈ {0, . . . ,m− 1}, let τu∗ =  min {t > klu∗ : Yt > 0} if this exists, n otherwise. Then sup τ∈C(Gt) EDk(lm)(Yτ ) = sup u∗∈{0,...,m−1} EDk(lm)(Yτu∗ ) Proof. It has already been shown in the proof of Lemma 3.18 that an optimal strategy takes the form ‘ignore the first I, and accept the next non-zero pay-off.’ In 78 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET fact, I must be a multiple of kl: suppose, for contradiction, that I = klu∗ + r, where r ∈ {1, . . . , kl − 1}. Then it must be the case that v(I + 1) ≤ kl ( s(I + 1) + 1 ) m = kl(u∗ + 2) m , since we would be willing to stop at time I + 1, but then v(I) ≤ max { kl(u∗ + 2) m , v(I + 1) } ≤ kl(u ∗ + 2) m = kl(s(I) + 1) m , which is a contradiction since we would not be willing to stop at time I.  The proof of the main theorem in this section can now be completed. Proof of Theorem 3.13. All that remains is to calculate and maximize E(Yτu∗ ) over u∗, where τu∗ is the smallest t > klu∗ such that Yt > 0. These calculations are very similar to those on page 68, but also include error terms which tend to zero as l,m→∞. Assume first that u∗ → ∞ as m → ∞, as the calculation will show that this is a valid assumption. Indeed, if t = o(n) then yt = o(1), whereas a probability of success will be obtained that is separated from zero. We should therefore never accept a pay-off for t = o(n). In particular, for sufficiently large m it is the case that u∗ ≥ 3, and so E (Yτu∗ ) = m−1∑ u=u∗ kl∑ r=1 P ( Yklu∗+1 = 0, . . . , Yklu+r−1 = 0, Yklu+r > 0 ) · yklu+r. Recall that the Yt are independent, and that s(klu+h) = u+ 1 when 1 ≤ h ≤ kl. Hence, using the convention that the empty product takes the value 1, it follows that E (Yτu∗ ) = m−1∑ u=u∗ kl∑ r=1 ( u∏ q=u∗+1 ( 1− 1 l(q − 2) )kl)( 1− 1 l(u− 1) )r−1 · 1 l(u− 1) · u+ 2 m . The formula for the sum of the geometric progression ( 1− 1 l(u−1) )r−1 gives E (Yτu∗ ) = m−1∑ u=u∗ ( u∏ q=u∗+1 ( 1− 1 l(q − 2) )kl)( 1− ( 1− 1 l(u− 1) )kl) · u+ 2 m , 3.3. UPPER BOUND 79 and multiplying out the term ( 1− ( 1− 1 l(u−1) )kl) gives E (Yτu∗ ) = m−1∑ u=u∗ ( u∏ q=u∗+1 ( 1− 1 l(q − 2) )kl) · u+ 2 m − m∑ u=u∗+1 ( u∏ q=u∗+1 ( 1− 1 l(q − 2) )kl) · u+ 1 m . Parts of these two sums cancel each other out, leaving E (Yτu∗ ) = u∗ + 2 m − ( m∏ q=u∗+1 ( 1− 1 l(q − 2) )kl) · m+ 1 m + m−1∑ u=u∗+1 ( u∏ q=u∗+1 ( 1− 1 l(q − 2) )kl) · 1 m . Now, let ε > 0 be arbitrary, choose m = m(ε) and l = l(m, ε) sufficiently large, and recall that therefore u∗ = u∗(ε) may be chosen to be sufficiently large also. Since( 1− 1 n )n < 1 e < ( 1− 1 n )n−1 for all n ≥ 2, it is the case that m∏ q=u∗+1 ( 1− 1 l(q − 2) )kl ≥ exp ( −kl − 1 l m∑ q=u∗+1 1 q − 2 ) ≥ ( u∗ m )k − ε 2 , and similarly u∏ q=u∗+1 ( 1− 1 l(q − 2) )kl ≤ exp ( −k u∑ q=u∗+1 1 q − 2 ) ≤ ( u∗ u )k . Setting p = u ∗ m , it follows that E (Yτu∗ ) ≤  p− p+ p log ( 1 p ) + ε if k = 1, p− pk + p k − 1 ( 1− pk−1)+ ε if k > 1. As before, these expressions are maximized when p = pk, and thus lim m→∞ lim l→∞ E ( Yτu∗ ) ≤ pk + ε. Since ε > 0 was arbitrary, this completes the proof.  Putting Corollary 3.8 and Theorem 3.13 together gives Theorem 3.1. 80 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET 3.4. Open problems It seems likely that these theorems can be improved. Firstly, Robert Morris and I [35] conjectured that Theorem 3.1 can be extended to all posets with k maximal elements. It is not inconceivable that the algorithm in Corollary 3.8 works on all posets with k maximal elements rather than just those whose width is also k; if not, it would be good to find some other algorithm dependent only on k that does so. Conjecture 3.20. Let (P,≺) be a poset with k maximal elements. Then there is an algorithm for the secretary problem on (P,≺) that is successful with probability at least pk, where pk is as defined in (3.1). Our algorithm, which gives the bound in Theorem 3.2, depends only on the size of the poset and the number of maximal elements. Our second conjecture was that the latter piece of information is not needed. Conjecture 3.21. Let (P,≺) be a poset. Then there is an algorithm for the secretary problem on (P,≺) that depends only on |P | and is successful with probability at least 1 e . In light of the threshold probabilities e− 1 k in Corollary 3.12, we considered the stopping time τ =  min { t : t > e− 1 mn, where m = |max(Pt)| and t ∈ max(Pt) } if this exists, n otherwise, and tried to show that this algorithm might be used to prove Conjecture 3.21, but un- fortunately we did not make any progress with its analysis. Kozik [54] considered this stopping time independently at the same time, and was able to show that it was successful with probability strictly greater than 1 4 for sufficiently large posets. Kozik’s proof is based on the analysis of many cases by computer, and he does not specify how much greater. However, whereas the bound of 1 4 is best possible for Preater’s algorithm, Kozik’s analysis is not at all tight and, like Robert Morris and me, he is hopeful that this algorithm might be successful with the best possible probability of 1 e . After I had finished writing this dissertation, Micha l Morayne alerted me to the fact that, very recently, Freij and Wa¨stlund [29] had proved Conjecture 3.21. Their strategy 3.4. OPEN PROBLEMS 81 and proof of its probability of success are beautifully simple. Firstly, they assume that the elements are observed at n times uniformly distributed in [0, 1]. Secondly, in addition to this, they assign a weight uniformly distributed in [0, 1] to each element as it appears, independently of the others, and define the greedy maximum of a poset (P,≺) as follows. Let z0 ∈ P be the element of minimal weight. If z0 is not a maximal element, then let z1 be the element of minimal weight in {x : z0 ≺ x}. Repeat this process, producing a chain z0 ≺ z1 ≺ z2 ≺ . . ., until a maximal element is reached. This element is the greedy maximum. Their strategy is to wait until time 1 e and then to select the next element that is the greedy maximum out of what has been seen so far; their analysis of it is surprisingly brief. Writing Ak for the event that the k th element observed is the greedy maximum out of what has been seen so far, they first show that the joint distribution of (Ak)k∈[n] is independent of the ground poset, and in particular the same as for the single chain. They then show that the probability that a maximal element is the greedy maximum at time t, conditioned on it appearing at time t, is equal to the conditional probability that it is the greedy maximum of the ground poset given that its weight is at most t; this is at least the probability that it is the greedy maximum of the ground poset. Finally, they carry out a simple calculation for the conditional probability that a maximal element is accepted given that it appears at time t, integrate over t and sum over the maximal elements, and find that with probability at least 1 e one of them is accepted. In this chapter, the aim was to select a maximal element; the same problems can be posed for the minimum rank version instead. In a general poset, the rank of an element x is most naturally defined as the maximum number of elements in a chain whose minimal element is x. In light of the results in this chapter, the following questions are appealing: (1) Is it the case that the single chain is the hardest poset for the minimal expected rank problem, that is, is it the case that for every poset there is a strategy that selects an element with expected rank at most about 3.8695? (2) Is there a strategy that depends only on the size of the poset and its number of maximal elements, or even just its size, that selects an element of finite expected rank for every poset where these are given? 82 3. THE SECRETARY PROBLEM ON AN UNKNOWN POSET As with the problems that I suggested at the end of Chapter 2, one could also allow the interviewer to go back to a previous candidate, who is still available with a certain probability, and see how this affects the problem. This chapter has been about unknown posets with a known number of elements. In Chapter 1, I described a version of the problem where the poset is a chain whose length comes from a known distribution. Distributions that did not seem too unpleasant nevertheless forced the expected rank to infinity. What distributions on the number of elements in an unknown poset admit a strategy that chooses a maximal element with positive probability, or a strategy that selects an element with finite expected rank? Part 2 Generalizations of acyclic colourings CHAPTER 4 Acyclic colourings of graphs 4.1. Colouring problems Throughout the history of graph theory, colouring problems have been widely studied. I shall summarize some of the highlights in this chapter, and show how this part of this dissertation fits into that journey. For a more detailed history and for proofs of these results, see Chapter V of Bolloba´s [11], whose notation I shall use. A proper vertex-colouring of a graph G = (V,E) is an assignment of colours to its vertices such that the end-points of any edge are differently coloured. The smallest number of colours needed to do this is called the chromatic number χ(G) of G. It is obvious that any graph with maximum degree d = ∆(G) can be properly coloured with at most d+ 1 colours: if the vertices are coloured with a palette of d+1 colours one by one in any order, then for each vertex v there is always at least one colour available that has not been used to colour its neighbours, and so v can be given this colour. The following theorem, due to Brooks [16], says that one can usually do slightly better. Theorem 4.1. Let G be a connected graph. Then χ(G) ≤ ∆(G) unless G is a complete graph or a cycle of odd length, in which case χ(G) = ∆(G) + 1.  The chromatic number of a graph can also be bounded if it can be drawn on a particular surface without its edges crossing. The Euler characteristic of a surface S, unfortunately usually denoted by χ(S), is the invariant χ(S) = V − E + F, where V , E and F are the numbers of vertices, edges and faces in a triangulation of S, that is, a graph drawn on S all of whose faces are triangles. In the following theorem, due to Heawood [47], a bound is given for surfaces other than the plane or, equivalently, the sphere, which has Euler characteristic 2. 85 86 4. ACYCLIC COLOURINGS OF GRAPHS Theorem 4.2. The chromatic number of a graph drawn on a closed surface of Euler characteristic χ ≤ 1 is at most h(χ) = ⌊ 7 + √ 49− 24χ 2 ⌋ .  The function h(χ) is known as the Heawood bound, which was proved to be best possible for every surface except the Klein bottle by Ringel and Youngs [72]. The Klein bottle has Euler characteristic 0, but every graph that can be drawn on it has chromatic number at most 6, rather than 7. The famous four-colour theorem, proved by Appel and Haken with computational assistance from Koch [7, 8, 9], states that every graph drawn on the plane can be coloured using at most four colours. It is just as natural to colour a graph’s edges as its vertices; in a proper edge-colouring the edges meeting at any one vertex are all coloured differently. This gives rise to the chromatic index χ′(G) of G, the smallest number of colours needed to do this. In this case, it is obvious that ∆(G) is a lower bound for χ′(G), since the edges incident to a vertex of maximum degree must all be differently coloured. What is surprising is that the best possible upper bound is only one larger, which was proved by Vizing [78] in the following theorem. Theorem 4.3. Let G be a graph. Then ∆(G) ≤ χ′(G) ≤ ∆(G) + 1.  There are of course many more results concerning colourings, and I shall briefly men- tion some in only one more direction. An L-colouring is a proper vertex-colouring where the colour of each vertex v is chosen from a given list L(v) of possible colours for that vertex. The list-chromatic number χl(G) is the minimum k such that G can be L-coloured for any L where |L(v)| ≥ k for all v. It is clearly the case that χl(G) ≥ χ(G) and one might expect always to have equality, since it seems most restrictive when the L(v) are all the same. This turns out not to be true; Voigt [79] proved that there are some planar graphs with list-chromatic number greater than 4. However, Thomassen [76] proved that the list chromatic number of planar graphs is at most 5. 4.2. FROM NASH-WILLIAMS TO ACYCLIC COLOURINGS 87 The list-chromatic index χ′l(G) is the corresponding concept for edge colourings, and again it must be the case that χ′l(G) ≥ χ′(G). It is conjectured (by whom first is not known) that χ′l(G) = χ ′(G) for all graphs G. Galvin [31] proved the conjecture for bipartite graphs and Kahn [48] proved that χ′l(G) = ( 1 + o(1) ) χ′(G) as χ′(G)→∞. The full conjecture is still open. 4.2. From Nash-Williams to acyclic colourings Perhaps surprisingly, given their name, acyclic colourings did not evolve as generali- zations of proper colourings; their history, which I shall outline in this section, is more closely related to the ‘acyclic’ part of the term. An acyclic graph is often called a forest, and I shall use the terms interchangeably. Suppose that a graph G is the union of k forests F1 ∪ . . . ∪ Fk. For all subgraphs H ⊂ G and for all i, it must be the case that ∣∣E(Fi ∩H)∣∣ ≤ |V (H)| − 1, as otherwise Fi would contain a cycle somewhere in H. Summing over the forests, it is therefore the case that |E(H)| ≤ k(|V (H)| − 1). A fundamental theorem of Nash-Williams [65] (whose proof relies on his work in an earlier paper [64]) asserts that this necessary condition is also sufficient. Theorem 4.4. Let G be a graph. Suppose that every subgraph H ⊂ G satisfies |E(H)| ≤ k(|V (H)| − 1), (4.1) for some integer k. Then G can be decomposed into at most k forests.  The minimum value of k such that G can be decomposed into at most k forests, or equivalently such that G satisfies (4.1), is called the arboricity of G. This concept was generalized to point-arboricity by Chartrand, Geller and Hedet- niemi [19], and formally defined and further explored by Chartrand, Kronk and Wall [20]. The point-arboricity of a graph is the minimum number of colours needed to colour the 88 4. ACYCLIC COLOURINGS OF GRAPHS vertices (not necessarily properly) so that the graph induced by each colour class is a forest, that is, is acyclic.                              @ @ @ PPPPPPPPP B B B B B B B B B T T T T T T T T T T T T T T T @ @ @ @ @ @s 2 s1 s 3 s3 or 4 s2 s 1 Figure 4.1. This graph can be 4-coloured in two different ways, shown on this figure up to relabelling of the colours. Neither of them is acyclic, since the vertices coloured 1 and 2 form a cycle. Acyclic colourings were introduced as a natural next step by Gru¨nbaum [44]. A proper vertex-colouring of a graph is acyclic if every graph induced by the union of two colour classes is acyclic, that is, there is no two-coloured cycle. The minimum number of colours needed to do this is the graph’s acyclic chromatic number. Whereas the point-arboricity of a graph is clearly at most the chromatic number, since an independent set is certainly acyclic, the acyclic chromatic number must be at least as large. Gru¨nbaum exhibited simple planar graphs with acyclic colouring number 5, for example the octahedron (see Figure 4.1), and conjectured that the acyclic chromatic number of all planar graphs is at most 5. Gru¨nbaum himself proved that it was at most 9; this was improved to 8 by Mitchem [58], to 7 by Albertson and Berman [2] and to 6 by Kostochka [50]. Finally, the full conjecture was proved by Borodin [13]. 4.3. Variants In this section, I shall demonstrate how many traditional colouring problems, as des- cribed in Section 4.1, have been transferred to the acyclic arena. I shall also describe the natural extensions of these problems that I shall consider in this part of this dissertation. Kostochka [51] proved that it is an NP-complete problem to decide for given G and k 4.3. VARIANTS 89 if the acyclic chromatic number of G is at most k; thus, it makes sense to try to find sufficient conditions for it to be small. Given their introduction, it is unsurprising that acyclic chromatic numbers were next bounded for graphs drawn on surfaces other than the plane. The genus g(S) of a surface S is the maximum number of closed curves along which S can be cut without being disconnected or, informally, the number of handles that need to be added to a sphere to obtain S. The plane has genus 0 and, for any surface S, it is the case that χ(S) = 2−2g(S). A graph is said to be of genus g if the smallest genus of a surface on which it can be drawn without its edges crossing is g. Albertson and Berman [1] instigated this study and proved the following theorem [3]. Theorem 4.5. Any graph of genus g > 0 can be acyclically coloured with 4g + 4 colours. When bounding the acyclic chromatic number, the shortest cycles are critical, since these are less likely to receive three colours in a proper colouring. The girth of a graph is the length of its shortest cycle. Borodin, Kostochka and Woodall [15] proved that planar graphs of girth at least 5 have acyclic chromatic number at most 4 and that those of girth at least 7 have acyclic chromatic number at most 3. A bound for the acyclic chromatic number of a graph as a function of its maximum degree was found by Alon, McDiarmid and Reed [4]. In the following theorem, A(d) is the maximum possible value of the acyclic chromatic number of a graph of maximum degree d. Theorem 4.6. Ω ( d 4 3 (log d) 1 3 ) ≤ A(d) ≤ O ( d 4 3 ) . My results are generalizations of this one, so it is worth outlining its proof. The upper bound is proved by colouring a graph of maximum degree d randomly and then using the Erdo˝s-Lova´sz local lemma. This essentially says that if we have a collection of bad events, each of which happens with small probability and is independent of most other bad events, then with positive probability none of them happens. I shall state this precisely in Section 4.5. For acyclic colourings, the most obvious bad events are, 90 4. ACYCLIC COLOURINGS OF GRAPHS (1) for each edge, that the vertices that it connects are the same colour, and, (2) for each even cycle, that its vertices are coloured with alternating colours. There is no need to consider odd cycles explicitly, since in a proper colouring they will receive at least three colours anyway. These are not quite the events considered, for two reasons. Firstly, considering cycles of all lengths makes it harder to find a useful bound for the number of other bad events that each bad event might depend on. Secondly, if two vertices u and v are opposite vertices in many 4-cycles, then by avoiding bad events of the second type we effectively end up insisting that u and v are differently coloured too many times to get best possible bounds. The first problem is easily dealt with; 4-cycles and paths of length 4 are considered, since longer cycles contain a path of length 4. The second problem is overcome by calling u and v ‘special’ if they have many common neighbours and are therefore in many 4-cycles and, for each special pair u, v, introducing the bad event that u and v are the same colour. If these bad events do not occur then properly coloured 4-cycles containing a special pair receive at least three colours, and there remain only 4-cycles not containing a special pair. The final type of bad event is that such 4-cycles are coloured with alternating colours, and each vertex cannot be in too many of them without being in a special pair. They proved that the upper bound is best possible up to a logarithmic factor by considering a random graph Gn,p on n vertices with each edge present with probability p independently of the others. They showed that if p ≥ 3 ( logn n ) 1 4 then, however the vertices are coloured with at most n 2 colours, the probability that every 4-cycle receives at least three colours is so small that the probability that any colouring has every 4-cycle receiving at least three colours is less than one. At the same time, for p = 3 ( logn n ) 1 4 , with high probability the maximum degree of the graph d satisfies d < 2pn = 6n 3 4 (log n) 1 4 , and therefore a graph exists with maximum degree d that cannot be acyclically coloured with fewer than n 2 = Ω ( d 4 3 (log d) 1 3 ) colours. In terms of precise evaluation of A(d), it is obvious that A(1) = 2 and A(2) = 3. It is known that A(3) = 4 (mentioned without proof by Gru¨nbaum [44]; see also Skulrattana- kulchai [73]), A(4) = 5 (Burnstein [17]), A(5) ≤ 8 and A(6) ≤ 12 (Kothapalli, Varagani, 4.3. VARIANTS 91 Venkaiah and Yadav [52, 53]). No lower bounds better than d+ 1 are known for d = 5, 6, for which the example is the complete graph Kd+1. In the same paper as earlier, Alon, McDiarmid and Reed [4] defined the edge acyclic chromatic number A′(G) and its maximum A′(d) for graphs of maximum degree d in the obvious way. They proved that A′(d) ≤ 64d, without trying to optimize the constant. When quoting this result in a later paper, Molloy and Reed [59] showed that the same proof can easily be modified to give a bound of 16d. Alon, Sudakov and Zaks [6] conjec- tured that in fact A′(d) ≤ d+2. They proved that A′(Gn,d) ≤ d+2 for almost all random d-regular graphs Gn,d and showed that there exists c > 0 such that A ′(G) ≤ d + 2 for all graphs G with ∆(G) = d and girth at least cd log d. They also commented that this conjecture would be best possible, by considering acyclic edge colourings of the complete graph K2n. The largest that any one colour class can be is n, when it is a perfect mat- ching, and no two colours classes can have more than 2n − 1 edges without containing a cycle. This means that the largest colour class has size at most n and all the others at most n − 1, so there are at least 2n + 1 colour classes. Nesˇetrˇil and Wormald [66] improved the bound for almost all random d-regular graphs to d+ 1. As with the vertex version, further improvements have been made for graphs of large girth. Muthu, Narayanan and Subramanian [63] improved the bound to 6d for graphs with girth at least 9 and to 4.52d for graphs of girth at least 220. Gerke, Greenhill and Wormald [37] defined the r-acyclic edge chromatic number to be the minimum number of colours in a proper edge-colouring in which each cycle receives at least r colours. They showed that (r− 2)d is asymptotically almost surely (as d→∞) a bound for the r-acyclic edge chromatic number of a random d-regular graph. List colourings have natural acyclic analogues, which I shall not define formally as they are obvious. Borodin, Fon-Der Flaass, Kostochka, Raspaud and Sopena [14] proved that the list-acyclic chromatic index of a planar graph is at most 7. When acyclic colourings are considered in isolation, rather than as a generalization of arboricity, there is no particular reason why the subgraphs chosen to be subject to extra conditions should be cycles. Again in the same paper as before, Alon, McDiarmid and Reed [4] discussed Pk-free colourings; these are proper vertex colourings such that 92 4. ACYCLIC COLOURINGS OF GRAPHS no path with k vertices is 2-coloured. They pointed out that, since there are at most d+ d(d− 1) = d2 vertices within distance 2 of any vertex, a greedy algorithm can be used to colour a graph in a P3-free fashion with at most d 2 + 1 colours. They also commented that a modification of their work in that paper proves that there is always a Pk-free colouring for k ≥ 4 with at most O ( d k−1 k−2 ) colours. 4.4. Generalized acyclic colourings The problems that I shall consider in this part of this dissertation are extensions of Theorem 4.6, which gave a bound for A(d), the maximum possible acyclic chromatic number of a graph of maximum degree d. There are many possibilities for generalizations, and these are just some of them. I shall introduce some more general versions of acyclic colourings, which will be formally defined in Section 4.5. As described in Section 4.3, since it is the shortest cycles that are most likely to receive only two colours in a proper colouring, several authors have imposed a minimum girth condition on the underlying graph to be coloured. My approach is slightly different; I shall continue to allow all graphs of maximum degree d, but only require long cycles to receive at least three colours. Specifically, I shall introduce the length-l-acyclic chromatic number, where cycles of length at least l must be 3-coloured. Since an odd cycle is already 3-coloured in a proper colouring, it is only necessary to consider length-2m-acyclic colourings for integers m. Recall from Section 4.3 that the r-acyclic edge chromatic number was defined to be the minimum number of colours in a proper edge-colouring in which each cycle receives at least r colours. I shall do the same for the vertex-colouring version, but to avoid ambiguity with the length-l-acyclic chromatic number, I shall call it the c-colour acyclic chromatic number. This is a natural concept to consider; for point-arboricity the defining condition is that every cycle receives at least two colours, for an acyclic colouring it is that every cycle receives at least three colours. What happens if we demand that every cycle receives at least c colours? Point-arboricity corresponds to the 2-colour acyclic chromatic number and the usual acyclic chromatic number to the 3-colour acyclic chromatic number. 4.5. DEFINITIONS AND A USEFUL TOOL 93 During my oral examination, I discovered that my bounds on the c-colour acyclic chromatic number (Section 5.3) had already been proved by Greenhill and Pikhurko [43], along with corresponding results for edge-colourings. I also mentioned Pk-free colourings in Section 4.3, which are an example of a variation on the theme of acyclic colourings in which the subgraphs that must receive extra colours are not cycles. One way to think of the acyclic chromatic number as a generalization of the chromatic number is as follows. In a proper colouring, every 1-regular connected subgraph (edge) receives at least two colours. In an acyclic colouring, every 2-regular connected subgraph (cycle) receives at least three colours. The next step is to ask what happens if every r-regular connected subgraph must receive at least three colours. However, this seems quite restrictive, as there are not likely to be many r-regular connected subgraphs. An alternative, related approach is to require subgraphs with mi- nimum degree at least r to receive at least three colours. This is still a natural thing to do: a proper colouring is one in which every subgraph with minimum degree at least 1 receives at least two colours, and an acyclic colouring is one in which every subgraph with minimum degree at least 2 receives at least three colours. I shall define the degree-r chromatic number of a graph to be the minimum number of colours needed to colour the graph properly in such a way that this happens. The methods that I shall use work just as well for r-regular subgraphs; I have chosen to proceed in this way only because it seems more interesting. This definition can be easily extended to hypergraphs; here, a degree-r colouring of a u-uniform hypergraph is a colouring such that every edge is multicoloured and every subhypergraph with minimum degree at least r receives at least u+ 1 colours. This is the first case to consider since every subhypergraph with at least one edge already receives at least u colours in a multicolouring. 4.5. Definitions and a useful tool In the next two chapters, I shall bound the following quantities for (hyper)graphs with maximum degree d. These bounds will be asymptotically best possible up to constant or logarithmic factors. In Chapter 5, I shall bound the two that concern cycles in graphs: the length-l-acyclic chromatic number and the c-colour chromatic number. In Chapter 6, 94 4. ACYCLIC COLOURINGS OF GRAPHS I shall bound the degree-r chromatic number of a u-uniform hypergraph. This is best separated into two cases: where u = 2, that is, for simple graphs, and where u ≥ 3. Note that the maxima in the following definitions all exist; trivial upper bounds are d2 + 1, dc−1 + 1 and ( (u− 1)d)2 + 1 colours, sufficient to colour vertices differently from those at distance at most 2, c− 1 and 2 away. The length-l-acyclic chromatic number of a graph. A colouring f : V (G)→ [x] is a length-l-acyclic colouring if it is a proper colouring and every cycle of length at least l receives at least three colours. For a graph G and an integer l, let A(l)(G) = min{x : G can be length-l-acyclically coloured using x colours}. For integers l, d, let A(l)(d) = max{A(l)(G) : ∆(G) = d}. Note that a length-l-acyclic colouring is also a length-(l+1)-acyclic colouring, soA(l+1)(G) ≤ A(l)(G) and A(l+1)(d) ≤ A(l)(d). Also, since a cycle of odd length must receive three colours in a proper colouring, it must be the case that A(2m−1)(G) = A(2m)(G) and A(2m−1)(d) = A(2m)(d) for any integer m. The c-colour acyclic chromatic number of a graph. A colouring f : V (G)→ [x] is a c-colour acyclic colouring if it is a proper colouring, every cycle of length less than c is multicoloured and every cycle of length at least c receives at least c colours. For a graph G and an integer c, let Ac(G) = min{x : G can be c-acyclically coloured using x colours}. For integers c, d, let Ac(d) = max{Ac(G) : ∆(G) = d}. Note that a (c+1)-colour acyclic colouring is also a c-colour acyclic colouring, so Ac(G) ≤ Ac+1(G) and Ac(d) ≤ Ac+1(d). The degree-r chromatic number of a u-uniform hypergraph. Let H be a u- uniform hypergraph. A colouring f : V (H) → [x] is a degree-r colouring if every edge is multicoloured and every subhypergraph K with δ(K) ≥ r receives at least u+ 1 colours. 4.5. DEFINITIONS AND A USEFUL TOOL 95 For a u-uniform hypergraph H and an integer r, let D(u)r (H) = min{x : H can be degree-r coloured using x colours}. For integers r, d, let D(u)r (d) = max{D(u)r (H) : ∆(H) = d and H is a u-uniform hypergraph}. I shall bound D (u) r separately when u = 2 and when u ≥ 3. It is not easy to guess what the upper bounds should be, even knowing that they should include Theorem 4.6. In fact, in the case of c-colour acyclic colourings, this theorem is in fact an exceptional case and so unhelpful in providing intuition. As with most results in this area, the proofs of these bounds will all use random colourings and the Erdo˝s-Lova´sz local lemma [25], in its nonsymmetric form (see page 64 of Alon and Spencer [5]). Lemma 4.7. Let A1, . . . , An be events in an arbitrary probability space. Let the graph G = (V,E) with V = [n] be a dependency graph for the events Ai, that is, assume that for each i, Ai is independent of the family of events {Aj : ij /∈ E}. If there are reals 0 < yi < 1 such that for all i P(Ai) ≤ yi ∏ ij∈E (1− yj), then P (⋂ i Aci ) ≥ n∏ i=1 (1− yi) > 0, so that with positive probability no event Ai occurs. The events Ai are commonly called bad events. To prove that the upper bounds that I shall give are asymptotically best possible up to constant or logarithmic factors, I shall give explicit or probabilistic constructions. In the probabilistic constructions, the number of colours needed will be proportional to the number of vertices, so I shall need to bound this in terms of the maximum degree of a random graph. Bolloba´s [12] (page 65, Corollary 3.4) proved the following precise result 96 4. ACYCLIC COLOURINGS OF GRAPHS about the maximum degree ∆ of a random graph Gn,p with n vertices and with each edge present with probability p = p(n) independent of the others, and with q = 1− p. Theorem 4.8. Suppose pqn (logn)3 →∞ and y is a fixed real number. Then the maximal degree ∆ of Gn,p satisfies lim n→∞ P [ ∆ < pn+ (2pqn log n) 1 2 ( 1− log log n 4 log n − log(2pi 1 2 ) 2 log n + y 2 log n )] = e−e −y . I shall use only the trivial consequence that, under these conditions, asymptotically almost surely the maximum degree satisfies ∆ < 2pn. In fact, I shall only ever use probabilities p of a particular form, and the lemma that I shall apply is as follows. Lemma 4.9. Let α ≥ 0, 0 < β < 1 and C ≥ 1 2 be real numbers, and let p = C (log n)α nβ . Then asymptotically almost surely the maximal degree ∆ of Gn,p satisfies n > ( ∆ 2C ) 1 1−β( 1 1−β log ∆ ) α 1−β . (4.2) Proof. The bounds on β mean that Theorem 4.8 can be applied, and so asymptoti- cally almost surely ∆ < 2Cn1−β(log n)α, from which it follows that n > ( ∆ 2C ) 1 1−β (log n) α 1−β . (4.3) Either the bound in (4.2) holds, in which case there is nothing to prove, or it does not hold, in which case log n ≤ 1 1− β log ∆− α 1− β log log ∆− 1 1− β log ( 2C (1− β)α ) ≤ 1 1− β log ∆, and (4.2) follows from (4.3).  CHAPTER 5 Long acyclic colourings and acyclic colourings with many colours 5.1. Introduction I have grouped the two results in this chapter together as they both involve genera- lizations of the acyclic chromatic number in a way that still relates to cycles in graphs, which is not the case with the results in the next chapter. Alon, McDiarmid and Reed [4] showed that a graph with maximum degree d can be acyclically coloured with O(d 4 3 ) colours. In this chapter, I shall answer the following questions: (1) How many colours do we need if only cycles of length at least l must receive at least three colours (Section 5.2)? (2) How many colours do we need if cycles must receive at least c colours (Sec- tion 5.3)? In Section 5.4, I shall suggest some natural open problems following on from these. 5.2. The length-l-acyclic chromatic number Recall that the length-l-acyclic chromatic number is the minimum number of colours needed to colour a graph properly so that each cycle of length at least l receives at least three colours. Recall also that, in a proper colouring, cycles of odd length will automati- cally receive at least three colours. In this section, I shall bound the maximum possible length-l-acyclic chromatic number of graphs of a given maximum degree d, denoted by A(l)(d). Theorem 5.1. For all integers d,m ≥ 2, it is the case that( d 6 ) 2m 2m−1 m ( 2m 2m−1 log d ) 1 2m−1 < A(2m)(d) = A(2m−1)(d) < 14(2m+ 1)d 2m 2m−1 . This has the following immediate corollary. 97 98 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS Corollary 5.2. For a fixed integer m ≥ 2, it is the case that Ω ( d 2m 2m−1 (log d) 1 2m−1 ) ≤ A(2m)(d) = A(2m−1)(d) ≤ O ( d 2m 2m−1 ) as d→∞.  I shall begin by proving the lower bound. Theorem 5.3. For n sufficiently large there is a graph G on n vertices with maximum degree ∆(G) = d satisfying n m ≥ ( d 6 ) 2m 2m−1 m ( 2m 2m−1 log d ) 1 2m−1 = Ω ( d 2m 2m−1 (log d) 1 2m−1 ) such that there is no length-2m-acyclic colouring with at most n m colours. Proof. This proof is a generalization of one by Alon, McDiarmid and Reed [4]. Let p = 3 ( log n n ) 1 2m , let n be large and divisible by m2, and let G ∈ Gn,p. Suppose that f : V (G)→ [ n m ] is a colouring that uses at most n m colours. From each class, discard as few vertices as possible to make its size divisible by m. Note that at most m− 1 vertices are discarded from each class, so at most (m−1)n m vertices are discarded in total and there remain at least n m vertices. Since the size of each colour class is now divisible by m, at least n m2 distinct monochromatic m-tuples can be taken from these vertices. Choose n m2 of these and call them T1, . . . , T n m2 .            Z Z Z Z Z Z Z Z Z Z Z Zs s s s s s s s t (m) i t (2) i t (1) i t (m) j t (2) j t (1) j Figure 5.1. A 2m-cycle of the form t (1) i t (1) j t (2) i t (2) j . . . t (k) i t (k) j . 5.2. THE LENGTH-l-ACYCLIC CHROMATIC NUMBER 99 For each m-tuple Ti, fix an ordering of the vertices t (1) i , . . . , t (m) i . Let Ci,j be the event that G contains the cycle t (1) i t (1) j t (2) i t (2) j . . . t (m) i t (m) j (see Figure 5.1). If any event Ci,j occurs then this colouring cannot be a length-2m-acyclic colouring of G. Let Af be the event that f is a length-2m-acyclic colouring. Then P(Af ) ≤ P (⋂ i 1 2 , and it therefore suffices to choose x such that x ≥ max { 14(2m+ 1)d, 14(2m+ 1)d 2m 2m−1 , 2m−2 √ 14(2m+ 1)d 2m 2m−1 , 2m−1 √ 14(2m+ 1)(m+ 1)d 2m 2m−1 } , for which it suffices to take x = 14(2m+ 1)d 2m 2m−1 .  5.3. The c-colour acyclic chromatic number The results in this section had already been proved by Greenhill and Pikhurko [43], which I did not know until my oral examination. Recall that the c-colour acyclic chromatic number is the minimum number of colours needed to colour a graph properly so that each cycle receives at least c colours. In this section, I shall bound the maximum possible c-colour acyclic chromatic number of graphs of a given maximum degree d, denoted by Ac(d). Theorem 5.5. For all integers d, k ≥ 2, it is the case that max {( d k )k , ( d 4 )k 2k log d } < A2k(d) ≤ A2k+1(d) < 6(k + 2)(2k)k+1dk. This has the following immediate corollary. Corollary 5.6. For a fixed integer k ≥ 2, it is the case that A2k(d), A2k+1(d) = Θ(d k) as d→∞.  104 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS It is interesting that this bound does not include the bound on the 3-colour acyclic chromatic number of Alon, McDiarmid and Reed [4]. It will become clearer why this is the case once Theorem 5.11 has been proved; I shall say more at the time. For c = 4, a probabilistic lower bound can be found in a similar way to Theorem 5.3. Theorem 5.7. For n sufficiently large there is a graph G on n vertices with maximum degree ∆(G) = d satisfying n 2 ≥ d 2 400 log d such that there is no 4-colour acyclic colouring with at most n 2 colours. Proof. Let p = 5 √ log n n , let n be large and divisible by 4, and let G ∈ Gn,p. Suppose that f : V (G)→ [n 2 ] is a colouring that uses at most n 2 colours. By discarding at most one vertex from each colour class, at least n 4 monochromatic pairs can be found. Let P be a set of n 4 of these pairs and split the remaining singletons into two sets S1 and S2 each of size n 4 . s s s s s s s s s s s s S1 P S2 Figure 5.2. The type of 4-cycle that is avoided. If any pair in P has a common neighbour in S1 and a common neighbour in S2 then f cannot be 4-colour acyclic (see Figure 5.2). For each pair pi ∈ P let Npi,j be the event that the pair pi has a common neighbour in Sj and let Af be the event that f is a 4-colour 5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 105 acyclic colouring. Then P(Af ) ≤ P (⋂ pi∈P ( N cpi,1 ∪N cpi,2 )) ≤ ( 2 ( 1− p2)n4 )n4 , since the events Npi,j are independent. Writing E for the event that G can be 4-colour acyclically coloured with at most n 2 colours, P(E) ≤ ∑ f :V (G)→[n2 ] P(Af ) ≤ nn ( 2 ( 1− p2)n4 )n4 ≤ exp ( n log n+ n 4 log 2− p 2n2 16 ) = o(1), which means that there exists a graph G that cannot be 4-colour acyclically coloured with at most n 2 colours, as required. By Lemma 4.9, the graph G can be chosen so that, in addition, the bound in the theorem is satisfied.  For c > 4, this does not seem easy to generalize. A sensible alternative approach to finding a lower bound is to find a graph of small maximum degree such that any two vertices are in a cycle of length at most c, so that all vertices must be coloured differently. One way of doing this is to join two graphs of small diameter together. The most obvious graph with small diameter for a given maximum degree is a tree. s s s s s s s s s s s s s s s s s s s s s s     Z Z Z Z          B B B B B B B B B                   E E E E E E E E E E E E E E E E E E Figure 5.3. The tree Td,h for d = h = 3. Let Td,h be the tree where all vertices except the leaves have degree d and the leaves are all distance h from a fixed root vertex (see Figure 5.3). Let T1 and T2 be copies of 106 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS Td,b k−12 c and form G by taking T1 and T2 and adding edges between vertices v1 ∈ T1 and v2 ∈ T2 if they are copies of the same vertex v ∈ Td,b k−12 c. Then the number of vertices in G is 2 ( 1 + d+ d(d− 1) + . . .+ d(d− 1)b k−12 c−1 ) = 2 + 2d ( (d− 1)b k−12 c − 1 d− 2 ) > ( d 2 )b k−12 c = Ω ( db k−12 c ) , and any two vertices of G are in a cycle of length at most 2k. Therefore, G cannot be 2k-acyclically coloured (or (2k + 1)-acyclically coloured) with less than Ω ( db k−12 c ) colours. This is a long way from the lower bound of Ω ( dk log d ) . Using probabilistic techniques, Bolloba´s [10] (see also page 259 of Bolloba´s [12], Theorem 10.10) found graphs of small diameter with much smaller maximum degree: Theorem 5.8. Let h be a positive constant, k = k(n) ≥ 2 a natural number, and define p = p(n, h, k), 0 < p < 1, by pknk−1 = log ( n2 h ) . Suppose that pn (logn)3 →∞. Then for G ∈ Gn,p we have lim n→∞ P ( diam(G) = k ) = e− h 2 and lim n→∞ P ( diam(G) = k + 1 ) = 1− e−h2 .  Using this, I can obtain a lower bound that is best possible up to logarithmic factors. Theorem 5.9. For n sufficiently large there is a graph G on n vertices with maximum degree ∆(G) = d satisfying n ≥ d k 22k+1k log d = Ω ( dk log d ) such that any two vertices are in a cycle of length at most 2k. In particular, a 2k- or (2k + 1)-colour acyclic colouring of such a graph must use at least n colours. 5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 107 Proof. Set p = ( 2 log n nk−1 ) 1 k , which corresponds to h = 1 in Theorem 5.8, and q = 2p. Let Gn,q ∈ Gn,q and obtain G1 and G2 by putting each edge of Gn,q in G1 or G2 uniformly at random independently of all other edges. Since q = 2p, this coupling means that G1 ∈ Gn,p and G2 ∈ Gn,p, but G1 and G2 are disjoint. By Theorem 5.8, lim n→∞ P ( diam(G1) = k ) = lim n→∞ P ( diam(G2) = k ) = e− 1 2 ≈ 0.6065. Take n sufficiently large that P ( diam(G1) = k ) = P ( diam(G2) = k ) > 3 5 and so P (( diam(G1) = k ) ∧ (diam(G2) = k)) > 1 5 . Since this probability is greater than zero, this means that there is a graph G with any two vertices in a cycle of length at most 2k. By Lemma 4.9, the graph G can be chosen so that, in addition, the bound in the theorem is satisfied.  In fact, an explicit construction does even better and gives the best possible lower bound up to a constant factor. Theorem 5.10. For n sufficiently large there is a graph G on n vertices with maximum degree ∆(G) = d satisfying n > ( d k )k = Ω(dk), such that any two non-adjacent vertices are in a cycle of length at most 2k (and so a 2k- or (2k + 1)-colour acyclic colouring must use at least n colours). Proof. Let G = (V,E) be defined as follows. For some large positive integer b, let V = [b]k and let uv ∈ E if u and v differ in exactly one co-ordinate. Then G is a k(b− 1)- regular graph. For all non-adjacent u, v, there is a cycle of length at most 2k containing u and v as follows. Start with u and then, working from left to right, for each digit ui where ui 6= vi, change it to vi, and then do the same to get back to u from v (see Figure 5.4). All vertices are therefore adjacent or in a cycle of length at most 2k, and must be differently coloured in a 2k- or (2k + 1)-colour acyclic colouring. 108 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS (1, 2, 3,2, 1) (3, 2, 3, 2, 1) (3, 2,2, 2, 1) (3, 2, 2,3, 1) (1, 2, 2, 3, 1) (1, 2,3, 3, 1) Figure 5.4. A 6-cycle for b = 3, k = 5, u = (1, 2, 3, 2, 1) and v = (3, 2, 2, 3, 1), with the new digit in each k-tuple in bold type. Setting d = k(b− 1), this is a graph G with n = |V (G)| and ∆(G) = d satisfying n = ( d k + 1 )k , and the result follows.  For fixed k, the probabilistic bound is worse as d → ∞, but it is better for small d, that is, for d up to about exp (( k 4 )k−1 8 ) . To prove an upper bound of the same order of magnitude as the explicit construction, I shall again use the Erdo˝s-Lova´sz local lemma. Theorem 5.11. Let G be a graph with ∆(G) = d and let k ≥ 2 be an integer. Then it is possible to colour G (2k + 1)-colour acyclically (and therefore 2k-colour acyclically) with at most 6(k + 2)(2k)k+1dk = O(dk) colours. Proof. If d = 1, then G contains only isolated edges and only two colours are needed, so assume that d ≥ 2. Call a pair of vertices {u, v} special if there are at least d paths of length k + 1 from u to v. Let f : V (G)→ [x] be a random function where each vertex receives a colour from [x] uniformly at random. The proof uses the Erdo˝s-Lova´sz local lemma. The bad events to be considered are as follows: (1) For each u, v with 1 ≤ d(u, v) ≤ k, let Wu,v = {f(u) = f(v)}. 5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 109 (2) For each special pair u, v, let Xu,v = {f(u) = f(v)}. (3) For each (2k+2)-cycle C = v0v1 . . . v2k+1 containing a non-special pair of opposite vertices, where v0, v1, . . . , v2k+1 are distinct vertices, let YC be the event that C receives at most 2k colours. (4) For each path P = v0v1 . . . v2k+2, where v0, v1, . . . , v2k+2 are distinct vertices, let ZP be the event that P receives at most 2k colours. If none of the events of type Wu,v occurs then f is a proper colouring and every cycle of length at most 2k+1 is multicoloured. In this case, the only vertices in a (2k+2)-cycle that can be the same colour are pairs of opposite vertices, so if in addition none of the events Xu,v and YC occurs then every cycle of length 2k+2 receives at least 2k+1 colours, whether or not it contains a non-special pair of opposite vertices. If none of the events ZP occurs then every cycle of length at least 2k + 3 receives at least 2k + 1 colours. Claim 1. Each vertex v is in at most (1) 2dk pairs of vertices distance at most k apart, (2) dk special pairs, (3) (k + 1)dk+2 cycles containing a non-special pair of opposite vertices, and (4) (k + 2)d2k+2 paths of length 2k + 2. Proof of claim. (1) By the same argument as in the proof of Theorem 5.4, there are at most d vertices distance 1 from v, at most d2 vertices distance 2 from v and so on, and so at most d+ d2 + . . .+ dk = d d− 1(d k − 1) < 2dk vertices distance at most k from v, the inequality holding since d ≥ 2. (2) There are at most dk+1 paths of length k + 1 starting at v, and each special pair that v is in contributes d such paths, so v is in at most dk+1 d = dk special pairs. 110 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS s s s s s s s s ? R 6 R  v a1 w1 a2 w2 non-special pair Figure 5.5. A cycle with a non-special opposite pair. (3) The aim is to count the (2k + 2)-cycles containing v and a non-special pair of opposite vertices in groups depending on where the closest non-special pair to v is. Moving clockwise from v, let w1 be the first vertex in a non-special pair and let a1 be its distance from v, and let w2 be the vertex opposite w1 at distance a2 anticlockwise from v, so that a1 + a2 = k + 1 (see Figure 5.5). It is possible that w1 = v and a1 = 0. There are k+ 1 choices for a1 and a2. Given a1 and a2, there are at most da1 paths of length a1 and d a2 paths of length a2 starting at v, and so at most da1+a2 = dk+1 choices for the arc from w1 to w2 containing v. Since {w1, w2} is a non-special pair, there are at most d choices for the arc from w1 to w2 not containing v. In total, there are therefore at most (k + 1)dk+1d = (k + 1)dk+2 (2k + 2)-cycles containing v and a non-special pair of opposite vertices. (4) Without loss of generality, assume that v plays the role of one of v0, v1, . . . , vk+1 in P = v0v1 . . . v2k+2, which gives k + 2 possibilities. For each of these there are at most d2k+2 paths P , which means that v is in at most (k + 2)d2k+2 paths of length 2k + 2.  5.3. THE c-COLOUR ACYCLIC CHROMATIC NUMBER 111 By multiplying the number of vertices in a pair, cycle or path by the number of paths of length at most k, special pairs, cycles with a non-special pair and paths that each vertex can be in, the number of edges in the dependency graph from an event of each type to events of each type is at most that shown in the table below. to Wu,v Xu,v YC ZP Wu,v 4d k 2dk 2(k + 1)dk+2 2(k + 2)d2k+2 from Xu,v 4d k 2dk 2(k + 1)dk+2 2(k + 2)d2k+2 YC 4(k + 1)d k 2(k + 1)dk 2(k + 1)2dk+2 2(k + 1)(k + 2)d2k+2 ZP 2(2k + 3)d k (2k + 3)dk (2k + 3)(k + 1)dk+2 (2k + 3)(k + 2)d2k+2 Claim 2. The probabilities of the events of each type satisfy the following equalities and inequalities: (1) P(Wu,v) = 1x , (2) P(Xu,v) = 1x , (3) P(YC) ≤ (2k)2k+2x2 , and (4) P(ZP ) ≤ (2k)2k+3x3 . Proof of claim. (1) This is true since f is a uniformly random colouring with colours taken from [x]. (2) See (1). (3) For each S ⊂ [x] with |S| = 2k, let Y SC be the event that C is coloured with colours taken from S, so that YC = ⋃ S⊂[x] |S|=2k Y SC . (However, this is not a partition, since if a colouring uses strictly fewer than 2k colours, then more than one event Y SC holds.) Given a choice of S, each of the 2k + 2 vertices of C must receive one of the colours in S, which means that P ( Y SC ) = ( 2k x )2k+2 , 112 5. LONG ACYCLIC COLOURINGS AND ACYCLIC COLOURINGS WITH MANY COLOURS and hence P(YC) ≤ ∑ S⊂[x] |S|=2k P ( Y SC ) ≤ ( x 2k )( 2k x )2k+2 ≤ x2k ( 2k x )2k+2 = (2k)2k+2 x2 . (4) As in (3), P(ZP ) ≤ ( x 2k )( 2k x )2k+3 ≤ (2k) 2k+3 x3 .  As in the proof of Theorem 5.4, the weightings used are double the probabilities or bounds for the probabilities given in Claim 2; it is enough to prove the condition in the Erdo˝s-Lova´sz local lemma for the events ZP , that is, ( 1− 2 x )2(2k+3)dk ( 1− 2 x )(2k+3)dk ( 1− 2(2k) 2k+2 x2 )(2k+3)(k+1)dk+2 ( 1− 2(2k) 2k+3 x3 )(2k+3)(k+2)d2k+2 ≥ 1 2 . It suffices to find x such that all four factors are at least 6 7 , for which any x ≥ max { 28(2k + 3)dk, 14(2k + 3)dk, √ 14(2k + 3)(k + 1)(2k)k+1d k 2 +1, 3 √ 14(2k)2k+3(2k + 3)(k + 2)d 2k+2 3 } will do. Since k ≥ 2, it is enough to take x = 6(k+ 2)(2k)k+1dk (or even x = 28(2k+ 3)dk once d is reasonably large).  When choosing the order of magnitude of x at the end of the proof, the critical condition was that the colouring was a distance-k colouring, not that special pairs and (2k + 2)-cycles were coloured in a particular way. This was because there was plenty of room to manoeuvre when choosing the threshold function for a pair to be considered special. Indeed, suppose that a pair of vertices were called special if there are least ds paths of length k+1 between them. Then each vertex would be in at most O(dk+1−s) special pairs and at most O(dk+1+s) (2k + 2)-cycles containing a non-special pair of opposite vertices. Since the probability that a special pair is monochromatic is 1 x and the probability that a 5.4. OPEN PROBLEMS 113 (2k+ 2)-cycle is coloured with at most 2k colours is O( 1 x2 ), the number of colours needed to avoid these events is O(dk+1−s) and O(d k+1+s 2 ). Since k ≥ 2, these are both O(dk) for any choice of s provided that 1 ≤ s ≤ k − 1. In the case of 3-colour acyclic colouring studied by Alon, McDiarmid and Reed [4], k = 1 and so this is not possible. Instead, s is chosen so that k + 1− s = k+1+s 2 , that is, s = k+1 3 = 2 3 , and the number of colours needed is O(d 4 3 ) rather than O(d). 5.4. Open problems In the results in this chapter, I have found the orders of magnitude of both the length- l-acyclic chromatic number (up to a logarithmic factor) and the c-colour acyclic chromatic number of a graph as functions of its maximum degree, assuming that l or c is fixed. As a result, the bounds are nowhere near best possible as functions of l or c. It would be worth trying to improve these and to remove the logarithmic factor in the lower bound for the length-l-acyclic chromatic number. The two concepts in this chapter can be combined to give a c-colour length-l-acyclic colouring. If l ≥ c then this is a proper colouring such that every cycle of length at least l receives at least c colours. If l < c then it is a proper colouring such that every cycle of length at least l but less than c is multicoloured and every cycle of length at least c receives at least c colours. In this chapter I have proved results for c = 3 and general l, and for l = 3 and general c. What happens for general c and l? CHAPTER 6 Small minimum degree colourings of graphs and hypergraphs 6.1. Introduction Recall that a degree-r colouring of a u-uniform hypergraph is a colouring such that every edge is multicoloured and every subhypergraph with minimum degree at least r receives at least u + 1 colours. In this chapter, I shall bound the maximum possible degree-r chromatic number for u-uniform hypergraphs of a given maximum degree d, denoted by D (u) r (d). The bounds are of a different nature when u = 2 and when u ≥ 3, so I shall treat them separately. In Section 6.2, I shall bound D (2) r (d), which for convenience I shall denote simply by Dr(d), and in Section 6.3 I shall bound D (u) r (d) for u ≥ 3. In Section 6.4, I shall suggest some natural open problems following on from these. 6.2. The degree-r chromatic number of a graph The main theorem that I shall prove in this section is the following. Theorem 6.1. For integers r, d ≥ 2, it is the case that ( d 6 ) r2 r2−1 r ( r2 r2−1 log d ) 1 r2−1 < Dr(d) < 2 r3+r+1r4d r2 r2−1 . This has the following immediate corollary. Corollary 6.2. For a fixed integer r ≥ 2, it is the case that Ω  d r2r2−1 (log d) 1 r2−1  ≤ Dr(d) ≤ O(d r2r2−1) as d→∞.  The probabilistic method for finding a lower bound is by now familiar from Theo- rems 5.3, 5.7 and 5.9. 115 116 6. SMALL MINIMUM DEGREE COLOURINGS OF GRAPHS AND HYPERGRAPHS Theorem 6.3. For n sufficiently large there is a graph G on n vertices with maximum degree ∆(G) = d satisfying n r ≥ ( d 6 ) r2 r2−1 r ( r2 r2−1 log d ) 1 r2−1 = Ω  d r2r2−1 (log d) 1 r2−1  such that there is no degree-r colouring with at most n r colours. Proof. Let p = 3 ( log n n ) 1 r2 , let n be large and divisible by r2 and let G ∈ Gn,p. Suppose that f : V (G)→ [n r ] is a colouring that uses at most n r colours. By discarding at most r − 1 vertices from each colour class, at least n r2 monochromatic r-tuples can be found. Choose n r2 of these and call them T1, . . . , T n r2 . Let Ki,j be the event that the graph induced by the vertices in Ti ∪ Tj contains the complete bipartite graph with vertex classes Ti and Tj. If any event Ki,j occurs then this colouring cannot be a degree-r colouring of G. Let Af be the event that f is a degree-r colouring. Then P(Af ) ≤ P (⋂ i 1, and these are the best possible such bounds. For a poset whose width is not necessarily equal to its number of maximal elements k, I found a strategy that is successful with probability 1 e , which is the best possible probability of success for the classical secretary problem and therefore best possible in this situation. Result 4 (Theorem 3.2). Let (P,≺) be a poset with k maximal elements. Then there is an algorithm for the secretary problem on (P,≺) depending only on |P | and k that is successful with probability at least 1 e , and this is the best possible such bound. Part 2: Generalizations of acyclic colourings In Chapter 5, I found an upper bound for the number of colours needed to colour a graph G properly in such a way that every cycle of length at least l receives at least three PART 2: GENERALIZATIONS OF ACYCLIC COLOURINGS 131 colours, as a function of its maximum degree d, and showed that it was best possible up to a logarithmic factor. Result 5 (Corollary 5.2). For a fixed integer m ≥ 2, it is the case that Ω ( d 2m 2m−1 (log d) 1 2m−1 ) ≤ A(2m)(d) = A(2m−1)(d) ≤ O ( d 2m 2m−1 ) as d→∞. I found an upper bound for the number of colours needed to colour a graph G properly in such a way that every cycle receives at least c colours, as a function of its maximum degree d, and showed that it was best possible up to a constant factor. Result 6 (Corollary 5.6). For a fixed integer k ≥ 2, it is the case that A2k(d), A2k+1(d) = Θ(d k) as d→∞. In Chapter 6, I found an upper bound for the number of colours needed to colour a graph G properly in such a way that every subgraph with minimum degree at least r receives at least three colours, as a function of its maximum degree d, and showed that it was best possible up to a logarithmic factor. Result 7 (Corollary 6.2). For a fixed integer r ≥ 2, it is the case that Ω  d r2r2−1 (log d) 1 r2−1  ≤ Dr(d) ≤ O(d r2r2−1) as d→∞. I found an upper bound for the number of colours needed to colour a u-uniform hypergraph H in such a way that every edge is multicoloured and every subhypergraph of minimum degree at least r receives at least three colours, as a function of its maximum degree d, and showed that it was best possible up to a constant factor. Result 8 (Corollary 6.6). For fixed integers u ≥ 3, r ≥ 2, it is the case that D(u)r (d) = Θ(d) 132 CONCLUSION as d→∞. These results are part of a natural progression from earlier work, and suggest possible directions of future research. Bibliography 1. M. O. Albertson and D. M. Berman, The acyclic chromatic number, Proceedings of the seventh southeastern conference on combinatorics, graph theory, and computing, 1976, pp. 51–69. 2. , Every planar graph has an acyclic 7-coloring, Israel J. Math. 28 (1977), no. 1, 169–174. 3. , An acyclic analogue to Heawood’s theorem, Glasg. Math. J. 19 (1978), no. 2, 163–166. 4. N. Alon, C. McDiarmid, and B. Reed, Acyclic coloring of graphs, Random Structures & Algorithms 2 (1991), no. 3, 277–288. 5. N. Alon and J. H. Spencer, The probabilistic method, Wiley-Interscience, 2000. 6. N. Alon, B. Sudakov, and A. Zaks, Acyclic edge colorings of graphs, J. Graph Theory 37 (2001), no. 3, 157–167. 7. K. Appel and W. Haken, Every planar map is four colorable, Bull. Amer. Math. Soc. 82 (1976), no. 5, 711–712. 8. , Every planar map is four colorable. Part I: discharging, Illinois J. Math. 21 (1977), no. 3, 429–490. 9. K. Appel, W. Haken, and J. Koch, Every planar map is four colorable. Part II: reducibility, Illinois J. Math. 21 (1977), no. 3, 491–567. 10. B. Bolloba´s, The diameter of random graphs, Trans. Amer. Math. Soc. 267 (1981), no. 1, 41–52. 11. , Modern graph theory, Springer Verlag, 1998. 12. , Random graphs, Cambridge University Press, 2001. 13. O. V. Borodin, On acyclic colorings of planar graphs, Discrete Math. 25 (1979), no. 3, 211–236. 14. O. V. Borodin, D. G. Fon-Der Flaass, A. V. Kostochka, A. Raspaud, and E. Sopena, Acyclic list 7-coloring of planar graphs, J. Graph Theory 40 (2002), no. 2, 83–90. 15. O. V. Borodin, A. V. Kostochka, and D. R. Woodall, Acyclic colourings of planar graphs with large girth, J. Lond. Math. Soc. 60 (1999), no. 2, 344–352. 16. R. L. Brooks, On colouring the nodes of a network, Math. Proc. Cambridge Philos. Soc. 37 (1941), no. 2, 194–197. 17. M. I. Burnstein, Every 4-valent graph has an acyclic 5-coloring, Soobsˇcˇ Akad. Nauk Gruzin. SSR 93 (1979), no. 1, 21–24 (Russian). 18. A. Cayley, Mathematical questions with their solutions, The Educational Times 23 (1875), 18–19. 19. G. Chartrand, D. Geller, and S. Hedetniemi, Graphs with forbidden subgraphs, J. Combin. Theory Ser. B 10 (1971), no. 1, 12–41. 133 134 BIBLIOGRAPHY 20. G. Chartrand, H. Kronk, and C. Wall, The point-arboricity of a graph, Israel J. Math. 6 (1968), no. 2, 169–175. 21. Y. S. Chow, S. Moriguti, H. Robbins, and S. M. Samuels, Optimal selection based on relative rank (the “secretary problem”), Israel J. Math. 2 (1964), no. 2, 81–90. 22. Y. S. Chow, H. Robbins, and D. Siegmund, The theory of optimal stopping, Dover, 1991. 23. K. L. Chung, A course in probability theory, Academic Press, 2001. 24. R. P. Dilworth, A decomposition theorem for partially ordered sets, Ann. of Math. 51 (1950), no. 1, 161–166. 25. P. Erdo˝s and L. Lova´sz, Problems and results on 3-chromatic hypergraphs and some related questions, Infinite and finite sets, 1975, pp. 609–627. 26. T. S. Ferguson, Who solved the secretary problem?, Statist. Sci. 4 (1989), no. 3, 282–289. 27. A. Q. Frank and S. M. Samuels, On an optimal stopping problem of Gusein-Zade, Stochastic Process. Appl. 10 (1980), no. 3, 299–311. 28. P. R. Freeman, The secretary problem and its extensions: a review, Int. Stat. Rev. 51 (1983), no. 2, 189–206. 29. R. Freij and J. Wa¨stlund, Partially ordered secretaries, Electron. Commun. Probab. 15 (2010), 504– 507. 30. J. Galambos, Advanced probability theory, CRC, 1995. 31. F. Galvin, The list chromatic index of a bipartite multigraph, J. Combin. Theory Ser. B 63 (1995), no. 1, 153–158. 32. M. Gardner, Mathematical games, Sci. Amer. 202 (1960), no. 2, 152. 33. , Mathematical games, Sci. Amer. 202 (1960), no. 3, 178–179. 34. B. J. Garrod, G. Kubicki, and M. Morayne, How to choose the best twins, SIAM J. Discrete Math., to appear. 35. B. J. Garrod and R. D. Morris, The secretary problem on an unknown poset, submitted. 36. N. Georgiou, M. Kuchta, M. Morayne, and J. Niemiec, On a universal best choice algorithm for partially ordered sets, Random Structures & Algorithms 32 (2008), no. 3, 263–273. 37. S. Gerke, C. Greenhill, and N. Wormald, The generalized acyclic edge chromatic number of random regular graphs, J. Graph Theory 53 (2006), no. 2, 101–125. 38. N. Ghoussoub, An integral representation of randomized probabilities and its applications, Se´minaire de Probabilite´s XVI 1980/81, 1982, pp. 519–543. 39. J. Gianini-Pettitt, Optimal selection based on relative ranks with a random number of individuals, Adv. in Appl. Probab. 11 (1979), no. 4, 720–736. 40. J. P. Gilbert and F. Mosteller, Recognizing the maximum of a sequence, J. Amer. Statist. Assoc. 61 (1966), no. 313, 35–73. BIBLIOGRAPHY 135 41. A. V. Gnedin, A multicriteria problem of optimal stopping of a selection process, Autom. Remote Control 42 (1981), 981–986. 42. , Multicriteria extensions of the best choice problem: sequential selection without linear order, Strategies for sequential search and selection in real time, 1992, pp. 153–172. 43. C. Greenhill and O. Pikhurko, Bounds on the generalised acyclic chromatic numbers of bounded degree graphs, Graphs Combin. 21 (2005), no. 4, 407–419. 44. B. Gru¨nbaum, Acyclic colorings of planar graphs, Israel J. Math. 14 (1973), no. 4, 390–408. 45. S. M. Gusein-Zade, The problem of choice and the optimal stopping rule for a sequence of independent trials, Theory Probab. Appl. 11 (1966), no. 3, 472–476. 46. I. Guttman, On a problem of L. Moser, Canad. Math. Bull. 3 (1960), 35–39. 47. P. J. Heawood, Map colour theorem, Q. J. Math. 24 (1890), 332–338. 48. J. Kahn, Asymptotics of the list-chromatic index for multigraphs, Random Structures & Algorithms 17 (2000), no. 2, 117–156. 49. W. Kaz´mierczak, An algorithm that chooses the maximum element of a partially ordered set, Master’s thesis, Politechnika Wroc lawska, 2006 (Polish). 50. A. V. Kostochka, Acyclic 6-coloring of planar graphs, Diskret. Analiz. 28 (1976), 40 (Russian). 51. , Upper bounds of chromatic functions of graphs, 1978. 52. K. Kothapalli, S. Varagani, V. C. Venkaiah, and K. Yadav, Acyclic vertex coloring of graphs of maximum degree 5, International conference on graph theory and its applications, 2008. 53. , Acyclic vertex coloring of graphs of maximum degree 6, LAGOS: Latin-American algorithms, graphs and optimization symposium, 2009. 54. J. Kozik, Dynamic threshold strategy for universal best choice problem, 21st international meeting on probabilistic, combinatorial, and asymptotic methods in the analysis of algorithms, 2010, pp. 439–452. 55. G. Kubicki and M. Morayne, Graph-theoretic generalization of the secretary problem: the directed path case, SIAM J. Discrete Math. 19 (2005), no. 3, 622–632. 56. M. Kuchta and M. Morayne, A secretary problem with many lives, submitted. 57. D. V. Lindley, Dynamic programming and decision theory, J. R. Stat. Soc. Ser. C. Appl. Stat. 10 (1961), no. 1, 39–51. 58. J. Mitchem, Every planar graph has an acyclic 8-coloring, Duke Math. J. 41 (1974), no. 1, 177–181. 59. M. Molloy and B. Reed, Further algorithmic aspects of the local lemma, Proceedings of the thirtieth annual ACM symposium on theory of computing, 1998, pp. 524–529. 60. M. Morayne, Partial-order analogue of the secretary problem: the binary tree case, Discrete Math. 184 (1998), no. 1–3, 165–181. 61. M. Morayne and M. Sulkowska, Graph-theoretic generalization of the best choice problem; randomized analysis of a simple effective algorithm for k-ary trees, submitted. 136 BIBLIOGRAPHY 62. L. Moser, On a problem of Cayley, Scripta Mathematica 22 (1956), no. 5, 289–292. 63. R. Muthu, N. Narayanan, and C. R. Subramanian, Improved bounds on acyclic edge colouring, Dis- crete Math. 307 (2007), no. 23, 3063–3069. 64. C. St.J. A. Nash-Williams, Edge-disjoint spanning trees of finite graphs, J. Lond. Math. Soc. 36 (1961), no. 1, 445–450. 65. , Decomposition of finite graphs into forests, J. Lond. Math. Soc. 39 (1964), no. 1, 12. 66. J. Nesˇetrˇil and N. C. Wormald, The acyclic edge chromatic number of a random d-regular graph is d + 1, J. Graph Theory 49 (2005), no. 1, 69–74. 67. J. D. Petruccelli, Best-choice problems involving uncertainty of selection and recall of observations, J. Appl. Probab. 18 (1981), no. 2, 415–425. 68. J. Preater, The best-choice problem for partially ordered objects, Oper. Res. Lett. 25 (1999), no. 4, 187–190. 69. E. L. Presman and I. M. Sonin, The best choice problem for a random number of objects, Theory Probab. Appl. 17 (1973), no. 4, 657–668. 70. M. Przykucki, Optimal stopping in a search for a vertex with full degree in a random graph, submitted. 71. M. Przykucki and M. Sulkowska, Gusein-Zade problem for directed path, Discrete Optim. 7 (2010), no. 1–2, 13–20. 72. G. Ringel and J. W. T. Youngs, Solution of the Heawood map-coloring problem, Proc. Natl. Acad. Sci. USA 60 (1968), no. 2, 438–445. 73. S. Skulrattanakulchai, Acyclic colorings of subcubic graphs, Inform. Process. Lett. 92 (2004), no. 4, 161–167. 74. M. H. Smith, A secretary problem with uncertain employment, J. Appl. Probab. 12 (1975), no. 3, 620–624. 75. W. Stadje, Efficient stopping of a random series of partially ordered points, Multiple criteria decision making theory and applications, 1980, pp. 430–447. 76. C. Thomassen, Every planar graph is 5-choosable, J. Combin. Theory Ser. B 62 (1994), no. 1, 180–181. 77. J. Tkocz, The best choice problem for almost linear orders, submitted. 78. V. G. Vizing, On an estimate of the chromatic class of a p-graph, Diskret. Analiz. 3 (1964), 25–30 (Russian). 79. M. Voigt, List colourings of planar graphs, Discrete Math. 120 (1993), no. 1–3, 215–219. 80. M. C. K. Yang, Recognizing the maximum of a random sequence based on relative rank with backward solicitation, J. Appl. Probab. 11 (1974), no. 3, 504–512.