Balanced allocations under incomplete information: New settings and techniques
Repository URI
Repository DOI
Change log
Authors
Abstract
In the balanced allocations framework, there are 𝑚 balls to be allocated into 𝑛 bins with the aim of minimising the maximum load of any of the bins, or equivalently minimising the 𝑔𝑎𝑝, i.e., the difference between the maximum load and the average load. In this dissertation, we focus on the ℎ𝑒𝑎𝑣𝑖𝑙𝑦𝑙𝑜𝑎𝑑𝑒𝑑 𝑐𝑎𝑠𝑒 where 𝑚 ≫ 𝑛, which tends to be more challenging to analyse.
In a decentralised setting, the simplest process is OneChoice, which allocates each ball to a bin sampled uniformly at random. It is wellknown that w.h.p. Gap(𝑚) = Θ( sqrt( 𝑚/𝑛 · log 𝑛 ) ) for any 𝑚 ≫ 𝑛. A great improvement over this is the TwoChoice process [ABKU99, KLM96], which allocates each ball to the least loaded of 𝑡𝑤𝑜 bins sampled uniformly at random. Berenbrink, Czumaj, Steger, and Vöcking (2006) showed that w.h.p. Gap(𝑚) = log₂ log 𝑛 + Θ(1) for any 𝑚 ⩾ 𝑛. This improvement is known as the "power of two choices". It has found several applications in hashing, load balancing and routing; and its importance was recently recognised in the 2020 ACM Theory and Practice Award.
In this dissertation, we introduce a set of techniques based on 𝑝𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑠. These enable us to analyse (both in terms of gap and load distribution) a wide range of processes and settings in the heavilyloaded case and to establish interesting insights in the balanced allocations framework:

We analyse variants of the TwoChoice process which trade sample efficiency, completeness of information and gap guarantees. For the (1+β)process which mixes OneChoice and TwoChoice with probability β in (0, 1], we prove tight bounds for small and large β, extending the results of Peres, Talwar and Wieder (2015). Another sample efficient family is that of TwoThinning processes, which allocate to the two sampled bins in an online manner. For TwoThinning processes that use as a decision function thresholds relative to the average load or thresholds in the rank domain, we establish tight bounds and also resolve a conjecture by Feldheim and GurelGurevich (2021). We also quantify tradeoffs for twosample processes between the number of queries and the gap bound, establishing a "power of two queries" phenomenon.

We analyse the TwoChoice process with random, adversarial and delay noise, proving tight bounds for various settings. In the adversarial setting, the adversary can decide in which of the two sampled bins the ball is allocated to, only when the two loads differ by at most 𝑔. The analysis of this setting implies bounds for settings with random noise and delay.
For the setting where load information is updated periodically every 𝑏 steps, for 𝑏 = 𝑛 we tighten the bound of [BCEFN12] to Θ( log 𝑛 / log log 𝑛 ) and prove that TwoChoice is optimal in this setting for any in [𝑛 · exp(logᶜ 𝑛), 𝑛 log 𝑛] for any constant 𝑐 > 0. For 𝑏 in [𝑛 log 𝑛, 𝑛³], we show that TwoChoice achieves w.h.p. a Θ(𝑏/𝑛) gap, while surprisingly the (1+β)process with appropriately chosen β achieves w.h.p. a Θ( sqrt( 𝑏/𝑛 · log 𝑛) ) gap, which is optimal over a large family of processes. This proves that in the presence of outdated information, less aggressive strategies can outperform the greedy processes (such as TwoChoice), which has been empirically observed in the queuing setting [D00, M00] for centralised processes since 2000, but to the best of our knowledge has not been formally proven.

Next we analyse TwoChoice in the graphical setting, where bins are vertices of a graph and each ball is allocated to the lesser loaded of the vertices adjacent to a randomly sampled edge. We extend the results of Kenthapadi and Panigrahy (2006) proving that for dense expanders in the heavilyloaded case the gap is w.h.p. O(log log 𝑛). In the presence of weights, we make progress towards [Open Problem 1, PTW15] by proving that for graphs with conductance φ, the gap is w.h.p. Ο(log 𝑛 / φ).

Further, we introduce and analyse processes which can allocate more than one balls to a sampled bin. We prove that these processes achieve w.h.p. an O(log 𝑛) gap (which also applies for any 𝑑regular graph), while still being more sampleefficient than OneChoice ("power of filling").

For the Memory process that can store bins in a cache, we generalise the O(log log 𝑛) gap bound by Mitzenmacher, Prabhakar and Shah (2002) to the heavilyloaded case and prove a matching lower bound. Further, in the presence of heterogeneous sampling distributions, we establish a striking difference between TwoChoice (or even 𝑑Choice with 𝑑 = O(1)) and Memory, showing that for the later the gap is bounded, while for the former it is known to diverge [W07] ("power of memory").