Scheduling for Reduced Tail Task Latencies in Highly Utilised Datacenters
Repository URI
Repository DOI
Change log
Authors
Abstract
Modern datacenters have become the backbone for running diverse workloads that increasingly comprise data-parallel computational jobs. Due to the ease of use and diversity of resources they host there has been a rise in the demand for datacenters leading to high volumes of traffic. Datacenters execute thousands of jobs by scheduling billions of tasks every day. To meet these demands, datacenter providers operate their clusters at levels of high utilisation. We show that under such conditions existing scheduling designs impose 3x larger wait times on tail tasks, that is, tasks that finish the last among tasks of jobs. This leads to large tail task completion times and consequently, elevated job completion times.
In this dissertation, we propose a new decentralised scheduling model, Murmuration, that uses multiple communicating scheduler instances to ensure tasks are scheduled in a manner that reduces their total wait times. It achieves this by scheduling all tasks of a job such that their start times are as close together as possible, thereby ensuring small tail task completion times and better average job completion times.
Our evaluation of Murmuration using publicly available workloads on a real-world cluster shows a median job completion time that is 15 --- 25% less than that of the default Kubernetes scheduler under both bursty and normal request arrival rates, reducing the median completion times by a fourth. We show that Murmuration scales to incoming workloads by scheduling more than a million tasks in a matter of minutes and exhibits a near-constant scheduling time for tasks even in very large scheduler deployments. We further enhance the design of Murmuration by incorporating queue re-ordering techniques for both scheduling jobs and executing tasks. Simulations show that with shortest remaining job first policy Murmuration outperforms other schedulers with a median job completion time that is 100x better than that of the current schedulers when evaluated on two industry workloads. Murmuration is also capable of handling scheduler failures with a performance impact of a mere 0.17% increase in job completion times at the 99th percentile. Moreover, increasing the number of scheduler instances in Murmuration improves its performance by reducing the total wait times of jobs assigned to the new schedulers.
