Repository logo

Firmament: Fast, Centralized Cluster Scheduling at Scale

Accepted version


Conference Object

Change log


Gog, IC 
Schwarzkopf, M 
Gleave, A 
Watson, RNM 
Hand, S 


Centralized datacenter schedulers can make high-quality placement decisions when scheduling tasks in a cluster. Today, however, high-quality placements come at the cost of high latency at scale, which degrades response time for interactive tasks and reduces cluster utilization. This paper describes Firmament, a centralized scheduler that scales to over ten thousand machines at sub- second placement latency even though it continuously reschedules all tasks via a min-cost max-flow (MCMF) optimization. Firmament achieves low latency by using multiple MCMF algorithms, by solving the problem incrementally, and via problem-specific optimizations. Experiments with a Google workload trace from a 12,500-machine cluster show that Firmament improves placement latency by 20 x over Quincy [22], a prior centralized scheduler using the same MCMF optimiza- tion. Moreover, even though Firmament is centralized, it matches the placement latency of distributed schedulers for workloads of short tasks. Finally, Firmament exceeds the placement quality of four widely-used central- ized and distributed schedulers on a real-world cluster, and hence improves batch task response time by 6 x.



Journal Title

Symposium on Operating Systems Design and Implementation

Conference Name

12th USENIX Symposium on Operating Systems Design and Implementation

Journal ISSN

Volume Title


This work was supported by a Google European Doc- toral Fellowship, by NSF award CNS-1413920, and by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL), under contract FA8750-11-C-0249.