Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization
View / Open Files
Publication Date
2020Journal Title
Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
Conference Name
2020 IEEE International Conference on Big Data (Big Data)
ISSN
2639-1589
ISBN
9781728162515
Publisher
IEEE
Pages
266-275
Type
Conference Object
This Version
AM
Metadata
Show full item recordCitation
Fekry, A., Carata, L., Pasquier, T., & Rice, A. (2020). Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization. Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020, 266-275. https://doi.org/10.1109/BigData50022.2020.9378085
Abstract
One of the key challenges for data analytics deployment is configuration tuning. The existing approaches for configuration tuning are expensive and overlook the dynamic characteristics of the analytics environment (i.e. frequent changes in workload due to receiving evolving input sizes or change in the underlying cluster environment). Such workload/environment changes can cause significant performance degradation, with retuning the configuration to accommodate those changes can yield up to 85\% potential execution time saving.
We propose SimTune, an approach that accommodates such changes through efficient configuration tuning.
SimTune combines
workload characterization and Multitask Bayesian optimization to identify similarity across workloads and accelerate finding near-optimal configurations. Our experimental results show that SimTune reduces the search time for finding
close-to-optimal configurations by 56-73\% (at the median) when compared to existing state-of-the-art techniques. This means that the
amortization of the tuning cost happens significantly faster, enabling
practical tuning in the rapidly changing environment of distributed analytics.
Sponsorship
Google Cloud, Amazon AWS
Identifiers
External DOI: https://doi.org/10.1109/BigData50022.2020.9378085
This record's URL: https://www.repository.cam.ac.uk/handle/1810/312750
Rights
All rights reserved
Licence:
http://www.rioxx.net/licenses/all-rights-reserved
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.