To Tune or Not to Tune?: In Search of Optimal Configurations for Data Analytics
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Fekry, A
Carata, L
Pasquier, T
Rice, Andrew https://orcid.org/0000-0002-4677-8032
Hopper, Andrew https://orcid.org/0000-0002-8169-335X
Abstract
This experimental study presents several overlooked issues that pose a challenge for data analytics configuration tuning and deployment. These issues include: 1) the assumption of static workload/environment ignoring the dynamic characteristics of the analytics environment (e.g. the frequent need for workload retuning). 2) the speed of tuning cost amortization and how this influences the tuning decision. 3) the need for a comprehensive incremental tuning for a diverse set of workloads. To prove our point, we present Tuneful, an efficient configuration tuning framework for data analytics. We show how it is designed to overcome the above issues and illustrate its applicability by experimenting with it on two cloud service providers.
Description
Keywords
Data analytics, Configuration tuning, Bayesian Optimization, Cost amortization
Journal Title
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Conference Name
KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Journal ISSN
Volume Title
Publisher
ACM
Publisher DOI
Rights
All rights reserved