Repository logo
 

Accelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization

cam.orpheus.counter24
cam.orpheus.successMon Apr 26 07:33:16 BST 2021 - Embargo updated
dc.contributor.authorFekry, A
dc.contributor.authorCarata, L
dc.contributor.authorPasquier, T
dc.contributor.authorRice, A
dc.contributor.orcidRice, Andrew [0000-0002-4677-8032]
dc.date.accessioned2020-11-11T00:30:34Z
dc.date.available2020-11-11T00:30:34Z
dc.date.issued2020
dc.description.abstractOne of the key challenges for data analytics deployment is configuration tuning. The existing approaches for configuration tuning are expensive and overlook the dynamic characteristics of the analytics environment (i.e. frequent changes in workload due to receiving evolving input sizes or change in the underlying cluster environment). Such workload/environment changes can cause significant performance degradation, with retuning the configuration to accommodate those changes can yield up to 85\% potential execution time saving. We propose SimTune, an approach that accommodates such changes through efficient configuration tuning. SimTune combines workload characterization and Multitask Bayesian optimization to identify similarity across workloads and accelerate finding near-optimal configurations. Our experimental results show that SimTune reduces the search time for finding close-to-optimal configurations by 56-73\% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning in the rapidly changing environment of distributed analytics.
dc.description.sponsorshipGoogle Cloud, Amazon AWS
dc.identifier.doi10.17863/CAM.59851
dc.identifier.isbn9781728162515
dc.identifier.issn2639-1589
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/312750
dc.language.isoeng
dc.publisherIEEE
dc.publisher.urlhttp://dx.doi.org/10.1109/bigdata50022.2020.9378085
dc.rightsAll rights reserved
dc.subject4606 Distributed Computing and Systems Software
dc.subject46 Information and Computing Sciences
dc.titleAccelerating the Configuration Tuning of Big Data Analytics with Similarity-aware Multitask Bayesian Optimization
dc.typeConference Object
dcterms.dateAccepted2020-10-20
prism.endingPage275
prism.publicationDate2020
prism.publicationNameProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
prism.startingPage266
pubs.conference-finish-date2020-12-13
pubs.conference-name2020 IEEE International Conference on Big Data (Big Data)
pubs.conference-start-date2020-12-10
rioxxterms.licenseref.startdate2020-12-10
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.typeConference Paper/Proceeding/Abstract
rioxxterms.versionAM
rioxxterms.versionofrecord10.1109/BigData50022.2020.9378085

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
paper.pdf
Size:
550.25 KB
Format:
Adobe Portable Document Format
Description:
Accepted version
Licence
http://www.rioxx.net/licenses/all-rights-reserved
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
DepositLicenceAgreementv2.1.pdf
Size:
150.9 KB
Format:
Adobe Portable Document Format