Repository logo
 

Federated Linear Dimensionality Reduction

cam.depositDate2022-01-17
cam.restrictionthesis_access_open
cam.supervisorMascolo, Cecilia
cam.supervisorCrowcroft, Jon
cam.supervisor.orcidCrowcroft, Jonathon [0000-0002-7013-0121]
dc.contributor.authorGrammenos, Andreas
dc.contributor.orcidGrammenos, Andreas [0000-0002-2525-5101]
dc.date.accessioned2022-01-21T02:33:02Z
dc.date.available2022-01-21T02:33:02Z
dc.date.submitted2021-04-30
dc.date.updated2022-01-17T21:17:46Z
dc.description.abstractIn recent years, the explosive rate of dataset expansion has offered the ability for researchers to access an unprecedented amount of information. Moreover, in addition to actual dataset size increases, the types and locations of data generators are more heterogeneous than ever -ranging from traditional servers to a myriad of IoT devices. These facts, coupled with recent emphasis on privacy and data-ownership, led to the creation of federated datasets. Such datasets are characterised by their massive size and are usually scattered across decentralised edge devices, each holding their local data samples. As exciting as these federated datasets might be, they introduce an astounding challenge: how to efficiently process federated data at scale? Naturally, given these constraints, centralisation of such datasets is often intractable, thus making traditional analytical methods inapplicable. This thesis introduces a suite of mathematical advancements that makes traditional learning algorithms applicable to the federated setting, by summarising the massive amounts of information into succinct dataset-specific representations. Concretely, we focus primarily on linear dimensionality reduction and, in particular, on Principal Component Analysis (PCA) due to its pervasiveness, along with its ability to process unstructured data. The first advancement we introduce is a novel algorithm to perform streaming and memory-limited dimesionality reduction at the edge that uses a generalisation of incremental Singular Value Decomposition (SVD). Further, we provide a rank-adaptive SVD extension able to account for distribution shifts over-time. Subsequently, building upon previous constructions, we present an (ε,δ)-differentially private federated algorithm for PCA. To achieve federation, we put forth a lightweight merging algorithm that unlocks the ability to process each subproblem locally at the edge which, in turn, through merging is propagated accordingly. We are able to guarantee differential privacy via an input-perturbation scheme in which the covariance matrix of a dataset is perturbed with a non-symmetric random Gaussian matrix. To evaluate the practicality of our innovations, we describe an algorithm able to perform task scheduling on federated data centres. The scheduler enables each decentralised node to incrementally compute its local model and independently execute scheduling decisions on whether to accept an incoming job based on the workload seen thus far. Finally, we complement our findings with an evaluation on synthetic as well as real-world datasets including sensor node measurements, hard-written images, and wine quality readings, considering a wide range of data modalities and dimensionalities.
dc.description.sponsorshipThis thesis and associated studentship was supported by The Alan Turing Institute under grants: TU/C/000003, {TU/B/000069}, and {EP/N510129/1}.
dc.identifier.doi10.17863/CAM.80273
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/332839
dc.language.isoeng
dc.publisher.collegeWolfson
dc.publisher.institutionUniversity of Cambridge
dc.rightsAll Rights Reserved
dc.rights.urihttps://www.rioxx.net/licenses/all-rights-reserved/
dc.subjectfederated learning
dc.subjectmachine learning
dc.subjectcomputer science
dc.subjectdimensionality reduction
dc.subjectprincipal component analysis
dc.titleFederated Linear Dimensionality Reduction
dc.typeThesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (PhD)
pubs.licence-display-nameApollo Repository Deposit Licence Agreement
pubs.licence-identifierapollo-deposit-licence-2-1
rioxxterms.licenseref.urihttps://www.rioxx.net/licenses/all-rights-reserved/
rioxxterms.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
gram-phd-thesis.pdf
Size:
4.56 MB
Format:
Adobe Portable Document Format
Description:
Thesis
Licence
https://www.rioxx.net/licenses/all-rights-reserved/