Bayesian pseudocoresets

Manousakas, D; Xu, Z; Mascolo, C; Campbell, T

Bayesian pseudocoresets

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/311879

Repository DOI

https://doi.org/10.17863/CAM.58971

Files

Accepted version (4.35 MB)

Type

Conference Object

Authors

Manousakas, Dionysios

https://orcid.org/0000-0002-3751-8781

Xu, Z

Mascolo, Cecilia

https://orcid.org/0000-0001-9614-4380

Campbell, T

Abstract

Standard Bayesian inference algorithms are prohibitively expensive in the regime of modern large-scale data. Recent work has found that a small, weighted subset of data (a coreset) may be used in place of the full dataset during inference, taking advantage of data redundancy to reduce computational cost. However, this approach has limitations in the increasingly common setting of sensitive, high-dimensional data. Indeed, we prove that there are situations in which the Kullback-Leibler (KL) divergence between the optimal coreset and the true posterior grows with data dimension; and as coresets include a subset of the original data, they cannot be constructed in a manner that preserves individual privacy. We address both of these issues with a single unified solution, Bayesian pseudocoresets—a small weighted collection of synthetic “pseudodata”—along with a variational optimization method to select both pseudodata and weights. The use of pseudodata (as opposed to the original datapoints) enables both the summarization of high-dimensional data and the differentially private summarization of sensitive data. Real and synthetic experiments on high-dimensional data demonstrate that Bayesian pseudocoresets achieve significant improvements in posterior approximation error compared to traditional coresets, and that pseudocoresets provide privacy without a significant loss in approximation quality.

Journal Title

Advances in Neural Information Processing Systems

Conference Name

Thirty-fourth Conference on Neural Information Processing Systems

Journal ISSN

1049-5258

Volume Title

2020-December

Publisher DOI

https://doi.org/10.17863/CAM.58971

Rights

Sponsorship

NSERC Discovery Grant, NSERC Discovery Launch Supplement, Nokia Bell Labs, Lundgren Fund, Darwin College Cambridge.

Collections

Cambridge University Research Outputs