Show simple item record

dc.contributor.authorChen, Yudong
dc.date.accessioned2022-06-14T12:36:09Z
dc.date.available2022-06-14T12:36:09Z
dc.date.submitted2021-11-01
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/338074
dc.description.abstractThe problem of changepoint detection and estimation has a long history, dating back to at least Page (1954, 1955). As modern technological advances, data sets of unprecedented size can be collected at high frequency. This provides statisticians with new challenges in this field. In this thesis we study the online version of the changepoint detection problem in high-dimensional settings. In Chapter 1, we survey the field of changepoint detection. We focus, in particular, on the comparison between the offline problem and the online problem, the contrast between the univariate setting and the high-dimensional setting and inference problems associated with changepoints. In Chapter 2, we introduce a novel method for high-dimensional, online changepoint detection in settings where a multivariate data stream may undergo a change in mean. The procedure works by performing Gaussian likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. Our algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Our procedure shows excellent performance compared to existing methods in the numerical studies. Our algorithm is implemented in the R package ocd, and we also demonstrate its utility on a seismology data set. In Chapter 3, we focus on the problem of inference for high-dimensional online changepoint detection. We propose a confidence interval for the changepoint location. The procedure first identifies coordinates with large signals and then combines univariate confidence intervals constructed from each of these coordinates. We prove that the confidence interval constructed has the desired coverage level and provide a guarantee on the length of the confidence interval. Our procedure also provides an estimate for the effective support of the signal as a byproduct. Simulations confirm the practical effectiveness of our proposal, and we also illustrate its applicability on both US excess deaths data from 2017-2020 and S&P 500 data from the 2007-2008 financial crisis.
dc.description.sponsorshipCantab Capital Institute for the Mathematics of Information
dc.rightsAll Rights Reserved
dc.rights.urihttps://www.rioxx.net/licenses/all-rights-reserved/
dc.subjectchangepoint detection
dc.subjectonline algorithm
dc.subjectinference
dc.subjectconfidence interval
dc.subjectsequential method
dc.subjectsparsity
dc.titleHigh-dimensional Online Changepoint Detection
dc.typeThesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (PhD)
dc.publisher.institutionUniversity of Cambridge
dc.date.updated2022-06-13T16:59:39Z
dc.identifier.doi10.17863/CAM.85483
rioxxterms.licenseref.urihttps://www.rioxx.net/licenses/all-rights-reserved/
rioxxterms.typeThesis
cam.supervisorSamworth, Richard
cam.supervisorWang, Tengyao
cam.supervisor.orcidSamworth, Richard [0000-0003-2426-4679]
cam.depositDate2022-06-13
pubs.licence-identifierapollo-deposit-licence-2-1
pubs.licence-display-nameApollo Repository Deposit Licence Agreement


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record