Show simple item record

dc.contributor.authorBerrett, Thomas Benjamin
dc.date.accessioned2017-10-12T13:39:15Z
dc.date.available2017-10-12T13:39:15Z
dc.date.issued2017-10-01
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/267832
dc.description.abstractNearest neighbour methods are a classical approach in nonparametric statistics. The k-nearest neighbour classifier can be traced back to the seminal work of Fix and Hodges (1951) and they also enjoy popularity in many other problems including density estimation and regression. In this thesis we study their use in three different situations, providing new theoretical results on the performance of commonly-used nearest neighbour methods and proposing new procedures that are shown to outperform these existing methods in certain settings. The first problem we discuss is that of entropy estimation. Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this chapter, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko (1987), based on the k-nearest neighbour distances of a sample. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient in up to three dimensions. A related topic of study is the estimation of the mutual information between two random vectors, and its application to testing for independence. We propose tests for the two different situations of the marginal distributions being known or unknown and analyse their performance. Finally, we study the classical k-nearest neighbour classifier of Fix and Hodges (1951) and provide a new asymptotic expansion for its excess risk. We also show that, in certain situations, a new modification of the classifier that allows k to vary with the location of the test point can provide improvements. This has applications to the field of semi-supervised learning, where, in addition to labelled training data, we also have access to a large sample of unlabelled data.
dc.description.sponsorshipMy PhD was funded by a Sims Scholarship.
dc.language.isoen
dc.subjectNonparametric statistics
dc.subjectNearest neighbour methods
dc.subjectEntropy Estimation
dc.subjectIndependence Testing
dc.subjectClassification
dc.titleModern k-Nearest Neighbour Methods in Entropy Estimation, Independence Testing and Classification
dc.typeThesis
dc.type.qualificationlevelDoctoral
dc.type.qualificationnameDoctor of Philosophy (PhD)
dc.publisher.institutionUniversity of Cambridge
dc.publisher.departmentDPMMS
dc.date.updated2017-10-12T10:25:18Z
dc.identifier.doi10.17863/CAM.13756
dc.contributor.orcidBerrett, Thomas Benjamin [0000-0002-2005-110X]
dc.publisher.collegeGonvi
dc.type.qualificationtitlePhD in Mathematics
cam.supervisorSamworth, Richard
rioxxterms.freetoread.startdate2018-10-12


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record