Repository logo
 

Elucidating the solution structure of the K-means cost function using energy landscape theory

Accepted version
Peer-reviewed

Change log

Authors

Wales, David J 

Abstract

The K-means algorithm, routinely used in many scientific fields, generates clustering solutions that depend on the initial cluster coordinates. The number of solutions may be large, which can make locating the global minimum challenging. Hence, the topography of the cost function surface is crucial to understanding the performance of the algorithm. Here we employ the energy landscape approach to elucidate the topography of the K-means cost function surface for Fisher’s Iris dataset. For any number of clusters we find that the solution landscapes have a funnelled structure that is usually associated with efficient global optimisation. An analysis of the barriers between clustering solutions shows that the funnelled structures result from remarkably small barriers between almost all clustering solutions. The funnelled structure becomes less well defined as the number of clusters increases, and we analyse kinetic analogues to quantify the increased difficulty of locating the global minimum for these different landscapes.

Description

Keywords

40 Engineering, 34 Chemical Sciences, 51 Physical Sciences

Journal Title

The Journal of Chemical Physics

Conference Name

Journal ISSN

0021-9606
1089-7690

Volume Title

Publisher

AIP Publishing
Sponsorship
Engineering and Physical Sciences Research Council (EP/L015552/1)
EPSRC (1819290)