Gradient-Based Competitive Learning: Theory
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
jats:titleAbstract</jats:title>jats:pDeep learning has been recently used to extract the relevant features for representing input data also in the unsupervised setting. However, state-of-the-art techniques focus mostly on algorithmic efficiency and accuracy rather than mimicking the input manifold. On the contrary, competitive learning is a powerful tool for replicating the input distribution topology. It is cognitive/biologically inspired as it is founded on Hebbian learning, a neuropsychological theory claiming that neurons can increase their specialization by competing for the right to respond to/represent a subset of the input data. This paper introduces a novel perspective by combining these two techniques: unsupervised gradient-based and competitive learning. The theory is based on the intuition that neural networks can learn topological structures by working directly on the transpose of the input matrix. At this purpose, the vanilla competitive layer and its dual are presented. The former is representative of a standard competitive layer for deep clustering, while the latter is trained on the transposed matrix. The equivalence of the layers is extensively proven both theoretically and experimentally. The dual competitive layer has better properties. Unlike the vanilla layer, it directly outputs the prototypes of the data inputs, while still allowing learning by backpropagation. More importantly, this paper proves theoretically that the dual layer is better suited for handling high-dimensional data (e.g., for biological applications), because the estimation of the weights is driven by a constraining subspace which does not depend on the input dimensionality, but only on the dataset cardinality. This paper has introduced a novel approach for unsupervised gradient-based competitive learning. This approach is very promising both in the case of small datasets of high-dimensional data and for better exploiting the advantages of a deep architecture: the dual layer perfectly integrates with the deep layers. A theoretical justification is also given by using the analysis of the gradient flow for both vanilla and dual layers.</jats:p>
Description
Funder: Politecnico di Torino
Keywords
Journal Title
Conference Name
Journal ISSN
1866-9964