Structural Priors in Deep Neural Networks
Deep learning has in recent years come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite breakthroughs in training deep networks, there remains a lack of understanding of both the optimization and structure of deep networks. The approach advocated by many researchers in the field has been to train monolithic networks with excess complexity, and strong regularization --- an approach that leaves much to desire in efficiency. Instead we propose that carefully designing networks in consideration of our prior knowledge of the task and learned representation can improve the memory and compute efficiency of state-of-the art networks, and even improve generalization --- what we propose to denote as structural priors.
We present two such novel structural priors for convolutional neural networks, and evaluate them in state-of-the-art image classification CNN architectures. The first of these methods proposes to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small, low-rank, filters. The second addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. The size of these channel-wise basis filters increases with the depth of the model, giving a novel sparse connection structure that resembles a tree root. Both methods are found to improve the generalization of these architectures while also decreasing the size and increasing the efficiency of their training and test-time computation.
Finally, we present work towards conditional computation in deep neural networks, moving towards a method of automatically learning structural priors in deep networks. We propose a new discriminative learning model, conditional networks, that jointly exploit the accurate representation learning capabilities of deep neural networks with the efficient conditional computation of decision trees. Conditional networks yield smaller models, and offer test-time flexibility in the trade-off of computation vs. accuracy.