Title | Pruning Convolutional Neural Networks to Make them More Efficient |
Authors | Mohammad Mahadi Hassain(wayez07@gmail.com) |
Supervisor | Dr. Nabeel Mohammed |
Semester | Fall, 2018 |
In recent years deep neural networks have achieved revolutionary performance on challenging tasks like speech recognition, computer vision and and natural language processing. It is maximizing the large-scale networks learning from big amount of data. Neural network models have millions of parameters requiring large amount of storage and are typically deployed in large data center back-end. This models are difficult to deploy on limited computation and energy constraint devices like mobile phones, and embedded devices. To solve this problem model pruning techniques are proposed in recent years. Many researches are going on to prune and compress the size of the model where reducing the number of parameters and operations from a model reduce the run time memory without losing the original accuracy. Pruning means removing weights or entire units from the network, while maintaining performance. Pruning strategies are basically focus on approximations like removing weights of the smallest magnitude, removing the least-sensitive neurons and ranking the weights by the sensitivity of the task performance with respect to the weights. In our work, we used clustering based pruning technique. Firstly, we got the weights of a layer and reshaped the weight matrix so that it can be made into a correlation matrix based on number of filters that layer has. Then we normalized the correlation matrix which gave us a cross correlation matrix. Then we found the distance matrix by 1 – cross correlation matrix. We clustered this matrix via Fcluster. This gave us a bunch of filter clusters. From each cluster we choosed one that has the highest average value with respect to other filters on the same filter and pruned the rest. Our retrained new model was able to maintain the same or achieve higher performance with smaller number of neurons.