Making Neural Networks run more Efficiently by Removing Redundant Connections

What is it about?

In this work, we present an algorithm, dubbed FlipOut, which can reliably identify and remove redundant connections in a neural network when it is training. In our experiments, we are able to remove more than 90% of the connections in the networks we tested on with little to no impact on performance, and achieved the best results in the literature when removing 99% or more connections. We take this a step further by also applying quantization. That is, each remaining connection is approximated by a less precise version of itself, which allows us to use less bits of memory (from 32 to 8 per connection). We find that these two methods, pruning and quantization, are complementary and work well with each other, allowing us to remove 75% of the connections while storing the remaining weights with 4 times less bits with little degradation in accuracy. Thus, a theoretical speedup of 16x can be achieved.

Why is it important?

Despite recent success in artificial intelligence, modern neural networks tend to be large and have high hardware requirements, making them inaccessible to most people as well as increasing their carbon footprint. Compression methods (such as pruning and quantization) have been researched to address this issue. However, typical pruning methods developed previously suffer from a few issues, namely that they are intractable to compute, require re-training a network multiple times, do not allow the user to select a specific level of sparsity or require careful manual tuning. Our work suffers from none of these issues, and we achieve competitive results with other methods. Moreover, we show that we can push the boundary of compression by quantizing the remaining connections.

The following have contributed to this page:

Andrei Apostol