Training Deep Neural Networks with Quantization and Structured Sparsity
Deep learning algorithms have shown tremendous success in applications ranging from self-driving cars to medical diagnosis. However, their usage on embedded platforms, such as mobile devices, has been limited due to high memory and computation requirements. The objective of this research is to compress neural network models in a hardware-friendly manner, while retaining their performance. The network is compressed by applying structured sparsity constraints along with network quantization constraints. There are promising trade-offs on test accuracy versus memory footprint results for certain precision and compression ratio settings. Future work includes performing an extended study using quantization and hierarchical coarse-grain sparsity.