Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization, and Huffman Coding