神经网络权值量化

对神经网络的权值进行量化，使模型大小变小，运行速度变快，且准确率与原来相近。

什么是量化

把网络权值从高精度转化成低精度（32位浮点数 float32 转化成 8位定点数 int8 或二值化为 1 bit），但模型准确率等指标与原来相近，模型大小变小，运行速度加快。

量化可以看作是噪声的一种来源，所以量化后的模型效果与原来相近。

优点
1. 模型变小，运行速度变快。
2. int8 只需 float32 内存带宽的25％，可以更好使用缓存并避免 RAM 访问出现瓶颈。
3. 每个时钟周期执行更多的 SIMD 操作。
4. 如有加速8位计算的 DSP 芯片则更快。
缺点

效果稍差。

先训练模型，再进行量化，测试时使用量化后的模型。

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients https://arxiv.org/abs/1606.06160 https://github.com/ppwwyyxx/tensorpack/tree/master/examples/DoReFa-Net
(ECCV 2016) XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks https://arxiv.org/abs/1603.05279 https://github.com/allenai/XNOR-Net
Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1 https://arxiv.org/abs/1602.02830 https://github.com/MatthieuCourbariaux/BinaryNet
BinaryConnect: Training Deep Neural Networks with binary weights during propagations https://arxiv.org/abs/1511.00363 https://github.com/MatthieuCourbariaux/BinaryConnect
(CVPR 2016) Quantized Convolutional Neural Networks for Mobile Devices https://github.com/jiaxiang-wu/quantized-cnn
(ICLR 2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding https://arxiv.org/abs/1510.00149 https://github.com/songhan/Deep-Compression-AlexNet