神经网络模型压缩与加速

模型压缩
1. 改变网络结构
2. 只改变权值
  1. 量化
  2. 矩阵分解
平台加速
1. 软件加速
  1. 卷积计算加速
  2. 不同框架速度不同
2. 硬件加速

介绍神经网络（主要是CNN）模型压缩与加速的常见方法

目标：模型运行速度尽可能快，大小尽可能小，准确率尽可能保持不变

模型压缩

改变网络结构

使用特定结构

如 ShuffleNet, MobileNet, Xception, SqueezeNet

MobileNet

把普通卷积操作分成两部分
- Depthwise Convolution
  
  计算量 \(D_K \cdot D_K \cdot M \cdot D_F \cdot D_F\)
- Pointwise Convolution
  
  计算量 \(M \cdot N \cdot D_F \cdot D_F\)
上面两步合称Depthwise Separable Convolution

与原卷积计算量之比 \(\frac{D_K \cdot D_K \cdot M \cdot D_F \cdot D_F + M \cdot N \cdot D_F \cdot D_F}{D_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_F} = \frac{1}{N} + \frac{1}{D_K^2}\)

参考：

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices https://arxiv.org/abs/1707.01083
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/abs/1704.04861 https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
Xception: Deep Learning with Depthwise Separable Convolutions https://arxiv.org/abs/1610.02357
(ICLR 2017) Squeezenet: Alexnet-level Accuracy with 50x Fewer Parameters and <0.5MB Model Size https://arxiv.org/abs/1602.07360 https://github.com/DeepScale/SqueezeNet

剪枝

裁剪连接、滤波器

权值稀疏化

参考：

(ICLR 2017) Pruning Filters for Efficient Convnets https://arxiv.org/abs/1608.08710
(ICLR 2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding https://arxiv.org/abs/1510.00149 https://github.com/songhan/Deep-Compression-AlexNet
(NIPS 2015) Learning both Weights and Connections for Efficient Neural Networks https://arxiv.org/abs/1506.02626

蒸馏

用一个性能好的大网络来教小网络学习，使小网络具备跟大网络一样的性能，但参数规模小

训练小模型 (distilled model) 的目标函数由两部分组成：

与大模型的softmax输出的交叉熵，称为软目标
与groundtruth的交叉熵

训练的损失为上述两项损失的加权和

参考：

(ICLR 2017) Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer https://arxiv.org/abs/1612.03928
(NIPSW 2014) Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531

只改变权值

量化

把网络权值从高精度转化成低精度（32位浮点数 float32 转化成 8位定点数 int8 或二值化为 1 bit），但模型准确率等指标与原来相近，模型大小变小，运行速度加快。

参考：

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients https://arxiv.org/abs/1606.06160 https://github.com/ppwwyyxx/tensorpack/tree/master/examples/DoReFa-Net
(ECCV 2016) XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks https://arxiv.org/abs/1603.05279 https://github.com/allenai/XNOR-Net
Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1 https://arxiv.org/abs/1602.02830 https://github.com/MatthieuCourbariaux/BinaryNet
BinaryConnect: Training Deep Neural Networks with binary weights during propagations https://arxiv.org/abs/1511.00363 https://github.com/MatthieuCourbariaux/BinaryConnect
(CVPR 2016) Quantized Convolutional Neural Networks for Mobile Devices https://github.com/jiaxiang-wu/quantized-cnn
(ICLR 2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding https://arxiv.org/abs/1510.00149 https://github.com/songhan/Deep-Compression-AlexNet

矩阵分解

低秩分解（SVD分解、Tucker分解、Block Term分解）

原理：权值向量主要分布在一些低秩子空间，用少数基来重构权值矩阵

参考：

(ICCV 2017) Coordinating Filters for Faster Deep Neural Networks https://arxiv.org/abs/1703.09746 https://github.com/wenwei202/caffe
(TPAMI 2015) Accelerating Very Deep Convolutional Networks for Classification and Detection https://arxiv.org/abs/1505.06798
(NIPS 2014) Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation https://papers.nips.cc/paper/5544-exploiting-linear-structure-within-convolutional-networks-for-efficient-evaluation

平台加速

软件加速

卷积计算加速

im2col + GEMM：将问题转化为矩阵乘法后使用矩阵运算库

参考：
1. (ICML 2017) MEC: Memory-efficient Convolution for Deep Neural Network https://arxiv.org/abs/1706.06873
FFT变换：时域卷积等于频域相乘，将问题转化为简单的乘法问题

参考：
1. (BMVC 2015) Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add https://arxiv.org/abs/1601.06815
2. (ICLR 2015) Fast Convolutional Nets With fbfft: A GPU Performance Evaluation https://arxiv.org/abs/1412.7580
3. Fast Training of Convolutional Networks through FFTs https://arxiv.org/abs/1312.5851
Winograd

参考：
1. (CODES 2016) Zero and data reuse-aware fast convolution for deep neural networks on gpu
2. (CVPR 2016) Fast Algorithms for Convolutional Neural Networks https://arxiv.org/abs/1509.09308

不同框架速度不同

TensorFlow Mobile, TensorFlow Lite https://www.tensorflow.org/mobile/
Caffe2 https://caffe2.ai/
腾讯ncnn（不依赖 BLAS/NNPACK 等计算框架，NEON优化，多核并行） https://github.com/Tencent/ncnn/
百度mdl（无任何第三方依赖，汇编优化，NEON优化） https://github.com/baidu/mobile-deep-learning

硬件加速

多GPU并行
多核并行
使用硬件提供最优的指令

如支持arm64则编译64位的库而非32位的库

如使用支持相应 SIMD (Single Instruction, Multiple Data) 的库

参考：

(ISCA 2016) EIE: Efficient Inference Engine on Compressed Deep Neural Network https://arxiv.org/abs/1602.01528

神经网络模型压缩与加速

模型压缩

改变网络结构

使用特定结构

剪枝

蒸馏

只改变权值

量化

矩阵分解

平台加速

软件加速

卷积计算加速

不同框架速度不同

硬件加速

Jiancheng Li

ICPC2013南京邀请赛 E. Eloh 解题报告

Jekyll

教务系统课程评价自动填写

Android Handler的处理方式

Android客户端使用socket通信

Java服务器使用socket通信

C#进制间的转换

Code Hunt SECTOR 02 LOOPS

Python使用socket通信

Android客户端使用HTTP通信