1. 模型压缩
    1. 改变网络结构
      1. 使用特定结构
      2. 剪枝
      3. 蒸馏
    2. 只改变权值
      1. 量化
      2. 矩阵分解
  2. 平台加速
    1. 软件加速
      1. 卷积计算加速
      2. 不同框架速度不同
    2. 硬件加速

介绍神经网络(主要是CNN)模型压缩与加速的常见方法

目标:模型运行速度尽可能快,大小尽可能小,准确率尽可能保持不变

模型压缩

改变网络结构

使用特定结构

如 ShuffleNet, MobileNet, Xception, SqueezeNet

  • MobileNet

    把普通卷积操作分成两部分

    • Depthwise Convolution

      计算量 \(D_K \cdot D_K \cdot M \cdot D_F \cdot D_F\)

    • Pointwise Convolution

      计算量 \(M \cdot N \cdot D_F \cdot D_F\)

    上面两步合称Depthwise Separable Convolution

    与原卷积计算量之比 \(\frac{D_K \cdot D_K \cdot M \cdot D_F \cdot D_F + M \cdot N \cdot D_F \cdot D_F}{D_K \cdot D_K \cdot M \cdot N \cdot D_F \cdot D_F} = \frac{1}{N} + \frac{1}{D_K^2}\)

参考:

  1. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices https://arxiv.org/abs/1707.01083

  2. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/abs/1704.04861 https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md

  3. Xception: Deep Learning with Depthwise Separable Convolutions https://arxiv.org/abs/1610.02357

  4. (ICLR 2017) Squeezenet: Alexnet-level Accuracy with 50x Fewer Parameters and <0.5MB Model Size https://arxiv.org/abs/1602.07360 https://github.com/DeepScale/SqueezeNet

剪枝

裁剪连接、滤波器

权值稀疏化

参考:

  1. (ICLR 2017) Pruning Filters for Efficient Convnets https://arxiv.org/abs/1608.08710

  2. (ICLR 2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding https://arxiv.org/abs/1510.00149 https://github.com/songhan/Deep-Compression-AlexNet

  3. (NIPS 2015) Learning both Weights and Connections for Efficient Neural Networks https://arxiv.org/abs/1506.02626

蒸馏

用一个性能好的大网络来教小网络学习,使小网络具备跟大网络一样的性能,但参数规模小

训练小模型 (distilled model) 的目标函数由两部分组成:

  1. 与大模型的softmax输出的交叉熵,称为软目标
  2. 与groundtruth的交叉熵

训练的损失为上述两项损失的加权和

参考:

  1. (ICLR 2017) Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer https://arxiv.org/abs/1612.03928

  2. (NIPSW 2014) Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531

只改变权值

量化

把网络权值从高精度转化成低精度(32位浮点数 float32 转化成 8位定点数 int8 或二值化为 1 bit),但模型准确率等指标与原来相近,模型大小变小,运行速度加快。

参考:

  1. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients https://arxiv.org/abs/1606.06160 https://github.com/ppwwyyxx/tensorpack/tree/master/examples/DoReFa-Net

  2. (ECCV 2016) XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks https://arxiv.org/abs/1603.05279 https://github.com/allenai/XNOR-Net

  3. Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1 https://arxiv.org/abs/1602.02830 https://github.com/MatthieuCourbariaux/BinaryNet

  4. BinaryConnect: Training Deep Neural Networks with binary weights during propagations https://arxiv.org/abs/1511.00363 https://github.com/MatthieuCourbariaux/BinaryConnect

  5. (CVPR 2016) Quantized Convolutional Neural Networks for Mobile Devices https://github.com/jiaxiang-wu/quantized-cnn

  6. (ICLR 2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding https://arxiv.org/abs/1510.00149 https://github.com/songhan/Deep-Compression-AlexNet

矩阵分解

低秩分解(SVD分解、Tucker分解、Block Term分解)

原理:权值向量主要分布在一些低秩子空间,用少数基来重构权值矩阵

参考:

  1. (ICCV 2017) Coordinating Filters for Faster Deep Neural Networks https://arxiv.org/abs/1703.09746 https://github.com/wenwei202/caffe

  2. (TPAMI 2015) Accelerating Very Deep Convolutional Networks for Classification and Detection https://arxiv.org/abs/1505.06798

  3. (NIPS 2014) Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation https://papers.nips.cc/paper/5544-exploiting-linear-structure-within-convolutional-networks-for-efficient-evaluation

平台加速

软件加速

卷积计算加速

  • im2col + GEMM:将问题转化为矩阵乘法后使用矩阵运算库

    参考:

    1. (ICML 2017) MEC: Memory-efficient Convolution for Deep Neural Network https://arxiv.org/abs/1706.06873
  • FFT变换:时域卷积等于频域相乘,将问题转化为简单的乘法问题

    参考:

    1. (BMVC 2015) Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add https://arxiv.org/abs/1601.06815

    2. (ICLR 2015) Fast Convolutional Nets With fbfft: A GPU Performance Evaluation https://arxiv.org/abs/1412.7580

    3. Fast Training of Convolutional Networks through FFTs https://arxiv.org/abs/1312.5851

  • Winograd

    参考:

    1. (CODES 2016) Zero and data reuse-aware fast convolution for deep neural networks on gpu

    2. (CVPR 2016) Fast Algorithms for Convolutional Neural Networks https://arxiv.org/abs/1509.09308

不同框架速度不同

  1. TensorFlow Mobile, TensorFlow Lite https://www.tensorflow.org/mobile/
  2. Caffe2 https://caffe2.ai/
  3. 腾讯ncnn(不依赖 BLAS/NNPACK 等计算框架,NEON优化,多核并行) https://github.com/Tencent/ncnn/
  4. 百度mdl(无任何第三方依赖,汇编优化,NEON优化) https://github.com/baidu/mobile-deep-learning

硬件加速

  1. 多GPU并行

  2. 多核并行

  3. 使用硬件提供最优的指令

    如支持arm64则编译64位的库而非32位的库

    如使用支持相应 SIMD (Single Instruction, Multiple Data) 的库

参考:

  1. (ISCA 2016) EIE: Efficient Inference Engine on Compressed Deep Neural Network https://arxiv.org/abs/1602.01528