1. 中文
    1. 简介
  2. English
    1. Introduction

Rethinking Atrous Convolution for Semantic Image Segmentation
Paper: https://arxiv.org/abs/1706.05587

在这篇论文,作者在 deeplab 的基础上重新研究了 atrous convolution,并提出了 cascaded module 和 parallel module 两种结构。

中文

简介

对于语义分割,考虑使用DCNN的两个挑战:

  1. 由于池化和卷积的降采样操作,导致细节空间信息大大减少。

    使用 atrous convolution 可以减缓这个问题。

  2. 物体有不同尺度。

    Figure 2

    Figure 2. Alternative architectures to capture multi-scale context.

    有四类方法来解决此问题:

    1. 图像金字塔

      使用图像金字塔对不同尺度输入提取特征,不同尺度的物体在不同的特征图上突出。

      RCNN [61], Contextual Deep CRFs [48], Deeplab [11], Attention to Scale [10]

    2. 编码-解码

      取编码器的多尺度特征,并从解码器恢复空间分辨率。

      Segnet [3], U-net[63], Refinenet [47], GCN [60]

    3. 上下文模块

      在原始网络的顶部级联额外的模块,用于捕捉远程信息。如DenseCRF。

      DeepLab [9], Cascade [85], piecewise [48], DPN [52], Holistic [81]

    4. 空间金字塔池化

      空间金字塔池化使用多个速率和多个有效的感受野进行池化。

      Attention to Scale [10], PSPNet [84]

English

Introduction

For the task of semantic segmentation, we consider two challenges in applying Deep Convolutional Neural Networks (DCNNs).

  1. this invariance to local image transformation may impede dense prediction tasks, where detailed spatial information is desired.

    With atrous convolution, also known as dilated convolution, is able to control the resolution at which feature responses are computed within DCNNs without requiring learning extra parameters.

  2. the existence of objects at multiple scales.

    Several methods have been proposed to handle the problem and we mainly consider four categories:

    1. Image Pyramid

      an image pyramid to extract features for each scale input where objects at different scales become prominent at different feature maps.

    2. Encoder-Decoder

      exploits multi-scale features from the encoder part and recovers the spatial resolution from the decoder part.

    3. Context module

      extra modules are cascaded on top of the original network for capturing long range information.

    4. Spatial Pyramid Pooling

      spatial pyramid pooling probes an incoming feature map with filters or pooling operations at multiple rates and multiple effective field-of-views.