1. Problem
  2. Evaluation
  3. Dataset
  4. Samples of ADE20K
  5. Result
    1. Stanford Background
    2. SIFT Flow
    3. PASCAL-Context
  6. Reference

Scene Parsing

Problem

segment and parse an image into different image regions associated with semantic categories

Evaluation

  • mean of the pixel-wise accuracy

    the ratio of pixels which are correctly predicted.

  • class-wise IoU

    the Intersection of Union of pixels averaged over all the semantic categories.

Dataset

  • Stanford Background

    S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1–8, Sept 2009.

  • SIFT Flow

    C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing via label transfer. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(12):2368–2382, Dec 2011.

  • PASCAL-Context

    Mottaghi, Roozbeh, et al. "The role of context for object detection and semantic segmentation in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

  • ADE20K

    Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. arXiv:1608.05442

Dataset Stanford Background SIFT Flow ADE20K
No. of images 715 2688 25562
No. of train set 572 2488 20210
No. of val set 0 0 2000
No. of test set 143 200 3352
No. of classes 8 33 150

Samples of ADE20K

http://sceneparsing.csail.mit.edu/browse.php/?dirname=training/

ADE_train_00019523.jpg ADE_train_00019523.jpg

ADE_train_00000278.jpg ADE_train_00000278.jpg

Result

Stanford Background

Method Pixel Acc. Class Acc. averaged computing time per image
Single-scale ConvNet 66 56.5 0.35 (GPU)
Augmented CNNs 71.97 66.16 -
Superparsing 77.5 - 10 to 300
Deep 2D LSTM (window 5x5) 77.73 68.26 1.3 (CPU)
Deep 2D LSTM (window 3x3) 78.56 68.79 3.7 (CPU)
Multi-scale ConvNet 78.8 72.4 0.6 (CPU)
RCNN2 (3 instances) 80.2 69.9 10.7 (GPU)
N-ReNet 80.4 71.8 0.07 (GPU)
Multi-CNN + rCPN Fast 80.9 78.8 0.37 (GPU)
multiscale net + CRF on gPb 81.4 76.0 60.5 (CPU)
Zoom-out 82.1 77.3 -
HGDN 82.41 72.98 0.02 (GPU)
RCNN_NIPS2015 83.1 74.8 0.03 (GPU)

SIFT Flow

Method Pixel Acc. Class Acc. mean IU f.w. IU averaged computing time per image
Augmented CNNs 49.39 44.54 - - -
Deep 2D LSTM (window 5x5) 68.74 22.59 - - 1.2 (CPU)
Deep 2D LSTM (window 3x3) 70.11 20.90 - - 3.1 (CPU)
RCNN2 (3 instances) 77.7 29.8 - - -
multiscale net + cover1 72.3 50.8 - - -
multiscale net + cover2 78.5 29.6 - - -
RCNN (balanced) 79.3 57.1 - - 0.03 (GPU)
HGDN 79.68 51.26 - - 0.03 (GPU)
RCNN-large 84.3 41.0 - - 0.04 (GPU)
FCN-16s 85.2 51.7 - - 0.175 (GPUs)
VGG-conv5-DAG-RNN(8) 85.3 55.7 - - -
FCN-8s 85.9 53.9 41.2 77.2 -
patch CRF+CNN 88.1 53.4 - - -

PASCAL-Context

Method Pixel Acc. Class Acc. mean IU f.w. IU
CFM - - 18.1 -
CFM - - 34.4 -
FCN-32s 65.5 49.1 36.7 50.9
FCN-16s 66.9 51.3 38.4 52.3
FCN-8s 67.5 52.3 39.1 53.0
patch CRF+CNN 71.5 53.9 - -

Reference

Method Year Conference Reference Paper
Superparsing 2010 ECCV Superparsing: Scalable nonparametric image parsing with superpixels
Single-scale ConvNet 2013 PAMI Learning hierarchical features for scene labeling
multiscale net 2013 PAMI Learning hierarchical features for scene labeling
Augmented CNNs 2014 BMVC Contextually constrained deep networks for scene labeling
RCNN2 (3 instances) 2014 ICML Recurrent convolutional neural networks for scene labeling
Multi-CNN + rCPN Fast 2014 NIPS Recursive context propagation network for semantic scene labeling
RCNN (balanced) 2015 NIPS Convolutional Neural Networks with Intra-layer
RCNN-large 2015 NIPS Convolutional Neural Networks with Intra-layer
Deep 2D LSTM 2015 CVPR Scene Labeling with LSTM Recurrent Neural Networks
Zoom-out 2015 CVPR Feedforward semantic segmentation with zoom-out features
FCN-16s 2015 CVPR Fully convolutional networks for semantic segmentation
N-ReNet 2016 Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation
HGDN 2016 CVPR Hierarchically Gated Deep Networks for Semantic Segmentation
VGG-conv5-DAG-RNN(8) 2016 CVPR DAG-Recurrent Neural Networks For Scene Labeling
patch CRF+CNN 2016 CVPR Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation