Scene Parsing

Scene Parsing

Problem

segment and parse an image into different image regions associated with semantic categories

mean of the pixel-wise accuracy

the ratio of pixels which are correctly predicted.
class-wise IoU

the Intersection of Union of pixels averaged over all the semantic categories.

Stanford Background

S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1–8, Sept 2009.
SIFT Flow

C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing via label transfer. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(12):2368–2382, Dec 2011.
PASCAL-Context

Mottaghi, Roozbeh, et al. "The role of context for object detection and semantic segmentation in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
ADE20K

Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. arXiv:1608.05442

Dataset	Stanford Background	SIFT Flow	ADE20K
No. of images	715	2688	25562
No. of train set	572	2488	20210
No. of val set	0	0	2000
No. of test set	143	200	3352
No. of classes	8	33	150

Method	Pixel Acc.	Class Acc.	averaged computing time per image
Single-scale ConvNet	66	56.5	0.35 (GPU)
Augmented CNNs	71.97	66.16	-
Superparsing	77.5	-	10 to 300
Deep 2D LSTM (window 5x5)	77.73	68.26	1.3 (CPU)
Deep 2D LSTM (window 3x3)	78.56	68.79	3.7 (CPU)
Multi-scale ConvNet	78.8	72.4	0.6 (CPU)
RCNN2 (3 instances)	80.2	69.9	10.7 (GPU)
N-ReNet	80.4	71.8	0.07 (GPU)
Multi-CNN + rCPN Fast	80.9	78.8	0.37 (GPU)
multiscale net + CRF on gPb	81.4	76.0	60.5 (CPU)
Zoom-out	82.1	77.3	-
HGDN	82.41	72.98	0.02 (GPU)
RCNN_NIPS2015	83.1	74.8	0.03 (GPU)

Method	Pixel Acc.	Class Acc.	mean IU	f.w. IU	averaged computing time per image
Augmented CNNs	49.39	44.54	-	-	-
Deep 2D LSTM (window 5x5)	68.74	22.59	-	-	1.2 (CPU)
Deep 2D LSTM (window 3x3)	70.11	20.90	-	-	3.1 (CPU)
RCNN2 (3 instances)	77.7	29.8	-	-	-
multiscale net + cover1	72.3	50.8	-	-	-
multiscale net + cover2	78.5	29.6	-	-	-
RCNN (balanced)	79.3	57.1	-	-	0.03 (GPU)
HGDN	79.68	51.26	-	-	0.03 (GPU)
RCNN-large	84.3	41.0	-	-	0.04 (GPU)
FCN-16s	85.2	51.7	-	-	0.175 (GPUs)
VGG-conv5-DAG-RNN(8)	85.3	55.7	-	-	-
FCN-8s	85.9	53.9	41.2	77.2	-
patch CRF+CNN	88.1	53.4	-	-	-

Method	Pixel Acc.	Class Acc.	mean IU	f.w. IU
CFM	-	-	18.1	-
CFM	-	-	34.4	-
FCN-32s	65.5	49.1	36.7	50.9
FCN-16s	66.9	51.3	38.4	52.3
FCN-8s	67.5	52.3	39.1	53.0
patch CRF+CNN	71.5	53.9	-	-

Method	Year	Conference	Reference Paper
Superparsing	2010	ECCV	Superparsing: Scalable nonparametric image parsing with superpixels
Single-scale ConvNet	2013	PAMI	Learning hierarchical features for scene labeling
multiscale net	2013	PAMI	Learning hierarchical features for scene labeling
Augmented CNNs	2014	BMVC	Contextually constrained deep networks for scene labeling
RCNN2 (3 instances)	2014	ICML	Recurrent convolutional neural networks for scene labeling
Multi-CNN + rCPN Fast	2014	NIPS	Recursive context propagation network for semantic scene labeling
RCNN (balanced)	2015	NIPS	Convolutional Neural Networks with Intra-layer
RCNN-large	2015	NIPS	Convolutional Neural Networks with Intra-layer
Deep 2D LSTM	2015	CVPR	Scene Labeling with LSTM Recurrent Neural Networks
Zoom-out	2015	CVPR	Feedforward semantic segmentation with zoom-out features
FCN-16s	2015	CVPR	Fully convolutional networks for semantic segmentation
N-ReNet	2016		Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation
HGDN	2016	CVPR	Hierarchically Gated Deep Networks for Semantic Segmentation
VGG-conv5-DAG-RNN(8)	2016	CVPR	DAG-Recurrent Neural Networks For Scene Labeling
patch CRF+CNN	2016	CVPR	Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation