1. Result
  2. Dataset
    1. Large Scale Dataset
    2. Features and annotations from INRIA
  3. Reference
    1. Large Scale Dataset
    2. Some reference links

Some experiment results of image annotation according to the papers.

Result

Dataset: Corel-5K, ESP Game, IAPRTC-12

Method Year P R F1 N+ P R F1 N+ P R F1 N+
MBRM 2004 24 25 24.5 122 18 19 18.5 209 24 23 23.5 223
JEC 2008 27 32 29.3 139 22 25 23.4 224 28 29 28.5 250
Group Sparsity 2010 30 33 31.4 146 - - - - 32 29 30.4 252
CNN-R 2015 32 41.3 36.1 166 44.5 28.5 34.7 248 49 31 38.0 272
FastTag 2013 32 43 36.7 166 46 22 29.8 247 47 26 33.5 280
TagProp(σML) 2009 33 42 37.0 160 39 27 31.9 239 46 35 39.8 266
2PKNN 2012 39 40 39.5 177 51 23 31.7 245 49 32 38.7 274
GLKNN 2015 36 47 40.8 184 41 36 38.3 282 34 31 32.4 255
SVM-DMBRM 2014 36 48 41.1 197 55 25 34.4 259 56 29 38.2 283
SKL-CRM 2014 39 46 42.2 184 41 26 31.8 248 47 32 38.1 274
KCCA-2PKNN 2014 42 46 43.9 179 - - - - 59 30 39.8 259
KCCA 2015 39 53 44.9 184 30 36 32.7 252 38 39 38.5 273
2PKNN+ML 2012 44 46 45.0 191 53 27 35.8 252 54 37 43.9 278
NMF-KNN 2014 38 56 45.3 150 33 26 29.1 238 - - - -
CCA-KNN 2015 42 52 46.5 201 46 36 40.4 260 45 38 41.2 278
context-RM-B 2015 - - - - 61 24 34.4 242 61 20 30.1 234
SLED 2015 35 51 41.5 - - - - - 49.82 47.36 48.6 -
NSIDML 2016 44.12 51.76 47.76 194 49.8 29.5 37.05 253 56.9 36.5 45.21 282
AWD-IKNN 2016 42 55 47.7 198 48 34 40.2 257 50 40 44.5 282

Dataset

  • Corel 5K
    Paper: P. Duygulu, K. Barnard, J. F. de Freitas, and D. A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, 2002.

  • ESP Game
    Paper: L. Von Ahn and L. Dabbish. Labeling images with a computer game. In SIGCHI Conference on Human Factors in Computing Systems, 2004.
    http://hunch.net/~learning/ESP-ImageSet.tar.gz

  • IAPRTC-12
    http://www.imageclef.org/photodata

Large Scale Dataset

  • NUS-WIDE
    Paper: Chua T S, Tang J, Hong R, et al. NUS-WIDE: a real-world web image database from National University of Singapore[C]//Proceedings of the ACM international conference on image and video retrieval. ACM, 2009: 48.
    http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm
Dataset Corel 5K ESP Game IAPR TC-12 NUS-WIDE
No. of images 5000 20770 19627 269648 (209347 annotated)
No. of labels 260 268 291 81
Train images 4500 18689 17665 110K (not fixed)
Test images 500 2081 1962 4K (not fixed)
labels per image 3.4, 4, 5 4.7, 5, 15 5.7, 5, 23 2.4, 2
images per label 58.6, 22, 1004 326.7, 172, 4553 347.7, 153, 4999 5701.3, 1682
No. of labels < mean-freq 195 (75.0%) 201 (75.0%) 217 (74.6%)

(entry format: mean, median, maximum)

Features and annotations from INRIA

http://lear.inrialpes.fr/people/guillaumin/data.php

  • gen_annotation.m
    input: files provided in the website
    output: train_annot.txt, test_annot.txt foreach dataset folder

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    datasets = { 'corel5k', 'iaprtc12', 'espgame' };
    sets = { 'test' , 'train' };
    for db=1:length(datasets),
    ds = datasets{db};
    for s=1:length(sets),
    str = sets{s};
    list = textread([ds '/' ds '_' str '_list.txt'],'%s');
    annot = logical(vec_read([ds '/' ds '_' str '_annot.hvecs']));
    fid = fopen([ds '/' str '_list.txt'], 'w');
    for i=1:length(list),
    annotation = annot(i,:);
    fprintf(fid, '%d', annot(1));
    for j=2:length(annotation),
    fprintf(fid, '\t%d', annot(j));
    end
    fprintf(fid, '\n');
    end
    fclose(fid);
    end
    end

Reference

Method Year Conference Reference Paper
MBRM 2004 CVPR Multiple bernoulli relevance models for image and video annotation
JEC 2008 ECCV A new baseline for image annotation
TagProp(σML) 2009 ICCV Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation
Group Sparsity 2010 CVPR Automatic image annotation using group sparsity
2PKNN 2012 ECCV Image annotation using metric learning in semantic neighbourhoods
2PKNN+ML 2012 ECCV Image annotation using metric learning in semantic neighbourhoods
FastTag 2013 ICML Fast image tagging
NMF-KNN 2014 CVPR NMF-KNN: image annotation using weighted multi-view non-negative matrix factorization
SVM-DMBRM 2014 ICMR A Hybrid Model for Automatic Image Annotation
KCCA-2PKNN 2014 ICMR A cross-media model for automatic image annotation
SKL-CRM 2014 MIR A sparse kernel relevance model for automatic image annotation
context-RM-B 2015 CVPR Feature-Independent Context Estimation for Automatic Image Annotation
CNN-R 2015 ICMR Automatic Image Annotation using Deep Learning Representations
KCCA 2015 ICMR Automatic Image Annotation using Deep Learning Representations
CCA-KNN 2015 ICMR Automatic Image Annotation using Deep Learning Representations
GLKNN 2015 ICMR Graph Learning on K Nearest Neighbours for Automatic Image Annotation
SLED 2015 J. TIP SLED: Semantic Label Embedding Dictionary Representation for Multilabel Image Annotation
NSIDML 2016 J. VCIR Image distance metric learning based on neighborhood sets for automatic image annotation
AWD-IKNN 2016 PCM Automatic Image Annotation using Adaptive Weighted Distance in Improved K Nearest Neighbors Framework

Large Scale Dataset

Year Conference Reference Paper
2014 ICLR Deep Convolutional Ranking for Multilabel Image Annotation
2015 ICCV Love Thy Neighbors Image Annotation by Exploiting Image Metadata
2015 ICMR Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning
Method Link
TagProp(σML) http://lear.inrialpes.fr/people/guillaumin/code.php#tagprop
Group Sparsity http://ranger.uta.edu/~huang/codes/annotation_corel.zip
2PKNN(+ML) http://cvit.iiit.ac.in/projects/imageAnnotation/
FastTag http://www.cse.wustl.edu/~mchen/
NMF-KNN http://crcv.ucf.edu/people/phd_students/mahdi/
SKL-CRM https://github.com/sjmoran/sklcrm