Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

de Freitas, Nando

doi:doi:10.1007/3-540-47979-1_7

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

P. Duygulu‚ K. Barnard‚ J.F.G. Freitas and D.A. Forsyth

Abstract

We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well — for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

Book Title

European Conference on Computer Vision (ECCV)

Editor

Heyden‚ Anders and Sparr‚ Gunnar and Nielsen‚ Mads and Johansen‚ Peter

ISBN

978−3−540−43748−2

Note

Best Paper prize in Cognitive Vision

Pages

97−112

Publisher

Springer Berlin Heidelberg

Series

Lecture Notes in Computer Science

Volume

2353

Year

2002

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Abstract

Links

See Also