Information Retrieval: 2010-2011

Lecturer

Brian Harrington

Degrees

Schedule C1 — Computer Science

Schedule C1 — Mathematics and Computer Science

Schedule C — MSc in Advanced Computer Science

Term

Hilary Term 2011 (20 lectures)

Overview

The dramatic increase in the amount of data that is available on the Web in recent years means that automatic methods of Information Retrieval (IR) have acquired greater significance. Furthermore, this data exists in multiple forms (text, image, video, etc) and it is becoming increasingly important that the techniques deployed in IR are able to perform search and retrieval operations across these distinct formats. For the purpose of this course IR is the study of the indexing, processing, and querying of both textual and image data.

The aim of the course is to provide an introduction to the basic principles and techniques used in IR; to demonstrate how statistical models of language can be used to solve the document retrieval problem; to explore a range of image processing techniques used in IR; and to show how combined models for language and image processing can enhance document retrieval.

Learning outcomes

to gain an understanding of the basic concepts and techniques in Information Retrieval;
to understand how statistical models of text can be used to solve problems in IR, with a focus on how the vector-space model and the language model can be applied to the document retrieval problem;
to understand how statistical models of text can be used for other IR applications, for example clustering;
to appreciate the importance of data structures such as an index to allow efficeint access to the information in large bodies of text;
to have experience of building a document retieval system, through the practical sessions, including the implementation of a relevance feedback system;
to gain an understanding of the basic operations of image processing that support IR;
to understand how image processing techniques for object recognition and motion detection can be used in solving the IR problem for image data;
to appreciate how combined models of language and image processing can enhance document retrieval;

Prerequisites

Prior knowledge of elementary linear algebra would be helpful but is not required for this course.

The practical portion of this course has a relatively in depth programming component. Students will build an vector space based information retrieval system from scratch using a programming language of their choice. Students should be familiar with object oriented programming, simple data structures such as hash maps, and text processing.

Synopsis

Information retrieval (Text Processing)

Text representation and processing

Retrieval models (Boolean, vector space, language model)

Indexing

Evaluation

Relevance feedback - real feedback, pseudo-relevance feedback

Document and concept clustering - hierarchical clustering, k-means

Web retrieval - Page rank, difficulties of Web retrieval

Document clustering

Information Retrieval (Image Processing)

Operations on images

Motion detection

Object recognition

Automatic image annotation and retrieval

Combined models of language and image processing

Reading list

Course Textbooks

Introduction to Information Retrieval, by Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze. http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html

Computer Vision: A modern approach (2003) by D. Forsyth and J. Ponce , ISBN 0-13-085198-1

Additional reading

Modern Information Retrieval (1999), by Ricardo Baeza-Yates and Berthier Ribeiro-Neto

Readings in Information Retrieval (1997), edited by Karen Sparck Jones and Peter Willett

Managing Gigabytes : Compressing and Indexing Documents and Images (1999), by Ian H. Witten, Alistair Moffat, and Timothy C. Bell.

Information Retrieval (1979), by C. J. van Rijsbergen (online at http://www.dcs.gla.ac.uk/Keith/Preface.html)

Machine Vision (1995) by R. Jain, R. Kasturi and B. Schunk, McGraw Hill, ISBN 0-07-032018-7

Feature extraction and image processing (2002) by M. Nixon and A. Aguado, ISBN 0-7506-5078-8

Taking our courses

This form is not to be used by students studying for a degree in the Department of Computer Science, or for Visiting Students who are registered for Computer Science courses

Other matriculated University of Oxford students who are interested in taking this, or other, courses in the Department of Computer Science, must complete this online form by 17.00 on Friday of 0th week of term in which the course is taught. Late requests, and requests sent by email, will not be considered. All requests must be approved by the relevant Computer Science departmental committee and can only be submitted using this form.