Information Retrieval: 2008-2009
Degrees | Schedule C1 — Computer Science Part C — Mathematics and Computer Science |
Term | Michaelmas Term 2008 (20 lectures) |
Overview
Information Retrieval (IR), for the purpose of this course, is the study of the indexing, processing, and querying of textual data. The growing importance of the Web means that IR has acquired added significance in recent years. The course will also look at how models of language similar to those used in IR can be applied to the problem of Machine Translation (MT), which is becoming increasingly important as more and more non-English text appears on the Web.The aim of the course is to provide an introduction to the basic principles and techniques used in IR; to demonstrate how statistical models of language can be used to solve the document retrieval problem; to consider specific IR applications such as cross-language retrieval; and to show how statistical models of language can be used to develop Machine Translation systems.
Learning outcomes
- to gain an understanding of the basic concepts and techniques in Information Retrieval;
- to understand how statistical models of text can be used to solve problems in IR, with a focus on how the vector-space model and the language model can be applied to the document retrieval problem;
- to understand how the user can be involved in the document retrieval process, through the use of relevance feedback;
- to understand how statistical models of text can be used for other IR applications, for example clustering;
- to appreciate the difficulties in carrying out document retrieval on the Web, and how the hyperlink structure can facilitate accurate retrieval;
- to appreciate the importance of data structures such as an index to allow efficeint access to the information in large bodies of text;
- to have experience of building a document retieval system, through the practical sessions, including the implementation of a relevance feedback system;
- to understand how statistical models of language can be applied to the Machine Translation problem.
Synopsis
Basics of information retrieval
- Text representation and processing
- Retrieval models (Boolean, vector space, language model)
- Indexing
- Evaluation
- Relevance feedback - real feedback, pseudo-relevance feedback
- Document and concept clustering - hierarchical clustering, k-means
- Web retrieval - Page rank, difficulties of Web retrieval
- Cross-language retrieval - queries in one language, documents in another
- Distributional and semantic similarity - automatic thesaurus construction
- Language models for MT
- Estimation from parallel texts
- Decoding (finding the most probable translation)
Reading list
Additional reading
- Course textbook:Introduction to Information Retrieval, by Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze. http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html
- Modern Information Retrieval (1999), by Ricardo Baeza-Yates and Berthier Ribeiro-Neto
- Readings in Information Retrieval (1997), edited by Karen Sparck Jones and Peter Willett
- Managing Gigabytes : Compressing and Indexing Documents and Images (1999), by Ian H. Witten, Alistair Moffat, and Timothy C. Bell.
- Information Retrieval (1979), by C. J. van Rijsbergen (online at http://www.dcs.gla.ac.uk/Keith/Preface.html)
Taking our courses
This form is not to be used by students studying for a degree in the Department of Computer Science, or for Visiting Students who are registered for Computer Science courses
Other matriculated University of Oxford students who are interested in taking this, or other, courses in the Department of Computer Science, must complete this online form by 17.00 on Friday of 0th week of term in which the course is taught. Late requests, and requests sent by email, will not be considered. All requests must be approved by the relevant Computer Science departmental committee and can only be submitted using this form.