University of Oxford Logo University of OxfordDepartment of Computer Science - Home
Linked in
Linked in
Follow us on twitter
Twitter
On Facebook
Facebook
Instagram
Instagram

Web Form Probing

Supervisor

Suitable for

Abstract

Background: The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on.

Principal goal of the MSc or Honour School project:

A relevant task in Deep Web data extraction system is the so-called Form-Probing. Roughly, given a web form where each field is annotated with some concept, probing a form involves tasks such as:
(i) validating its annotations by submitting and checking the response page, (ii) forming meaningful queries given a database of values to submit, (iii) devise smart approaches to reduce the number of possible queries (e.g. in case of fields dependent on each other, like Min and Max price).
This proposal aims at designing probing heuristics and developing  such techniques in an extensible framework (plug-in).

Skills Needed: This project requires good theoretical, analytic and software engineering skills. Knowledge of Java and web-technologies is also required.

Supervision: This project will be co-supervised by Dr. Tim Furche.