PDF Analysis and Annotation
|
Supervisor |
|
|
Suitable for |
Honour School of Computer Science, Part C
|
Abstract
Background: The work will be done in the context of the large ERC project DIADEM: Domain-centric Intelligent Automated Data Extraction Methodology whose goal is to automate web data extraction in specific application domains such as real estate, restaurants, and so on.
Principal goal of the MSc or Honour School project: Automated analysis of PDF layouts is a central web data extraction and search. Thbis project aims at integrating state of the art technologies for PDF analysis and semantic annotations of Web documents.
Skills Needed: This project requires good theoretical, analytic and software engineering skills. Also, good knowledge of Java is essential.
Supervision: This project will be co-supervised by Dr. Giorgio Orsi.
