University of Oxford Logo University of OxfordDepartment of Computer Science - Home

Wrapping Wrapper Induction

Cheng Wang

Info

Date

7th February 2012 (week , Hilary Term 2012)

Time

11:30

Place

147

Abstract

Example quality makes or breaks wrapper induction, but manual
example creation severely limits the scalability of wrapper induction.
Thus to enable web scale wrapper induction, automatic example
generation is necessary, at an accuracy and robustness well
beyond reach of current approaches.
To bridge this gap, we present AMBER, a system for fully automated
generation of human quality examples from any result page
of a given domain. AMBER annotates attributes fundamental to the
considered domain and identifies records in leveraging repeated attribute
patterns. This contrasts to previous approaches that analyze
only repeated structures in HTML code or its rendering, as
no domain knowledge is available. Our multi-domain evaluation,
covering hundreds of sites, demonstrates that AMBER achieves an
accuracy (>98%) comparable to skilled human annotators.

Further info

Related series