Wrapping Wrapper Induction
Cheng Wang
Info
|
Date |
7th February 2012 (week , Hilary Term 2012) |
|
Time |
11:30 |
|
Place |
147 |
Abstract
Example quality makes or breaks wrapper induction, but manual
example creation severely limits the scalability of wrapper induction.
Thus to enable web scale wrapper induction, automatic example
generation is necessary, at an accuracy and robustness well
beyond reach of current approaches.
To bridge this gap, we present AMBER, a system for fully automated
generation of human quality examples from any result page
of a given domain. AMBER annotates attributes fundamental to the
considered domain and identifies records in leveraging repeated attribute
patterns. This contrasts to previous approaches that analyze
only repeated structures in HTML code or its rendering, as
no domain knowledge is available. Our multi-domain evaluation,
covering hundreds of sites, demonstrates that AMBER achieves an
accuracy (>98%) comparable to skilled human annotators.
Further info
|
Related series |
|