Skip to main content

Real understanding of real estate forms

Tim Furche‚ Georg Gottlob‚ Giovanni Grasso‚ Xiaonan Guo‚ Giorgio Orsi and Christian Schallhart

Abstract

Finding an apartment is a lengthy and tedious process. Once decided, one can never be sure not to have missed an even better offer which would have been just one click away. Form understanding is key to automatically access and process all the relevant – and nowadays readily available – data. We introduce OPAL (Ontology-based web Pattern Analysis with Logic), a novel, purely logical approach to web form understanding: OPAL labels, structures, and groups form fields according to a domain-specific ontology linked through phenomenological rules to a logical representation of a DOM. The phenomenological rules describe how ontological concepts appear on the web; the ontology formalizes and structures common patterns of web pages observed in a domain. A unique feature of OPAL is that all domain-independent assumptions about web forms are represented in rules, whereas domain-specific assumptions are represented in the ontology. This yields a coherent logical framework, robust in face of changing web trends. We apply OPAL to a significant, randomly selected sample of UK real estate sites, showing that straightforward rules suffice to achieve high precision form understanding.

Book Title
Proc. of 1st Intl Conf. on Web Intelligence‚ Mining and Semantics (WIMS)
Pages
13:1–13:12
Year
2011