ROSeAnn: taming online semantic annotators
Luying Chen‚ Stefano Ortona and Giorgio Orsi
Abstract
Named entity extractors are a popular means for enriching documents with semantic annotations. Both the overlap and the increasing diversity in the capabilities and in the vocabularies of the annotators motivate the need for managing and integrating semantic annotations in a coherent and uniform fashion. ROSEANN is a framework for the management and the reconciliation of semantic annotations. It provides end-users and programmers with a unified view over the results of multiple online and standalone annotators, linking them to an integrated ontology of their vocabularies, and supporting a variety of document formats such as: plain text, live Web pages, and PDF documents. Although ROSEANN provides two pre-defined algorithms for conflict resolution – one supervised, appropriate when representative training data is available, and one unsupervised – it also allows application developers to define their own integration techniques, as well as extending the pool of annotators as new ones become available.