Skip to main content

Processing Geographic Language

Inderjeet Mani ( Brandeis Univeristy )
Humans are able to communicate geographic information in a highly concise but vague manner, posing interesting challenges for natural language understanding. In recent years, information extraction systems have been developed to ground geographical references in text in terms of geo-coordinates, with the tags produced by such systems being used by geographical search engines and mapping tools. However, without a standard for how different types of geographical entities such as towns, roads, directions, etc. should be tagged, such systems are impossible to reliably evaluate. In this talk, I will describe an annotation scheme called SpatialML, that has been used to accurately mark up places, their geo-coordinates, and spatial relationships in a variety of text corpora. SpatialML represents spatial relationships among geographical regions in terms of the Region Connection Calculus (RCC), and it has also been mapped to the Generalized Upper Model (GUM) ontology from the University of Bremen. SpatialML is also being used in the Cross-Language Evaluation Forum (CLEF) to assess tools to analyze geographical queries posed to search engines, and it is currently being integrated with a time markup standard (TimeML). Despite these positive trends, I will argue that a far more concerted research effort is required to address the fundamental challenges of geographic language.

 

 

Share this: