Ontology Alignment Evaluation Initiative - OAEI-2011.5 Campaign

Large BioMed Track

OAEI 2011.5::Large Biomedical Ontologies Track

NEWS:

General description

This track consists of finding alignments between the Foundational Model of Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI). These ontologies are semantically rich and contain tens of thousands of classes. Note that for the OAEI 2011.5 only the case FMA-NCI will be evaluated.

UMLS Metathesaurus has been selected as the basis for the track reference alignments. UMLS is currently the most comprehensive effort for integrating independently-developed medical thesauri and ontologies, including FMA, SNOMED CT, and NCI. The integration of new UMLS sources combines automatic techniques, expert assessment, and auditing protocols.

Reference alignments

Although the standard UMLS distribution does not directly provide sets of "mappings" (in the OAEI sense) between the integrated ontologies, it is relatively straightforward to extract mapping sets from the information provided in the distribution files (see UMLS-based reference alignments for details).

It has been noticed, however, that although these mappings have been manually curated by domain experts, they lead to a significant number of logical inconsistencies when integrated with the corresponding source ontologies. In this track we will compare the results of the matching tools both against the original UMLS mapping set and a refined/repaired set that do not lead to such inconsistencies (see UMLS-based reference alignments for details).

The reference alignments between FMA and NCI can be downloaded from the following SEALS repository links:

Data sets

Complete datasets can be downloaded as a zip file

FMA-NCI matching problem

We have split the FMA-NCI matching problem in three subtasks: (1) considering the small (overlapping) modules of FMA and NCI, (2) considering the extended (overlapping) modules and, (3) considering the whole ontologies as input. The reference alignments will be the same for the three cases, however the complexity will be different, in terms of both performance and scalability, since larger ontologies will also involve more possible candidate mappings.

Note that ontologies have been normalised for the OAEI, as a result the synonyms of concept names are provided as "rdfs:label" annotations.

FMA-NCI small overlapping

This dataset consists of two fragments/modules of FMA and NCI, which represent their respective overlappings. The FMA module contains 3,696 concepts (5% of FMA), while the NCI module contains 6,488 concepts (10% of NCI). The small modules can be downloaded from the following SEALS repository links:

FMA-NCI extended overlapping

This dataset consists of two fragments/modules of FMA and NCI, which represent their respective extended overlapping. The FMA (extended) module contains 28,861 concepts (37% of FMA), while the NCI (extended) module contains 25,591 concepts (38% of NCI). The extended modules can be downloaded from the following SEALS repository links:

FMA-NCI whole ontologies

This dataset consists of the whole FMA and NCI ontologies, which consist of 78,989 and 66,724 concepts, respectively. Note that, if you are using the OWL API, the following parameter "-DentityExpansionLimit=100000000" should be given to the JVM. The ontologies can be downloaded from the following SEALS repository links:

Modalities

This track has two main objectives. On the one hand, it intends to evaluate the performance of matching tools when matching real large scale ontologies. On the other hand, it aims at creating an error-free "silver standard" reference alignment by "harmonising" the output of different matching and/or debugging tools, together with the current UMLS mapping sets.

Regarding the use of background knowledge, the OAEI rules state that a resource (i.e. a third biomedical ontology) especially designed for the test is not allowed. Particularly, matching tools using UMLS as background knowledge will have an advantage since the reference alignment is also based on UMLS. Nevertheless, it will be interesting to evaluate the performance of a tool with and without specialised background knowledge. Moreover, matching tools using UMLS may be specially helpful in the creation of the proposed "silver standard" reference alignment.

Task 1: standard matching

For this task the generated alignment should be an optimal solution to the matching problem with respect to both recall and precision. In the evaluation we will focus on the f-value. Furthermore, we also motivate the creation of an error-free output, that is, the extracted mappings together with the ontologies should not lead to (many) unsatisfiabilities.

Task 2: mapping debugging (optional)

Mapping debugging tools are also welcome to provide a revised version of the original UMLS mappings, similar to the current provided refinement.

We aim at harmonising different revised subsets of the UMLS mappings together with the outputs of the participants from Task 1 in order to create an error-free "silver standard" reference alignment. Participant outputs will also be compared against the silver standard in order to analyse how different are w.r.t. the other tools.

Support by SEALS

The evaluation of Task 1 will be run with support of SEALS. This requires that you wrap your matching tool in a way that allows us to execute it on the SEALS platform.

Task 2 will be optional and will be run in an 'off-line' way.

Schedule

The schedule given at http://oaei.ontologymatching.org/2011.5/index.html#schedule is also binding for this track.

Acknowledgements

We would like to thank the participants and organisers of the OAEI campaigns, especially Christian Meilicke for his help in the setting up of this track.

Contact

This track is organized by Ernesto Jimenez Ruiz, Bernardo Cuenca Grau and Ian Horrocks, and supported by the SEALS and LogMap projects. If you have any problems working with the ontologies or any suggestions related to this track, feel free to write an email to ernesto [at] cs [.] ox [.] ac [.] uk