Ontology Alignment Evaluation Initiative - OAEI-2012 Campaign

Results OAEI 2012::Large BioMed Track

Results OAEI 2012 FMA-SNOMED matching problem

As it is depicted in the following tables, the FMA-SNOMED matching problem was harder than the FMA-NCI problem both in size and in complexity. Thus, matching systems required more time to complete the task and provided, in general, worse results in terms of F-measure. Furthermore, MaasMatch, Wmatch and AUTOMSv2, which were able to complete the small FMA-NCI task, failed to complete the small FMA-SNOMED task in less than 24 hours.

FMA-SNOMED small fragments

Six systems provided an on average an F-measure greater than 0.75. However, the other 6 systems that completed the task (including our baseline) failed to provide a recall higher than 0.4. GOMMA-bk provided the best results in terms of both recall and F-measure, while the baseline LogMapLt provided the best precision closely followed by ServOMapL. GOMMA-bk is a bit ahead with respect the other systems since managed to provide a mapping set with very high recall. The use of background knowledge was key in this matching task.

As in the FMA-NCI matching problem, precision tend to increase when comparing against the original UMLS mapping set, while recall decreases.

The runtimes were also very positive in general and 8 systems completed the task in less than 6 minutes. MapSSS required almost 1 hour, while Hertuda, HotMatch and AROMA needed 5, 9 and 14 hours to complete the task, respectively.

LogMap, unlike LogMap-noe, failed to detect and repair two unsatisfiable classes since they were outside the computed ontology fragments (overlapping). The rest of the systems, even when providing highly precise mappings like ServOMapL, generated mapping sets with a high incoherence degree.


System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
GOMMA_Bk 148 8,598 0.958 0.914 0.935 0.860 0.912 0.885 0.862 0.912 0.886 0.893 0.913 0.903 13,685 58.06% 4,674
ServOMapL 39 6,346 0.985 0.694 0.814 0.884 0.691 0.776 0.892 0.696 0.782 0.920 0.694 0.791 10,584 44.91% 3,056
YAM++ 326 6,421 0.972 0.693 0.809 0.870 0.688 0.769 0.879 0.694 0.776 0.907 0.692 0.785 14,534 61.67% 3,150
LogMap-noe 63 6,363 0.964 0.681 0.799 0.877 0.688 0.771 0.889 0.696 0.781 0.910 0.688 0.784 0 0% 0
LogMap 65 6,164 0.965 0.660 0.784 0.876 0.666 0.756 0.889 0.674 0.767 0.910 0.667 0.769 2 0.01% 2
ServOMap 46 6,008 0.985 0.657 0.788 0.880 0.652 0.749 0.888 0.656 0.755 0.918 0.655 0.764 8,165 34.64% 2,721
GOMMA 54 3,667 0.926 0.377 0.536 0.834 0.377 0.520 0.865 0.390 0.538 0.875 0.381 0.531 2,058 8.73% 206
MapSSS 3,129 3,458 0.798 0.306 0.442 0.719 0.307 0.430 0.737 0.313 0.440 0.751 0.309 0.438 9,084 38.54% 389
AROMA 51,191 5,227 0.555 0.322 0.407 0.507 0.327 0.397 0.519 0.333 0.406 0.527 0.327 0.404 21,083 89.45% 2,296
HotMatch 31,718 2,139 0.875 0.208 0.336 0.812 0.214 0.339 0.842 0.222 0.351 0.843 0.214 0.342 907 3.85% 104
LogMapLt 14 1,645 0.975 0.178 0.301 0.902 0.183 0.304 0.936 0.189 0.315 0.938 0.183 0.307 773 3.28% 21
Hertuda 17,625 3,051 0.578 0.196 0.292 0.533 0.201 0.292 0.555 0.208 0.303 0.555 0.201 0.296 1,020 4.33% 47


FMA-SNOMED big fragments

MapSSS, HotMatch and Hertuda failed to complete the task involving the big fragments of FMA and SNOMED after more than 24 hours of execution.

ServOMapL provided the best results in terms of F-measure and precision, whereas GOMMA-bk got the best recall. As in the FMA-NCI matching task involving big fragments, the F-measures suffered, in general, a decrease with respect to the small matching task. The most important variations were suffered by GOMMA-bk and GOMMA where their average precision decreased from 0.893 and 0.875 to 0.571 and 0.389, respectively. This is an interesting fact, since the background knowledge used by GOMMA-bk could not avoid the decrease in precision while keeping a high recall. Furthermore, runtimes were from 4 to 10 times higher for all the systems, with the exception of AROMA's runtime that increased from 14 to 17 hours.

LogMap (with its two variants) generated a clean output where the mappings together with the input ontologies did not lead to any unsatisfiable class.


System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
ServOMapL 234 6,563 0.945 0.689 0.797 0.847 0.686 0.758 0.857 0.692 0.766 0.883 0.689 0.774 55,970 32.36% 1,192
ServOMap 315 6,272 0.941 0.655 0.773 0.841 0.650 0.734 0.849 0.655 0.740 0.877 0.654 0.749 143,316 82.85% 1,320
YAM++ 3,780 7,003 0.879 0.684 0.769 0.787 0.679 0.729 0.797 0.686 0.737 0.821 0.683 0.746 69,345 40.09% 1,360
LogMap-noe 521 6,450 0.886 0.635 0.740 0.805 0.640 0.713 0.821 0.651 0.726 0.837 0.642 0.727 0 0% 0
LogMap 484 6,292 0.883 0.617 0.726 0.800 0.621 0.699 0.815 0.631 0.711 0.833 0.623 0.712 0 0% 0
GOMMA_Bk 636 12,614 0.613 0.858 0.715 0.548 0.852 0.667 0.551 0.855 0.670 0.571 0.855 0.684 75,910 43.88% 3,344
GOMMA 437 5,591 0.412 0.256 0.316 0.370 0.255 0.302 0.386 0.265 0.314 0.389 0.259 0.311 7,343 4.25% 480
AROMA 62,801 2,497 0.684 0.190 0.297 0.638 0.197 0.300 0.660 0.203 0.310 0.661 0.196 0.303 54,459 31.48% 271
LogMapLt 96 1,819 0.882 0.178 0.296 0.816 0.183 0.299 0.846 0.189 0.309 0.848 0.183 0.302 2,994 1.73% 24
MapSSS - - - - - - - - - - - - - - - - -
HotMatch - - - - - - - - - - - - - - - - -
Hertuda - - - - - - - - - - - - - - - - -


FMA-SNOMED whole ontologies

AROMA failed to complete the matching task involving the whole FMA and SNOMED ontologies in less than 24 hours.

The results in terms of both precision and recall did not suffer important changes and, as in the previous task, ServOMapL provided the best results in terms of F-measure and precision while GOMMA-bk got the best recall.

Runtimes for ServOMap, ServOMapL LogMapLt and LogMap (with its two variations) were in line with the previous matching task; the computation times for GOMMA, GOMMA-bk and YAM++, however, suffered and important increase. GOMMA (with its two variations) required more than 30 minutes, while YAM++ required more than 6 hours.

LogMap and LogMap-noe mappings, as in previous tasks, had a very low incoherence degree.


System Time (s) # Mappings Original UMLS Refined UMLS (LogMap) Refined UMLS (Alcomo) Average Incoherence Analysis
Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure Precision  Recall  F-measure All Unsat. Degree Root Unsat.
ServOMapL 517 6,605 0.939 0.688 0.794 0.842 0.686 0.756 0.851 0.691 0.763 0.877 0.688 0.772 99,726 25.86% 2,862
ServOMap 532 6,320 0.933 0.655 0.770 0.835 0.650 0.731 0.842 0.655 0.737 0.870 0.653 0.746 273,242 70.87% 2,617
YAM++ 23,900 7,044 0.872 0.682 0.765 0.780 0.678 0.725 0.791 0.685 0.734 0.814 0.681 0.742 106,107 27.52% 3,393
LogMap 612 6,312 0.877 0.615 0.723 0.795 0.619 0.696 0.811 0.629 0.708 0.828 0.621 0.710 10 0.003% 0
LogMap-noe 791 6,406 0.866 0.616 0.720 0.782 0.617 0.690 0.801 0.631 0.706 0.816 0.621 0.706 10 0.003% 0
GOMMA_Bk 1,893 12,829 0.602 0.858 0.708 0.538 0.852 0.660 0.542 0.855 0.663 0.561 0.855 0.677 119,657 31.03% 5,289
LogMapLt 171 1,823 0.880 0.178 0.296 0.814 0.183 0.299 0.844 0.189 0.309 0.846 0.183 0.301 4,938 1.28% 37
GOMMA 1,994 5,823 0.370 0.239 0.291 0.332 0.239 0.278 0.347 0.248 0.289 0.350 0.242 0.286 10,752 2.79% 609
AROMA - - - - - - - - - - - - - - - - -
MapSSS - - - - - - - - - - - - - - - - -
HotMatch - - - - - - - - - - - - - - - - -
Hertuda - - - - - - - - - - - - - - - - -