News (22/11/2020): Datasets are now available in zenodo. Back to other SemTab editions.

SemTab 2020: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching

Tabular data in the form of CSV files is the common input format in a data analytics pipeline. However a lack of understanding of the semantic structure and meaning of the content may hinder the data analytics process. Thus gaining this semantic understanding will be very valuable for data integration, data cleaning, data mining, machine learning and knowledge discovery tasks. For example, understanding what the data is can help assess what sorts of transformation are appropriate on the data.

Tables on the Web may also be the source of highly valuable data. The addition of semantic information to Web tables may enhance a wide range of applications, such as web search, question answering, and knowledge base (KB) construction.

Tabular data to Knowledge Graph (KG) matching is the process of assigning semantic tags from Knowledge Graphs (e.g., Wikidata or DBpedia) to the elements of the table. This task however is often difficult in practice due to metadata (e.g., table and column names) being missing, incomplete or ambiguous.

The SemTab challenge aims at benchmarking systems dealing with the tabular data to KG matching problem, so as to facilitate their comparison on the same basis and the reproducibility of the results.

The 2020 edition of this challenge will be collocated with the 19th International Semantic Web Conference and the 15th International Workshop on Ontology Matching.

We have a discussion group for the challenge where we share the latest news with the participants and we discuss issues risen during the evaluation rounds.


Datasets and Evaluator

The challenge datasets and ground truths are now open:

The codes of the AICrowd evaluator are also available here.

The target Knowledge Graph in SemTab 2020 is Wikidata. Wikidata Truthy Dump (April 24, 2020): DOI

Datasets per round:


Results and Challenge Prizes

Results of all four rounds available here. Summary of SemTab 2020 results.

SemTab-2020 slides presented during the ISWC conference.

Prizes sponsored by IBM Research:


System Papers

Papers published in the Vol-2775 of CEUR Workshop Proceedings.

ISWC Challenge Presentations

The results of the challenge will be presented on November 5 in two sessions. See full ISWC program here. Five participating teams will also present their systems.

Session 7A (EST: 10:20-11:20. CET: 16:20-17:20. CST: 23:20-00:20):

Session 8B (EST: 12:00-13:00. CET: 18:00-19:00. CST: 01:00-02:00):

There will also be a slot devoted to SemTab systems during the Ontology Matching workshop on November 2.


Participation: forum and registration

We have a discussion group for the challenge where we share the latest news with the participants and we discuss issues risen during the evaluation rounds.

Please register your system using this google form.

Note that participants can join SemTab at any Round for any of the tasks.


Challenge Tasks

The challenge includes the following tasks organised into several evaluation rounds:

The challenge will be run with the support of the AICrowd platform.

Round 4 submission: https://tinyurl.com/semtab2020-round4



Important Dates


System Papers

We encourage participants to submit a system paper. The paper should be no more than 10 pages long and formatted using the LNCS Style. System papers will be revised by 1-2 challenge organisers. System papers will be published as a volume of CEUR-WS. By submitting a paper, the authors accept the CEUR-WS publishing rules.


Organisation

This challenge is organised by Kavitha Srinivas (IBM Research), Ernesto Jimenez-Ruiz (City, University of London; University of Oslo), Oktie Hassanzadeh (IBM Research), Jiaoyan Chen (University of Oxford), Vasilis Efthymiou (IBM Research), and Vincenzo Cutrona (University of Milano - Bicocca). If you have any problems working with the datasets or any suggestions related to this challenge, do not hesitate to contact us via the discussion group.


Acknowledgements

The challenge is currently supported by the SIRIUS Centre for Research-driven Innovation and IBM Research.