Tabular data in the form of CSV files is the common input format in a data analytics pipeline. However a lack of understanding of the semantic structure and meaning of the content may hinder the data analytics process. Thus gaining this semantic understanding will be very valuable for data integration, data cleaning, data mining, machine learning and knowledge discovery tasks. For example, understanding what the data is can help assess what sorts of transformation are appropriate on the data.
Tables on the Web may also be the source of highly valuable data. The addition of semantic information to Web tables may enhance a wide range of applications, such as web search, question answering, and knowledge base (KB) construction.
Tabular data to Knowledge Graph (KG) matching is the process of assigning semantic tags from Knowledge Graphs (e.g., Wikidata or DBpedia) to the elements of the table. This task however is often difficult in practice due to metadata (e.g., table and column names) being missing, incomplete or ambiguous.
This challenge aims at benchmarking systems dealing with the tabular data to KG matching problem, so as to facilitate their comparison on the same basis and the reproducibility of the results.
The 2020 edition of this challenge will be collocated with the 19th International Semantic Web Conference and the 15th International Workshop on Ontology Matching.
We have a discussion group for the challenge where we share the latest news with the participants and we discuss issues risen during the evaluation rounds.
Please register your system using this google form.
Note that participants can join SemTab at any Round for any of the tasks.
The challenge includes the following tasks organised into several evaluation rounds:
The challenge will be run with the support of the AICrowd platform.
Round 4 submission: https://tinyurl.com/semtab2020-round4
The target Knowledge Graph will be Wikidata.
We encourage participants to submit a system paper. The paper should be no more than 10 pages long and formatted using the LNCS Style. System papers will be revised by 1-2 challenge organisers. System papers will be published as a volume of CEUR-WS. By submitting a paper, the authors accept the CEUR-WS publishing rules.
This challenge is organised by Kavitha Srinivas (IBM Research), Ernesto Jimenez-Ruiz (City, University of London; University of Oslo), Oktie Hassanzadeh (IBM Research), Jiaoyan Chen (University of Oxford) and Vasilis Efthymiou (IBM Research). If you have any problems working with the datasets or any suggestions related to this challenge, do not hesitate to contact us via the discussion group.
The challenge is currently supported by the SIRIUS Centre for Research-driven Innovation and IBM Research.