RDFox Tests for AAAI 2015

ISG Tools

Incremental Update of Datalog Materialisation:
The Backward/Forward Algorithm

This is the webpage in support of the AAAI 2015 submission "Incremental Update of Datalog Materialisation: The Backward/Forward Algorithm". We have extended RDFox with a variant of the well known DRed algorithm and with the Backward/Forward Algorithm (B/F) presented in our submission. RDFox is an open-source, RAM-based datalog system developed in our group. A Linux version of our extension is downloadable as RDFox-linux.zip usable under Oxford's licence agreement.

Test Cases and data Sets

In our paper, we conducted our evaluation on four different test setups, the first two involving the benchmark data sets LUBM and UOBM an the latter two involving the cultural database with two different datalog programs, Claros-L and Claros-LE. For explanations of the directory structure please have a look at further down.

Our Test Results

We provide the results of our tests as separate pdf documents and csv files:

LUBM-L (pdf, csv), and UOBM-U (pdf, csv), Claros-L (pdf, csv), Claros-LE (pdf, csv),

Reproducing Test Results

On each test setup 5 different tests were run where we measured the time, the number of derivations and the number of checked triples and deleted triples, respectively for the three different approaches:

Algorithm	required Datasets
rematerialisation	(E\E^–)
incremental deletion by the DRed algorithm	E,E^–
incremental deletion by the B/F algorithm.	E,E^–

where E is the original data set and E^– is a subset of E that is to be incrementally deleted. Note that derivations and number of triples need to be measured separately from the times as the statistical subroutines introduce an asymmetric overhead.

In order to obtain E^– and E\E^– we provide the Java utility FileSplit which is used as follows

java <utility> -cp uk/ac/ox/cs/util/FileSplit -s=27012015 -l=<k> -f=<E> -o1=<E^–> -o2=<(E\E^–)> -n=<n>

where

<k> the number of lines to be extracted
<E> the .ttl file from which the lines are to be extracted. File must be present.
<E^–> the facts that need to be deleted. File will be created.
<(E\E^–)> contains all lines from <E> except those lines contained in E^–. File will be created.
<n> number of lines contained in <E>

We fixed the parameter -s as given above for our tests. It is the seed used in the random number generator to obtain reproducible results, when extracting randomly lines from <E>. The table below shows the parameter <k> used for the different test cases and the parameter <n> which only depends on the original data file <E>. Note that the data files called Claros.ttl for Claros-L and Claros-LE are the same.

Test name	25%	50%	75%	100%	-n
LUBM	2500000	5000000	7500000	10000000	133573856
UOBM	17000000	34000000	51000000	68000000	254753783
Claros-L	575000	1150000	1725000	2300000	18793277
Claros-LE	25000	50000	75000	100000	18793277

Setting up the Test Environment

To reproduce tests, please extract the archive and use the executable CppRDFox under ../RDFox/lib/

RDFox requires for each test case a specific directory structure, where the sub-directory

facts contains the original data set with extension .ttl
dlog contains the ruleset with extension .dlog
scripts contains the start-up scripts for the RDFox shell.

Create the directories dlog, facts and scripts.
Copy the .dlog file into dlog and the .ttl file into facts.
Use FileSplit.class and create the files <E^–> and <(E\E^–)> in the facts sub-directory.
Download one of the following script files into the scripts sub-directory and adapt to your needs.
Start RDFox using the following command from the Linux shell (command line)
<RDFox> -shell <absScript>
where <RDFox> is a suitable path to the executable CppRDFox and <absScript> is the absolute filename of the script you are starting.

Evaluating the Statistical Output

Times need to be evaluated without statistics! They can then be found in case of

remat.rdfox as
Materializing rules on 1 thread. Materialization time: 66.058 s.
dred.rdfox or bf.rdfox as
Materializing rules incrementally on 1 thread and the backward/forward algorithm. Materialization time: 0.006 s.
and "…the DRed algorithm." for the DRed algorithm respectively.

Number of Derivations and Number of Checked Facts and Deleted Facts can be found only when statistics are turned on. In the case of

remat.rdfox the number of derivations can be found in the line
"… : Derivations"

dred.rdfox the number of Deleted Facts can be found the second line of


		---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
		    3979494 : Non-merged IDB tuples extracted from proved list

the number of Backward Derivatons (Bwd) can be found in


		---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
				⋮
		     590286 : Non-delayed derivations

the number of Forward Derivations (Fwd) is the sum of the following numbers


		---------------- BACKWARD/FORWARD DELETION: deletion propagation ----------------
				⋮
		   59875463 : Derivations


		---------------- TUPLE INSERTION ----------------
				⋮
		   29301013 : Matched rule instances

bf.rdfox the number of Checked Facts can be found in


		---------------- BACKWARD/FORWARD DELETION ----------------
				⋮
		    1462837 :     Tuples added to the checked list

the number of Backward Derivations can be found in


		---------------- BACKWARD/FORWARD DELETION ----------------
				⋮
		   22072685 : Rule instances checked during backward reasoning

the number of Forward Derivations is the sum of the following numbers


		---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
				⋮
		    7112373 :     Matched rule instances during tuple extraction


		---------------- BACKWARD/FORWARD DELETION: deletion propagation ----------------
				⋮
		   30574450 : Derivations

Researchers

Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks

Key Publications

Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks and Dan Olteanu Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. in AAAI 2014.

Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks Handling of owl:sameAs via Rewriting. in AAAI 2015.

Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, Zhe Wu, and Jay Banerjee RDFox: A Highly-Scalable RDF Store in ISWC 2015.

Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks Combining Rewriting and Incremental Materialisation Maintenance for Datalog Programs with Equality in IJCAI 2015.

Data and Knowledge Group

― Knowledge Representation and Reasoning