Data and Knowledge Group

― Knowledge Representation and Reasoning

Incremental Update of Datalog Materialisation:
The Backward/Forward Algorithm

This is the webpage in support of the AAAI 2015 submission "Incremental Update of Datalog Materialisation: The Backward/Forward Algorithm". We have extended RDFox with a variant of the well known DRed algorithm and with the Backward/Forward Algorithm (B/F) presented in our submission. RDFox is an open-source, RAM-based datalog system developed in our group. A Linux version of our extension is downloadable as RDFox-linux.zip usable under Oxford's licence agreement.

Test Cases and data Sets

In our paper, we conducted our evaluation on four different test setups, the first two involving the benchmark data sets LUBM and UOBM an the latter two involving the cultural database with two different datalog programs, Claros-L and Claros-LE. For explanations of the directory structure please have a look at further down.

Our Test Results

We provide the results of our tests as separate pdf documents and csv files:

LUBM-L (pdf, csv), and UOBM-U (pdf, csv), Claros-L (pdf, csv), Claros-LE (pdf, csv),

Reproducing Test Results

On each test setup 5 different tests were run where we measured the time, the number of derivations and the number of checked triples and deleted triples, respectively for the three different approaches:
Algorithmrequired Datasets
rematerialisation (E\E)
incremental deletion by the DRed algorithmE,E
incremental deletion by the B/F algorithm.E,E
where E is the original data set and E is a subset of E that is to be incrementally deleted. Note that derivations and number of triples need to be measured separately from the times as the statistical subroutines introduce an asymmetric overhead.

In order to obtain E and E\E we provide the Java utility FileSplit which is used as follows
java <utility> -cp uk/ac/ox/cs/util/FileSplit -s=27012015 -l=<k> -f=<E> -o1=<E> -o2=<(E\E)> -n=<n>
where
  • <k> the number of lines to be extracted
  • <E> the .ttl file from which the lines are to be extracted. File must be present.
  • <E> the facts that need to be deleted. File will be created.
  • <(E\E)> contains all lines from <E> except those lines contained in E. File will be created.
  • <n> number of lines contained in <E>
We fixed the parameter -s as given above for our tests. It is the seed used in the random number generator to obtain reproducible results, when extracting randomly lines from <E>. The table below shows the parameter <k> used for the different test cases and the parameter <n> which only depends on the original data file <E>. Note that the data files called Claros.ttl for Claros-L and Claros-LE are the same.

Test name25%50%75%100%-n
LUBM25000005000000750000010000000133573856
UOBM17000000340000005100000068000000254753783
Claros-L57500011500001725000230000018793277
Claros-LE25000500007500010000018793277

Setting up the Test Environment

To reproduce tests, please extract the archive and use the executable CppRDFox under ../RDFox/lib/

RDFox requires for each test case a specific directory structure, where the sub-directory

  • facts contains the original data set with extension .ttl
  • dlog contains the ruleset with extension .dlog
  • scripts contains the start-up scripts for the RDFox shell.

  1. Create the directories dlog, facts and scripts.
  2. Copy the .dlog file into dlog and the .ttl file into facts.
  3. Use FileSplit.class and create the files <E> and <(E\E)> in the facts sub-directory.
  4. Download one of the following script files into the scripts sub-directory and adapt to your needs.
  5. Start RDFox using the following command from the Linux shell (command line)
    <RDFox> -shell <absScript>
    where <RDFox> is a suitable path to the executable CppRDFox and <absScript> is the absolute filename of the script you are starting.

Evaluating the Statistical Output

Times need to be evaluated without statistics! They can then be found in case of
  • remat.rdfox as
    Materializing rules on 1 thread.
    Materialization time: 66.058 s.
  • dred.rdfox or bf.rdfox as
    Materializing rules incrementally on 1 thread and the backward/forward algorithm. Materialization time: 0.006 s.
    and "…the DRed algorithm." for the DRed algorithm respectively.
Number of Derivations and Number of Checked Facts and Deleted Facts can be found only when statistics are turned on. In the case of
  • remat.rdfox the number of derivations can be found in the line
    "… : Derivations"
  • dred.rdfox the number of Deleted Facts can be found the second line of
    ---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
        3979494 : Non-merged IDB tuples extracted from proved list
    the number of Backward Derivatons (Bwd) can be found in
    ---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
         590286 : Non-delayed derivations
    the number of Forward Derivations (Fwd) is the sum of the following numbers
    ---------------- BACKWARD/FORWARD DELETION: deletion propagation ----------------
       59875463 : Derivations
    ---------------- TUPLE INSERTION ----------------
       29301013 : Matched rule instances
  • bf.rdfox the number of Checked Facts can be found in
    ---------------- BACKWARD/FORWARD DELETION ----------------
        1462837 :     Tuples added to the checked list
    the number of Backward Derivations can be found in
    ---------------- BACKWARD/FORWARD DELETION ----------------
       22072685 : Rule instances checked during backward reasoning
    the number of Forward Derivations is the sum of the following numbers
    ---------------- BACKWARD/FORWARD DELETION: forward reasoning ----------------
        7112373 :     Matched rule instances during tuple extraction
    ---------------- BACKWARD/FORWARD DELETION: deletion propagation ----------------
       30574450 : Derivations

Researchers

Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks

Key Publications