P-Progol User Manual
For Version 2.7.5. Later versions have been superseeded by Aleph. See http://www.comlab.ox.ac.uk/oucl/groups/machlearn/Aleph/aleph_toc.html.
This document provides reference information on P-Progol, an implementation that contains aspects of the Progol algorithm. It is not intended to be an introduction to Progol nor to Inductive Logic Programming (ILP). Note that P-Progol has been superseeded by by Aleph. See http://www.comlab.ox.ac.uk/oucl/groups/machlearn/Aleph/aleph_toc.html.
The Progol algorithm is described in detail in S.H. Muggleton (1995), Inverse Entailment and Progol, New Gen. Comput., 13:245-286, available at ftp://ftp.cs.york.ac.uk/pub/ML_GROUP/Papers/InvEnt.ps.gz. A good introduction to the theory, implementation and applications of ILP can be found in S.H. Muggleton and L. De Raedt (1994), Inductive Logic Programming: Theory and Methods, Jnl. Logic Programming, 19,20:629--679, available at ftp://ftp.cs.york.ac.uk/pub/ML_GROUP/Papers/lpj.ps.gz.
P-Progol is intended to be a prototype for exploring ideas. It commenced in 1993 as part of a fun project undertaken by Ashwin Srinivasan and Rui Camacho at Oxford University. The main purpose was to understand ideas of inverse entailment which eventually appeared in Steve Muggleton's paper, and was accompanied by (good-natured) bun-fights about the execution speeds of programs being written by Ashwin (who was writing P-Progol) and Rui (who was writing Indlog). The earliest P-Progol implementation dates from April, 1993. There are, currently, at least 3 implementations based on the Progol algorithm: CProgol (by S. Muggleton, written in C which contains its own Prolog interpreter), Indlog (by Rui Camacho, in Prolog requiring the Yap compiler) and P-Progol (by Ashwin Srinivasan, in Prolog largely developed with Yap). The main differences in implementation (other than language) are in the search technique used and degree of user-interaction allowed. See section Related versions and programs for more details on obtaining these programs and differences between them.
During routine use, P-Progol follows a very simple procedure that can be described in 4 steps:
P-Progol is an implementation based around these 4 steps. A more advanced use of P-Progol (see section Advanced use of P-Progol) allows alteration to each of these steps.
P-Progol code is contained in a single file, usually called `progolXXX.pl' (the XXX stands for the current version number, for example progol272.pl refers to Version 2.7.2). To load P-Progol, you will need to consult this file into your Prolog compiler, with sufficient stack and heap size (the more, the better!). Here is an example of loading P-Progol into the Yap compiler, with a stack size of 5000 K bytes and heap size of 20000 K bytes:
yap -s5000 -h20000 [ Restoring file startup ] yes ?- [progol272].
P-Progol requires 3 files to construct theories. The most straightforward use of P-Progol would involve:
read_all(filestem)command. See section Read all input files.
inducecommand See section Construct a theory.
All background knowledge for P-Progol is contained in a file with
a .b extension. Background knowledge is in the form of Prolog
clauses that encode information relevant to the domain. It can also
contain any directives understood by the Prolog compiler being used
:- consult(someotherfile).). This file also contains
language and search restrictions for P-Progol. Most
basic amongst refer to modes, types and determinations
(see section Mode declarations, section Type specifications, and section Determinations).
These declare the mode of call for predicates that can appear in any clause hypothesised by P-Progol. They take the form:
RecallNumber bounds the non-determinacy of a form of predicate call,
PredicateMode specifies a legal form for calling a predicate.
RecallNumber can be either (a) a number specifying the number of
successful calls to the predicate;
or (b) * specifying that the predicate has bounded non-determinacy.
It is usually easiest to specify
RecallNumber as *.
PredicateMode is a template of the form:
Here are some examples of how they appear in a file:
:- mode(1,mem(+number,+list)). :- mode(1,dec(+integer,-integer)). :- mode(1,mult(+integer,+integer,-integer)). :- mode(1,plus(+integer,+integer,-integer)). :- mode(1,(+integer)=(#integer)). :- mode(*,has_car(+train,-car)).
Each ModeType is either (a) +T specifying that when a literal
with predicate symbol p appears in a hypothesised clause, the corresponding
argument should be an "input" variable of type T; (b) -T specifying
that the argument is an "output" variable of type T; or
(c) # specifying that it should be a constant of type T.
With these directives P-Progol ensures that for any hypothesised
clause of the form
H:- B1, B2, ..., Bc:
Types have to be specified for every argument of all predicates
to be used in constructing a hypothesis. This specification is
done within a
mode(...,...) statement (see section Mode declarations).
For P-Progol types are just names, and no type-checking
is done. Variables of different types are treated distinctly, even if
one is a sub-type of the other.
Determination statements declare the predicated that can be used to construct a hypothesis. They take the form:
The first argument is the name and arity of the target predicate, that is, the predicate that will appear in the head of hypothesised clauses. The second argument is the name and arity of a predicate that can appear in the body of such clauses. Typically there will be many determination declarations for a target predicate, corresponding to the predicates thought to be relevant in constructing hypotheses. If no determinations are present P-Progol does not construct any clauses. Determinations are only allowed for 1 target predicate on any given run of P-Progol: if multiple target determinations occur, the first one is chosen.
Here are some examples of how they appear in a file:
:- determination(eastbound/1,has_car/2). :- determination(mult/3,mult/3). :- determination(p/1,'='/2).
Positive examples of a concept to be learned with P-Progol are written in a file with a .f extension. The filestem should be the same as that used for the background knowledge. The positive examples are simply ground facts. Here are some examples of how they appear in a file:
eastbound(east1). eastbound(east2). eastbound(east3).
Code exists in Version 1.9 onwards for dealing with non-ground positive examples. However, this has never been tested rigorously.
Negative examples of a concept to be learned with P-Progol are written in a file with a .n extension. The filestem should be the same as that used for the background knowledge. The negative examples are simply ground facts. Here are some examples of how they appear in a file:
eastbound(west1). eastbound(west1). eastbound(west1).
Non-ground constraints can be a more compact way of expressing negative information.
Such constraints can be specified in the background knowledge file
(see section User-defined constraints). P-Progol is capable of learning from
positive examples only. This is done using a Bayesian evaluation function
posonly in section Evaluation functions).
Once the `filestem.b, filestem.f' and `filestem.n' files are in place, they can be read in P-Progol with the command:
This will usually produce a long list of current settings that looks like:
version=2.7.2 construct_bottom=saturation lazy_on_cost=false nodes=5000 etc.
If the flag
record is set (see section Setting P-Progol parameters)
then these settings, along with the date and machine
that is currently being used are written to the file pointed to by
recordfile (see section Setting P-Progol parameters). Note that writing the
date and machine details are written by calls to the
hostname commands. These may not be available on all platforms.
The basic command for randomly selecting examples and constructing clauses is:
When issued P-Progol does the four steps described by the basic Progol algorithm (see section The basic P-Progol algorithm). The result is usually a trace that lists clauses searched along with their positive and negative example coverage, like:
eastbound(A) :- has_car(A,B). [120/5] eastbound(A) :- has_car(A,B), short(B). [120/5] eastbound(A) :- has_car(A,B), open_car(B). [120/5] eastbound(A) :- has_car(A,B), shape(B,rectangle). [120/5]
and the final result that looks like:
[theory] [Rule 1] [Pos cover = 120 Neg cover = 0] eastbound(A) :- has_car(A,B), short(B), closed(B). [pos-neg] 
induce also reports the performance on the training data as a confusion
matrix that looks like:
[Training set performance] Actual + - + 120 0 120 Pred - 0 5 5 120 5 125 Accuracy = 100%
Performance on a test data is also reported if values for the parameters
test_neg are set (see section Setting P-Progol parameters).
induce implements a simple greedy cover-set algorithm.
P-Progol allows you to experiment with a number of
other ways of searching for answers (see section Altering the search).
The final theory constructed by P-Progol can be saved in a file
Besides automatic performance reporting, the theory constructed by P-Progol can be evaluated on examples in any data file using the command:
File is the name of the data file containing the examples.
Flag is one of
The former shows examples covered or otherwise.
Flag have to be provided.
test/4 then returns the following numbers.
Covered is the number of examples in the data file covered by current theory.
Total is the total number of examples in the data file.
Advanced use of P-Progol allows modifications to each of the steps to the basic algorithm (see section The basic P-Progol algorithm):
samplesizein section Setting P-Progol parameters). The best clause obtained from reducing each corresponding bottom clause is then added to the theory. Alternatively, no sampling need be performed, and every example can be saturated and reduced (see
induce_maxin section Altering the search).
construct_bottomin section Setting P-Progol parameters). Literals in the a bottom clause may be evaluated "lazily" (see
lazy_evaluatein section Other commands). Individual bottom clauses can be constructed and examined (see
satin section Other commands).
reducein section Other commands).
induce_maxin section Altering the search).
set/2 predicate forms the basis for setting a number of
parameter values for P-Progol. Parameters are set to values
The current value of a parameter is obtained using:
A parameter can be un-set by using:
set/2 statements for P-Progol are:
noiseis not set.
minaccis not set.
truethen clauses and coverage are cached for future use. Only clauses upto length set by
cache_clauselengthare stored in the cache.
trueand the verbosity level is at least 2 then all clauses with the same score as the current best clause are reported with the statement "exploratory clause." If verbosity level is below 2, search continues but reporting is suppressed. All internal pruning is turned off (see section Built-in and user-defined pruning).
exploreis set to
inf). Set an upper bound on the beam-width to be used in a greedy search.
truethen trace of P-Progol execution is written to a file. The filename is given by
recordis set to
truethen theorem-proving on negative examples stops once bounds set by
induce_covercommands. The best clause from the sample is added to the theory. A value of 0 turns off random sampling, and the next uncovered example in the sequence is selected.
truethen randomly generated examples are obtained after conditioning the stochastic generator with the positive examples.
bf). Sets the search strategy. See section Altering the search.
coverage). Sets the evaluation function for a search. See section Altering the search.
false). Specifies the nature of the customised refinement operator. If
falsethen standard operation results. If
userthen the user specifies a refinement operator with
autothen an automatic
refine/2definition is generated prior to search. This is useful if a bottom clause is either not constructed or is constructed lazily. No attempt is made to ensure any kind of optimality and the same clauses may result from several different refinement paths. The setting for
probabilisticis experimental and should not be used until further notice. In all cases other than a setting of
refineopis also set to
true. See section Altering the search.
saturation). Specifies the stage at which the bottom clause is constructed. If
reductionthen it is constructed lazily during the search. This is useful if the bottom clause is too large to be constructed prior to search. This also sets the flag
true. The user has to provide a refinement operator definition (using
refine/2). If not, the
refineparameter is set to
falsethen no bottom clause is constructed. The user would normally provide a refinement operator definition in this case.
restricted_sld, then examples covered are determined by forcing current hypothesised clause to be the first parent clause in a SLD resolution proof. If
sldthen this restriction is not enforced. The former strategy is efficient, but not refutation complete. It is sufficient if all that is needed is to determine how many examples are covered by the current clause, which is what is needed when P-Progol is used to construct a set of non-recursive clauses greedily (for example using the
induce/0command: see section Construct a theory).
command). Sets the stage of current execution. This is normally not user-set, and decided internally.
false). If set to
truebefore constructing a bottom clause, then variable co-references in the bottom clause are split apart by new variables. The new variables can occur at input or output positions of the head literal, and only at output positions in body literals. Equality literals between new and old variables are inserted into the bottom clause to maintain equivalence. It may also result in variable renamed versions of other literals being inserted into the bottom clause. All of this increases the search space considerably and can make the search explore redundant clauses. The current version also elects to perform variable splitting whilst constructing the bottom clause (in contrast to doing it dynamically whilst searching). This was to avoid unnecessary checks that could slow down the search when variable splitting was not required. This means the bottom clause can be extremely large, and the whole process is probably not very practical for large numbers of co-references. The procedure has not been rigourously tested to quantify this.
false). Specifies if user-defined cost-statements require clause coverages to be evaluated. This is normally not user-set, and decided internally (except in the Sicstus version See section No Auto Set of
lazy_on_costin Sicstus Progol).
false). Specifies if theorem-proving should proceed if a constraint is violated.
trueprints text associated with a literal (using the
verboseto the same value. A value of 0 shows very little.
record). For example,
set(experiment,'Run 1 with background B0').
P-Progol allows the basic search strategy to be
altered in a number of ways. Besides the
command, there are two other commands that can be used
to construct clauses. The command:
is very similar to
The only difference is that positive examples covered by a clause are
not removed prior to seeding on a new (uncovered) example. After a
induce_cover P-Progol only removes the
the examples covered by the best clause are removed from a pool of
seed examples only. After this, a new example or set of examples
is chosen from the seeds left, and the process repeats.
The theories returned by
induce_cover are dependent
on the order in which positive examples are presented. The command:
is not affected by this order, as it saturates and reduces every
example. The search is made more efficient by
remembering the coverage of the best clause obtained so far for
each example being generalised. Both
are slower than
induce, and usually produce clauses with a great
deal of overlap in coverage. A separate program will have to be
used to find some subset of these clauses that minimises this
overlap (see T-Reduce in section Related versions and programs).
Search with P-Progol is controlled by two parameters. One
sets the search strategy (
search) and the other sets the
evaluation function (
A search strategy is set using
The following search strategies are recognised by P-Progol:
nodesapplies to any 1 iteration. This is experimental.
bfsearch strategy that, starting from 1, progressively increases the upper-bound on the number of occurrences of a predicate symbol in any clause. Limit set by value for
nodesapplies to any 1 iteration. This language-based search was developed by Rui Camacho and is described in his PhD thesis.
An evaluation function is set using
The following clause evaluation functions are recognised by P-Progol:
P - N, where
Nare the positive and negative examples covered by the clause.
P - N - L + 1, where
Nare the positive and negative examples covered by the clause, and
Lthe number of literals in the clause.
laplace (laplace estimate)
Nare the positive and negative examples covered by the clause.
mis set by
mautomatically set to be the maximum likelihood estimate of the best value of
Cis the value returned by a user-defined cost function. See section User-defined cost specification.
Two sorts of pruning can be distinguished within P-Progol. Internal pruning refers to built-in pruning that performs admissible removal of clauses from a search. This is currently available for the following evaluation functions: coverage, compression, posonly, laplace, mestimate, auto_m. User-defined prune statements can be written to specify the conditions under which a user knows for certain that a clause (or its refinements) could not possibly be an acceptable hypothesis. Such clauses are pruned from the search. The "prune" definition is written in the background knowledge file (that has extension `.b'). The definition is distinguished by the fact that they are all rules of the form:
prune((ClauseHead:-ClauseBody)) :- Body.
The following example is from a pharmaceutical application that states that every extension of a clause representing a "pharmacophore" with six "pieces" is unacceptable, and that the search should be pruned at such a clause.
prune((Head:-Body)) :- violates_constraints(Body). violates_constraints(Body) :- has_pieces(Body,Pieces), violates_constraints(Body,Pieces). violates_constraints(Body,[_,_,_,_,_,_]). has_pieces(...) :-
The use of such pruning can greatly improve P-Progol's efficiency. They can be seen as a special case of providing distributional information about the hypothesis space.
The use of a user-specified cost function is a fundamental construct in statistical decision theory, and provides a general method of scoring descriptions. P-Progol allows the specification of the cost of a clause. The cost statements are written in the background knowledge file (that has extension `.b'), and are distinguished by the fact that they are all rules of the form:
ClauseLabel is the list
P is the number of positive examples covered by the
N is the number of negative examples covered by the clause
L is the number of literals in the clause.
It is usually not possible to devise automatically admissible pruning strategies for an arbitrary cost function. Thus, when using a user-defined cost measure, P-Progol places the burden of specifying a pruning strategy on the user.
P-Progol accepts integrity constraints that should not be violated by a hypothesis. These are written in the background knowledge file (that has extension `.b') and are similar to the integrity constraints in the ILP programs Clint and Claudien. The constraints are distinguished by the fact that they are all rules of the form:
Body is a set of literals that specify the condition(s) that should
not be violated by a clause found by P-Progol.
It is usual to use the
hypothesis/3 (see section Other commands) command
to obtain the clause currently being considered by P-Progol.
The following example is from a pharmaceutical
application that states that hypotheses
are unacceptable if they have fewer than three "pieces" or
which do not specify the distances between all pairs of pieces.
false:- hypothesis(Head,Body,_), has_pieces(Body,Pieces), length(Pieces,N), N =< 2. false:- hypothesis(_,Body,_), has_pieces(Body,Pieces), incomplete_distances(Body,Pieces).
The use of constraints is another way for P-Progol to obtain interesting hypothesis without negative examples. Ordinarily, this will result in a single clause that classifies every example as positive. Such clauses can be precluded by constraints. Note also that an integrity constraint does not state that a refinement of a clause that violates one or more constraints will also be unacceptable.
Constraints have a different syntax in the Sicstus version of P-Progol See section How to Express Constraints in Sicstus Progol.
allows a method of specifying the refinement operator to
be used in a search. This is done
using a Prolog definition for the predicate
The definition specifies the transitions in the
refinement graph traversed in a
The "refine" definition is written in the background knowledge
file (that has extension ".b"). The definition is distinguished
by the fact that they are all rules of the form:
This specifies that Clause1 is refined to Clause2. The definition can be nondeterministic, and the set of refinements for any one clause are obtained by repeated backtracking. For any refinement P-Progol ensures that Clause2 implies the current most specific clause. Clause2 can contain cuts ("!") in its body.
The following example is from a pharmaceutical application that states that searches for a "pharmacophore" that consists of 4 "pieces" (each piece is some functional group), and associated distances in 3-D space. Auxilliary definitions for predicates like member/2 and dist/5 are not shown. representing a "pharmacophore" with six "pieces" is unacceptable, and that the search should be pruned at such a clause.
refine(false,active(A)). refine(active(A),Clause):- member(Pred1,[hacc(A,B),hdonor(A,B),zincsite(A,B)]), member(Pred2,[hacc(A,C),hdonor(A,C),zincsite(A,C)]), member(Pred3,[hacc(A,D),hdonor(A,D),zincsite(A,D)]), member(Pred4,[hacc(A,E),hdonor(A,E),zincsite(A,E)]), Clause = (active(A):- Pred1, Pred2, dist(A,B,C,D1,E1), Pred3, dist(A,B,D,D2,E2), dist(A,C,D,D3,E3), Pred4, dist(A,B,E,D4,E4), dist(A,C,E,D5,E5), dist(A,D,E,D6,E6)).
There are a number of other useful commands in P-Progol. These are:
posterior. These show the following:
*. Mode is a mode template as in a
mode/2declaration. Declares a mode for the head of a hypothesised clause. Required when
*. Mode is a mode template as in a
mode/2declaration. Declares a mode for a literal in the body of a hypothesised clause.
modebtemplate corresponding to
determination(PSym1,PSym2)has been declared in the background file.
text(active(X),[X, 'is active']). Then the clause
active(d1)will be written as
d1 is active.
Pis the positive examples covered by the hypothesised clause,
Nis the negative examples covered by the hypothesised clause, and
Lis the number of literals in the hypothesised clause,
This is the original version of P-Progol and uses the Yap Prolog compiler. It is maintained by Ashwin Srinivasan at the University of Oxford, and can be found at http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/PProgol/pprogol.pl. If you obtain this version, please inform Ashwin Srinivasan firstname.lastname@example.org. This version is free for academic use (research and teaching). If you intend to use it for commercial purposes then please contact Ashwin Srinivasan email@example.com.
NB: Yap is available at http://www.ncc.up.pt/~vsc/Yap/. P-Progol requires Yap 4.1.15 or higher, compiled with the DEPTH_LIMIT flag set to 1 (that is, include -DDEPTH_LIMIT=1 in the compiler options).
The Yap version P-Progol was ported to use the Sicstus Prolog compiler by James Cussens, firstname.lastname@example.org, at the University of York. There are some differences as a result of underlying differences in Yap and Sicstus. This conversion refers version 2.7.1.
lazy_on_costin Sicstus Progol
Let us assume that the Sicstus version of P-Progol has been
progol.pl. Unless you intend to alter P-Progol
very frequently, it a good idea to fcompile
progol.pl to avoid
having to recompile
progol.pl each time you use it. You need to
progol.pl before fcompiling so that the operators in
Progol are recognised:
SICStus 3 #5: Fri Aug 14 18:19:52 BST 1998 | ?- [progol], fcompile(progol).
fcompiling will create a file
progol.ql. Now you can load in
load(progol). If you haven't previously fcompiled
Progol explicitly indexes examples: if the first example in the
`.f' file is
eastbound(train1) this is asserted as
example(1,pos,eastbound(train1)). It is crucial for speed that
example/3 is a compiled predicate.
Since, unlike Yap, Sicstus does not allow incremental compilation, it is
necessary for Sicstus Progol to read in the `.f' and `.n'
files, construct the
example/3 facts, (and some bookkeeping
goals) save them to a file with a `.exs' extension, and then
compile this file.
Once a `.exs' file has been created, you can fcompile it into a `.exs.ql' file. If such a file exists, Progol simply loads it, since indexing and compilation are already done.
So, if you type, say, read_all(foo). Sicstus P-Progol does the following
Remove any old `.exs' or `.exs.ql' files if you change your `.f' or `.n' files.
Constraints are expressed using
false/1 rather than
false/0 is a built-in. The argument to
false/1 will be instantiated to the current clause. Your
constraint does not have to use this instantiation, although typically
false((_Head:-Body)) :- has_literal(Body,Lit,_), functor(Lit,gender,_).
There is no depth checking in Sicstus Progol since, unlike Yap,
this is not built in to the Sicstus compiler. Setting the depth
:- set(depth,40). will simply be ignored.
Progol uses an SLP to generate a set of random examples which is used
to prevent overgeneralisation when learning from positive examples
only. In Sicstus Progol, it is necessary to declare as dynamic all type
predicates and all predicates that they call. This is so
can be used to implement the SLP. The random sample is not compiled as
in Yap Progol. Also, one can not use internal Progol predicates to
generate the random sample, after the positive examples have been read
in. (This is not something most people would want to do.)
lazy_on_costin Sicstus Progol
Yap Progol can use
clause/2 to inspect even compiled predicates,
source is used). Sicstus can't, so it can't inspect the
user cost definitions to decide is laziness is warranted. So you have to
lazy_on_cost yourself if you want it.
The Yap version of P-Progol was ported to use the SWI Prolog compiler by Stephen Moyle at the University of Oxford. Details of this are available from Stephen Moyle, email@example.com, or Ashwin Srinivasan.
A version of the Progol algorithm in the C language has been developed independently by Stephen Muggleton, and is available via ftp from the University of York. CProgol is independent of any compiler, as it contains its own interpreter for Prolog. Details of this are available from Stephen Muggleton, firstname.lastname@example.org. Some key differences to CProgol 4.2 are below:
:-, and can be non-ground. P-Progol does not require a
:-prefix, and negative examples are ground (non-ground negative information is specified as constraints.)
set(evalfn,posonly). CProgol does not allow user-specified refinement operators or cost functions. There is no equivalent of
Indlog is another Progol-like algorithm has been implemented by Rui Camacho, at the University of Porto. It uses the Yap Prolog compiler. Details of this are available from Rui Camacho, email@example.com.
T-Reduce is a companion program to P-Progol that can
be used to process the clauses found by the commands
induce_max. This finds a subset of these clauses that explain
the examples adequately, and have lesser overlap in
coverage. T-Reduce uses the Yap Prolog compiler.
Details of this are available from Ashwin Srinivasan,
get_nextbestnow records node's idb reference for efficiency
set(splitvars,true). This resulted in a number of new predicates being added to the code, predominantly under the section titled VARIABLE SPLITTING. Also introduced
flatten_lits/11that allows for equalities arising from splits to be included in bottom clause. A new predicate
split_okdoes some elementary checking for redundant literal addition during search.
abandon/0no longer supported. Users should use
integrate_termto remove check
Depth1 < Depth. These are no longer needed.
construct_callto call on a new predicate
legal_term/4. This looks nicer.
set(proof_strategy,Flag)introduced to allow full or restricted sld-resolution.
To avoid name-clashes, users must avoid using the following predicate symbols as they are used within P-Progol (Version 2.7.5). Entries in this table may change with subsequent versions.
Jump to: a - b - c - d - e - f - g - h - i - l - m - n - o - p - r - s - t - u - v - w
Jump to: b - c - d - e - g - h - i - l - m - n - p - r - s - t - v - w - y
lazy_on_costin Sicstus Progol
Jump to: a - b - c - d - e - f - g - h - i - l - m - n - o - p - q - r - s - t - u - v - w
This document was generated on 29 September 1999 using the texi2html translator version 1.52.