The Predictive Toxicology Evaluation Challenge

Can an AI program participate in scientific discovery?



Prevention of environmentally-induced cancers is a health issue of unquestionable importance, and requires an understanding of the mechanisms of chemical carcinogenesis. Vital to this are the rodent carcinogenicity tests conducted within the US National Toxicology Program by the National Institute of Environmental Health Sciences (NIEHS). This has resulted in a large database of compounds classified as carcinogens or otherwise. The Predictive-Toxicology Evaluation project of the NIEHS provides the opportunity to compare carcinogenicity predictions on previously untested chemicals. This has resulted in two blind trials: PTE-1 (now complete) and PTE-2 (ongoing). Predicting the carcinogenic activity of compounds in these trials presents a formidable challenge for programs concerned with knowledge discovery. Desirable features of this problem are:

The Predictive Toxicology Evaluation Challenge has been devised by us to provide Machine Learning programs an opportunity to participate in carcinogensis prediction. A Prolog representation of the data for the carcinogenesis problem is available. This site provides access to the following.


Challenge Rules and Conditions

  1. A consortium can submit upto 10 entries. Each entry will be given a unique identifier by us. Submissions received after August 29, 1997 will be entered into the challenge.
  2. Entries submitted on, or before November 15, 1998 will be evaluated for chemical relevance by Doug Bristol (US National Institute of Environmental Health Sciences). (New)
  3. A consortium must be willing to provide the URL of a short description of their entry. This description should use the template provided.
  4. We intend to submit results obtained to date to IJCAI-99. Due acknowledgements will be made to entries that participated.
  5. Classification is into one of 2 classes: carcinogenic (+) or non-carcinogenic (-). Two test-sets are available: PTE-1 and PTE-2. Training-sets used for constructing theories must not include compounds in the test-set chosen for prediction. The compounds in a test set must not be used to select amongst theories constructed with a training set.
  6. Theories will be evaluated along scales of accuracy and explantory power. Accuracy of a theory is defined in the usual manner ie. (Tp+Tn)/Total where Tp,Tn are the True Positives and True Negatives predicted on the test-set and Total is the total number of compounds in the test-set. Explanatory power of a theory is initially a boolean property that is true if some or all of the theory can be drawn as chemical substructures. This will later be amended by us to incorporate the evaluation of the expert chemist.
  7. We reserve the right to amend any errors that may be brought to light either in these rules and conditions, or in any other pages comprising this site.

Predicting PTE-1

You can enter predictions for 39 compounds in the PTE-1 data by pressing the appropriate button provided in the table below. CAS Id refers to the compound identifier in the NTP database. Prolog Id refers to the compound identifier within our carcinogenisis prediction experiments.

Consortium name:

Contact E-mail address:

URL for description:

PTE-1 predictions

Example: if your prediction is + (ie. carcinogenic) then the entry in the Prediction column should look like: + -

CAS Id

Prolog Id

Compound Name

Prediction

91-20-3

d296

NAPHTHALENE

+ -

9005-65-6

d297

POLYSORBATE 80 (GLYCOL)

+ -

58-33-3

d298

PROMETHAZINE HYDROCHLORIDE

+ -

108-46-3

d299

RESORCINOL

+ -

96-48-0

d300

GAMMA-BUTYROLACTONE

+ -

79-11-8

d302

MONOCHLOROACETIC ACID

+ -

100-02-7

d303

P-NITROPHENOL

+ -

1330-78-5

d304

TRICRESYL PHOSPHATE

+ -

120-32-1

d305

O-BENZYL-P-CHLOROPHENOL

+ -

3296-90-0

d306

2,2-BIS(BROMOMETHYL)-1,3-PROPANEDIOL

+ -

75-65-0

d307

TERT-BUTYL ALCOHOL

+ -

119-84-6

d308

3,4-DIHYDROCOUMARIN

+ -

107-21-1

d309

ETHYLENE GLYCOL

+ -

298-59-9

d311

METHYLPHENIDATE HYDROCHLORIDE

+ -

58-55-9

d312

THEOPHYLLINE

+ -

96-69-5

d313

4,4-THIOBIS(6-TERT-BUTYL-M-CRESOL)

+ -

396-01-0

d314

TRIAMTERENE

+ -

57-41-0

d315

5,5-DIPHENYLHYDANTOIN

+ -

1825-21-4

d316

PENTACHLOROANISOLE

+ -

10599-90-3

d317

CHLORAMINE

+ -

81-11-8

d318

4,4'-DIAMINO-2,2'-STILBENEDISULFONIC ACID

+ -

74-83-9

d319

METHYL BROMIDE

+ -

62-23-7

d320

P-NITROBENZOIC ACID

+ -

28407-37-6

d322

C.I. DIRECT BLUE 218

+ -

2425-85-6

d323

C.I. PIGMENT RED 3

+ -

6471-49-4

d324

C.I. PIGMENT RED 23

+ -

137-09-7

d325

2,4-DIAMINOPHENOL

+ -

103-90-2

d326

ACETAMINOPHEN (4-HYDROXYACETANILIDE)

+ -

599-79-1

d327

SALICYLAZOSULFAPYRIDINE

+ -

1271-19-8

d328

TITANOCENE DICHLORIDE

+ -

6459-94-5

d329

C.I. ACID RED 114

+ -

2429-74-5

d330

C.I. DIRECT BLUE 15

+ -

91-64-5

d331

COUMARIN

+ -

96-13-9

d332

2,3-DIBROMO-1-PROPANOL

+ -

119-93-7

d333

3,3'-DIMETHYLBENZIDINE

+ -

59820-43-8

d334

HC YELLOW 4

+ -

100-01-6

d335

P-NITROANILINE

+ -

91-23-6

d336

O-NITROANISOLE

+ -

96-18-4

d337

1,2,3-TRICHLOROPROPANE

+ -

Explanatory Power: yes no

Summary of results so far: PTE-1


Predicting PTE-2

You can enter predictions for 30 compounds in the PTE-2 data by pressing the appropriate button provided in the table below. CAS Id refers to the compound identifier in the NTP database. Prolog Id refers to the compound identifier within our carcinogenisis prediction experiments.

Consortium name:

Contact E-mail address:

URL for description:

PTE-2 predictions

Example: if your prediction is + (ie. carcinogenic) then the entry in the Prediction column should look like: + -

CAS Id

Prolog Id

Compound Name

Prediction

6533-68-2

t1

SCOPOLAMINE HYDROBROAMIDE

+ -

76-57-3

t2

CODEINE

+ -

147-47-7

t3

1,2-DIHYDRO-2,2,4-TRIMETHYQUINOLINE

+ -

75-52-8

t4

NITROMETHANE

+ -

109-99-9

t5

TETRAHYDROFURAN

+ -

1948-33-0

t6

T-BUTYLHYDROQUINONE

+ -

100-41-4

t7

ETHYLBENZENE

+ -

126-99-8

t8

CHLOROPRENE

+ -

8003-22-3

t10

D & C YELLOW NO. 11

+ -

78-84-2

t11

ISOBUTYRALDEHYDE

+ -

127-00-4

t13

1-CHLORO-2-PROPANOL

+ -

11-42-2

t14

DIETHANOLAMINE

+ -

77-09-8

t15

PHENOLPHTHALEIN

+ -

110-86-1

t16

PYRIDINE

+ -

1300-72-7

t17

XYLENESULFONIC ACID

+ -

98-00-0

t18

FURFURYL ALCOHOL

+ -

125-33-7

t19

PRIMACLONE

+ -

111-76-2

t20

ETHYLENE GLYCOL MONOBUTYL ETHER

+ -

115-11-7

t22

ISOBUTENE

+ -

93-15-2

t23

METHYLEUGENOL

+ -

434-07-1

t24

OXYMETHOLONE

+ -

84-65-1

t25

ANTHRAQUINONE

+ -

518-82-1

t26

EMODIN

+ -

5392-40-5

t27

CITRAL

+ -

104-55-2

t29

CINNAMALDEHYDE

+ -

10026-24-1

t9

COBALT SULFATE HEPTAHYDRATE

+ -

1313-27-5

t12

MOLYBDENUM TRIOXIDE

+ -

1303-00-0

t21

GALLIUM ARSENIDE

+ -

7632-00-0

t28

SODIUM NITRITE

+ -

1314-62-1

t30

VANADIUM PENTOXIDE

+ -

Explanatory Power: yes no

Summary of results so far: PTE-2


Some relevant information


Machine Learning at the Computing Laboratory