Results 1 - 10
of
12
Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction
- Artificial Intelligence
, 1996
"... A classic problem from chemistry is used to test a conjecture that in domains for which data are most naturally represented by graphs, theories constructed with Inductive Logic Programming (ILP) will significantly outperform those using simpler feature-based methods. One area that has long been asso ..."
Abstract
-
Cited by 141 (29 self)
- Add to MetaCart
A classic problem from chemistry is used to test a conjecture that in domains for which data are most naturally represented by graphs, theories constructed with Inductive Logic Programming (ILP) will significantly outperform those using simpler feature-based methods. One area that has long been associated with graph-based or structural representation and reasoning is organic chemistry. In this field, we consider the problem of predicting the mutagenic activity of small molecules: a property that is related to carcinogenicity, and an important consideration in developing less hazardous drugs. By providing an ILP system with progressively more structural information concerning the molecules, we compare the predictive power of the logical theories constructed against benchmarks set by regression, neural, and tree-based methods. 1 Introduction Constructing theories to explain observations occupies much of the creative hours of scientists and engineers. Programs from the field of Inductiv...
Structural Regression Trees
, 1996
"... In many real-world domains the task of machine learning algorithms is to learn a theory predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determ ..."
Abstract
-
Cited by 60 (10 self)
- Add to MetaCart
In many real-world domains the task of machine learning algorithms is to learn a theory predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with non-determinate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems by integrating the statistical method of regression trees into ILP. SRT constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP syste...
Relating chemical activity to structure: an examination of ILP successes
, 1995
"... An important test-bed for Inductive Logic Programming (ILP) systems has been the task of relating the activity of chemical compounds to their structure. In this paper we examine the structure-activity problems that have been addressed by ILP, and evaluate empirically the extent to which a first- ..."
Abstract
-
Cited by 47 (5 self)
- Add to MetaCart
An important test-bed for Inductive Logic Programming (ILP) systems has been the task of relating the activity of chemical compounds to their structure. In this paper we examine the structure-activity problems that have been addressed by ILP, and evaluate empirically the extent to which a first-order representation was required. This is done by comparing ILP theories against those constructed by standard linear regression and a decision-tree learner. When propositional encodings are feasible for the feature-based algorithms, we present evidence that they are capable of matching the predictive accuracies of an ILP theory. However, as the diversity of compounds considered increases, propositional encodings become intractable. In such cases, our results provide support for the claim that ILP programs will continue to construct accurate, understandable theories. Based on this evidence, we propose future work to realise fully the potential of ILP in structure-activity problems. 1
Automated discovery of structural signatures of protein fold and function
- Journal of Molecular Biology
, 2001
"... Within the collection of determined protein structures, there is a wealth of principles governing the complex sequence-conformation-function relationships. Historically, many of these principles ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Within the collection of determined protein structures, there is a wealth of principles governing the complex sequence-conformation-function relationships. Historically, many of these principles
A new approach to pharmacophore mapping and QSAR analysis using Inductive Logic Programming. Application to Thermolysin inhibitors and Glycogen Phosphorylase Inhibitors
"... A key problem in QSAR is the selection of appropriate descriptors to form accurate regression equations for the compounds under study. Inductive Logic Programming (ILP) algorithms are a class of machine learning algorithm that have been successfully applied to a number of SAR problems. Unlike other ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
A key problem in QSAR is the selection of appropriate descriptors to form accurate regression equations for the compounds under study. Inductive Logic Programming (ILP) algorithms are a class of machine learning algorithm that have been successfully applied to a number of SAR problems. Unlike other QSAR methods, which use attributes to describe chemical structure, ILP uses relations. This gives ILP the advantages of: not requiring explicit superimposition of individual compounds in a dataset, dealing naturally with multiple conformations, and using a language much closer to that used normally by chemists. We unify ILP and standard regression techniques to give a QSAR method that has the strength of ILP at describing steric structure with the familiarity and power of regression methods. Complex pharmacophores, correlating with activity, were identified and used as new indicator variables, along with the Comparative Molecular Field Analysis (CoMFA) prediction, to form predictive regression equations. We compared the formation of 3D-QSARs using standard CoMFA with the use of ILP on the well-studied thermolysin zinc protease inhibitor dataset and a glycogen phosphorylase inhibitor dataset. In each case the addition of ILP variables produced statistically better results (P<0.01 for thermolysin and P<0.05 for GP datasets) than the CoMFA analysis. Moreover, the new ILP variables were found not to increase the complexity of the final QSAR equations and gave possible insight into the binding mechanism of the ligand-protein complex under study. 4
Mining for Causes of Cancer: Machine Learning Experiments at Various Levels of Detail
- In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Menlo Park, CA
, 1997
"... This paper presents, from a methodological point of view, first results of an interdisciplinary project in scientific data mining. We analyze data about the carcinogenicity of chemicals derived from the carcinogenesis bioassay program, a long-term research study performed by the US National Institut ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This paper presents, from a methodological point of view, first results of an interdisciplinary project in scientific data mining. We analyze data about the carcinogenicity of chemicals derived from the carcinogenesis bioassay program, a long-term research study performed by the US National Institute of Environmental Health Sciences. The database contains detailed descriptions of 6823 tests performed with more than 330 compounds and animals of different species, strains and sexes. The chemical structures are described at the atom and bond level, and in terms of various relevant structural properties. The goal of this paper is to investigate the effects that various levels of detail and amounts of information have on the resulting hypotheses, both quantitatively and qualitatively. We apply relational and propositional machine learning algorithms to learning problems formulated as regression or as classification tasks. In addition, these experiments have been conducted with two learning ...
Top-Down Induction of Model Trees with Regression and Splitting Nodes
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Model trees are an extension of regression trees that associate leaves with multiple regression models. In this paper, a method for the data-driven construction of model trees is presented, namely, the Stepwise Model Tree Induction (SMOTI) method. Its main characteristic is the induction of trees ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Model trees are an extension of regression trees that associate leaves with multiple regression models. In this paper, a method for the data-driven construction of model trees is presented, namely, the Stepwise Model Tree Induction (SMOTI) method. Its main characteristic is the induction of trees with two types of nodes: regression nodes, which perform only straight-line regression, and splitting nodes, which partition the feature space. The multiple linear model associated with each leaf is then built stepwise by combining straight-line regressions reported along the path from the root to the leaf. In this way, internal regression nodes contribute to the definition of multiple models and have a "global" effect, while straight-line regressions at leaves have only "local" effects.
Relating Chemical Structure to Activity: An Application of the Neural Folding Architecture
- in Proceedings of the fifth International GI-Workshop on Fuzzy-Neuro Systems '98
, 1998
"... : This paper is based on the neural folding architecture (FA). The FA is a recurrent neural network architecture which is especially suited for adaptive structure processing, i.e. learning approximations of mappings from symbolic term structures to IR n . The main objective of this paper is to dem ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
: This paper is based on the neural folding architecture (FA). The FA is a recurrent neural network architecture which is especially suited for adaptive structure processing, i.e. learning approximations of mappings from symbolic term structures to IR n . The main objective of this paper is to demonstrate that the FA can be successfully applied to approximate quantitative structure activity relationships (QSARs), which play an important role during a drug design process. Several approaches for the conversion of a QSAR problem to suitable learning tasks for the FA are presented. Finally the FA is applied to a well-known QSAR benchmark, viz. the inhibition of E. coli dihydrofolate reductase by triazines. The achieved results are compared with results of other machine learning approaches on the same QSAR benchmark, and prove that the FA is significantly better. Keywords: recurrent neural networks, folding architecture, drug design, quantitative structure activity relationships, inhibit...
Four Suggestions and a Rule Concerning the Application of ILP
"... Since the late 1980s there has been a sustained research effort directed at investigating the application of Inductive Logic Programming (ILP) to problems in biology and chemistry. This essay is a personal view of some interesting issues that have arisen during my involvement in this enterprise. Man ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Since the late 1980s there has been a sustained research effort directed at investigating the application of Inductive Logic Programming (ILP) to problems in biology and chemistry. This essay is a personal view of some interesting issues that have arisen during my involvement in this enterprise. Many of the concerns of the broader field of Knowledge Discovery from Databases manifest themselves during the application of ILP to analyse bio-chemical data. Addressing them in this microcosm has given me some directions on the wider application of ILP, and I present these here in the form of four suggestions and one rule. Readers are invited to consider them in the context of a hypothetical Recommended Codes and Practices for the application of ILP.

