Results 11 - 20
of
731
Error Reduction through Learning Multiple Descriptions
, 1996
"... . Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount ..."
Abstract
-
Cited by 114 (3 self)
- Add to MetaCart
. Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount of error reduction is linked to the "degree to which the descriptions for a class make errors in a correlated manner." We present a precise and novel definition for this notion and use twenty-nine data sets to show that the amount of observed error reduction is negatively correlated with the degree to which the descriptions make errors in a correlated manner. We empirically show that it is possible to learn descriptions that make less correlated errors in domains in which many ties in the search evaluation measure (e.g. information gain) are experienced during learning. The paper also presents results that help to understand when and why multiple descriptions are a help (irrelevant attribute...
Discovering Generalized Episodes Using Minimal Occurrences
- In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining
, 1996
"... Sequences of events are an important special form of data that arises in several contexts, including telecommunications, user interface studies, and epidemiology. We present a general and flexible framework of specifying classes of generalized episodes. These are recurrent combinations of events sat ..."
Abstract
-
Cited by 112 (8 self)
- Add to MetaCart
Sequences of events are an important special form of data that arises in several contexts, including telecommunications, user interface studies, and epidemiology. We present a general and flexible framework of specifying classes of generalized episodes. These are recurrent combinations of events satisfying certain conditions. The framework can be instantiated to a wide variety of applications by selecting suitable primitive conditions. We present algorithms for discovering frequently occurring episodes and episode rules. The algorithms are based on the use of minimal occurrences of episodes; this makes it possible to evaluate confidences of a wide variety of rules using only a single analysis pass. We present empirical results on the behavior of the algorithms on events stemming from a WWW log.
Controlling the Complexity of Learning in Logic through Syntactic and Task-Oriented Models
- INDUCTIVE LOGIC PROGRAMMING
, 1992
"... Due to the inadequacy of attribute-only representations for many learning problems, there is now a renewed interest in algorithms employing first-order logic or restricted variants thereof as their knowledge representation. In this paper, we give a brief overview of the dimensions along which the ..."
Abstract
-
Cited by 95 (7 self)
- Add to MetaCart
Due to the inadequacy of attribute-only representations for many learning problems, there is now a renewed interest in algorithms employing first-order logic or restricted variants thereof as their knowledge representation. In this paper, we give a brief overview of the dimensions along which the complexity of learning in such representations can be controlled. We then present RDT, a model-based learning algorithm for function-free Horn clauses with negation that introduces two new means of complexity control, namely the use of syntactic rule models, and the use of a task-oriented domain topology. We briefly describe some preliminary application results of RDT within the knowledge acquisition system MOBAL, and present directions of further research.
Learning Probabilistic Models of Link Structure
- Journal of Machine Learning Research
, 2002
"... Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with "flat" data representations, forcing us to convert our data into a form that loses much of the link ..."
Abstract
-
Cited by 89 (11 self)
- Add to MetaCart
Most real-world data is heterogeneous and richly interconnected. Examples include the Web, hypertext, bibliometric data and social networks. In contrast, most statistical learning methods work with "flat" data representations, forcing us to convert our data into a form that loses much of the link structure. The recently introduced framework of probabilistic relational models (PRMs) embraces the object-relational nature of structured data by capturing probabilistic interactions between attributes of related entities. In this paper, we extend this framework by modeling interactions between the attributes and the link structure itself. An advantage of our approach is a unified generarive model for both content and relational structure. We propose two mechanisms for representing a probabilistic distribution over link structures: reference uncertainty and existence uncertainty. We describe the appropriate conditions for using each model and present learning algorithms for each. We present experimental results showing that the learned models can be used to predict link structure and, moreover, the observed link structure can be used to provide better predictions for the attributes in the model.
Mutagenesis: ILP experiments in a non-determinate biological domain
- Proceedings of the 4th International Workshop on Inductive Logic Programming, volume 237 of GMD-Studien
, 1994
"... This paper describes the use of Inductive Logic Programming as a scientific assistant. In particular, it details the application of the ILP system Progol to discovering structural features that can result in mutagenicity in small molecules. To discover these concepts, Progol only had access to th ..."
Abstract
-
Cited by 85 (7 self)
- Add to MetaCart
This paper describes the use of Inductive Logic Programming as a scientific assistant. In particular, it details the application of the ILP system Progol to discovering structural features that can result in mutagenicity in small molecules. To discover these concepts, Progol only had access to the atomic and bond structure of the molecules. With such a primitive description and no further assistance from chemists, Progol corroborated some existing knowledge and proposed a new structural alert for mutagenicity in compounds. In the process, the experiments act as a case study in which, even with extremely limited background knowledge, an Inductive Logic Programming tool firstly, complements a complex statistical model developed by skilled chemists, and secondly, continues to provide understandable theories when the statistical model fails. The experiments also constitute the first demonstrations of a prototype of the Progol system. Progol allows the construction of hypotheses with bounded non-determinacy by performing a best-first search within the subsumption lattice. The results here provide evidence that such searches are both viable and desirable. 1
Inductive Constraint Logic
, 1995
"... . A novel approach to learning first order logic formulae from positive and negative examples is presented. Whereas present inductive logic programming systems employ examples as true and false ground facts (or clauses), we view examples as interpretations which are true or false for the target theo ..."
Abstract
-
Cited by 80 (19 self)
- Add to MetaCart
. A novel approach to learning first order logic formulae from positive and negative examples is presented. Whereas present inductive logic programming systems employ examples as true and false ground facts (or clauses), we view examples as interpretations which are true or false for the target theory. This viewpoint allows to reconcile the inductive logic programming paradigm with classical attribute value learning in the sense that the latter is a special case of the former. Because of this property, we are able to adapt AQ and CN2 type algorithms in order to enable learning of full first order formulae. However, whereas classical learning techniques have concentrated on concept representations in disjunctive normal form, we will use a clausal representation, which corresponds to a conjuctive normal form where each conjunct forms a constraint on positive examples. This representation duality reverses also the role of positive and negative examples, both in the heuristics and in the a...
Parameter learning of logic programs for symbolic-statistical modeling
- Journal of Artificial Intelligence Research
, 2001
"... We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. de nite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distributio ..."
Abstract
-
Cited by 77 (18 self)
- Add to MetaCart
We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. de nite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, thatrunsfora class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have beendeveloped independently in each research eld. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can signi cantly outperform the Inside-Outside algorithm. 1.
Learning the CLASSIC Description Logic: Theoretical and Experimental Results
- In Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International Conference (KR94
, 1994
"... We present a series of theoretical and experimental results on the learnability of description logics. We first extend previous formal learnability results on simple description logics to C-Classic, a description logic expressive enough to be practically useful. We then experimentally evaluate two e ..."
Abstract
-
Cited by 75 (6 self)
- Add to MetaCart
We present a series of theoretical and experimental results on the learnability of description logics. We first extend previous formal learnability results on simple description logics to C-Classic, a description logic expressive enough to be practically useful. We then experimentally evaluate two extensions of a learning algorithm suggested by the formal analysis. The first extension learns C-Classic descriptions from individuals. (The formal results assume that examples are themselves descriptions.) The second extension learns disjunctions of C-Classic descriptions from individuals. The experiments, which were conducted using several hundred target concepts from a number of domains, indicate that both extensions reliably learn complex natural concepts. 1 INTRODUCTION One well-known family of formalisms for representing knowledge are description logics, sometimes also called terminological logics or KL-ONE-type languages. Description logics have been applied in a number of contexts...
Relational Markov Models and their Application to Adaptive Web Navigation
, 2002
"... Relational Markov models (RMMs) are a generalization of Markov models where states can be of different types, with each type described by a different set of variables. The domain of each variable can be hierarchically structured, and shrinkage is carried out over the cross product of these hierarchi ..."
Abstract
-
Cited by 74 (7 self)
- Add to MetaCart
Relational Markov models (RMMs) are a generalization of Markov models where states can be of different types, with each type described by a different set of variables. The domain of each variable can be hierarchically structured, and shrinkage is carried out over the cross product of these hierarchies. RMMs make effective learning possible in domains with very large and heterogeneous state spaces, given only sparse data. We apply them to modeling the behavior of web site users, improving prediction in our PROTEUS architecture for personalizing web sites. We present experiments on an e-commerce and an academic web site showing that RMMs are substantially more accurate than alternative methods, and make good predictions even when applied to previously-unvisited parts of the site.
Relational Learning Techniques for Natural Language Information Extraction
, 1998
"... The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a t ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be ext...

