Results 1 - 10
of
143
Fast Effective Rule Induction
, 1995
"... Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error r ..."
Abstract
-
Cited by 800 (19 self)
- Add to MetaCart
Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error rates higher than those of C4.5 and C4.5rules. We then propose a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5rules with respect to error rates, but much more efficient on large samples. RIPPERk obtains error rates lower than or equivalent to C4.5rules on 22 of 37 benchmark problems, scales nearly linearly with the number of training examples, and can efficiently process noisy datasets containing hundreds of thousands of examples.
Learning to Extract Symbolic Knowledge from the World Wide Web
, 1998
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a ..."
Abstract
-
Cited by 290 (24 self)
- Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a
Learning to Construct Knowledge Bases from the World Wide Web
, 2000
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would ena ..."
Abstract
-
Cited by 187 (3 self)
- Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general a...
CPAR: Classification based on Predictive Association Rules
, 2003
"... Recent studies in data mining have proposed a new classification approach, called associative classification, which, according to several reports, such as [7, 6], achieves higher classification accuracy than traditional classification approaches such as C4.5. However, the approach also su#ers from t ..."
Abstract
-
Cited by 104 (3 self)
- Add to MetaCart
Recent studies in data mining have proposed a new classification approach, called associative classification, which, according to several reports, such as [7, 6], achieves higher classification accuracy than traditional classification approaches such as C4.5. However, the approach also su#ers from two major deficiencies: (1) it generates a very large number of association rules, which leads to high processing overhead; and (2) its confidence-based rule evaluation measure may lead to overfitting.
Relational Learning Techniques for Natural Language Information Extraction
, 1998
"... The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a t ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be ext...
A Flexible Learning System for Wrapping Tables and Lists in HTML Documents
- In International World Wide Web Conference
, 2002
"... this paper we will discuss some of the more important representational issues for wrapper learners, focusing on the specific problem of extracting text from web pages. We argue that pure DOM- or token-based representations of web pages are inadequate for the purpose of learning wrappers ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
this paper we will discuss some of the more important representational issues for wrapper learners, focusing on the specific problem of extracting text from web pages. We argue that pure DOM- or token-based representations of web pages are inadequate for the purpose of learning wrappers
Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs
- Journal of Artificial Intelligence Research
, 1995
"... This paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called Foidl, is based on Foil (Quinlan, 1990) but employs intensional background knowle ..."
Abstract
-
Cited by 68 (16 self)
- Add to MetaCart
This paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called Foidl, is based on Foil (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the past-tense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. Foidl is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic). 1. Introduction Inductive logic programming (ILP) is a growing subtopic of machine learning that studies the induction of Prolog programs from examples in the presence of background knowledge (Muggleton, 1992; Lavrac & Dzeroski, 1994). Due to the expressiveness of first-order...
Relational Learning with Statistical Predicate Invention: Better Models for Hypertext
- Machine Learning
, 2001
"... We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext domains because its statistical component allows it to characterize text in terms of word frequencies, wh ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext domains because its statistical component allows it to characterize text in terms of word frequencies, whereas its relational component is able to describe how neighboring documents are related to each other by hyperlinks that connect them. We evaluate our approach by applying it to tasks that involve learning definitions for (i) classes of pages, (ii) particular relations that exist between pairs of pages, and (iii) locating a particular class of information in the internal structure of pages. Our experiments demonstrate that this new approach is able to learn more accurate classifiers than either of its constituent methods alone. Keywords: Relational Learning, Text Categorization, Predicate Invention, Naive Bayes
Improving the efficiency of inductive logic programming through the use of query packs
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2002
"... Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets ..."
Abstract
-
Cited by 54 (19 self)
- Add to MetaCart
Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems.
Induction of Logic Programs: FOIL and Related Systems
- New Generation Computing
, 1995
"... FOIL is a first-order learning system that uses information in a collection of relations to construct theories expressed in a dialect of Prolog. This paper provides an overview of the principal ideas and methods used in the current version of the system, including two recent additions. We present ex ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
FOIL is a first-order learning system that uses information in a collection of relations to construct theories expressed in a dialect of Prolog. This paper provides an overview of the principal ideas and methods used in the current version of the system, including two recent additions. We present examples of tasks tackled by FOIL and of systems that adapt and extend its approach. 1. Introduction All symbolic machine learning leads to the formulation or modification of theories, so the language in which theories are expressed is an important consideration. Firstorder theory languages have been used for at least thirty years, as documented by Sammut [1993]. Explanation-based generalisation systems [Mitchell, Keller and Kedar-Cabelli, 1986; DeJong and Mooney, 1986] have always required them, but the early and influential work of Shapiro [1983] and Sammut and Banerji [1986] also employed them in an inductive learning context. Nevertheless, first-order empirical learning, including...

