Results 1 - 10
of
47
Fast Effective Rule Induction
, 1995
"... Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error r ..."
Abstract
-
Cited by 800 (19 self)
- Add to MetaCart
Many existing rule learning systems are computationally expensive on large noisy datasets. In this paper we evaluate the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems. We show that while IREP is extremely efficient, it frequently gives error rates higher than those of C4.5 and C4.5rules. We then propose a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5rules with respect to error rates, but much more efficient on large samples. RIPPERk obtains error rates lower than or equivalent to C4.5rules on 22 of 37 benchmark problems, scales nearly linearly with the number of training examples, and can efficiently process noisy datasets containing hundreds of thousands of examples.
Dynamically discovering likely program invariants to support program evolution
- IEEE Transactions on Software Engineering
, 2001
"... Explicitly stated program invariants can help programmers by identifying program properties that must be preserved when modifying code. In practice, however, these invari-ants are usually implicit. An alternative to expecting pro-grammers to fully annotate code with invariants is to au-tomatically i ..."
Abstract
-
Cited by 467 (63 self)
- Add to MetaCart
Explicitly stated program invariants can help programmers by identifying program properties that must be preserved when modifying code. In practice, however, these invari-ants are usually implicit. An alternative to expecting pro-grammers to fully annotate code with invariants is to au-tomatically infer invariants from the program itself. This research focuses on dynamic techniques for discovering in-variants from execution traces. This paper reports two results. First, it describes techniques for dynamically discovering invariants, along with an instru-menter and an inference engine that embody these tech-niques. Second, it reports on the application of the engine to two sets of target programs. In programs from Gries’s work on program derivation, we rediscovered predefined in-variants. In a C program lacking explicit invariants, we dis-covered invariants that assisted a software evolution task.
Clausal Discovery
- Machine Learning
, 1996
"... The clausal discovery engine Claudien is presented. Claudien is an inductive logic programming engine that fits in the knowledge discovery in databases and data mining paradigm as it discovers regularities that are valid in data. As such Claudien performs a novel induction task, which is called char ..."
Abstract
-
Cited by 170 (32 self)
- Add to MetaCart
The clausal discovery engine Claudien is presented. Claudien is an inductive logic programming engine that fits in the knowledge discovery in databases and data mining paradigm as it discovers regularities that are valid in data. As such Claudien performs a novel induction task, which is called characteristic induction from closed observations, and which is related to existing formalizations of induction in logic. In characterising induction from closed observations, the regularities are represented by clausal theories, and the data using Herbrand interpretations. Claudien also employs a novel declarative bias mechanism to define the set of clauses that may appear in a hypothesis. Keywords : Inductive Logic Programming, Knowledge Discovery in Databases, Data Mining, Learning, Induction, Semantics for Induction, Logic of Induction, Parallel Learning. 1 Introduction Despite the fact that the areas of knowledge discovery in databases [Fayyad et al., 1995] and inductive logic programmin...
Separate-and-conquer rule learning
- Artificial Intelligence Review
, 1999
"... This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of ..."
Abstract
-
Cited by 118 (29 self)
- Add to MetaCart
This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and overfitting avoidance biases.
A Flexible Learning System for Wrapping Tables and Lists in HTML Documents
- In International World Wide Web Conference
, 2002
"... this paper we will discuss some of the more important representational issues for wrapper learners, focusing on the specific problem of extracting text from web pages. We argue that pure DOM- or token-based representations of web pages are inadequate for the purpose of learning wrappers ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
this paper we will discuss some of the more important representational issues for wrapper learners, focusing on the specific problem of extracting text from web pages. We argue that pure DOM- or token-based representations of web pages are inadequate for the purpose of learning wrappers
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
First order jk-clausal theories are PAC-learnable
- Artificial Intelligence
, 1994
"... We present positive PAC-learning results for the nonmonotonic inductive logic programming setting. In particular, we show that first order range-restricted clausal theories that consist of clauses with up to k literals of size at most j each are polynomialsample polynomial-time PAC-learnable with on ..."
Abstract
-
Cited by 63 (27 self)
- Add to MetaCart
We present positive PAC-learning results for the nonmonotonic inductive logic programming setting. In particular, we show that first order range-restricted clausal theories that consist of clauses with up to k literals of size at most j each are polynomialsample polynomial-time PAC-learnable with one-sided error from positive examples only. In our framework, concepts are clausal theories and examples are finite interpretations. We discuss the problems encountered when learning theories which only have infinite non-trivial models and propose a way to avoid these problems using a representation change called flattening. Finally, we compare our results to PAC-learnability results for the normal inductive logic programming setting. 1
Structural Regression Trees
, 1996
"... In many real-world domains the task of machine learning algorithms is to learn a theory predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determ ..."
Abstract
-
Cited by 60 (10 self)
- Add to MetaCart
In many real-world domains the task of machine learning algorithms is to learn a theory predicting numerical values. In particular several standard test domains used in Inductive Logic Programming (ILP) are concerned with predicting numerical values from examples and relational and mostly non-determinate background knowledge. However, so far no ILP algorithm except one can predict numbers and cope with non-determinate background knowledge. (The only exception is a covering algorithm called FORS.) In this paper we present Structural Regression Trees (SRT), a new algorithm which can be applied to the above class of problems by integrating the statistical method of regression trees into ILP. SRT constructs a tree containing a literal (an atomic formula or its negation) or a conjunction of literals in each node, and assigns a numerical value to each leaf. SRT provides more comprehensible results than purely statistical methods, and can be applied to a class of problems most other ILP syste...

