Results 1  10
of
18
Database Mining: A Performance Perspective
 IEEE Transactions on Knowledge and Data Engineering
, 1993
"... We present our perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classification, associations, and sequences, and argue that these problems can be unifor ..."
Abstract

Cited by 282 (12 self)
 Add to MetaCart
We present our perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classification, associations, and sequences, and argue that these problems can be uniformly viewed as requiring discovery of rules embedded in massive data. We describe a model and some basic operations for the process of rule discovery. We show how the database mining problems we consider map to this model and how they can be solved by using the basic operations we propose. We give an example of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm not only is efficient in discovering classification rules but also has accuracy comparable to ID3, one of the current best classifiers. Index Terms. database mining, knowledge discovery, classification, associations, sequences, decision trees Current address: Computer Science De...
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 153 (1 self)
 Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 108 (8 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
A Theory of Learning Classification Rules
, 1992
"... The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error whe ..."
Abstract

Cited by 80 (6 self)
 Add to MetaCart
The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error when learning logical rules. The thesis is motivated by considering some current research issues in machine learning such as bias, overfitting and search, and considering the requirements placed on a learning system when it is used for knowledge acquisition. Basic Bayesian decision theory relevant to the problem of learning classification rules is reviewed, then a Bayesian framework for such learning is presented. The framework has three components: the hypothesis space, the learning protocol, and criteria for successful learning. Several learning protocols are analysed in detail: queries, logical, noisy, uncertain and positiveonly examples. The analysis is done by interpreting a protocol as a...
Some applications of treebased modelling to speech and language
 In Proceedings of the Speech and Natural Language Workshop, Cape Cod MA
"... Several applications of statistical treebased modelling are described here to problems in speech and language. Classification and regression trees are well suited to many of the pattern recognition problems encountered in this area since they (1) statistically select the most significant features i ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
Several applications of statistical treebased modelling are described here to problems in speech and language. Classification and regression trees are well suited to many of the pattern recognition problems encountered in this area since they (1) statistically select the most significant features involved (2) provide "honest" estimates of their performance, (3) permit both categorical and continuous features to be considered, and
Decision Graphs  An Extension of Decision Trees
, 1993
"... : In this paper, we examine Decision Graphs, a generalization of decision trees. We present an inference scheme to construct decision graphs using the Minimum Message Length Principle. Empirical tests demonstrate that this scheme compares favourably with other decision tree inference schemes. This w ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
: In this paper, we examine Decision Graphs, a generalization of decision trees. We present an inference scheme to construct decision graphs using the Minimum Message Length Principle. Empirical tests demonstrate that this scheme compares favourably with other decision tree inference schemes. This work provides a metric for comparing the relative merit of the decision tree and decision graph formalisms for a particular domain. 1 Introduction In this paper, we examine the problem of inferring a decision procedure from a set of examples. We examine the decision graph [5, 1, 16, 15, 14], a generalization of the decision tree [3, 18], and propose a method to construct decision graphs based upon Wallace's Minimum Message Length Principle (MMLP) [24, 10, 25]. The MMLP is related to Rissanen's Minimum Description Length Principle (MDLP) [21, 22, 20]. For the reader unfamiliar with minimum encoding methods (MML and MDL), a good introduction to the area is given by Georgeff [10]. We formalize ...
Bayes Risk Weighted Vector Quantization With Posterior Estimation for Image Compression and Classification
 IEEE TRANSACTIONS ON IMAGE PROCESSING
, 1996
"... Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and lowlevel classification, it is not surpri ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
Classification and compression play important roles in communicating digital information. Their combination is useful in many applications, including the detection of abnormalities in compressed medical images. In view of the similarities of compression and lowlevel classification, it is not surprising that there are many similar methods for their design. Because some of these methods are useful for designing vector quantizers, it seems natural that vector quantization (VQ) is explored for the combined goal. We investigate several VQbased algorithms that seek to minimize both the distortion of compressed images and errors in classifying their pixel blocks. These algorithms are investigated with both full search and treestructured codes. We emphasize a nonparametric technique that minimizes both error measures simultaneously by incorporating a Bayes risk component into the distortion measure used for design and encoding. We introduce a treestructured posterior estimator to produce t...
Automatic phonetic baseform determination
 in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing
, 1991
"... Phonetic baseforms are the basic recognition units in most large vocabulary speech recognition systems. These baseforms are usually determined by hand once a vocabulary is chosen and not modified thereafter. However, many applications of speech recognition, such as dictation transcription, are ham ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Phonetic baseforms are the basic recognition units in most large vocabulary speech recognition systems. These baseforms are usually determined by hand once a vocabulary is chosen and not modified thereafter. However, many applications of speech recognition, such as dictation transcription, are hampered by a fixed vocabulary and require the user be able to add new words to the vocabulary. At least one phonetic baseform must be assigned to each new word to properly integrate the word into the recognition system. Dictionary lookup is often unsuccessful in determining a phonetic baseform because new words are often names or taskspecific jargon; also, talkers tend to have idiosyncratic pronunciations for a substantial fraction of words. This paper describes a series of experiments in which the phonetic baseform is deduced automatically for new words by utilizing actual utterances of the new word in conjunction with a set of automatically derived spellingtosound rules. We evaluated recognition performance on new words spoken by two different talkers when the phonetic baseforms were extracted via the above approach. The error rates on these new words were found to be comparable to or better than when the phonetic baseforms were derived by hand, thus validating the basic approach. 1
OnLine Stochastic Processes in Data Compression
, 1996
"... The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
The ability to predict the future based upon the past in finitealphabet sequences has many applications, including communications, data security, pattern recognition, and natural language processing. By Shannon's theory and the breakthrough development of arithmetic coding, any sequence, a 1 a 2 \Delta \Delta \Delta a n , can be encoded in a number of bits that is essentially equal to the minimal informationlossless codelength, P i \Gamma log 2 p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ). The goal of universal online modeling, and therefore of universal data compression, is to deduce the model of the input sequence a 1 a 2 \Delta \Delta \Delta a n that can estimate each p(a i ja 1 \Delta \Delta \Delta a i\Gamma1 ) knowing only a 1 a 2 \Delta \Delta \Delta a i\Gamma1 so that the ex...
Minimum Conditional Entropy Context Quantization
, 2000
"... We consider the problem of finding the quantizer Q that quantizes the Kdimensional causal context C i = (X i\Gammat i ; X i\Gammat 2 ; : : : ; X i\Gammat K ) of a source symbol X i into one of M conditioning states such that the conditional entropy H(X i jQ(C i )) is minimized. The resulting mi ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We consider the problem of finding the quantizer Q that quantizes the Kdimensional causal context C i = (X i\Gammat i ; X i\Gammat 2 ; : : : ; X i\Gammat K ) of a source symbol X i into one of M conditioning states such that the conditional entropy H(X i jQ(C i )) is minimized. The resulting minimum conditional entropy context quantizer can be used for sequential coding of the sequence X0 ; X1 ; X2 ; : : :. 1 Introduction A key problem in sequential source coding of a discrete random sequence X 0 ; X 1 ; X 2 ; \Delta \Delta \Delta is modeling the underlying conditional distribution of the source P (X i jX i\Gamma1 ); (1) where X i\Gamma1 denotes X 0 ; X 1 ; \Delta \Delta \Delta ; X i\Gamma1 , the prefix of X i . Because of model estimation considerations, it is not possible to directly use all of X i\Gamma1 as the model's context. Indeed, given a model class, the order of the model or the number of model parameters needs to be carefully selected so as not to negatively i...