Results 1  10
of
54
Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems
 Proceedings of the IEEE
, 1998
"... this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, ph ..."
Abstract

Cited by 248 (11 self)
 Add to MetaCart
this paper. Let us place it within the neural network perspective, and particularly that of learning. The area of neural networks has greatly benefited from its unique position at the crossroads of several diverse scientific and engineering disciplines including statistics and probability theory, physics, biology, control and signal processing, information theory, complexity theory, and psychology (see [45]). Neural networks have provided a fertile soil for the infusion (and occasionally confusion) of ideas, as well as a meeting ground for comparing viewpoints, sharing tools, and renovating approaches. It is within the illdefined boundaries of the field of neural networks that researchers in traditionally distant fields have come to the realization that they have been attacking fundamentally similar optimization problems.
An Active Testing Model for Tracking Roads in Satellite Images
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1995
"... We present a new approach for tracking roads from satellite images, and thereby illustrate a general computational strategy ("active testing") for tracking 1D structures and other recognition tasks in computer vision. Our approach is related to recent work in active vision on "where to look next" a ..."
Abstract

Cited by 158 (5 self)
 Add to MetaCart
We present a new approach for tracking roads from satellite images, and thereby illustrate a general computational strategy ("active testing") for tracking 1D structures and other recognition tasks in computer vision. Our approach is related to recent work in active vision on "where to look next" and motivated by the "divideandconquer" strategy of parlor games such as "Twenty Questions." We choose "tests" (matched filters for short road segments) one at a time in order to remove as much uncertainty as possible about the "true hypothesis" (road position) given the results of the previous tests. The tests are chosen online based on a statistical model for the joint distribution of tests and hypotheses. The problem of minimizing uncertainty (measured by entropy) is formulated in simple and explicit analytical terms. To execute this entropy testing rule we then alternate between data collection and optimization: at each iteration new image data are examined and a new entropy minimizat...
Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey
 Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract

Cited by 147 (1 self)
 Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, treestructured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Learning classification trees
 Statistics and Computing
, 1992
"... Algorithms for learning cIassification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. T ..."
Abstract

Cited by 124 (8 self)
 Add to MetaCart
Algorithms for learning cIassification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to QuinIan’s information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan’s C4 (1987) and Breiman et aL’s CART (1984) show the full Bayesian algorithm produces more accurate predictions than versions
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 107 (8 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Split Selection Methods for Classification Trees
 STATISTICA SINICA
, 1997
"... Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares ..."
Abstract

Cited by 76 (9 self)
 Add to MetaCart
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected by a direct stopping rule or by pruning. Real and simulated data are used to compare QUEST with the exhaustive search approach. QUEST is shown to be substantially faster and the size and classification accuracy of its trees are typically comparable to those of exhaustive search.
A Vector Quantization Approach to Universal Noiseless Coding and Quantization
 IEEE Trans. Inform. Theory
, 1996
"... AbstractA twostage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixedrate quan ..."
Abstract

Cited by 44 (10 self)
 Add to MetaCart
AbstractA twostage code is a block code in which each block of data is coded in two stages: the first stage codes the identity of a block code among a collection of codes, and the second stage codes the data using the identified code. The collection of codes may he noiseless codes, fixedrate quantizers, or variablerate quantizers. We take a vector quantization approach to twostage coding, in which the first stage code can be regarded as a vector quantizer that “quantizes ” the input data of length n to one of a fixed collection of block codes. We apply the generalized Lloyd algorithm to the firststage quantizer, using induced measures of rate and distortion, to design locally optimal twostage, codes. On a source of medical images, twostage variahlerate vector quantizers designed in this way outperform standard (onestage) fixedrate vector quantizers by over 9 dB. The tail of the operational distortionrate function of the firststage quantizer determines the optimal rate of convergence of the redundancy of a universal sequence of twostage codes. We show that there exist twostage universal noiseless codes, fixedrate quantizers, and variablerate quantizers whose perletter rate and distortion redundancies converge to zero as (k/2)n ’ logn, when the universe of sources has finite dimension k. This extends the achievability part of Rissanen’s theorem from universal noiseless codes to universal quantizers. Further, we show that the redundancies converge as O(n’) when the universe of sources is countable, and as O(r~l+‘) when the universe of sources is infinitedimensional, under appropriate conditions. Index TermsTwostage, adaptive, compression, minimum description length, clustering. I.
Decision Graphs  An Extension of Decision Trees
, 1993
"... : In this paper, we examine Decision Graphs, a generalization of decision trees. We present an inference scheme to construct decision graphs using the Minimum Message Length Principle. Empirical tests demonstrate that this scheme compares favourably with other decision tree inference schemes. This w ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
: In this paper, we examine Decision Graphs, a generalization of decision trees. We present an inference scheme to construct decision graphs using the Minimum Message Length Principle. Empirical tests demonstrate that this scheme compares favourably with other decision tree inference schemes. This work provides a metric for comparing the relative merit of the decision tree and decision graph formalisms for a particular domain. 1 Introduction In this paper, we examine the problem of inferring a decision procedure from a set of examples. We examine the decision graph [5, 1, 16, 15, 14], a generalization of the decision tree [3, 18], and propose a method to construct decision graphs based upon Wallace's Minimum Message Length Principle (MMLP) [24, 10, 25]. The MMLP is related to Rissanen's Minimum Description Length Principle (MDLP) [21, 22, 20]. For the reader unfamiliar with minimum encoding methods (MML and MDL), a good introduction to the area is given by Georgeff [10]. We formalize ...
Decision Trees For Geometric Models
, 1993
"... A fundamental problem in modelbased computer vision is that of identifying which of a given set of geometric models is present in an image. Considering a "probe" to be an oracle that tells us whether or not a model is present at a given point, we study the problem of computing efficient strategi ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
A fundamental problem in modelbased computer vision is that of identifying which of a given set of geometric models is present in an image. Considering a "probe" to be an oracle that tells us whether or not a model is present at a given point, we study the problem of computing efficient strategies ("decision trees") for probing an image, with the goal to minimize the number of probes necessary (in the worst case) to determine which single model is present. We show that a dlg ke height binary decision tree always exists for k polygonal models (in fixed position), provided (1) they are nondegenerate (do not share boundaries) and (2) they share a common point of intersection. Further, we give an efficient algorithm for constructing such decision tress when the models are given as a set of polygons in the plane. We show that constructing a minimum height tree is NPcomplete if either of the two assumptions is omitted. We provide an efficient greedy heuristic strategy and show ...
A Global Optimization Technique for Statistical Classifier Design
 IEEE Transactions on Signal Processing
"... A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the design of practical classifiers for some of the most popular structures currently in use. The method, grounded in ideas from statistical physics and information theory, extends the deterministic annealing approach for optimization, both to incorporate structural constraints on data assignments to classes and to minimize the probability of error as the cost objective. During the design, data are assigned to classes in probability, so as to minimize the expected classification error given a specified level of randomness, as measured by Shannon's entropy. The constrained optimization is equivalent to a free energy minimization, motivating a deterministic annealing approach in which the entropy...