Results 1  10
of
237
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 231 (3 self)
 Add to MetaCart
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
A Minimum Description Length Approach to Statistical Shape Modelling
 IEEE Transactions on Medical Imaging
, 2001
"... We describe a method for automatically building statistical shape models from a training set of exam ple boundaries / surfaces. These models show considerable promise as a basis for segmenting and interpreting images. One of the drawbacks of the approach is, however, the need to establish a set of ..."
Abstract

Cited by 177 (11 self)
 Add to MetaCart
We describe a method for automatically building statistical shape models from a training set of exam ple boundaries / surfaces. These models show considerable promise as a basis for segmenting and interpreting images. One of the drawbacks of the approach is, however, the need to establish a set of dense correspondences between all members of a set of training shapes. Often this is achieved by locating a set of qandmarks manually on each training image, which is timeconsuming and subjective in 2D, and almost impossible in 3D. We describe how shape models can be built automatically by posing the correspondence problem as one of finding the parameterization for each shape in the training set. We select the set of parameterizations that build the best model. We define best as that which min imizes the description length of the training set, arguing that this leads to models with good compactness, specificity and generalization ability. We show how a set of shape parameterizations can be represented and manipulated in order to build a minimum description length model. Results are given for several different training sets of 2D boundaries, showing that the proposed method constructs better models than other approaches including manual landmarking  the current gold standard. We also show that the method can be extended straightforwardly to 3D.
Model Selection and the Principle of Minimum Description Length
 Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract

Cited by 145 (5 self)
 Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate th...
Universal prediction
 IEEE Transactions on Information Theory
, 1998
"... Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both th ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings. Index Terms — Bayes envelope, entropy, finitestate machine, linear prediction, loss function, probability assignment, redundancycapacity, stochastic complexity, universal coding, universal prediction. I.
Relative Loss Bounds for Online Density Estimation with the Exponential Family of Distributions
 MACHINE LEARNING
, 2000
"... We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the n ..."
Abstract

Cited by 116 (11 self)
 Add to MetaCart
We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative loglikelihood of the example with respect to the past parameter of the algorithm. An oline algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the online algorithm over the total loss of the best oline parameter. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a Bregman divergence to derive and analyze each algorithm. These divergences are relative entropies between two exponential distributions. We also use our methods to prove relative loss bounds for linear regression.
Sequential Prediction of Individual Sequences Under General Loss Functions
 IEEE Transactions on Information Theory
, 1998
"... We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction st ..."
Abstract

Cited by 75 (7 self)
 Add to MetaCart
We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function, we generalize previous work on universal prediction, forecasting, and data compression. However, here we restrict ourselves to the case when the comparison class is finite. For a given sequence, we define the regret as the total loss on the entire sequence suffered by the adaptive sequential predictor, minus the total loss suffered by the predictor in the comparison class that performs best on that particular sequence. We show that for a large class of loss functions, the minimax regret is either \Theta(log N) or \Omega\Gamma p ` log N ), depending on the loss function, where N is the number of predictors in the comparison class a...
Support vector machines for speech recognition
 Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract

Cited by 74 (2 self)
 Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and overparameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Toward a method of selecting among computational models of cognition
 Psychological Review
, 2002
"... The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
The question of how one should decide among competing explanations of data is at the heart of the scientific enterprise. Computational models of cognition are increasingly being advanced as explanations of behavior. The success of this line of inquiry depends on the development of robust methods to guide the evaluation and selection of these models. This article introduces a method of selecting among mathematical models of cognition known as minimum description length, which provides an intuitive and theoretically wellgrounded understanding of why one model should be chosen. A central but elusive concept in model selection, complexity, can also be derived with the method. The adequacy of the method is demonstrated in 3 areas of cognitive modeling: psychophysics, information integration, and categorization. How should one choose among competing theoretical explanations of data? This question is at the heart of the scientific enterprise, regardless of whether verbal models are being tested in an experimental setting or computational models are being evaluated in simulations. A number of criteria have been proposed to assist in this endeavor, summarized nicely by Jacobs and Grainger
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...