Results 1  10
of
231
Statistical pattern recognition: A review
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract

Cited by 657 (22 self)
 Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the wellknown methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Hidden Markov processes
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finite ..."
Abstract

Cited by 170 (3 self)
 Add to MetaCart
Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finitestate finitealphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximumlikelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finitestate channels, hidden Markov models, identifiability, Kalman filter, maximumlikelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Recognizing Imprecisely Localized, Partially Occluded and Expression Variant Faces from a Single Sample per Class
, 2002
"... The classical way of attempting to solve the face (or object) recognition problem is by using large and representative datasets. In many applications though, only one sample per class is available to the system. In this contribution, we describe a probabilistic approach that is able to compensate fo ..."
Abstract

Cited by 148 (8 self)
 Add to MetaCart
The classical way of attempting to solve the face (or object) recognition problem is by using large and representative datasets. In many applications though, only one sample per class is available to the system. In this contribution, we describe a probabilistic approach that is able to compensate for imprecisely localized, partially occluded and expression variant faces even when only one single training sample per class is available to the system. To solve the localization problem, we find the subspace (within the feature space, e.g. eigenspace) that represents this error for each of the training images. To resolve the occlusion problem, each face is divided into k local regions which are analyzed in isolation. In contrast with other approaches, where a simple voting space is used, we present a probabilistic method that analyzes how "good" a local match is. To make the recognition system less sensitive to the differences between the facial expression displayed on the training and the testing images, we weight the results obtained on each local area on the bases of how much of this local area is affected by the expression displayed on the current test image.
Segmentation of multivariate mixed data via lossy coding and compression
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmen ..."
Abstract

Cited by 68 (13 self)
 Add to MetaCart
Abstract—In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and ratedistortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phasetransitionlike behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data. Index Terms—Multivariate mixed data, data segmentation, data clustering, rate distortion, lossy coding, lossy compression, image segmentation, microarray data clustering. 1
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
Face Recognition with Image Sets Using Manifold Density Divergence
, 2005
"... In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semiparametric model for learning probability densities confined to highly ..."
Abstract

Cited by 65 (14 self)
 Add to MetaCart
In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semiparametric model for learning probability densities confined to highly nonlinear but intrinsically lowdimensional manifolds. The model leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds. The proposed method is evaluated on a large data set, acquired in realistic imaging conditions with severe illumination variation. Our algorithm is shown to match the best and outperform other stateoftheart algorithms in the literature, achieving 94% recognition rate on average.
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
The consistency of the BIC Markov order estimator.
"... . The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
. The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj k (jAj\Gamma1) 2 log n: We show that k equals the correct order of the chain, eventually almost surely as n ! 1, thereby strengthening earlier consistency results that assumed an apriori bound on the order. A key tool is a strong ratiotypicality result for Markov sample paths. We also show that the Bayesian estimator or minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. AMS 1991 subject classification: Primary 62F12, 62M05; Secondary 62F13, 60J10 Key words and phrases: Bayesian Information Criterion, order estimation, ratiotypicality, Markov chains. 1 Supported in part by a joint N...
Algorithmic Statistics
 IEEE Transactions on Information Theory
, 2001
"... While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between an individual data sample and an individual model summarizing the information in the data, for example, a finite set (or probability distribution) where the data sample typically came from. The statistical theory based on such relations between individual objects can be called algorithmic statistics, in contrast to classical statistical theory that deals with relations between probabilistic ensembles. We develop the algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic. This theory is based on twopart codes consisting of the code for the statistic (the model summarizing the regularity, the meaningful information, in the data) and the modeltodata code. In contrast to the situation in probabilistic statistical theory, the algorithmic relation of (minimal) sufficiency is an absolute relation between the individual model and the individual data sample. We distinguish implicit and explicit descriptions of the models. We give characterizations of algorithmic (Kolmogorov) minimal sufficient statistic for all data samples for both description modes in the explicit mode under some constraints. We also strengthen and elaborate earlier results on the "Kolmogorov structure function" and "absolutely nonstochastic objects" those rare objects for which the simplest models that summarize their relevant information (minimal sucient statistics) are at least as complex as the objects themselves. We demonstrate a close relation between the probabilistic notions and the algorithmic ones: (i) in both cases there is an "information nonincrease" law; (ii) it is shown that a function is a...
Spam filtering using statistical data compression models
 Journal of Machine Learning Research
, 2006
"... Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task call ..."
Abstract

Cited by 52 (12 self)
 Add to MetaCart
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on characterlevel or binary sequences. By modeling messages as sequences, tokenization and other errorprone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our empirical evaluation indicate that compression models outperform currently established spam filters, as well as a number of methods proposed in previous studies.