Results 1  10
of
2,220
Statistical pattern recognition: A review
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract

Cited by 1007 (30 self)
 Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the wellknown methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Independent component analysis: algorithms and applications
 NEURAL NETWORKS
, 2000
"... ..."
(Show Context)
Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ¹ minimization
 PROC. NATL ACAD. SCI. USA 100 2197–202
, 2002
"... Given a ‘dictionary’ D = {dk} of vectors dk, we seek to represent a signal S as a linear combination S = ∑ k γ(k)dk, with scalar coefficients γ(k). In particular, we aim for the sparsest representation possible. In general, this requires a combinatorial optimization process. Previous work considered ..."
Abstract

Cited by 618 (38 self)
 Add to MetaCart
Given a ‘dictionary’ D = {dk} of vectors dk, we seek to represent a signal S as a linear combination S = ∑ k γ(k)dk, with scalar coefficients γ(k). In particular, we aim for the sparsest representation possible. In general, this requires a combinatorial optimization process. Previous work considered the special case where D is an overcomplete system consisting of exactly two orthobases, and has shown that, under a condition of mutual incoherence of the two bases, and assuming that S has a sufficiently sparse representation, this representation is unique and can be found by solving a convex optimization problem: specifically, minimizing the ℓ¹ norm of the coefficients γ. In this paper, we obtain parallel results in a more general setting, where the dictionary D can arise from two or several bases, frames, or even less structured systems. We introduce the Spark, ameasure of linear dependence in such a system; it is the size of the smallest linearly dependent subset (dk). We show that, when the signal S has a representation using less than Spark(D)/2 nonzeros, this representation is necessarily unique. We
Feature selection based on mutual information: Criteria of maxdependency, maxrelevance, and minredundancy
 IEEE TRANS. PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2005
"... Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first der ..."
Abstract

Cited by 542 (7 self)
 Add to MetaCart
(Show Context)
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimalredundancymaximalrelevance criterion (mRMR), for firstorder incremental feature selection. Then, we present a twostage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 488 (4 self)
 Add to MetaCart
(Show Context)
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Kernel independent component analysis
 Journal of Machine Learning Research
, 2002
"... We present a class of algorithms for independent component analysis (ICA) which use contrast functions based on canonical correlations in a reproducing kernel Hilbert space. On the one hand, we show that our contrast functions are related to mutual information and have desirable mathematical propert ..."
Abstract

Cited by 454 (24 self)
 Add to MetaCart
We present a class of algorithms for independent component analysis (ICA) which use contrast functions based on canonical correlations in a reproducing kernel Hilbert space. On the one hand, we show that our contrast functions are related to mutual information and have desirable mathematical properties as measures of statistical dependence. On the other hand, building on recent developments in kernel methods, we show that these criteria can be computed efficiently. Minimizing these criteria leads to flexible and robust algorithms for ICA. We illustrate with simulations involving a wide variety of source distributions, showing that our algorithms outperform many of the presently known algorithms. 1.
Efficient sparse coding algorithms
 In NIPS
, 2007
"... Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higherlevel features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we ..."
Abstract

Cited by 437 (14 self)
 Add to MetaCart
(Show Context)
Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higherlevel features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1regularized least squares problem and an L2constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit endstopping and nonclassical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons. 1
Probabilistic Independent Component Analysis
, 2003
"... Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'modelfree' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ..."
Abstract

Cited by 205 (13 self)
 Add to MetaCart
Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'modelfree' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ability to quantify statistical significance. We present an integrated approach to Probabilistic ICA for FMRI data that allows for nonsquare mixing in the presence of Gaussian noise. We employ an objective estimation of the amount of Gaussian noise through Bayesian analysis of the true dimensionality of the data, i.e. the number of activation and nonGaussian noise sources. Reduction of the data to this 'true' subspace before the ICA decomposition automatically results in an estimate of the noise, leading to the ability to assign significance to voxels in ICA spatial maps. Estimation of the number of intrinsic sources not only enables us to carry out probabilistic modelling, but also achieves an asymptotically unique decomposition of the data. This reduces problems of interpretation, as each final independent component is now much more likely to be due to only one physical or physiological process. We also describe other improvements to standard ICA, such as temporal prewhitening and variance normafisation of timeseries, the latter being particularly useful in the context of dimensionality reduction when weak activation is present. We discuss the use of prior information about the spatiotemporal nature of the source processes, and an alternativehypothesis testing approach for inference, using Gaussian mixture models. The performance of our approach is illustrated and evaluated on real and complex artificial FMRI data, and compared to the spatiotemporal accuracy of restfits obtaine...
Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
 IEEE Trans. On Audio, Speech and Lang. Processing
, 2007
"... Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain ..."
Abstract

Cited by 181 (30 self)
 Add to MetaCart
(Show Context)
Abstract—An unsupervised learning algorithm for the separation of sound sources in onechannel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a timevarying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements. Index Terms—Acoustic signal analysis, audio source separation, blind source separation, music, nonnegative matrix factorization, sparse coding, unsupervised learning. I.
Underdetermined Blind Source Separation Using Sparse Representations
, 2001
"... The scope of this work is the separation of N sources from M linear mixtures when the underlying system is underdetermined, that is, when M<N. If the input distribution is sparse the mixing matrix can be estimated either by external optimization or by clustering and, given the mixing matrix, a mi ..."
Abstract

Cited by 164 (5 self)
 Add to MetaCart
The scope of this work is the separation of N sources from M linear mixtures when the underlying system is underdetermined, that is, when M<N. If the input distribution is sparse the mixing matrix can be estimated either by external optimization or by clustering and, given the mixing matrix, a minimal l_ 1 norm representation of the sources can be obtained by solving a lowdimensional linear programming problem for each of the data points. Yet, when the signals per se do not satisfy this assumption, sparsity can still be achieved by realizing the separation in a sparser transformed domain. The approach is illustrated here for M = 2. In this case we estimate both the number of sources and the mixing matrix by the maxima of a potential function along the circle of unit length, and we obtain the minimal l_1 norm representation of each data point by a linear combination of the pair of basis vectors that enclose it. Several experiments with music and speech signals show that their timedomain representation is not sparse enough. Yet, excellent results were obtained using their shorttime Fourier transform, including the separation of up to six sources from two mixtures.