Results 1 
6 of
6
Joint Induction of Shape Features and Tree Classifiers
 IEEE Trans. PAMI
, 1997
"... We introduce a very large family of binary features for twodimensional shapes. The salient ones for separating particular shapes are determined by inductive learning during the construction of classi cation trees. There is a feature for every possible geometric arrangement of local topographic code ..."
Abstract

Cited by 75 (6 self)
 Add to MetaCart
(Show Context)
We introduce a very large family of binary features for twodimensional shapes. The salient ones for separating particular shapes are determined by inductive learning during the construction of classi cation trees. There is a feature for every possible geometric arrangement of local topographic codes. The arrangements express coarse constraints on relative angles and distances among the code locations and are nearly invariant to substantial a ne and nonlinear deformations. They are also partially ordered, which makes it possible to narrow the search for informative ones at each node of the tree. Di erent trees correspond to di erent aspects of shape. They are statistically weakly dependent due to randomization and are aggregated in a simple way. Adapting the algorithm to a shape family is then fully automatic once training samples are provided. As an illustration, we classify handwritten digits from the NIST database � the error rate is:7%.
Randomized Inquiries About Shape; an Application to Handwritten Digit Recognition
, 1994
"... We describe an approach to shape recognition based on asking relational questions about the arrangement of landmarks, basically localized and oriented boundary segments. The questions are grouped into highly structured inquiries in the form of a tree. There are, in fact, many trees, each constructed ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We describe an approach to shape recognition based on asking relational questions about the arrangement of landmarks, basically localized and oriented boundary segments. The questions are grouped into highly structured inquiries in the form of a tree. There are, in fact, many trees, each constructed from training data based on entropy reduction. The outcome of each tree is not a classification but rather a distribution over shape classes. The final classification is based on an aggregate distribution. The framework is nonEuclidean and there is no feature vector in the standard sense. Instead, the representation of the image data is graphical and each question is associated with a labeled subgraph. The ordering of the questions is highly constrained in order to maintain computational feasibility, and dependence among the trees is reduced by randomly subsampling from the available pool of questions. Experiments are reported on the recognition of handwritten digits. Although the amount ...
Mixtures of Latent Variable Models for Density Estimation and Classification
, 2000
"... This paper deals with the problem of probability density estimation with the goal of finding a good probabilistic representation of the data. One of the most popular density estimation methods is the Gaussian mixture model (GMM). A promising alternative to GMMS are the recently proposed mixtures of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper deals with the problem of probability density estimation with the goal of finding a good probabilistic representation of the data. One of the most popular density estimation methods is the Gaussian mixture model (GMM). A promising alternative to GMMS are the recently proposed mixtures of latent variable models. Examples of the latter are principal component analysis and factor analysis. The advantage of these models is that they are capable of representing the covariance structure with less parameters by choosing the dimension of a subspace in a suitable way. An empirical evaluation on a large number of data sets shows that mixtures of latent variable models almost always outperform various GMMS both in density estimation and Bayes classifiers. To avoid having to choose a value for the dimension of the latent subspace by a computationally expensive search technique such as crossvalidation, a Bayesian treatment of mixtures of latent variable models is proposed. This framework makes it possible to determine the appropriate dimension during training and experiments illustrate its viability.
A distance for partially labeled trees No Author Given
"... Abstract. Trees are a powerful data structure for representing data for which hierarchical relations can be defined. It has been applied in a number of fields like image analysis, natural language processing, protein structure, or music retrieval, to name a few. Procedures for comparing trees are ve ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Trees are a powerful data structure for representing data for which hierarchical relations can be defined. It has been applied in a number of fields like image analysis, natural language processing, protein structure, or music retrieval, to name a few. Procedures for comparing trees are very relevant in many tasks where tree representations are involved. The computation of these measures is usually time consuming and different authors have proposed algorithms that are able to compute them in a reasonable time, by means of approximated versions of the similarity measure. Other methods require that the trees are fully labeled for the distance to be computed. The measure utilized in this paper is able to deal with trees labeled only at the leaves that runs in O(T1  × T2) time. Experiments and comparative results are provided.
Complexity Control Of Image Processing Network Architectures Through Regularization
"... this paper, we will talk only about OCR applications: while our approach is general to image processing, OCR serves as a perfect test bed for new algorithms, because of the extensive litterature and "universal" data bases (the NIST data base) available. and also of how big the learning set ..."
Abstract
 Add to MetaCart
this paper, we will talk only about OCR applications: while our approach is general to image processing, OCR serves as a perfect test bed for new algorithms, because of the extensive litterature and "universal" data bases (the NIST data base) available. and also of how big the learning set will have to be (i.e. how many data will be needed to match the unknown variables). Thus reducing the complexity of a NN is a means to reduce costs, and also, as we just saw, to increase performances: really a very good deal indeed ! As a consequence of this, a lot of work has been devoted, along the years, to designing networks with carefully controlled complexity. In particular, Time Delay Neural NetworksTDNN [24], have been used to great success for OCR [12]: a TDNN is an architecture where complexity is controlled through connectivity (connections are local) and weights (which are shared). This dramatically reduces the TDNN complexity as compared to a fully connected architecture: for example the TDNN in [12] has 100 000 connections but only 2 500 weights, while a fully connected NN with 100 000 connections would have that many weights! Finding the optimal architecture though is very much of an art: there does not seem to exist any systematic approach to design the appropriate pattern of connections, local or shared weights. Basically, one tests, bytrial anderror, various choices for receptive field sizes and overlaps), and finally selects that architecture which yields the best performances on a validation set. This process is, of course, extremely time consuming and one could always fear that the final architecture is not optimal, but only the best one among those tested. This problem is often considered as a major weakness of#the NN approach. We need a principled way to d...
An OnLine EM Algorithm Applied to Kernel PCA
, 2000
"... Kernel principal component analysis (Pca) is a recent method for nonlinear feature extraction. Applying kernel Pca to a data set with N patterns requires storing and nding the eigenvectors of a N N kernel matrix. This paper describes how an ExpectationMaximization (Em) algorithm for standard Pca c ..."
Abstract
 Add to MetaCart
(Show Context)
Kernel principal component analysis (Pca) is a recent method for nonlinear feature extraction. Applying kernel Pca to a data set with N patterns requires storing and nding the eigenvectors of a N N kernel matrix. This paper describes how an ExpectationMaximization (Em) algorithm for standard Pca can be adapted to kernel Pca without having to store the kernel matrix. Experimental results are given where Em for kernel Pca extracts up to 512 nonlinear features from a data set with 15,000 examples. The extracted features lead to good performance when used as preprocessed data for a linear classifier. A novel online Em algorithm for Pca is presented and shown to further speed up the learning process.