Results 1 - 10
of
40
Feature selection: Evaluation, application, and small sample performance
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1997
"... Abstract—A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al., dominates the other algorithms tested. We study the problem of choosing an optimal feature s ..."
Abstract
-
Cited by 238 (9 self)
- Add to MetaCart
Abstract—A large number of algorithms have been proposed for feature subset selection. Our experimental results show that the sequential forward floating selection (SFFS) algorithm, proposed by Pudil et al., dominates the other algorithms tested. We study the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models. Pooling features derived from different texture models, followed by a feature selection results in a substantial improvement in the classification accuracy. We also illustrate the dangers of using feature selection in small sample size situations. Index Terms—Feature selection, curse of dimensionality, genetic algorithm, node pruning, texture models, SAR image classification. 1
Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions
- J. MOL. BIOL
, 1997
"... We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the ..."
Abstract
-
Cited by 190 (62 self)
- Add to MetaCart
We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the first two terms in a series expansion for the residue probability distributions in the protein database; the decoupling of the distance and environment dependencies of the distributions resolves the major problems with current database-derived scoring functions noted by Thomas and Dill. The simulated annealing procedure rapidly and frequently generates native-like structures for small helical proteins and better than random structures for small b sheet containing proteins. Most of the simulated structures have native-like solvent accessibility and secondary structure patterns, and thus ensembles of these structures provide a particularly challenging set of decoys for evaluating scoring functions. We investigate the effects of multiple sequence information and different types of conformational constraints on the overall performance of the method, and the ability of a variety of recently developed scoring functions to recognize the native-like conformations in the ensembles of simulated structures.
Multiple Resolution Segmentation of Textured Images
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1991
"... This paper presents a multiple resolution algorithm for segmenting images into regions with differing statistical behavior. In addition, an algorithm is developed for determining the number of statistically distinct regions in an image and estimating the parameters of those regions. Both algorithms ..."
Abstract
-
Cited by 102 (7 self)
- Add to MetaCart
This paper presents a multiple resolution algorithm for segmenting images into regions with differing statistical behavior. In addition, an algorithm is developed for determining the number of statistically distinct regions in an image and estimating the parameters of those regions. Both algorithms use a causal Gaussian autoregressive (AR) model to describe the mean, variance and spatial correlation of the image textures. Together the algorithms may be used to perform unsupervised texture segmentation. The multiple resolution segmentation algorithm first segments images at coarse resolution and then progresses to finer resolutions until individual pixels are classified. This method results in accurate segmentations and requires significantly less computation than some previously known methods. The field containing the classification of each pixel in the image is modeled as a Markov random field (MRF). Segmentation at each resolution is then performed by maximizing the a posteriori prob...
Prediction of local structure in proteins using a library of sequence-structure motifs
- J. MOL. BIOL
, 1998
"... ..."
A PDE-based Level-Set Approach for Detection and Tracking of Moving Objects
, 1997
"... This papers presents a framework for detecting and tracking moving objects in a sequence of images. Using a statistical approach, where the inter-frame di#erence is modeled by a mixture of two Laplacian or Gaussian distributions, and an energy minimization based approach, we reformulate the motion d ..."
Abstract
-
Cited by 50 (13 self)
- Add to MetaCart
This papers presents a framework for detecting and tracking moving objects in a sequence of images. Using a statistical approach, where the inter-frame di#erence is modeled by a mixture of two Laplacian or Gaussian distributions, and an energy minimization based approach, we reformulate the motion detection and tracking problem as a front propagation problem. The Euler-Lagrange equation of the designed energy functional is #rst derived and the #ow minimizing the energy is then obtained. Following the work by Caselles et al [11] and Malladi et al [23, 24] the contours to be detected and tracked are modeled as geodesic active contours evolving toward the minimum of the designed energy, under the in#uence of internal and external image dependent forces. Using the level set formulation scheme of Osher and Sethian [29], complex curves can be detected and tracked and topological changes for the evolving curves are naturally managed. To reduce the computational cost required by a direct implementation of the formulation scheme of Osher and Sethian [29], a new approach exploiting aspects from the classical Narrow Band [3] and Fast Marching [33] methods is proposed and favorably compared to them. In order to further reduce the CPU time, a multi-scale approach has also been considered. Very promising experimental results are provided using real video sequences.
Statistically Efficient Estimation Using Population Coding
, 1998
"... Coarse codes are widely used throughout the brain to encode sensory and motor variables. Methods designed to interpret these codes, such as population vector analysis, are either inefficient (the variance of the estimate is much larger than the smallest possible variance) or biologically implausible ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
Coarse codes are widely used throughout the brain to encode sensory and motor variables. Methods designed to interpret these codes, such as population vector analysis, are either inefficient (the variance of the estimate is much larger than the smallest possible variance) or biologically implausible, like maximum likelihood. Moreover, these methods attempt to compute a scalar or vector estimate of the encoded variable. Neurons are faced with a similar estimation problem. They must read out the responses of the presynaptic neurons, but, by contrast, they typically encode the variable with a further population code rather than as a scalar. We show how a nonlinear recurrent network can be used to perform estimation in a near-optimal way while keeping the estimate in a coarse code format. This work suggests that lateral connections in the cortex may be involved in cleaning up uncorrelated noise among neurons representing similar variables.
Resource-Aware Distributed Stream Management using Dynamic Overlays
- In Proc. of 25th IEEE International Conference on Distributed Computing Systems (ICDCS-2005
, 2005
"... We consider distributed applications that continuously stream data across the network, where data needs to be aggregated and processed to produce a 'useful ' stream of updates. Centralized approaches to performing data aggregation suffer from high communication overheads, lack of scalability, and un ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
We consider distributed applications that continuously stream data across the network, where data needs to be aggregated and processed to produce a 'useful ' stream of updates. Centralized approaches to performing data aggregation suffer from high communication overheads, lack of scalability, and unpredictably high processing workloads at central servers. This paper describes a scalable and efficient solution to distributed stream management based on (1) resource-awareness, which is middleware-level knowledge of underlying network and processing resources, (2) overlay-based in-network data aggregation, and (3) high-level programming constructs to describe data-flow graphs for composing useful streams. Technical contributions include a novel algorithm based on resource-aware network partitioning to support dynamic deployment of dataflow graph components across the network, where efficiency of the deployed overlay is maintained by making use of partition-level resource-awareness. Contributions also include efficient middleware-based support for component deployment, utilizing runtime code generation rather than interpretation techniques, thereby addressing both high performance and resource-constrained applications. Finally, simulation experiments and benchmarks attained with actual operational data corroborate this paper's claims. 1.
Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum
, 2003
"... We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorptio ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
We have developed an algorithm called Q5 for probabilistic classification of healthy versus disease whole serum samples using mass spectrometry. The algorithm employs principal components analysis (PCA) followed by linear discriminant analysis (LDA) on whole spectrum surface-enhanced laser desorption/ionization time of flight (SELDI-TOF) mass spectrometry (MS) data and is demonstrated on four real datasets from complete, complex SELDI spectra of human blood serum. Q5 is a closed-form, exact solution to the problem of classification of complete mass spectra of a complex protein mixture. Q5 employs a probabilistic classification algorithm built upon a dimension-reduced linear discriminant analysis. Our solution is computationally efficient; it is noniterative and computes the optimal linear discriminant using closed-form equations. The optimal discriminant is computed and verified for datasets of complete, complex SELDI spectra of human blood serum. Replicate experiments of different training/testing splits of each dataset are employed to verify robustness of the algorithm. The probabilistic classification method achieves excellent performance. We achieve sensitivity, specificity, and positive predictive values above 97 % on three ovarian cancer datasets and one prostate cancer dataset. The Q5 method outperforms previous full-spectrum complex sample spectral classification techniques and can provide clues as to the molecular identities of differentially expressed proteins and peptides.
Bayesian Estimation Methods For N-Gram Language Model Adaptation
- In Proceedings of International Conference on Spoken Language Processing
, 1996
"... Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models providemany computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models providemany computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user's language. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution tothis problem is provided by the Bayesian learning framework. Beyond the classical estimates, a Bayes derived interpolation model is proposed. Empirical comparisons have been carried out on a 10,000-word radiological reporting domain. Results are provided in terms of perplexity and recognition accuracy.

