Results 1 - 10
of
38
Text Categorization Based on Regularized Linear Classification Methods
- Information Retrieval
, 2000
"... A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document ..."
Abstract
-
Cited by 67 (2 self)
- Add to MetaCart
A number of linear classification methods such as the linear least squares fit (LLSF), logistic regression, and support vector machines (SVM's) have been applied to text categorization problems. These methods share the similarity by finding hyperplanes that approximately separate a class of document vectors from its complement. However, support vector machines are so far considered special in that they have been demonstrated to achieve the state of the art performance. It is therefore worthwhile to understand whether such good performance is unique to the SVM design, or if it can also be achieved by other linear classification methods. In this paper, we compare a number of known linear classification methods as well as some variants in the framework of regularized linear systems. We will discuss the statistical and numerical properties of these algorithms, with a focus on text categorization. We will also provide some numerical experiments to illustrate these algorithms on a number of datasets.
On clustering of fMRI time series
, 1997
"... Introduction. The spatio-temporal fMRI signal is a combination of several interacting components: The locally correlated hemodynamic response, the network of neuronal activations, and global components such as the cardiac cycle, breathing etc. A priori this implies that the signal is correlated in t ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Introduction. The spatio-temporal fMRI signal is a combination of several interacting components: The locally correlated hemodynamic response, the network of neuronal activations, and global components such as the cardiac cycle, breathing etc. A priori this implies that the signal is correlated in time and space, and that these correlations have both short and long range components. Clustering is a classical non-parametric approach to explorative analysis data. By clustering we can group signals according to a given objective function. Clustering of waveforms has already been used in fMRI signal analysis, see e.g. (1). Clustering of stochastic data, however, is hard optimization problem with many potential pitfalls. The "optimal" cluster configuration depends on the particular choice of clustering scheme (e.g. k-means, k-medians, hierachical clustering) examples are legio (2), but just as importantly on the choice of distance metr
A Framework For Computational Anatomy
, 2002
"... The rapid collection of brain images from healthy and diseased subjects has stimulated the development of powerful mathematical algorithms to compare, pool and average brain data across whole populations. Brain structure is so complex and variable that new approaches in computer vision, partial diff ..."
Abstract
-
Cited by 27 (12 self)
- Add to MetaCart
The rapid collection of brain images from healthy and diseased subjects has stimulated the development of powerful mathematical algorithms to compare, pool and average brain data across whole populations. Brain structure is so complex and variable that new approaches in computer vision, partial differential equations, and statistical field theory are being formulated to detect and visualize disease-specific patterns. We present some novel mathematical strategies for computational anatomy, focusing on the creation of population-based brain atlases. These atlases describe how the brain varies with age, gender, genetics, and over time. We review applications in Alzheimer's disease, schizophrenia and brain development, outlining some current challenges in the field.
Classifier technology and the illusion of progress. Statist
- Sci
, 2006
"... Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.
Neural Minimal Distance Methods
- PROC. 3-RD CONF. ON NEURAL NETWORKS AND THEIR APPLICATIONS
, 1997
"... Minimal distance methods are simple and in some circumstances highly accurate. In this paper relations between neural and minimal distance methods are investigated. Neural realization facilitates new versions of minimal distance methods. Parametrization of distance functions, distance-based weighti ..."
Abstract
-
Cited by 13 (12 self)
- Add to MetaCart
Minimal distance methods are simple and in some circumstances highly accurate. In this paper relations between neural and minimal distance methods are investigated. Neural realization facilitates new versions of minimal distance methods. Parametrization of distance functions, distance-based weighting of neighbors, active selection of reference vectors from the training set and relations to the case-based reasoning are discussed.
On Optimal Pairwise Linear Classifiers for Normal Distributions: The Two-Dimensional Case
, 2001
"... Computing linear classifiers is a very important problem in statistical Pattern Recognition (PR). These classifiers have been investigated by the PR community extensively since they are the ones which are both easy to implement and comprehend. It is well known that when dealing with normally distrib ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Computing linear classifiers is a very important problem in statistical Pattern Recognition (PR). These classifiers have been investigated by the PR community extensively since they are the ones which are both easy to implement and comprehend. It is well known that when dealing with normally distributed classes, the optimal discriminant function for two-classes is linear only when the covariance matrices are equal. Other approaches, such as the Fisher's discriminant, the perceptron algorithm, minimum square distance classifiers, etc., have solved this problem by generating a linear classifier in normal and non-normal distributions, but these classifiers are typically suboptimal. In this
Hierarchical Clustering of Self-Organizing Maps for Cloud Classification
- Neurocomputing
, 2000
"... This paper presents a new method for segmenting multispectral satellite images. The proposed method is unsupervised and consists of two steps. During the rst step the pixels of a learning set are summarized by a set of codebook vectors using a Probabilistic Self-Organizing Map (PSOM, [9]) In a secon ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents a new method for segmenting multispectral satellite images. The proposed method is unsupervised and consists of two steps. During the rst step the pixels of a learning set are summarized by a set of codebook vectors using a Probabilistic Self-Organizing Map (PSOM, [9]) In a second step the codebook vectors of the map are clustered using Agglomerative Hierarchical Clustering (AHC, [7]). Each pixel takes the label of its nearest codebook vector. A practical application to Meteosat images illustrates the relevance of our approach.
Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data
, 2005
"... This work addresses a central problem of Proteomics: estimating the amounts of each of the thousands of proteins in a cell culture or tissue sample. Although laboratory methods involving isotopes have been developed for this problem, we seek a simpler approach, one that uses more-straightforward lab ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
This work addresses a central problem of Proteomics: estimating the amounts of each of the thousands of proteins in a cell culture or tissue sample. Although laboratory methods involving isotopes have been developed for this problem, we seek a simpler approach, one that uses more-straightforward laboratory procedures. Specifically, our aim is to use data-mining techniques to infer protein levels from the relatively cheap and abundant data available from high-throughput tandem mass spectrometry (MS/MS). In this thesis, we develop and evaluate several techniques for tackling this problem. Specifically, we develop and evaluate different statistical models of MS/MS data. In addition, to evaluate their biological relevance, we test each method on three real-world datasets generated by MS/MS experiments performed on various tissue samples taken from Mouse.
Towards Formal Structural Representation of Spoken Language: An Evolving Transformation System (ETS) Approach
, 2005
"... Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Speech recognition has been a very active area of research over the past twenty years. Despite an evident progress, it is generally agreed by the practitioners of the field that performance of the current speech recognition systems is rather suboptimal and new ap-proaches are needed. The motivation behind the undertaken research is an observation that the notion of representation of objects and concepts that once was considered to be central in the early days of pattern recognition, has been largely marginalised by the ad-vent of statistical approaches. As a consequence of a predominantly statistical approach to speech recognition problem, due to the numeric, feature vector-based, nature of rep-resentation, the classes inductively discovered from real data using decision-theoretic techniques have little meaning outside the statistical framework. This is because deci-sion surfaces or probability distributions are difficult to analyse linguistically. Because of the later limitation it is doubtful that the gap between speech recognition and lin-guistic research can be bridged by the numeric representations. This thesis investigates an alternative, structural, approach to spoken language representation and categorisa-
A cellular automata approach to detecting interactions among single-nucleotide polymorphisms in complex multifactorial diseases
- Pacific Symposium on Biocomputing 7
, 2002
"... The identification and characterization of susceptibility genes for common complex multifactorial human diseases remains a statistical and computational challenge. Parametric statistical methods such as logistic regression are limited in their ability to identify genes whose effects are dependent so ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The identification and characterization of susceptibility genes for common complex multifactorial human diseases remains a statistical and computational challenge. Parametric statistical methods such as logistic regression are limited in their ability to identify genes whose effects are dependent solely or partially on interactions with other genes and environmental exposures. We introduce cellular automata (CA) as a novel computational approach for identifying combinations of single-nucleotide polymorphisms (SNPs) associated with clinical endpoints. This alternative approach is nonparametric (i.e. no hypothesis about the value of a statistical parameter is made), is model-free (i.e. assumes no particular inheritance model), and is directly applicable to case-control and discordant sib-pair study designs. We demonstrate using simulated data that the approach has good power for identifying high-order nonlinear interactions (i.e. epistasis) among four SNPs in the absence of independent main effects. 1

