Results 1  10
of
13
Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data
, 2000
"... Motivation: DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data ..."
Abstract

Cited by 399 (1 self)
 Add to MetaCart
Motivation: DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data using support vector machines (SVMs). This analysis consists of both classification of the tissue samples, and an exploration of the data for mislabeled or questionable tissue results. Results: We demonstrate the method in detail on samples consisting of ovarian cancer tissues, normal ovarian tissues, and other normal tissues. The dataset consists of expression experiment results for 97 802 cDNAs for each tissue. As a result of computational analysis, a tissue sample is discovered and confirmed to be wrongly labeled. Upon correction of this mistake and the removal of an outlier, perfect classification of tissues is achieved, but not with high confidence. We identify and analyse a subset of genes from the ovarian dataset whose expression is highly differentiated between the types of tissues. To show robustness of the SVM method, two previously published datasets from other types of tissues or cells are analysed. The results are comparable to those previously obtained. We show that other machine learning methods also perform comparably to the SVM on many of those datasets. Availability: The SVM software is available at http:// www. cs.columbia.edu/#bgrundy/svm. Contact: booch@cse.ucsc.edu
Ultraconservative Online Algorithms for Multiclass Problems
 Journal of Machine Learning Research
, 2001
"... In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and th ..."
Abstract

Cited by 249 (23 self)
 Add to MetaCart
In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and then sets the predicted label to be the index of the prototype achieving the highest similarity. To design and analyze the learning algorithms in this paper we introduce the notion of ultraconservativeness. Ultraconservative algorithms are algorithms that update only the prototypes attaining similarityscores which are higher than the score of the correct label's prototype. We start by describing a family of additive ultraconservative algorithms where each algorithm in the family updates its prototypes by finding a feasible solution for a set of linear constraints that depend on the instantaneous similarityscores. We then discuss a specific online algorithm that seeks a set of prototypes which have a small norm. The resulting algorithm, which we term MIRA (for Margin Infused Relaxed Algorithm) is ultraconservative as well. We derive mistake bounds for all the algorithms and provide further analysis of MIRA using a generalized notion of the margin for multiclass problems.
Pranking with Ranking
 Advances in Neural Information Processing Systems 14
, 2001
"... We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rankprediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a si ..."
Abstract

Cited by 168 (6 self)
 Add to MetaCart
We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rankprediction rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking.
Smooth Boosting and Learning with Malicious Noise
 Journal of Machine Learning Research
, 2003
"... We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In pa ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In particular, we use the new smooth boosting algorithm to construct malicious noise tolerant versions of the PACmodel pnorm linear threshold learning algorithms described in [23]. The bounds on sample complexity and malicious noise tolerance of these new PAC algorithms closely correspond to known bounds for the online p...
Potentialbased Algorithms in Online Prediction and Game Theory
"... In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasiadditive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the strategies of Hart and M ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasiadditive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the strategies of Hart and MasColell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potentialbased analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general gametheoretic problems. In particular, we describe a notion of generalized regret and show its applications in learning theory.
Function Tagging
"... Function tags are a contextsensitive annotation applied to words and phrases of natural language text, marking their syntactic or semantic role within a larger utterance. As researchers improve results on various other problems in pure natural language processing (e.g partofspeech tagging, parsin ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Function tags are a contextsensitive annotation applied to words and phrases of natural language text, marking their syntactic or semantic role within a larger utterance. As researchers improve results on various other problems in pure natural language processing (e.g partofspeech tagging, parsing), those who work in the more applied NLP elds (e.g. questionanswering, temporal analysis) are seeking more powerful sorts of linguistic annotation as input for their own systems. Hence, function tags. In the rst part of the thesis, I present the problem of function tagging: why it is an interesting problem, who has worked on similar thing, and what exactly I intend to do. I brie y review the function tags of the Penn treebank, and explain the speci c metrics by which I will evaluate my work. In the second part of the thesis, I introduce the many features that I will use to train a function tagging system, and then I present some systems that make use of them: one using feature trees, one using decision trees (brie y), and one using perceptron models. For each system, I give a brief historical perspective, an
On Kernels, Margins, and Lowdimensional Mappings
"... Kernel functions are typically viewed as providing an implicit mapping of points into a highdimensional space, with the ability to gain much of the power of that space without incurring a high computational cost. However, the JohnsonLindenstrauss lemma suggests that in the presence of a large ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Kernel functions are typically viewed as providing an implicit mapping of points into a highdimensional space, with the ability to gain much of the power of that space without incurring a high computational cost. However, the JohnsonLindenstrauss lemma suggests that in the presence of a large margin, a kernel function can instead be viewed as a mapping to a lowdimensional space, one of dimension only ~ O(1= ), where is the value of the margin. In this paper, we explore the question of whether one can eciently compute such mappings, using only blackbox access to a kernel function. We answer this question in the armative if our method is also allowed blackbox access to the underlying distribution (i.e., unlabeled examples). We also give a lower bound, showing this is not possible for an arbitrary blackbox kernel function, if we do not have access to the distribution. We leave open the question of whether such mappings can be found eciently without access to the distribution for standard kernel functions such as the polynomial kernel.
Learning to Recognize 3D Objects
, 2000
"... A learning account for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability. The key assumption underlying this work is that objects can be recognized (or, discriminated) using simple representations in terms of \syntactically" simpl ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
A learning account for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability. The key assumption underlying this work is that objects can be recognized (or, discriminated) using simple representations in terms of \syntactically" simple relations over the raw image. Although the potential number of these simple relations could be huge, only a few of them are actually present in each observed image and a fairly small number of those observed is relevant to discriminating an object. We show that these properties can be exploited to yield an ecient learning approach in terms of sample and computational complexity, within the PAC model. No assumptions are needed on the distribution of the observed objects and the learning performance is quantied relative to its past experience. Most importantly, the success of learning an object representation is naturally tied to the ability to represent it as a function of some in...
On the Generalisation of Soft Margin Algorithms
 IEEE Transactions on Information Theory
, 2000
"... Generalisation bounds depending on the margin of a classier are a relatively recent development. They provide an explanation of the performance of stateoftheart learning systems such as Support Vector Machines (SVM) [12] and Adaboost [24]. The diculty with these bounds has been either their lack ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Generalisation bounds depending on the margin of a classier are a relatively recent development. They provide an explanation of the performance of stateoftheart learning systems such as Support Vector Machines (SVM) [12] and Adaboost [24]. The diculty with these bounds has been either their lack of robustness or their looseness. The question of whether the generalisation of a classier can be more tightly bounded in terms of a robust measure of the distribution of margin values has remained open for some time. The paper answers this open question in the armative and furthermore the analysis leads to bounds that motivate the previously heuristic soft margin SVM algorithms as well as justifying the use of the quadratic loss in neural network training algorithms. The results are extended to give bounds for the probability of failing to achieve a target accuracy in regression prediction, with a statistical analysis of Ridge Regression and Gaussian Processes as a special case. The analysis presented in the paper has also lead to new boosting algorithms described elsewhere [7].
Learning in Natural Language: Theory and Algorithmic Approaches
, 2000
"... This article summarizes work on developing a learning theory account for the major learning and statistics based approaches used in natural language processing. It shows that these approaches can all be explained using a single distribution free inductive principle related to the pac model of learni ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This article summarizes work on developing a learning theory account for the major learning and statistics based approaches used in natural language processing. It shows that these approaches can all be explained using a single distribution free inductive principle related to the pac model of learning. Furthermore, they all make predictions using the same simple knowledge representation  a linear representation over a common feature space. This is significant both to explaining the generalization and robustness properties of these methods and to understanding how these methods might be extended to learn from more structured, knowledge intensive examples, as part of a learning centered approach to higher level natural language inferences.