• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

TP: Comparison of discrimination methods for the classification of tumors using gene expression data (0)

by S Dudoit, J Fridlyand, Speed
Venue:J Am Stat Assoc
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 222
Next 10 →

Multicategory Support Vector Machines, theory, and application to the classification of microarray data and satellite radiance data

by Yoonkyung Lee, Yi Lin, Grace Wahba - Journal of the American Statistical Association , 2004
"... Two-category support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We pro ..."
Abstract - Cited by 117 (10 self) - Add to MetaCart
Two-category support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We propose the multicategory support vector machine (MSVM), which extends the binary SVM to the multicategory case and has good theoretical properties. The proposed method provides a unifying framework when there are either equal or unequal misclassi � cation costs. As a tuning criterion for the MSVM, an approximate leave-one-out cross-validation function, called Generalized Approximate Cross Validation, is derived, analogous to the binary case. The effectiveness of the MSVM is demonstrated through the applications to cancer classi � cation using microarray data and cloud classi � cation with satellite radiance pro � les.

Applications of Resampling Methods to Estimate the Number of Clusters and to Improve the Accuracy of a Clustering Method

by Jane Fridlyand, Sandrine Dudoit , 2001
"... The burgeoning field of genomics, and in particular microarray experiments, have revived interest in both discriminant and cluster analysis, by raising new methodological and computational challenges. The present paper discusses applications of resampling methods to problems in cluster analysis. A r ..."
Abstract - Cited by 112 (0 self) - Add to MetaCart
The burgeoning field of genomics, and in particular microarray experiments, have revived interest in both discriminant and cluster analysis, by raising new methodological and computational challenges. The present paper discusses applications of resampling methods to problems in cluster analysis. A resampling method, known as bagging in discriminant analysis, is applied to increase clustering accuracy and to assess the confidence of cluster assignments for individual observations. A novel prediction-based resampling method is also proposed to estimate the number of clusters, if any, in a dataset. The performance of the proposed and existing methods are compared using simulated data and gene expression data from four recently published cancer microarray studies.

Feature selection for high-dimensional genomic microarray data

by Eric P. Xing, Michael I. Jordan, Richard M. Karp - In Proceedings of the Eighteenth International Conference on Machine Learning , 2001
"... We report on the successful application of feature selection methods to a classification problem in molecular biology involving only 72 data points in a 7130 dimensional space. Our approach is a hybrid of filter and wrapper approaches to feature selection. We make use of a sequence of simple filters ..."
Abstract - Cited by 95 (2 self) - Add to MetaCart
We report on the successful application of feature selection methods to a classification problem in molecular biology involving only 72 data points in a 7130 dimensional space. Our approach is a hybrid of filter and wrapper approaches to feature selection. We make use of a sequence of simple filters, culminating in Koller and Sahami’s (1996) Markov Blanket filter, to decide on particular feature subsets for each subset cardinality. We compare between the resulting subset cardinalities using cross validation. The paper also investigates regularization methods as an alternative to feature selection, showing that feature selection methods are preferable in this problem. 1.

BagBoosting for tumor classification with gene expression data

by Marcel Dettling - Bioinformatics , 2004
"... Motivation: Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection ..."
Abstract - Cited by 79 (1 self) - Add to MetaCart
Motivation: Microarray experiments are expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. They create a need for class prediction tools, which can deal with a large number of highly correlated input variables, perform feature selection and provide class probability estimates that serve as a quantification of the predictive uncertainty. A very promising solution is to combine the two ensemble schemes bagging and boosting to a novel algorithm called BagBoosting. Results: When bagging is used as a module in boosting, the resulting classifier consistently improves the predictive performance and the probability estimates of both bagging and boosting on real and simulated gene expression data. This quasi-guaranteed improvement can be obtained by simply making a bigger computing effort. The advantageous predictive potential is also confirmed by comparing BagBoosting to several established class prediction tools for microarray data.

Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data

by Y. Lee, C.-K. Lee, Yoonkyung Lee, Yoonkyung Lee, Cheol-koo Lee, Cheol-koo Lee - Journal of the American Statistical Association , 2002
"... Monitoring gene expression profiles is a novel approach in cancer diagnosis. Several studies showed that prediction of cancer types using gene expression data is promising and very informative. The Support Vector Machine (SVM) is one of the classification methods successfully applied to the cancer d ..."
Abstract - Cited by 68 (4 self) - Add to MetaCart
Monitoring gene expression profiles is a novel approach in cancer diagnosis. Several studies showed that prediction of cancer types using gene expression data is promising and very informative. The Support Vector Machine (SVM) is one of the classification methods successfully applied to the cancer diagnosis problems using gene expression data. However, its optimal extension to more than two classes was not obvious, which might impose limitations in its application to multiple tumor types. In this paper, we analyze a couple of published multiple cancer types data sets by the multicategory SVM, which is a recently proposed extension of the binary SVM.

Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data

by Baolin Wu, Tom Abbott, David Fishman, Walter Mcmurray, Gil Mor, Kathryn Stone, David Ward, Kenneth Williams, Hongyu Zhao - Bioinformatics , 2003
"... ..."
Abstract - Cited by 47 (2 self) - Add to MetaCart
Abstract not found

CLICK and EXPANDER: a system for clustering and visualizing gene expression data

by Roded Sharan, Adi Maron-Katz, Ron Shamir - Bioinformatics , 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract - Cited by 42 (6 self) - Add to MetaCart
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.

Statistical Issues in cDNA Microarray Data Analysis

by Gordon K. Smyth, Yee Hwa Yang, Terry Speed , 2003
"... This article summarizes some of the issues involved and provides a brief review of the analysis tools which are available to researchers to deal with them. Any microarray experiment involves a number of distinct stages. Firstly there is the design of the experiment. The researchers must decide which ..."
Abstract - Cited by 40 (2 self) - Add to MetaCart
This article summarizes some of the issues involved and provides a brief review of the analysis tools which are available to researchers to deal with them. Any microarray experiment involves a number of distinct stages. Firstly there is the design of the experiment. The researchers must decide which genes are to be printed on the arrays, which sources of RNA are to be hybridized to the arrays and on how many arrays the hybridizations will be replicated. Secondly, after hybridization, there follows a number of data-cleaning steps or `low-level analysis' of the microarray data. The microarray images must be processed to acquire red and green foreground and background intensities for each spot. The acquired red/green ratios must be normalized to adjust for dye-bias and for any systematic variation other than that due to the differences between the RNA samples being studied. Thirdly, the normalized ratios are analyzed by various graphical and numerical means to select differentially expressed genes or to find groups of genes whose expression profiles can reliably classify the different RNA sources into meaningful groups. The sections of this article correspond roughly to the various analysis steps. The following notation will be used throughout the article. The foreground red and green intensities will be written Pp and 9p for each spot. The background intensities will be Pf and 9f . The background-corrected intensities will be P and 9 where usually P Pp Pf 0 # and 9 9p 9f 0 # . The log-differential expression ratio will be vyq # E P 9 0 for each spot. Finally, the log-intensity of the spot will be vyq 3 P9 0 , a measure of the overall brightness of the spot. (The letter E is a mnemonic for minus as vyq vyq E P 9 0 # while 3 is a mnemonic for add as #vyq vyq #...

Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma

by Gavin J. Gordon, Roderick V. Jensen, Li-li Hsiao, Steven R. Gullans, Joshua E. Blumenstock, Sridhar Ramaswamy, William G. Richards, David J. Sugarbaker, Raphael Bueno - Cancer Res , 2002
"... The pathological distinction between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA) of the lung can be cumbersome using established methods. We propose that a simple technique, based on the expression levels of a small number of genes, can be useful in the early and accurate diagnosi ..."
Abstract - Cited by 35 (1 self) - Add to MetaCart
The pathological distinction between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA) of the lung can be cumbersome using established methods. We propose that a simple technique, based on the expression levels of a small number of genes, can be useful in the early and accurate diagnosis of MPM and lung cancer. This method is designed to accurately distinguish between genetically disparate tissues using gene expression ratios and rationally chosen thresholds. Here we have tested the fidelity of ratio-based diagnosis in differentiating between MPM and lung cancer in 181 tissue samples (31 MPM and 150 ADCA). A training set of 32 samples (16 MPM and 16 ADCA) was used to identify pairs of genes with highly significant, inversely correlated expression levels to form a total of 15 diagnostic ratios using expression profiling data. Any single ratio of the 15 examined was at least 90 % accurate in predicting diagnosis for the remaining 149 samples (e.g., test set). We then examined (in the test

Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems

by Jieping Ye, Bin Yu - Journal of Machine Learning Research , 2005
"... A generalized discriminant analysis based on a new optimization criterion is presented. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) when the scatter matrices are singular. An efficient algorithm for the new optimization problem is presented. Th ..."
Abstract - Cited by 31 (10 self) - Add to MetaCart
A generalized discriminant analysis based on a new optimization criterion is presented. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) when the scatter matrices are singular. An efficient algorithm for the new optimization problem is presented. The solutions to the proposed criterion form a family of algorithms for generalized LDA, which can be characterized in a closed form. We study two specific algorithms, namely Uncorrelated LDA (ULDA) and Orthogonal LDA (OLDA). ULDA was previously proposed for feature extraction and dimension reduction, whereas OLDA is a novel algorithm proposed in this paper. The features in the reduced space of ULDA are uncorrelated, while the discriminant vectors of OLDA are orthogonal to each other. We have conducted a comparative study on a variety of real-world data sets to evaluate ULDA and OLDA in terms of classification accuracy.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University