## Challenges in the analysis of mass-throughput data: A technical commentary from the statistical machine learning perspective (2006)

Venue: | CANCER INFORMATICS |

Citations: | 1 - 0 self |

### BibTeX

@ARTICLE{Aliferis06challengesin,

author = {Constantin F. Aliferis and Alexander Statnikov and Ioannis Tsamardinos},

title = {Challenges in the analysis of mass-throughput data: A technical commentary from the statistical machine learning perspective},

journal = {CANCER INFORMATICS},

year = {2006},

volume = {2},

pages = {133--162}

}

### OpenURL

### Abstract

Sound data analysis is critical to the success of modern molecular medicine research that involves collection and interpretation of mass-throughput data. The novel nature and high-dimensionality in such datasets pose a series of nontrivial data analysis problems. This technical commentary discusses the problems of over-fi tting, error estimation, curse of dimensionality, causal versus predictive modeling, integration of heterogeneous types of data, and lack of standard protocols for data analysis. We attempt to shed light on the nature and causes of these problems and to outline viable methodological approaches to overcome them.

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ation error based on the training error. Such bounds are provided by Statistical Theory for statistical predictive models and by Computational Learning Theory for machine learning predictive models ([=-=Vapnik, 1998-=-], [Anthony, 1992], [Kearns, 1994], and [Herbrich, 2002]). Typically, statistical methods attempt to estimate the probability distribution of the data. The distribution estimate can then be used to cr... |

7493 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...misleading results [Spirtes 2000]. Fortunately, it was shown, relatively recently (1980’s), that it is possible to soundly infer causal relations from observational data in many practical cases (see [=-=Pearl, 1988-=-], [Spirtes, 2000], and [Glymour, 1999]). Since then, algorithms that infer such causal relations have been developed that can greatly reduce the number of experiments required to discover the causal ... |

5369 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...ng methods such as Neural networks can be confi gured to exploit dimensionality reduction implicitly (eg, by using fewer hidden units than input units in multi-layered feed-forward NN architectures) [=-=Bishop, 1995-=-]. Forward and backward model selection (which is a special form of “wrapper”-style feature selection) procedures have been criticized unfavorably to PCA by some authors on the basis of several accoun... |

3389 | An Introduction to the Bootstrap - Efron, Tibshirani - 1993 |

2157 | Cluster analysis and display of genome-wide expression patterns - Eisen, Spellman, et al. - 1998 |

1271 |
Causality: Models, reasoning, and inference
- Pearl
- 2000
(Show Context)
Citation Context ... requirement (the “Causal Markov Condition”) that relates causal graph structure to independencies among variables in the data distribution modeled by the Bayesian Network (for technical details see [=-=Pearl, 2000-=-]) . Interpretation of causation in such graphs is intuitive: a variable in the graph causes another variable directly if and only if there is an edge from the former to the latter (provided the graph... |

616 | An Introduction to Computational Learning Theory - Kearns, Vazirani - 1994 |

582 | Bayesian interpolation - MacKay - 1991 |

540 | A tutorial on support vector regression - Smola, Schölkopf - 2004 |

517 |
Minimum information about a microarray experiment (MIAME)-toward standards for microarray data
- Brazma
- 2001
(Show Context)
Citation Context ...S [Fananapazir, 2005] that we discuss below. While considerable efforts have been made to develop data storage and dissemination standards (eg, the MIAME standard for microarray gene expression data [=-=Brazma, 2001-=-] and the Human Proteome Organization’s PSI for mass spectrometry data [Orchard, 2004], [Hermjakob, 2004]) there is limited agreement as to which are the best methods to process, analyze, and interpre... |

480 | A practical guide to support vector classification, 2007. Paper available at http://www.csie.ntu.edu.tw/∼cjlin/papers/guide/guide.pdf - Hsu, Chang, et al. |

380 | Computer systems that learn - Weiss, Kulikowski - 1991 |

340 | A scaled conjugate gradient algorithm for fast supervised learning - Møller - 1993 |

286 | Large margin DAGs for multiclass classification - PLATT, CHRISTIANINI, et al. - 2000 |

261 |
Estimating the error rate of a prediction rule: Improvement on cross-validation
- Efron
- 1983
(Show Context)
Citation Context ...ontrary to cross-validation, bootstrap estimators require sampling with replacement of a large number of bootstrap samples; several methods exist to estimate the generalization error thereafter (see [=-=Efron, 1983-=-], [Efron, 1994], and [Efron, 1997]). While bootstrap methods have lower variance of Challenges in the Analysis of Mass-Throughput Data the predictions, their computational cost is typically high and ... |

243 | Selection bias in gene extraction on the basis of microarray gene-expression data - Ambroise, McLachlan |

241 | Function minimization by conjugate gradients - Fletcher, Reeves - 1964 |

207 | On bias, variance 0/1 loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery
- Friedman
- 1997
(Show Context)
Citation Context ...ecifi c classifi er is capable of producing (variance) (L-A), the error of the latter model relative to the optimal model (bias) (O-L), and the error of the optimal model O (see [Domingos, 2000] and [=-=Friedman, 1997-=-] for mathematical details and examples of analyses of specifi c learners). 138 Outcome of Interest Y Bias: Average error of approximating line with line Variance: Average error of approximating line ... |

189 | Learning Bayesian network structure from massive datasets: The 'sparse candidate' algorithm - Friedman, Nachman, et al. - 1999 |

176 | On the learnability and design of output codes for multiclass problems - Crammer, Singer - 2000 |

168 |
Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis (pp
- Harrell
- 2001
(Show Context)
Citation Context ...selection) procedures have been criticized unfavorably to PCA by some authors on the basis of several accounts the most important of which is overfi tting the data (for example see the discussion in [=-=Harrell, 2001-=-]). Indeed if extensive model selection and fi tting are performed at the same time (as Cancer Informatics 2006: 2sChallenges in the Analysis of Mass-Throughput Data Figure 9. Reduction of dimensional... |

156 | Support vector machines for multi-class pattern recognition - WESTON, WATKINS - 1999 |

140 | Learning Bayesian Networks is NP-Hard - Chickering, Geiger, et al. - 1994 |

136 |
Categorical Data Analysis, 2nd ed
- Agresti
- 2002
(Show Context)
Citation Context ...the interaction effects (see Figure 13). In predictor spaces with 10,000 predictors for example even 2 nd order effects become impractical to be examined exhaustively in tractable computational time [=-=Agresti, 2002-=-]. (d) Irrelevant of superfl uous dimensions may severely affect methods that calculate distances in the input space It is easy to see for example why the k-Nearest Neighbors as a learning method is s... |

136 | Comparing support vector machines with gaussian kernels to radial basis function classiers - Scholkopf, Sung, et al. - 1997 |

128 | A multigene assay to predict recurrence of tamoxifentreated, node-negative breast cancer - Paik, Shak, et al. - 2004 |

125 | Large-scale bayesian logistic regression for text categorization - GENKIN, LEWIS, et al. - 2006 |

119 | Improvements on cross-validation: The .632+ bootstrap method - Efron, Tibshirani - 1997 |

117 | Restart procedures for the conjugate gradient method - Powell - 1977 |

113 | The HUPO PSIs molecular interaction format – a community standard for the representation of protein interaction data - Hermjakob - 2004 |

100 | Is cross-validation valid for small-sample microarray classification - Braga-Neto, Dougherty - 2004 |

96 | Kernel logistic regression and the import vector machine - ZHU, HASTIE - 2002 |

95 | A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis - Statnikov, Aliferis, et al. - 2005 |

87 | Variable selection using svm-based criteria
- Rakotomamonjy
- 2003
(Show Context)
Citation Context ...e cases be sensitive to the curse of dimensionality (see [Hastie 2001] for an example) and can benefi t by additional feature selection to their regularization (see [Hardin 2004], [Guyon, 2002], and [=-=Rakotomamonjy, 2003-=-]). (d) Heuristic methods Various techniques from the fi elds of Operations Research and Non-Linear Optimization have been applied to neural network optimization as alternatives to the standard gradie... |

86 | Tutorial on Practical Prediction Theory for Classification
- Langford
- 2005
(Show Context)
Citation Context ...or delta. Unlike bias-variance decomposition, COLT bounds are independent of the learning task. From the large fi eld of COLT we suggest [Anthony, 1992] and [Kearns, 1994] and the recent tutorial by [=-=Langford, 2005-=-] as accessible introductions. It is worth noting one interesting theoretical concept from COLT that is pertinent to over-fi tting. The VC (Vapnik-Chervonenkis) dimension (not to be confused with the ... |

66 | Pitfalls in the Use of DNA Microarray Data for Diagnostic and Prognostic Classification - Simon, Radmacher, et al. - 2003 |

60 | Overfitting in making comparisons between variable selection methods
- Reunanen
- 2003
(Show Context)
Citation Context ...stimates. However, as long as the stepwise procedure is conducted separately from model fi tting and error estimation the fi nal model and its error estimates (respectively) will not be over-fi tted [=-=Reunanen, 2003-=-]. (e) Penalizing complexity/parameters One method for avoiding over-fitting without resorting to low-performance high-bias classifi ers is to design and apply learners that trade-off complexity for “... |

56 | Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry - Listgarten, Emili - 2005 |

50 | Gauss-Newton approximation to Bayesian regularization - Foresee, Hagan - 1997 |

46 | HITON, a novel Markov blanket algorithm for optimal variable selection - Aliferis, Tsamardinos, et al. - 2003 |

44 |
Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations
- Diamandis
- 2004
(Show Context)
Citation Context ...idization, and newer variants such as tiled expression arrays, high-resolution mass spectrometry, liquid chromatography-mass spectrometry and others. ([Fortina, 2002], [Valle, 2004], [Khoury, 2003], [=-=Diamandis, 2004-=-], and [Petricoin, 2002]). These methods are capable of producing genotypic, transcriptional, proteomic, and other measurements about cellular function on a massive scale. In the near future such tech... |

44 |
Strong completeness and faithfulness in Bayesian networks
- Meek
- 1995
(Show Context)
Citation Context ...red by feature selection and SVMs as explained earlier) is that under the condition of faithful distributions (which is the vast majority of distributions with very few known deviations known so far [=-=Meek, 1995-=-]), universal approximator learners (such as SVMs), and standard loss functions (ie, mean squared error, area under the ROC curve) the Markov Blanket predictors are guaranteed to be the smallest optim... |

43 | Towards principled feature selection: Relevancy, filters and wrappers - Tsamardinos, Aliferis - 2003 |

41 | Ziding Feng. A data-analytic strategy for protein biomarker discovery: pro of high-dimensional proteomic data for cancer detection - Yasui, Pepe, et al. |

40 | Causation, Prediction and Search. 2nd edition - Spirtes, Glymour, et al. - 2001 |

38 | Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostrate cancer from noncancer patients - Qu, Adam, et al. - 2002 |

37 |
A Derivation of Conjugate Gradients
- Beale
- 1972
(Show Context)
Citation Context ...esearch and Non-Linear Optimization have been applied to neural network optimization as alternatives to the standard gradient descent optimization (for example, see [Fletcher, 1964], [Powell, 1997], [=-=Beale, 1972-=-], [Moller, 1993], [MacCay, 1992], and [Foresee, 1997]). Also weight decay, momentum, random restarts and other heuristic and quasi-heuristic methods have been devised to Cancer Informatics 2006: 2sau... |

35 | Making logistic regression a core data mining tool with tr-irls - Komarek, Moore - 2005 |

33 | Algorithms for large scale Markov blanket discovery - Tsamardinos, Aliferis, et al. |

32 | Computational Learning Theory: An Introduction. Cambridge Tracts - Anthony, Biggs - 1992 |