Results 1 - 10
of
18
Predictive learning via rule ensembles
, 2005
"... General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predict ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However, their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects. 1. Introduction. Predictive
LASSO-Patternsearch Algorithm with Application to Ophthalmology and Genomic Data
, 2008
"... The LASSO-Patternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
The LASSO-Patternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act
Exploring Interactions in High-Dimensional Genomic Data: An Overview of Logic Regression, with Applications
- Journal of Multivariate Analysis
, 2004
"... with applications ..."
Identification of SNP interactions using logic regression. Biostat
- Biostat
, 2007
"... Interactions of single nucleotide polymorphisms (SNPs) are assumed to be responsible for complex diseases such as sporadic breast cancer. Important goals of studies concerned with such genetic data are thus to identify combinations of SNPs that lead to a higher risk of developing a disease and to me ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Interactions of single nucleotide polymorphisms (SNPs) are assumed to be responsible for complex diseases such as sporadic breast cancer. Important goals of studies concerned with such genetic data are thus to identify combinations of SNPs that lead to a higher risk of developing a disease and to measure the importance of these interactions. There are many approaches based on classification methods such as CART and Random Forests that allow measuring the importance of single variables. But with none of these methods the importance of combinations of variables can be quantified directly. In this paper, we show how logic regression can be employed to identify SNP interactions explanatory for the disease status in a case-control study and propose two measures for quantifying the impor-tance of these interactions for classification. These approaches are 1 then applied, on the one hand, to simulated data sets, and on the other hand, to the SNP data of the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions as-sociated with sporadic breast cancer.
Loss-Based Cross-Validated deletion/substitution/addition algorithms in estimation
- APPLICATIONS IN GENOMICS. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
, 2004
"... ..."
Identifying interacting SNPs using Monte Carlo logic regression. Genetic Epidemiology, 28(2):157–170
, 2005
"... Interactions are frequently at the center of interest in single-nucleotide polymorphism (SNP) association studies. When interacting SNPs are in the same gene or in genes that are close in sequence, such interactions may suggest which haplotypes are associated with a disease. Interactions between unr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Interactions are frequently at the center of interest in single-nucleotide polymorphism (SNP) association studies. When interacting SNPs are in the same gene or in genes that are close in sequence, such interactions may suggest which haplotypes are associated with a disease. Interactions between unrelated SNPs may suggest genetic pathways. Unfortunately, data sets are often still too small to definitively determine whether interactions between SNPs occur. Also, competing sets of interactions could often be of equal interest. Here we propose Monte Carlo logic regression, an exploratory tool that combines Markov chain Monte Carlo and logic regression, an adaptive regression methodology that attempts to construct predictors as Boolean combinations of binary covariates such as SNPs. The goal of Monte Carlo logic regression is to generate a collection of (interactions of) SNPs that may be associated with a disease outcome, and that warrant further investigation. As such, the models that are fitted in the Markov chain are not combined into a single model, as is often done in Bayesian model averaging procedures. Instead, the most frequently occurring patterns in these models are tabulated. The method is applied to a study of heart disease with 779 participants and 89 SNPs. A simulation study is carried out to investigate the performance of the Monte Carlo logic regression approach. Genet. Epidemiol. 28:157–170, 2005.
Detecting High-Order Interactions of Single Nucleotide Polymorphisms Using Genetic Programming
"... Motivation: Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as can-cer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Motivation: Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as can-cer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is ad-ditionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individ-ual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this paper, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS (Genetic Programming for Association Studies) cannot only be used for feature selection, but can also be employed for discrimination. Results: In an application to the genotype data from the GENICA study, an associa-tion study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an applica-tion to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several ten SNPs, but can also be employed to analyze whole-genome data. Availability: Software is available on request from the authors. Contact:
Printed in Great Britain Biostatistics Advance Access published March 23, 2006
"... Haplotype data capture the genetic variation among individuals in a population and among populations. An understanding of this variation and the ancestral history of haplotypes is important in genetic association studies of complex disease. We introduce a method for detecting associations between di ..."
Abstract
- Add to MetaCart
Haplotype data capture the genetic variation among individuals in a population and among populations. An understanding of this variation and the ancestral history of haplotypes is important in genetic association studies of complex disease. We introduce a method for detecting associations between disease and haplotypes in a candidate gene region or candidate block with little or no recombination. A perfect phylogeny demonstrates the evolutionary relationship between single-nucleotide polymorphisms (SNPs) in the haplotype blocks. Our approach extends the logic regression technique of Ruczinski et al. (2003) to a Bayesian framework, and constrains the model space to that of a perfect phylogeny. Environmental factors, as well as their interactions with SNPs, may be incorporated into the regression framework. We demonstrate our method on simulated data from a coalescent model, as well as data from a candidate gene study of sarcoidosis.
Biostatistics (2003), 4,4,pp. 523--538
"... This article presents methods that address this need. We focus on and-or combinations of biomarker results that we call logic rules and present novel definitions for the ROC curve and the area under the curve (AUC) that are applicable to this class of combination tests. Our estimates of the ROC and ..."
Abstract
- Add to MetaCart
This article presents methods that address this need. We focus on and-or combinations of biomarker results that we call logic rules and present novel definitions for the ROC curve and the area under the curve (AUC) that are applicable to this class of combination tests. Our estimates of the ROC and AUC are amenable to statistical inference including comparisons of tests and regression analysis. The methods are applied to data on free and total PSA levels among prostate cancer cases and matched controls enrolled in the Physicians' Health Study

