Results 1  10
of
20
Supplement to “Efficient computation with a linear mixed model on largescale data sets with applications to genetic studies.” DOI:10.1214/12AOAS586SUPP
, 2013
"... ar ..."
Efficient Algorithms for Multivariate Linear Mixed Models in Genomewide Association Studies
"... Multivariate linear mixed models (mvLMMs) have been widely used in many areas of genetics, and have attracted considerable recent interest in genomewide association studies (GWASs). However, existing methods for calculating the likelihood ratio test statistics in mvLMMs are time consuming, and, wi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Multivariate linear mixed models (mvLMMs) have been widely used in many areas of genetics, and have attracted considerable recent interest in genomewide association studies (GWASs). However, existing methods for calculating the likelihood ratio test statistics in mvLMMs are time consuming, and, without approximations, cannot be directly applied to analyze even two traits jointly in a typicalsize GWAS. Here, we present a novel algorithm for computing parameter estimates and test statistics (Likelihood ratio and Wald) in mvLMMs that i) reduces periteration optimization complexity from cubic to linear in the number of samples; and ii) in GWAS analyses, reduces permarker complexity from cubic to approximately quadratic (or linear if the relatedness matrix is of low rank) in the number of samples. The new method effectively generalizes both the EMMA (Efficient Mixed Model Association) algorithm and the GEMMA (Genomewide EMMA) algorithm to the multivariate case, making the likelihood ratio tests in GWASs with mvLMM possible, for the first time, for tens of thousands of samples and a moderate number of phenotypes (< 10). With real examples, we show that, as expected, the new method is orders of magnitude faster than competing methods in both variance component estimation in a single mvLMM, and in GWAS applications. The method is implemented in the GEMMA software package, freely available at
REFINING GENETICALLY INFERRED RELATIONSHIPS USING TREELET COVARIANCE SMOOTHING 1
, 2012
"... Recent technological advances coupled with large sample sets have uncovered many factors underlying the genetic basis of traits and the predisposition to complex disease, but much is left to discover. A common thread to most genetic investigations is familial relationships. Close relatives can be id ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Recent technological advances coupled with large sample sets have uncovered many factors underlying the genetic basis of traits and the predisposition to complex disease, but much is left to discover. A common thread to most genetic investigations is familial relationships. Close relatives can be identified from family records, and more distant relatives can be inferred from large panels of genetic markers. Unfortunately these empirical estimates can be noisy, especially regarding distant relatives. We propose a new method for denoising genetically—inferred relationship matrices by exploiting the underlying structure due to hierarchical groupings of correlated individuals. The approach, which we call Treelet Covariance Smoothing, employs a multiscale decomposition of covariance matrices to improve estimates of pairwise relationships. On both simulated and real data, we show that smoothing leads to better estimates of the relatedness amongst distantly related individuals. We illustrate our method with a large genomewide association study and estimate the “heritability ” of body mass index quite accurately. Traditionally
Predictive Accuracy: a Function of Genetic Distance
"... How well conclusions generalise to different populations than that used in a statistical analysis is a key question in any genomewide association study (GWAS) or genomic selection (GS) study. In addition to the influence of exogenous factors, an important consideration is how far a target populati ..."
Abstract
 Add to MetaCart
(Show Context)
How well conclusions generalise to different populations than that used in a statistical analysis is a key question in any genomewide association study (GWAS) or genomic selection (GS) study. In addition to the influence of exogenous factors, an important consideration is how far a target population is from the training population the genomewide model is estimated from. Furthermore, the former may not be available along with the latter, either because the data have yet to be collected (e.g. multistage studies) or because the individuals do not exist yet (e.g. future generations in a breeding program). Naturally, we expect the predictive ability of the model to decay as the two populations are increasingly unrelated — that is, when their genetic distance increases. An interesting question then is, how far a target population can we reliably predict for a certain phenotype given a training population and an associated genomewide model? Clustering, Genetic Distance and Relatedness The genetic distance between the two populations is known as kinship or relatedness, and can be estimated from marker profiles in several ways including allele sharing [4] and allelic correlation [1]. Allelic correlation in particular is interesting because it measures relatedness as a linear relationship, which can be easily handled by classic statistical procedures. For example, if the target population is unknown we can: • compute the kinship matrix of all samples from their marker profiles using allelic correlation; • use the kinship matrix as a distance matrix and split the population in two subsets using nearest neighbour (knn) with k = 2; • take the largest subset as the new training population to estimate the genomewide model and the smallest as the new target population to estimate predictive power reliably [5]. The two subsamples are guaranteed to minimise the average kinship coefficient between their elements, because knn maximises the average Euclidean distance and thus minimises the average allelic correlation. This is in contrast with crossvalidation, which produces pairs of homogeneous training and test sets.
Open Access
"... A note on the rationale for estimating genealogical coancestry from molecular markers ..."
Abstract
 Add to MetaCart
(Show Context)
A note on the rationale for estimating genealogical coancestry from molecular markers
Received; accepted Corresponding author:
"... analytical comparison of the principal component method and ..."
Edited by:
, 2013
"... †These authors have contributed equally to this work. Objectives: We present an uptodate review of STRUCTURE software: one of the most widely used population analysis tools that allows researchers to assess patterns of genetic structure in a set of samples. STRUCTURE can identify subsets of the wh ..."
Abstract
 Add to MetaCart
(Show Context)
†These authors have contributed equally to this work. Objectives: We present an uptodate review of STRUCTURE software: one of the most widely used population analysis tools that allows researchers to assess patterns of genetic structure in a set of samples. STRUCTURE can identify subsets of the whole sample by detecting allele frequency differences within the data and can assign individuals to those subpopulations based on analysis of likelihoods. The review covers STRUCTURE’s most commonly used ancestry and frequency models, plus an overview of the main applications of the software in human genetics including casecontrol association studies (CCAS), population genetics, and forensic analysis. The review is accompanied by supplementary material providing a stepbystep guide to running STRUCTURE. Methods: With reference to a worked example, we explore the effects of changing the principal analysis parameters on STRUCTURE results when analyzing a uniform set of human genetic data. Use of the supporting software: CLUMPP and distruct is detailed and we provide an overview and worked example of STRAT software, applicable to CCAS.
METHODOLOGY ARTICLE Open Access
"... Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia James MS Wason 1 * , Frank Dudbridge 2 Background: Genomewide association studies (GWAS) are a widely used study design for detecting genetic causes of complex diseases. Current studies prov ..."
Abstract
 Add to MetaCart
(Show Context)
Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia James MS Wason 1 * , Frank Dudbridge 2 Background: Genomewide association studies (GWAS) are a widely used study design for detecting genetic causes of complex diseases. Current studies provide good coverage of common causal SNPs, but not rare ones. A popular method to detect rare causal variants is haplotype testing. A disadvantage of this approach is that many parameters are estimated simultaneously, which can mean a loss of power and slower fitting to large datasets. Haplotype testing effectively tests both the allele frequencies and the linkage disequilibrium (LD) structure of the data. LD has previously been shown to be mostly attributable to LD between adjacent SNPs. We propose a generalised linear model (GLM) which models the effects of each SNP in a region as well as the statistical interactions between adjacent pairs. This is compared to two other commonly used multimarker GLMs: one with a maineffect parameter for each SNP; one with a parameter for each haplotype. Results: We show the haplotype model has higher power for rare untyped causal SNPs, the maineffects model