Results 1 -
3 of
3
Foundational principles for large scale inference: Illustrations through correlation mining
, 2015
"... When can reliable inference be drawn in the “Big Data ” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-in ..."
Abstract
- Add to MetaCart
(Show Context)
When can reliable inference be drawn in the “Big Data ” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quanti-fies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional
Selection and Estimation for Mixed Graphical Models
, 2014
"... We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different paramet-ric form. In particular, we assume that each node’s conditional distribution is in the exponential family. We identify re ..."
Abstract
- Add to MetaCart
(Show Context)
We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different paramet-ric form. In particular, we assume that each node’s conditional distribution is in the exponential family. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose condi-tional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and expo-nential distributions. Our theoretical findings are corroborated by evidence from simulation studies.
unknown title
"... A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees ..."
Abstract
- Add to MetaCart
A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees