Results 1  10
of
19
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
Modelling functional integration: a comparison of structural equation and dynamic causal models
 NeuroImage
, 2004
"... The brain appears to adhere to two fundamental principles of functional organisation, functional integration and functional specialisation, where the integration within and among specialised areas is mediated by effective connectivity. In this paper we review two different approaches to modelling ef ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
The brain appears to adhere to two fundamental principles of functional organisation, functional integration and functional specialisation, where the integration within and among specialised areas is mediated by effective connectivity. In this paper we review two different approaches to modelling effective connectivity from fMRI data, Structural Equation Models (SEMs) and Dynamic Causal Models (DCMs). In common to both approaches are model comparison frameworks in which inferences can be made about effective connectivity per se and about how that connectivity can be changed by perceptual or cognitive set. Underlying the two approaches, however, are two very different generative models. In DCM a distinction is made between the ‘neuronal level ’ and the ‘hemodynamic level’. Experimental inputs cause changes in effective connectivity expressed at the level of neurodynamics which in turn cause changes in the observed hemodynamics. In SEM changes in effective connectivity lead directly to changes in the covariance structure of the observed hemodynamics. Because changes in effective connectivity in the brain occur at a neuronal level DCM is the preferred model for fMRI data. This review focuses on the underlying assumptions and limitations of each model and demonstrates their application to data from a study of attention to visual motion.
The TETRAD Project: Constraint Based Aids to Causal Model Specification
 MULTIVARIATE BEHAVIORAL RESEARCH
"... ..."
Bayesian SEM: A more flexible representation of substantive theory. Submitted for publication. Retrieved from http://www.statmodel.com/download/ BSEMv4.pdf
, 2010
"... This paper proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, smallvariance priors. It is argued that this produces an analysis that bette ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
This paper proposes a new approach to factor analysis and structural equation modeling using Bayesian analysis. The new approach replaces parameter specifications of exact zeros with approximate zeros based on informative, smallvariance priors. It is argued that this produces an analysis that better reflects substantive theories. The proposed Bayesian approach is particularly beneficial in applications where parameters are added to a conventional model such that a nonidentified model is obtained if maximumlikelihood estimation is applied. This approach is useful for measurement aspects of latent variable modeling such as with CFA and the measurement part of SEM. Two application areas are studied, crossloadings and residual correlations in CFA. The approach encompasses three elements: Model testing, model estimation, and model modification. Monte Carlo simulations and real data are analyzed using Mplus. 2 1
Generalized measurement models
, 2004
"... Given a set of random variables, it is often the case that their associations can be explained by hidden common causes. We present a set of welldefined assumptions and a provably correct algorithm that allow us to identify some of such hidden common causes. The assumptions are fairly general and so ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Given a set of random variables, it is often the case that their associations can be explained by hidden common causes. We present a set of welldefined assumptions and a provably correct algorithm that allow us to identify some of such hidden common causes. The assumptions are fairly general and sometimes weaker than those used in practice by, for instance, econometricians, psychometricians, social scientists and in many other fields where latent variable models are important and tools such as factor analysis are applicable. The goal is automated knowledge discovery: identifying latent variables that can be used across diferent applications and causal models and throw new insights over a data generating process. Our approach is evaluated throught simulations and three realworld cases.
Bayesian analysis using Mplus: Technical implementation
, 2010
"... In this note we describe the implementation details for estimating latent variable models with the Bayesian estimator in Mplus. The algorithm used in Mplus is Markov Chain Monte Carlo (MCMC) based on the Gibbs sampler, see Gelman et al. (2004). ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
In this note we describe the implementation details for estimating latent variable models with the Bayesian estimator in Mplus. The algorithm used in Mplus is Markov Chain Monte Carlo (MCMC) based on the Gibbs sampler, see Gelman et al. (2004).
The hidden life of latent variables: Bayesian learning with mixed graph models
, 2008
"... Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of D ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of DAGs is not closed under marginalization of hidden variables. This means that in general we cannot use a DAG to represent the independencies over a subset of variables in a larger DAG. Directed mixed graphs (DMGs) are a representation that includes DAGs as a special case, and overcomes this limitation. This paper introduces algorithms for performing Bayesian inference in Gaussian and probit DMG models. An important requirement for inference is the characterization of the distribution over parameters of the models. We introduce a new distribution for covariance matrices of Gaussian DMGs. We discuss and illustrate how several Bayesian machine learning tasks can benefit from the principle presented here: the power to model dependencies that are generated from hidden variables, but without necessarily modelling such variables explicitly.
Bayesian inference for Gaussian mixed graph models
 Proceedings of 22nd Conference on Uncertainty in Artificial Intelligence
, 2006
"... We introduce priors and algorithms to perform Bayesian inference in Gaussian models defined by acyclic directed mixed graphs. Such a class of graphs, composed of directed and bidirected edges, is a representation of conditional independencies that is closed under marginalization and arises naturall ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We introduce priors and algorithms to perform Bayesian inference in Gaussian models defined by acyclic directed mixed graphs. Such a class of graphs, composed of directed and bidirected edges, is a representation of conditional independencies that is closed under marginalization and arises naturally from causal models which allow for unmeasured confounding. Monte Carlo methods and a variational approximation for such models are presented. Our algorithms for Bayesian inference allow the evaluation of posterior distributions for several quantities of interest, including causal effects that are not identifiable from data alone but could otherwise be inferred where informative prior knowledge about confounding is available. 1
The modelsize effect on traditional and modified tests of covariance structures
 Structural Equation Modeling
, 2007
"... According to Kenny and McCoach (2003), chisquare tests of structural equation models produce inflated Type I error rates when the degrees of freedom increase. So far, the amount of this bias in large models has not been quantified. In a Monte Carlo study of confirmatory factor models with a range o ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
According to Kenny and McCoach (2003), chisquare tests of structural equation models produce inflated Type I error rates when the degrees of freedom increase. So far, the amount of this bias in large models has not been quantified. In a Monte Carlo study of confirmatory factor models with a range of 48 to 960 degrees of freedom it was found that the traditional maximum likelihood ratio statistic, TML, overestimates nominal Type I error rates up to 70 % under conditions of multivariate normality. Some alternative statistics for the correction of modelsize effects were also investigated: the scaled Satorra–Bentler statistic, TSC; the adjusted Satorra–