Clustering Association Rules
, 1997
Cited by 114 (0 self)
We consider the problem of clustering twodimensional association rules in large databases. We present a geometricbased algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segmentation generated by ARCS using the Minimum Description Length (MDL) principle of encoding the clusters on several databases including noise and errors. Scaleup experiments show that ARCS, using the BitOp algorithm, scales linearly with the amount of data. 1 Introduction Data mining, or the efficient discovery of interesting patterns from large collections of data, has been recognized as an important area of database research. The most commonly sought patterns are association rules as introduced in [AIS93b]. Intuitively, an association rule identifies a frequently occuring pattern of information in a database. Consider a supermarket database w...
Bayesian Model Assessment In Factor Analysis
, 2004
Cited by 58 (8 self)
Factor analysis has been one of the most powerful and flexible tools for assessment of multivariate dependence and codependence. Loosely speaking, it could be argued that the origin of its success rests in its very exploratory nature, where various kinds of datarelationships amongst the variables at study can be iteratively verified and/or refuted. Bayesian inference in factor analytic models has received renewed attention in recent years, partly due to computational advances but also partly to applied focuses generating factor structures as exemplified by recent work in financial time series modeling. The focus of our current work is on exploring questions of uncertainty about the number of latent factors in a multivariate factor model, combined with methodological and computational issues of model specification and model fitting. We explore reversible jump MCMC methods that build on sets of parallel Gibbs samplingbased analyses to generate suitable empirical proposal distributions and that address the challenging problem of finding e#cient proposals in highdimensional models. Alternative MCMC methods based on bridge sampling are discussed, and these fully Bayesian MCMC approaches are compared with a collection of popular model selection methods in empirical studies.
The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis
 Psychological Methods
, 1996
Cited by 42 (1 self)
Monte Carlo computer simulations were used to investigate the performance of three X 2 test statistics in confirmatory factor analysis (CFA). Normal theory maximum likelihood)~2 (ML), Browne's asymptotic distribution free X 2 (ADF), and the SatorraBentler rescaled X 2 (SB) were examined under varying conditions of sample size, model specification, and multivariate distribution. For properly specified models, ML and SB showed no evidence of bias under normal distributions across all sample sizes, whereas ADF was biased at all but the largest sample sizes. ML was increasingly overestimated with increasing nonnormality, but both SB (at all sample sizes) and ADF (only at large sample sizes) showed no evidence of bias. For misspecified models, ML was again inflated with increasing nonnormality, but both SB and ADF were underestimated with increasing nonnormality. It appears that the power of the SB and ADF test statistics to detect a model misspecification is attenuated given nonnormally distributed data. Confirmatory factor analysis (CFA) has become an increasingly popular method of investigating the structure of data sets in psychology. In contrast to traditional exploratory factor analysis that does not place strong a priori restrictions on the structure of the model being tested, CFA requires the investigator to specify both the number of factors
Panel Data Models with Interactive Fixed Effects
, 2005
Cited by 40 (4 self)
This paper considers large N and large T panel data models with unobservable multiple interactive effects. These models are useful for both micro and macro econometric modelings. In earnings studies, for example, workers ’ motivation, persistence, and diligence combined to influence the earnings in addition to the usual argument of innate ability. In macroeconomics, the interactive effects represent unobservable common shocks and their heterogeneous responses over cross sections. Since the interactive effects are allowed to be correlated with the regressors, they are treated as fixed effects parameters to be estimated along with the common slope coefficients. The model is estimated by the least squares method, which provides the interactiveeffects counterpart of the within estimator. We first consider model identification, and then derive the rate of convergence and the limiting distribution of the interactiveeffects estimator of the common slope coefficients. The estimator is shown to be √ NT consistent. This rate is valid even in the presence of correlations and heteroskedasticities in both dimensions, a striking contrast with fixed T framework in which serial correlation and heteroskedasticity imply unidentification. The asymptotic distribution is not necessarily centered at zero. Biased corrected estimators are derived. We also derive the constrained estimator and its limiting distribution, imposing additivity coupled with interactive effects. The problem of testing additive versus interactive effects is also studied. We also derive identification conditions for models with grand mean, timeinvariant regressors, and common regressors. It is shown that there exists a set of necessary and sufficient identification conditions for those models. Given identification, the rate of convergence and limiting results continue to hold. Key words and phrases: incidental parameters, additive effects, interactive effects, factor
The Theoretical Status of Latent Variables
 Psychological Review
, 2003
Cited by 25 (3 self)
This article examines the theoretical status of latent variables as used in modern test theory models. First, it is argued that a consistent interpretation of such models requires a realist ontology for latent variables. Second, the relation between latent variables and their indicators is discussed. It is maintained that this relation can be interpreted as a causal one but that in measurement models for interindividual differences the relation does not apply to the level of the individual person. To substantiate intraindividual causal conclusions, one must explicitly represent individual level processes in the measurement model. Several research strategies that may be useful in this respect are discussed, and a typology of constructs is proposed on the basis of this analysis. The need to link individual processes to latent variable models for interindividual differences is emphasized. Consider the following sentence: “Einstein would not have been able to come up with his e � mc 2 had he not possessed such an extraordinary intelligence. ” What does this sentence express? It relates observable behavior (Einstein’s writing e � mc 2)toan unobservable attribute (his extraordinary intelligence), and it does so by assigning to the unobservable attribute a causal role in
A Taxonomy for Spatiotemporal Connectionist Networks Revisited: The Unsupervised Case
 Neural Computation
, 2003
Cited by 21 (1 self)
Spatiotemporal connectionist networks (STCN's) comprise an important class of neural models that can deal with patterns distributed both in time and space. In this paper, we widen the application domain of the taxonomy for supervised STCN's recently proposed by Kremer (2001) to the unsupervised case. This is possible through a reinterpretation of the state vector as a vector of latent (hidden) variables, as proposed by Meinicke (2000). The goal of this generalized taxonomy is then to provide a nonlinear generative framework for describing unsupervised spatiotemporal networks, making it easier to compare and contrast their representational and operational characteristics. Computational properties, representational issues and learning are also discussed and a number of references to the relevant source publications are provided. It is argued that the proposed approach is simple and more powerful than the previous attempts, from a descriptive and predictive viewpoint. We also discuss the relation of this taxonomy with automata theory and state space modeling, and suggest directions for further work.
Macroeconomic Forecasting Using Many Predictors
 Advances in Econometrics, Theory and Applications, Eight World Congress of the Econometric Society
, 2000
Cited by 18 (0 self)
This paper is based on research carried out jointly with James H. Stock, who I thank for comments on this paper. I thank JeanPhilippe Laforte for research assistance. This research was supported by the National Science Foundation (SBR9730489). (Version WC_2b) 1
AIRCRAFT MULTIDISCIPLINARY DESIGN OPTIMIZATION USING DESIGN OF EXPERIMENTS THEORY AND RESPONSE SURFACE MODELING METHODS
, 1997
Cited by 18 (2 self)
Design engineers often employ numerical optimization techniques to assist in the evaluation and comparison of new aircraft configurations. While the use of numerical optimization methods is largely successful, the presence of numerical noise in realistic engineering optimization problems often inhibits the use of many gradientbased optimization techniques. Numerical noise causes inaccurate gradient calculations which in turn slows or prevents convergence during optimization. The problems created by numerical noise are particularly acute in aircraft design applications where a single aerodynamic or structural analysis of a realistic aircraft configuration may require tens of CPU hours on a supercomputer. The computational expense of the analyses coupled with the convergence difficulties created by numerical noise are significant obstacles to performing aircraft multidisciplinary design optimization. To address these issues, a procedure has been developed to create two types of noisefree mathematical models for use in aircraft optimization studies. These two methods use elements of statistical analysis and the overall procedure for using the methods is made computationally affordable by the application of parallel computing techniques. The first
Investigating Spearman’s hypothesis by means of multigroup confirmatory factor analysis
 Multivariate Behavioral Research
, 2000
Cited by 17 (6 self)
Differences between blacks and whites on cognitive ability tests have been attributed to a fundamental difference between these groups in general intelligence (or g, as it is denoted). The hypothesized difference in g gives rise to Spearman’s hypothesis, which states that the differences in the means of the tests are related to the tests ’ factor loadings on g. Jensen has investigated this hypothesis by correlating differences in means and tests ’ g loadings. The aim of the present article is to investigate BW differences using multigroup confirmatory factor analysis. The advantages of multigroup confirmatory factor analysis over Jensen’s test of Spearman’s hypothesis are discussed. A published data set is analyzed. Strict factorial invariance is tested and judged to be tenable. Various models are tested, which do and do not incorporate g. It is observed that it is difficult to distinguish between several hypotheses, including and excluding g, concerning group differences. The inability to distinguish between competing models using multigroup confirmatory factor analysis makes it difficult to draw clear conclusions about the exact nature of blackwhite differences in cognitive abilities. The implications of the results for Jensen’s test of Spearman’s hypothesis are discussed.
Intergovernmental grants as a tactical instrument: empirical evidence from Swedish municipalities
 Journal of Public Economics
Cited by 15 (1 self)
Are grants to Swedish municipalities tactical, that is, do parties use these in order to get elected? In this paper, the theoretical model of Lindbeck & Weibull and Dixit & Londregan is tested, using panel data on 255 Swedish municipalities for the years 1981 1995. The empirical implication of the theory is that groups with many swing voters will receive larger grants than other groups. In the paper, a new method of estimating the number of swing voters is proposed and used. The results support the hypothesis that intergovernmental grants are used in order to win votes.