Results 1 - 10
of
13
Algebraic Algorithms for Sampling from Conditional Distributions
- Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract
-
Cited by 152 (12 self)
- Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.
Matching and Record Linkage
- Business Survey Methods
, 1995
"... INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters b ..."
Abstract
-
Cited by 77 (14 self)
- Add to MetaCart
INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters by Colledge, Nijhowne, and Archer). A survey of retail establishments or agricultural establishments might combine results from an area frame and a list frame. To produce a combined estimator, units from the area frame would need to be identified in the list frame (see Vogel-Kott chapter). To estimate the size of a (sub)population via capture-recapture techniques, one needs to accurately determine units common to two or more independent listings (Sekar and Deming 1949; Scheuren 1983; Winkler 1989b). Samples must be drawn appropriately to estimate overlap (Deming and Gleser 1959). Rather than develop a special survey to collect data for policy decisions, it might be more appropriate t
Improved Decision Rules In The Fellegi-Sunter Model Of Record Linkage
- Proceedings of the Section on Survey Research Methods, American Statistical Association
, 1993
"... Many applications of the Fellegi-Sunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advance ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
Many applications of the Fellegi-Sunter model use simplifying assumptions and ad hoc modifications to improve matching efficacy. Because of model misspecification, distinctive approaches developed in one application typically cannot be used in other applications and do not always make use of advances in statistical and computational theory. An ExpectationMaximization (EMH) algorithm that constrains the estimates to a convex subregion of the parameter space is given. The EMH algorithm provides probability estimates that yield better decision rules than unconstrained estimates. The algorithm is related to results of Meng and Rubin (1993) on Multi-Cycle Expectation-Conditional Maximization algorithms and make use of results of Haberman (1977) that hold for large classes of loglinear models. Key Words: MCECM Algorithm, Latent Class, Computer Matching, Error Rate This paper provides a theory for obtaining constrained maximum likelihood estimates for latent-class, loglinear models on finite ...
An Application Of The Fellegi-Sunter Model Of Record Linkage To The 1990 U.S. Decennial Census
- U.S. Decennial Census”. Technical report, US Bureau of the Census
, 1987
"... This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a pri ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
This paper describes a methodology for computer matching the Post Enumeration Survey with the Census. Computer matching is the first stage of a process for producing adjusted Census counts. All crucial matching parameters are computed solely using characteristics of the files being matched. No a priori knowledge of truth of matches is assumed. No previously created lookup tables are needed. The methods are illustrated with numerical results using files from the 1988 Dress Rehearsal Census for which the truth of matches is known. Key words and phrases. EM Algorithm; String Comparator Metric; LP Algorithm; Decision Rule; Error Rate. 1. INTRODUCTION This paper describes a particular application of the Fellegi-Sunter (1969) model of record linkage. New computational methods are used for computer matching the Post Enumeration Survey (PES) with the Census. The PES is used to produce adjusted Census counts. Computer matching is the first stage of PES processing. All crucial matching paramete...
The Gifi System Of Descriptive Multivariate Analysis
- STATISTICAL SCIENCE
, 1998
"... The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of h ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of homogeneity analysis is presented, along with its extensions and generalizations leading to nonmetric principal components analysis and canonical correlation analysis. A brief account of stability issues and areas of applications of the techniques is also given.
Pattern discovery by residual analysis and recursive partitioning
- IEEE Transactions on Knowledge and Data Engineering
, 1999
"... AbstractÐIn this paper, a novel method of pattern discovery is proposed. It is based on the theoretical formulation of a contingency table of events. Using residual analysis and recursive partitioning, statistically significant events are identified in a data set. These events constitute the importa ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
AbstractÐIn this paper, a novel method of pattern discovery is proposed. It is based on the theoretical formulation of a contingency table of events. Using residual analysis and recursive partitioning, statistically significant events are identified in a data set. These events constitute the important information contained in the data set and are easily interpretable as simple rules, contour plots, or parallel axes plots. In addition, an informative probabilistic description of the data is automatically furnished by the discovery process. Following a theoretical formulation, experiments with real and simulated data will demonstrate the ability to discover subtle patterns amid noise, the invariance to changes of scale, cluster detection, and discovery of multidimensional patterns. It is shown that the pattern discovery method offers the advantages of easy interpretation, rapid training, and tolerance to noncentralized noise. Index TermsÐPattern discovery, residual analysis, recursive partitioning, events, contingency tables.
HOMOGENEITY ANALYSIS
"... Abstract. The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main techn ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. The Gifi system of analyzing categorical data through nonlinear varieties of classical multivariate analysis techniques is reviewed. The system is characterized by the optimal scaling of categorical variables which is implemented through alternating least squares algorithms. The main technique of homogeneity analysis is presented, along with its extensions and generalizations leading to nonmetric principal components analysis and canonical correlation analysis. Several examples are used to illustrate the methods. A brief account of stability issues and areas of applications of the techniques is also given. Key words and phrases: Optimal scaling, alternating least squares, multivariate techniques, loss functions, stability.
SAS Global Forum 2007 Statistics and Data Analysis Paper 192-2007 Latent Class Analysis in SAS®: Promise, Problems, and Programming
"... Latent class analysis (LCA) is an important tool for marketing professionals who must characterize subgroups within large and heterogeneous populations. LCA is also of interest to clinical professionals who must place clients in diagnostic or prognostic categories when a gold standard for doing so i ..."
Abstract
- Add to MetaCart
Latent class analysis (LCA) is an important tool for marketing professionals who must characterize subgroups within large and heterogeneous populations. LCA is also of interest to clinical professionals who must place clients in diagnostic or prognostic categories when a gold standard for doing so is poorly defined. Attempts to bring LCA into the SAS ® mainstream are fairly recent. The paper discusses these efforts and demonstrates a SAS macro that combines PROC CATMOD with conventional DATA steps to perform LCA. The macro is demonstrated on data wherein four binary observed variables permit estimation of two hypothesized latent classes. LCA is a categorical analog to factor analysis, and posits the existence of unobserved classes to explain the pattern of association observed in a multidimensional contingency table. LCA estimates two types of parameters: (1) latent class prevalences and (2) probabilities, conditional on class membership, of individuals ' responses on each observed variable. The SAS macro estimates these parameters using a classic expectation-maximization (E-M) algorithm. Maximization steps specify a log-linear model in PROC CATMOD while expectation steps employ standard data step programming. The presentation illustrates the usefulness of LCA and probes certain problems and limitations associated with constructing and interpreting LC models. LC parameter estimates are sensitive to their initial values, and the classic E-M approach does not estimate standard errors. Bootstrapping of standard errors and replicated analyses using a grid of initial estimates are among the approaches that can address these limitations.
FIRST
"... Children’s production and comprehension of politeness in requests: Relationships to behavioural adjustment, temperament and empathy ..."
Abstract
- Add to MetaCart
Children’s production and comprehension of politeness in requests: Relationships to behavioural adjustment, temperament and empathy

