Results 1 -
6 of
6
The State of Record Linkage and Current Research Problems
- Statistical Research Division, U.S. Census Bureau
, 1999
"... This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful id ..."
Abstract
-
Cited by 172 (7 self)
- Add to MetaCart
This paper provides an overview of methods and systems developed for record linkage. Modern record linkage begins with the pioneering work of Newcombe and is especially based on the formal mathematical model of Fellegi and Sunter. In their seminal work, Fellegi and Sunter introduced many powerful ideas for estimating record linkage parameters and other ideas that still influence record linkage today. Record linkage research is characterized by its synergism of statistics, computer science, and operations research. Many difficult algorithms have been developed and put in software systems. Record linkage practice is still very limited. Some limits are due to existing software. Other limits are due to the difficulty in automatically estimating matching parameters and error rates, with current research highlighted by the work of Larsen and Rubin. Keywords: computer matching, modeling, iterative fitting, string comparison, optimization RsSUMs Cet article donne une vue d'ensemble sur les ...
Matching and Record Linkage
- Business Survey Methods
, 1995
"... INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters b ..."
Abstract
-
Cited by 77 (14 self)
- Add to MetaCart
INTRODUCTION Matching has a long history of uses in statistical surveys and administrative data development. A business register consisting of names, addresses, and other identifying information such as total financial receipts might be constructed from tax and employment data bases (see chapters by Colledge, Nijhowne, and Archer). A survey of retail establishments or agricultural establishments might combine results from an area frame and a list frame. To produce a combined estimator, units from the area frame would need to be identified in the list frame (see Vogel-Kott chapter). To estimate the size of a (sub)population via capture-recapture techniques, one needs to accurately determine units common to two or more independent listings (Sekar and Deming 1949; Scheuren 1983; Winkler 1989b). Samples must be drawn appropriately to estimate overlap (Deming and Gleser 1959). Rather than develop a special survey to collect data for policy decisions, it might be more appropriate t
Recursive Analysis Of Linked Data Files
- Proceedings of the 1996 Census Bureau Annual Research Conference
, 1996
"... This paper demonstrates a methodology for analyzing two or more files when the only common information is name and address that is subject to significant error. Such a situation might arise with lists of businesses. We assume that a small proportion of records can be accurately matched. With the mat ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper demonstrates a methodology for analyzing two or more files when the only common information is name and address that is subject to significant error. Such a situation might arise with lists of businesses. We assume that a small proportion of records can be accurately matched. With the matched pairs we build an edit/imputation model and add predicted quantitative values, via a regression analysis to each file. Matching is then repeated with the common quantitative data and with name and address information. If necessary, the edit/impute, regression, and matching steps can be repeated in a recursive fashion. In large measure the ideas of Neter, Maynes, and Ramanathan (1965) are revised but with new tools. KEYWORDS Edit, Imputation, Record Linkage, Regression Analysis, Recursive Processes 1. INTRODUCTION To make the best decisions, researchers and policymakers often need more information than is available in a single data base or in summary statistics from multiple files. S...
Regression Analysis with Linked Data
, 2004
"... Record linkage, or exact matching, can be used to join together two files that contain information on the same individuals, but lack unique personal identification codes. The possibility of errors in linkage causes problems for estimating the relationships between variables on the two files. The eff ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Record linkage, or exact matching, can be used to join together two files that contain information on the same individuals, but lack unique personal identification codes. The possibility of errors in linkage causes problems for estimating the relationships between variables on the two files. The effect is analogous to the impact of measurement error. A model of a linear regression relationship between variables in linked files is proposed. Assuming the probabilities that pairs of records are links are known, an unbiased estimator of the regression coefficients is derived. Methods for estimating the linkage probabilities by using mixture models are discussed. A consistent estimator of the covariance matrix of the proposed estimator is proposed. A bootstrap estimator is used to reflect the impact of the uncertainty in record linkage model parameters on the estimators of the regression parameters. A simulation study compares the performance of the proposed estimator and alternatives.
A Double Sampling Scheme Model for . . .
, 1974
"... A general double sampling scheme model which employs a combination of an error-free measurement process and a faulty measurement process is developed. The model allows estimation of measurement error variance and elimination of measurement process bias. The model is applied to two specific survey s ..."
Abstract
- Add to MetaCart
A general double sampling scheme model which employs a combination of an error-free measurement process and a faulty measurement process is developed. The model allows estimation of measurement error variance and elimination of measurement process bias. The model is applied to two specific survey situations, a self-enumeration survey and an interviewer conducted survey. Using a cost function which reflects the relative cost of the error-free measurement process and the faulty measurement process, optimum values for the sample sizes are derived and the optimum number of interviewers is indicated. For various values of the parameters the DSS model is compared to using only the faulty measurement process or only the error-free measurement process and the preferred sampling scheme is indicated.

