Results 1 - 10
of
81
TAILOR: A Record Linkage Toolbox
, 2002
"... Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the re ..."
Abstract
-
Cited by 90 (9 self)
- Add to MetaCart
the record pairs that represent the same entity (duplicate records), commonly known as record linkage, is one of the essential elements of data cleaning. In this paper, we address the record linkage problem by adopting a machine learning approach. Three models are proposed and are analyzed empirically. Since
Record Linkage: Current Practice and Future Directions
- CSIRO Mathematical and Information Sciences
, 2003
"... Record linkage is the task of quickly and accurately identifying records corresponding to the same entity from one or more data sources. Record linkage is also known as data cleaning, entity reconciliation or identification and the merge/purge problem. This paper presents the "standard" ..."
Abstract
-
Cited by 51 (0 self)
- Add to MetaCart
;standard" probabilistic record linkage model and the associated algorithm. Recent work in information retrieval, federated database systems and data mining have proposed alternatives to key components of the standard algorithm. The impact of these alternatives on the standard approach are assessed. The key question
Beyond Probabilistic Record Linkage: Using Neural Networks and Complex Features to Improve Genealogical Record Linkage
"... Abstract — Probabilistic record linkage has been used for many years in a variety of industries, including medical, government, private sector and research groups. The formulas used for probabilistic record linkage have been recognized by some as being equivalent to the naïve Bayes classifier. While ..."
Abstract
- Add to MetaCart
. While this method can produce useful results, it is not difficult to improve accuracy by using one of a host of other machine learning or neural network algorithms. Even a simple singlelayer perceptron tends to outperform the naïve Bayes classifier—and thus traditional probabilistic record linkage
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
, 2004
"... rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these in ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and non-equivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model
Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases
, 2002
"... The problem of identifying approximately duplicate records in databases has previously been studied as record linkage, the merge/purge problem, hardening soft databases, and field matching. Most existing approaches have focused on efficient algorithms for locating potential duplicates rather than pr ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
precise similarity metrics for comparing records. In this paper, we present a domain-independent method for improving duplicate detection accuracy using machine learning. First, trainable distance metrics are learned for each field, adapting to the specific notion of similarity that is appropriate
Naive Bayes Classifiers: A Probabilistic Detection Model for Breast Cancer
"... Naive Bayes is one of the most effective statistical and probabilistic classification algorithms. As health care environment is “information loaded ” but “knowledge deprived”. So to extract knowledge, effective analysis tools are constructed to discover hidden relationships in data. The aim of this ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
of this work is to design a Graphical User Interface to enter the patient screening record and detect the probability of having Breast cancer disease in women in her future using Naive Bayes Classifiers, a Probabilistic Classifier. As breast cancer is considered to be second leading cause of cancer deaths
STATISTICAL MODELS AND ANALYSIS TECHNIQUES FOR LEARNING IN RELATIONAL DATA
, 2006
"... Many data sets routinely captured by organizations are relational in nature - from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships
among those objects (e.g., citation graphs ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and
thereby decision-making, if machine learning techniques can effectively exploit the relational information.
This work focuses on how to learn accurate statistical models of complex
Research Article Inferring Ancestries Efficiently in Admixed Populations with Linkage Disequilibrium
"... Much effort has recently been invested in developing methods for determining the ancestral origin of chromosomal segments in admixed individuals. Motivations for this task are the study of population history such as bottleneck effects and migration, the assessment of population stratification for ad ..."
Abstract
- Add to MetaCart
, with sufficient ancestral haplotypes, this framework can provide higher accuracy in inferring ancestral origin. Key words: algorithms, computational molecular biology, genetic mapping, genetic variations, machine learning, Markov chains.
Predictive Modeling of Implantation Outcome in an In Vitro Fertilization Setting: An Application of Machine Learning Methods
"... ization (IVF) treatment increase the number of successful pregnancies while elevating the risk of multiple gestations. IVF-associated multiple pregnancies exhibit significant financial, social, and medical implications. Clinicians need to decide the number of embryos to be transferred considering th ..."
Abstract
- Add to MetaCart
in embryo-based implantation prediction. Multiple embryo implantations were predicted at a 63.8 % sensitiv-ity level. Predictions using the proposed model resulted in higher accuracy compared with expert judgment alone (on average, 75.7 % and 60.1%, respectively). Conclusions. A machine learning
PredictiveModeling of Implantation Outcome in an In Vitro Fertilization Setting: An Application of Machine Learning Methods
"... ization (IVF) treatment increase the number of successful pregnancies while elevating the risk of multiple gestations. IVF-associated multiple pregnancies exhibit significant financial, social, and medical implications. Clinicians need to decide the number of embryos to be transferred considering th ..."
Abstract
- Add to MetaCart
in embryo-based implantation prediction. Multiple embryo implantations were predicted at a 63.8 % sensitiv-ity level. Predictions using the proposed model resulted in higher accuracy compared with expert judgment alone (on average, 75.7 % and 60.1%, respectively). Conclusions. A machine learning
Results 1 - 10
of
81