• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Finding Approximate Matches in Large Lexicons (1995)

by Justin Zobel, Philip Dart
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 22
Next 10 →

Concepts of Adaptive Information Filtering

by Daniel Remy Tauritz, Promotor Prof, J. N. Kok, Overige Dr, T. Bäck , 1996
"... concepts and algorithms ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
concepts and algorithms

Comparing Inverted Files and Signature Files for Searching a Large Lexicon p

by Ben Carterette, Fazli Can - Communications of the ACM , 1996
"... Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparison of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparison of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to a smaller size than an inverted file if each n-gram sets only one bit in the term signature. With a signature width less than half the number of unique n-grams in the lexicon, the signature file method is about as fast as the inverted file method, and significantly smaller. Greater flexibility in memory usage and faster index generation time make signature files appropriate for searching large lexicons or other collections in an environment where memory is at a premium.

MAL4:6- Using Data Mining for Record Linkage

by Burdette Pixton, Christophe Giraud-carrier
"... This paper presents a first attempt at using pedigree-based data to improve record linkage. It describes a composite metric for similarity and a mechanism to extract relevant generational features. Results on a large data set demonstrate promise. 1 ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This paper presents a first attempt at using pedigree-based data to improve record linkage. It describes a composite metric for similarity and a mechanism to extract relevant generational features. Results on a large data set demonstrate promise. 1

Information Access to Historical Documents from the Early New High German Period

by Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz, Christiane Wanzeck
"... With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the present role of digital historical documents. After collecting central facts and observations on historical language change we ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the present role of digital historical documents. After collecting central facts and observations on historical language change we comment on the difficulties that result for retrieval and data mining on historical texts. In the second part of the paper we report on our own work in the area with a focus on special matching strategies that help to relate modern language keywords with old variants. The basis of our studies is a collection of documents from the Early New High German period. These texts come with a very rich spectrum on word variants and spelling variations.

Similarity Searching in the CORDIS Text Database

by Euripides G.M. Petrakis, Kostas Tzeras, Kostas Tzerasz Gmd-ipsi , 2001
"... Similarity searching in text databases with multiple field types is still an open problem. We focus our attention on the "COmmunity Research and Development Information Service" (CORDIS) database of the European Union and we evaluate the effectiveness of many text retrieval methods in terms of preci ..."
Abstract - Add to MetaCart
Similarity searching in text databases with multiple field types is still an open problem. We focus our attention on the "COmmunity Research and Development Information Service" (CORDIS) database of the European Union and we evaluate the effectiveness of many text retrieval methods in terms of precision, recall and ranking quality. Our experiments indicate that different field types should be handled by different retrieval methods.

Finding Variants of Out-of-Vocabulary Words in Arabic

by Abdusalam F. A, Nwesri S. M. M, Tahaghoghi Falk Scholer
"... Transliteration of a word into another language often leads to multiple spellings. Unless an information retrieval system recognises different forms of transliterated words, a significant number of documents will be missed when users specify only one spelling variant. Using two different datasets, w ..."
Abstract - Add to MetaCart
Transliteration of a word into another language often leads to multiple spellings. Unless an information retrieval system recognises different forms of transliterated words, a significant number of documents will be missed when users specify only one spelling variant. Using two different datasets, we evaluate several approaches to finding variants of foreign words in Arabic, and show that the longest common subsequence (LCS) technique is the best overall. 1

Utilizing Stacking for Feature Reduction in Graph-Based Genealogical Record Linkage

by Stephen Ivie, Yao Huang Lin, Christophe Giraud-carrier
"... Abstract — Genealogy research is centered on collecting records about an individual from various sources and combining the information to gain a larger historical perspective about that individual, commonly in the form of a pedigree. Data extraction, the internet, and other technological advancement ..."
Abstract - Add to MetaCart
Abstract — Genealogy research is centered on collecting records about an individual from various sources and combining the information to gain a larger historical perspective about that individual, commonly in the form of a pedigree. Data extraction, the internet, and other technological advancements have made large amounts of digital genealogical data more accessible. Discovering the relevancy of a digital record to a given pedigree involves determining if the individual described in the record is in actuality an individual within the pedigree. This process is called Genealogical Record Linkage (GRL). GRL can be automated through data mining and techniques by creating machine learned models from hand labeled comparisons. In this paper, we compare two such models-a tabular approach and a graph based stacking approach-and report the successful application of both on a large, post-blocking database. We also note the successful integration of these approaches in an open source distributed genealogy program that finds relevant machetes to a given pedigree from multiple online repositories. I.

Automatic recognition of handwritten medical forms

by Robert Jay, Milewski Venu Govindaraju, Anurag Bhardwaj, V. Govindaraju, A. Bhardwaj
"... for search engines ..."
Abstract - Add to MetaCart
for search engines

Information Access to Historical Documents from the Early New High German Period

by unknown authors
"... With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the present role of digital historical documents. After collecting central facts and observations on historical language change we ..."
Abstract - Add to MetaCart
With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the present role of digital historical documents. After collecting central facts and observations on historical language change we comment on the difficulties that result for retrieval and data mining on historical texts. In the second part of the paper we report on our own work in the area with a focus on special matching strategies that help to relate modern language keywords with old variants. The basis of our studies is a collection of documents from the Early New High German period. These texts come with a very rich spectrum on word variants and spelling variations.

IMPROVING RECORD LINKAGE THROUGH PEDIGREES

by Burdette Pixton , 2006
"... ..."
Abstract - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University