• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 127
Next 10 →

Approximate String Joins in a Database (Almost) for Free - Erratum

by Luis Gravano, Panagiotis G. Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Divesh Srivastava - In VLDB , 2003
"... case the result returned by the Figure 1 query is incomplete and su#ers from "false negatives," in contrast to our claim to the contrary in [GIJ 01b]. In general, the string pairs that are omitted are pairs of short strings. Even when these strings match within small edit distance, t ..."
Abstract - Cited by 210 (16 self) - Add to MetaCart
negatives are only pairs of short strings, we can join all pairs of these small strings, using only the length filter, and UNION the result with the result of the SQL query described in [GIJ 01b]. We list the modified query in Figure 2. 2 Experimental Results We now experimentally measure the number

Web Data Integration Using Approximate String Join

by Yingping Huang, Gregory Madey - In WWW , 2004
"... • Web data integration is an important preprocessing step for web mining and data analysis. • Approximate string processing is a fundamental ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
• Web data integration is an important preprocessing step for web mining and data analysis. • Approximate string processing is a fundamental

Accuracy of Approximate String Joins Using Grams

by Oktie Hassanzadeh, Mohammad Sadoghi, Renee J. Miller , 2007
"... Approximate join is an important part of many data cleaning and integration methodologies. Various similarity measures have been proposed for accurate and efficient matching of string attributes. The accuracy of the similarity measures highly depends on the characteristics of the data such as amount ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
in these methodologies. We present an overview of several similarity measures based on q-grams. We then thoroughly compare their accuracy on several datasets with different characteristics. Since the efficiency of approximate joins depend on the similarity threshold they use, we study how the value of the threshold

String...

by Luis Gravano, Panagiotis G. Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Divesh Srivastava
"... String data is ubiquitous, and its management has taken on particular importance in the past few years. Approximate queries are very important on string data especially for more complex queries involving joins. This is due, for example, to the prevalence of typographical errors in data, and multiple ..."
Abstract - Add to MetaCart
relational optimizers. We demonstrate experimentally the benefits of our technique over the direct use of UDFs, using commercial database systems and real data. To study the I/O and CPU behavior of approximate string join algorithms with variations in edit distance and  -gram length, we also describe

Using Approximate String Matching Techniques to Join Street Names of Residential Addresses

by Supervisor Prof, Johann Gamper, Tutor Nikolaus Augsten , 2004
"... i For many administrative tasks at the Municipality of Bolzano-Bozen a number of autonomous databases have to be accessed. In order to compute these tasks more efficiently, the content of these databases should be linked automatically. A promising join attribute are residen-tial addresses, as they a ..."
Abstract - Add to MetaCart
techniques in order to find good matches for street names. In this thesis I analyze the accuracy and efficiency of two approx-imate string matching algorithms for matching street names: q-gram and edit-distance. This analysis is based on experiments using street

String Similarity Measures and Joins with Synonyms

by Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang
"... A string similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings “Sam ” and “Samuel ” can be considered similar. Most existing work that computes the similarity of two strings only considers syntactic similarities, ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
both signature and length filtering strategies, for efficient string similarity joins with synonyms. We develop an estimator to approximate the size of candidates to enable an online selection of signature filters to further improve the efficiency. This estimator provides strong low-error, high

Approximate joins for data-centric XML

by Nikolaus Augsten, Michael Bohlen, Curtis Dyreson, Johann Gamper - In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society , 2008
"... Abstract- In data integration applications, a join matches elements that are common to two data sources. Often, however, elements are represented slightly different in each source, so an approximate join must be used. For XML data, most approximate join strategies are based on some ordered tree matc ..."
Abstract - Cited by 10 (4 self) - Add to MetaCart
trees. The approximate join algorithm based on windowed pq-grams is implemented as an equality join on strings which avoids the costly computation of the distance between every pair of input trees. Our experiments with synthetic and real world data confirm the analytic results and suggest that our

Scalable String Similarity Search/Join with Approximate Seeds and Multiple Backtracking

by Enrico Siragusa, David Weese, Knut Reinert
"... We present in this paper scalable algorithms for optimal string similarity search and join. Our methods are variations of those applied in Masai [15], our recently published tool for mapping high-throughput DNA sequencing data with unpreceded speed and accuracy. The key features of our approach are ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We present in this paper scalable algorithms for optimal string similarity search and join. Our methods are variations of those applied in Masai [15], our recently published tool for mapping high-throughput DNA sequencing data with unpreceded speed and accuracy. The key features of our approach

Using q-grams in a DBMS for Approximate String Processing

by Luis Gravano, Panagiotis G. Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Lauri Pietarinen, Divesh Srivastava , 2001
"... String data is ubiquitous, and its management has taken on particular importance in the past few years. Approximate queries are very important on string data. This is due, for example, to the prevalence of typographical errors in data, and multiple conventions for recording attributes such as name a ..."
Abstract - Cited by 38 (1 self) - Add to MetaCart
databases by exploiting facilities already available in them. At the core, our technique relies on generating short substrings of length #, called #-grams, and processing them using standard methods available in the DBMS. The proposed technique enables various approximate string processing methods in a DBMS

Approximate Joins for Data Centric XML

by Nikolaus Augsten, Michael Böhlen, Curtis Dyreson, Johann Gamper - INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE) , 2008
"... Abstract — In data integration applications, a join matches elements that are common to two data sources. Often, however, elements are represented slightly different in each source, so an approximate join must be used. For XML data, most approximate join strategies are based on some ordered tree mat ..."
Abstract - Add to MetaCart
trees. The approximate join algorithm based on windowed pq-grams is implemented as an equality join on strings and avoids to evaluate the distance between every pair of input trees. Our experiments with synthetic and real world data confirm the analytic results and suggest that our technique is both
Next 10 →
Results 1 - 10 of 127
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University