Results 1 -
6 of
6
Cleansing databases of misspelled proper nouns
- In CleanDB Workshop
, 2006
"... The paper presents a data cleansing technique for string databases. We propose and evaluate an algorithm that identifies a group of strings that consists of (multiple) occurrences of a correctly spelled string plus nearby misspelled strings. All strings in a group are replaced by the most frequent s ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
with sampling to efficiently identify and cleanse a database. The experimental evaluation shows that for proper nouns the center calculation and border detection algorithms are robust and even very small sample sizes yield good results. 1
Searching Proper Names in Databases
"... Identifying names --- e.g., author names or company names --- is still an open problem. In this paper we review known similarity measures. These measures deal with phonetic similarity, typing errors and plain string similarity. We show experimentally that all three approaches lead to significant bet ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
vagueness of queries and uncertainty of knowledge. In this paper, we focus on searching proper nouns. Here the vagueness stems from limited knowledge a user has about his information need. If he is e.g. searching for papers of a certain author in a bibliographic database, he will not be successful if he
Algorithms for Grapheme-Phoneme Translation for English and French: Applications
- COMPUTATIONAL LINGUISTICS
, 1997
"... Letter-to-sound rules, also known as grapheme-to-phoneme rules, are important computational tools and have been used for a variety of purposes including word or name lookups for database searches and speech synthesis. These rules are especially useful when integrated into database searches on names ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
on names and ad-dresses, since they can complement orthographic search algorithms that make use of permutation, deletion, and insertion by allowing for a comparison with the phonetic equivalent. In databases, phonetics can help retrieve a word or a proper name without the user needing to know the correct
Spellchecking and Error Correcting System for text paragraphs written in Punjabi Language using Hybrid approach
"... ABSTRACT:Spell-checking is the process of detecting and correcting incorrect spelled words in a paragraph. Spell checking system first detects the incorrect words and then provide the best possible solution of corrected words. Spell checking system is a combination of handcrafted rules of the langu ..."
Abstract
- Add to MetaCart
application that analysis possible misspelling in a text by referring to the accepted spellings in a database. In the database various accurate words of the target language for which the spell -checker is to be made are stored which consists of proper nouns for males, females, countries, states, rivers
Visualization of Hash-functions
"... den angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus Quellen entnommen wurden, sind als solche kenntlich gemacht. Diese Arbeit hat in gleicher oder ähnlicher Form noch keiner Prüfungsbehörde vorgelegen. Darmstadt, den 26.06.2012 (T. Kilian) Contents ..."
Abstract
- Add to MetaCart
den angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus Quellen entnommen wurden, sind als solche kenntlich gemacht. Diese Arbeit hat in gleicher oder ähnlicher Form noch keiner Prüfungsbehörde vorgelegen. Darmstadt, den 26.06.2012 (T. Kilian) Contents
Innovation Activities and the Incentives for Vertical Acquisitions and Integration
"... ABSTRACT We examine the incentives for firms to vertically integrate through acquisitions and production. We develop a new firm-specific measure of vertical relatedness and integration using 10-K product text. We find that firms in high R&D industries are less likely to become targets in vertic ..."
Abstract
- Add to MetaCart
by such vertical links, we remove any mentions of customers and suppliers using 81 phrases listed in the Internet Appendix. 23 Ultimately, we represent both firm vocabularies and the commodity vocabularies from BEA as vectors with length 60,507, which is the number of nouns and proper nouns appearing in 10-K