Results 11 -
16 of
16
tRuEcasIng
, 2003
"... Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of on news articles. Task based evaluation shows a 26% F-measure improveme ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing.
Mixed Language Query Disambiguation
- IN ACL-99. THE 37TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1999
"... We propose a mixed language query disambiguation approach by using co-occurrence information from monolingual data only. A mixed language query consists of words in a primary language and a secondary language. Our method translates the query into monolingual queries in either language. Two novel fea ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We propose a mixed language query disambiguation approach by using co-occurrence information from monolingual data only. A mixed language query consists of words in a primary language and a secondary language. Our method translates the query into monolingual queries in either language. Two novel features for disambiguation, namely contextual word voting and 1-best contextual word, are introduced and compared to a baseline feature, the nearest neighbor. Average query translation accuracy for the two features are 81.37% and 83.72%, compared to the baseline accuracy of 75.50%.
Unsupervised Text Mining
, 1997
"... We describe the results of performing text mining on a challenging problem in natural language processing, word sense disambiguation. We compare two methods of unsupervised learning, Ward's minimum--variance clustering and the EM algorithm, that distinguish the meaning of an ambiguous word based onl ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We describe the results of performing text mining on a challenging problem in natural language processing, word sense disambiguation. We compare two methods of unsupervised learning, Ward's minimum--variance clustering and the EM algorithm, that distinguish the meaning of an ambiguous word based only on features that can be automatically identified in text. This is a significant advantage over most previous approaches which require a training sample where the meanings of ambiguous words have been manually disambiguated. The creation of sense tagged text sufficient to serve as a training sample is expensive and time consuming and is yet another example of the knowledge acquisition bottleneck. We present experimental results showing the application of each of these algorithms to the disambiguation of three nouns using five different feature sets. We find that these methods can distinguish two senses of bill with accuracy of up to 82 percent, three senses of interest
Information Access
"... IV 1. MOTIVATION 1 2. DISAMBIGUATION AND TOPIC ASSIGNMENT 2 2.1 THE LEXICAL DISAMBIGUATION ALGORITHM 2 2.1.1 SOME ALGORITHMIC DETAILS 5 2.1.2 EFFICIENCY AND IMPLEMENTATION DETAILS 6 2.2 THE CLASSIFICATION ALGORITHM. 6 2.3 PREPROCESSING 8 2.3.1 DOCUMENT PREPROCESSING 8 2.3.1.1 Stemming. 9 2.3.2 THESA ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
IV 1. MOTIVATION 1 2. DISAMBIGUATION AND TOPIC ASSIGNMENT 2 2.1 THE LEXICAL DISAMBIGUATION ALGORITHM 2 2.1.1 SOME ALGORITHMIC DETAILS 5 2.1.2 EFFICIENCY AND IMPLEMENTATION DETAILS 6 2.2 THE CLASSIFICATION ALGORITHM. 6 2.3 PREPROCESSING 8 2.3.1 DOCUMENT PREPROCESSING 8 2.3.1.1 Stemming. 9 2.3.2 THESAURUS PREPROCESSING 10 3. SOFTWARE DESIGN OF IAGO! 11 3.1 OVERALL DESIGN 11 3.2 THE INTERNET DIRECTORY 12 3.2.1 THE DIRECTORY USER-INTERFACE 12 3.3 SEARCH BY WORD SENSES 14 3.3.1 LEXICAL DISAMBIGUATION FILTER 14 4. EXPERIMENTS 16 4.1 IAGO! 0.1 16 4.1.1 MOTIVATIONS FOR IAGO! 0.1 17 4.1.2 INITIAL RESULTS AND REMEDIAL MEASURES 18 4.2 EVALUATING THE INTERNET DIRECTORY 19 4.3 EVALUATING DISAMBIGUATION 21 iii 4.3.2 RESULTS 22 4.3.3 EVALUATING SEARCH BY WORD SENSES 26 5. LIMITATIONS AND POSSIBLE IMPROVEMENTS 27 5.1 EFFICIENCY 27 5.2 IMPROVING DISAMBIGUATION 28 5.2.1 LIMITS OF THE APPROACH 28 5.2.2 THESAURAL CATEGORIES AS WORD SENSE PROXIES 29 5.2.3 INCOMPLETENESS OF THE THESAURUS 30 5.2.4 MULTI-WORD PHRASES 31 5.2.5 WORD ELEMENTS 32 5.2.6 USING THE WORD SENSE DISTRIBUTION TO IMPROVE DISAMBIGUATION 32 5.3 TOPIC ASSIGNMENT 32 5.3.1 THESAURAL CATEGORIES AS TOPICS 32 5.3.2 MULTIPLE CATEGORIZATION AND RANKING 33 5.3.3 DISAMBIGUATION 33 5.3.4 COMMON WORDS 34 5.3.5 MULTILINGUAL CONSIDERATIONS 34 6. EXTENSIONS AND OTHER APPLICATIONS: 35 6.1 AUTOMATED SUMMARIZATION 35 6.2 QUALITY 36 6.3 INTEGRATING INFORMATION FROM PICTURES 36 6.4 QUERY EXPANSION 37 7. CONCLUSION 37 8. ACKNOWLEDGMENTS 38 9. REFERENCES 39 iv Information Access by Natural Language Processing Isaac Cheng and Robert Wilensky Abstract We explore the hypothesis that lexical disambiguation could be applied to provide useful information access services. Specifically, we refined a lexical disambiguation method, and used it in ...
Word-Sense Disambiguation
"... This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roget's index tend to correspond to sen ..."
Abstract
- Add to MetaCart
This paper describes a program that disambiguates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roget's index tend to correspond to sense distinctions; thus selecting the most likely category provides a useful level of sense disambiguation. The selection of categories is accomplished by identifying and weighting words that are indicative of each category when seen in context, using a Bayesian theoretical framework.
Mitsubishi Electric Research Laboratories
- in Proceedings of International Symposium on Non-Photorealistic Animation and Rendering (Annecy
, 2002
"... this paper we describe a system to show some limited effects on a static toy-car model and present techniques that can be used in similar setups. Our focus is on creating apparent motion for animation ..."
Abstract
- Add to MetaCart
this paper we describe a system to show some limited effects on a static toy-car model and present techniques that can be used in similar setups. Our focus is on creating apparent motion for animation

