Results 1 - 10
of
179
Discovering Word Senses from Text
- In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining
, 2002
"... Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text ..."
Abstract
-
Cited by 159 (10 self)
- Add to MetaCart
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval---Clustering.
Evaluating Natural Language Processing Systems
, 1993
"... This report presents a detailed analysis and review of NLP evaluation, in principle and in practice. Part 1 examines evaluation concepts and establishes a framework for NLP system evaluation. This makes use of experience in the related area of information retrieval and the analysis also refers to ev ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
This report presents a detailed analysis and review of NLP evaluation, in principle and in practice. Part 1 examines evaluation concepts and establishes a framework for NLP system evaluation. This makes use of experience in the related area of information retrieval and the analysis also refers to evaluation in speech processing. Part 2 surveys significant evaluation work done so far, for instance in machine translation, and discusses the particular problems of generic system evaluation. The conclusion is that evaluation strategies and techniques for NLP need much more development, in particular to take proper account of the influence of system tasks and settings. Part 3 develops a general approach to NLP evaluation, aimed at methodologically-sound strategies for test and evaluation motivated by comprehensive performance factor identification. The analysis throughout the report is supported by extensive illustrative examples. This work was carried out under the UK Science and Engineeri...
Data mining for hypertext: A tutorial survey
- ACM SIGKDD Explorations
, 2000
"... With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile ground for data mining research to make a difference to the effectiveness of information search. Today, Web surfers access the Web through two dominant interfaces: clicking on hyperlinks and searching ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
With over 800 million pages covering most areas of human endeavor, the World-wide Web is a fertile ground for data mining research to make a difference to the effectiveness of information search. Today, Web surfers access the Web through two dominant interfaces: clicking on hyperlinks and searching via keyword queries. This process is often tentative and unsatisfactory. Better support is needed for expressing one's information need and dealing with a search result in more structured ways than available now. Data mining and machine learning have significant roles to play towards this end. In this paper we will survey recent advances in learning and mining problems related to hypertext in general and the Web in particular. We will review the continuum of supervised to semi-supervised to unsupervised learning problems, highlight the specific challenges which distinguish data mining in the hypertext domain from data mining in the context of data warehouses, and summarize the key areas of ...
Automatic Cross-Language Retrieval Using Latent Semantic Indexing
, 1997
"... We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multilingual se ..."
Abstract
-
Cited by 57 (1 self)
- Add to MetaCart
We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multilingual semantic space using Latent Semantic Indexing (LSI). Strong test results for the cross-language LSI (CLLSI) method are presented for a new French-English collection. We also provide evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI), and explore several practical training methods. By all available measures, CL-LSI performs quite well and is widely applicable. Introduction Cross-language LSI (CL-LSI) is a fully automatic method for cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished b...
Learning Dependency Translation Models as Collections of Finite State Head Transducers
- Computational Linguistics
, 2000
"... The paper defines weighted head transducers,finite-state machines that perform middle-out string transduction. These transducers are strictly more expressive than the special case of standard leftto-right finite-state transducers. Dependency transduction models are then defined as collections of wei ..."
Abstract
-
Cited by 57 (3 self)
- Add to MetaCart
The paper defines weighted head transducers,finite-state machines that perform middle-out string transduction. These transducers are strictly more expressive than the special case of standard leftto-right finite-state transducers. Dependency transduction models are then defined as collections of weighted head transducers that are applied hierarchically. A dynamic programming search algorithm is described for finding the optimal transduction of an input string with respect to a dependency transduction model. A method for automatically training a dependency transduction model from a set of input-output example strings is presented. The method first searches for hierarchical alignments of the training examples guided by correlation statistics, and then constructs the transitions of head transducers that are consistent with these alignments. Experimental results are given for applying the training method to translation from English to Spanish and Japanese. 1.
Interactive Translation of Conversational Speech
, 1996
"... iscuss their usability and performance. 1.0 Introduction Multilinguality will take on spoken form when information services are to extend beyond national boundaries or across language groups. Database access by speech will need to handle multiple languages to service customers from different langu ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
iscuss their usability and performance. 1.0 Introduction Multilinguality will take on spoken form when information services are to extend beyond national boundaries or across language groups. Database access by speech will need to handle multiple languages to service customers from different language groups. Public service operators (emergency, police, telephone operators and others) frequently receive requests from foreigners unable to speak the national language. Already multilingual spoken language services are growing. Telephone companies in the US (AT&T Language Line), Europe and Japan now offer language translation services over the telephone, provided by human operators. Movies and television broadcasts are routinely translated and Interactive Translation of Conversational Speech 2 delivered either by dubbing, subtitles or multilingual transcripts. With the drive of automating information services, therefore, comes a growing need for automate
Automatic Cross-Language Information Retrieval using Latent Semantic Indexing
- Cross-Language Information Retrieval, chapter 5
, 1998
"... We descride a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual s ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
We descride a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual semantic space using Latent Semantic Indexing (LSI). We present strong preliminary test results for our cross-language LSI (CL-LSI) method for a French-English collection. We also provide some evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI).
Semantic-based Transfer
, 1996
"... This article presents a new semanticbased transfer approach developed and applied within the Verbmobil Machine Translation project. We give an overview of the declarative transfer fo,'malism to- gether with its procedural realization. Our approach is discussed and compared with several other approac ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
This article presents a new semanticbased transfer approach developed and applied within the Verbmobil Machine Translation project. We give an overview of the declarative transfer fo,'malism to- gether with its procedural realization. Our approach is discussed and compared with several other approaches f,'om the MT literature. The results presented in this article have been implemented and integrated into the Verbmobil system.
The Use of Lexical Semantics in Interlingual Machine Translation
- MACHINE TRANSLATION
, 1992
"... ..."
Translating with non-contiguous phrases
- In EMNLP
, 2005
"... This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a tra ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
This paper presents a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with gaps. A method for producing such phrases from a word-aligned corpora is proposed. A statistical translation model is also presented that deals such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric. Translations are produced by means of a beam-search decoder. Experimental results are presented, that demonstrate how the proposed method allows to better generalize from the training data. 1

