Results 1 - 10
of
37
Machine Learning in Automated Text Categorization
- ACM Computing Surveys
, 2002
"... The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this p ..."
Abstract
-
Cited by 839 (13 self)
- Add to MetaCart
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.
An introduction to boosting and leveraging
- Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Exploring automatic word sense disambiguation with decision lists and the Web
- Proceedings of the Semantic Annotation And Intelligent
, 2000
"... The most effective paradigm for word sense disambiguation, supervised learning, seems to be stuck because of the knowledge acquisition bottleneck. In this paper we take an in-depth study of the performance of decision lists on two publicly available corpora and an additional corpus automatical ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
The most effective paradigm for word sense disambiguation, supervised learning, seems to be stuck because of the knowledge acquisition bottleneck. In this paper we take an in-depth study of the performance of decision lists on two publicly available corpora and an additional corpus automatically acquired from the Web, using the fine-grained highly polysemous senses in WordNet. Decision lists are shown a versatile state-of-the-art technique. The experiments reveal, among other facts, that SemCor can be an acceptable (0.7 precision for polysemous words) starting point for an all-words system. The results on the DSO corpus show that for some highly polysemous words 0.7 precision seems to be the current state-of-the-art limit. On the other hand, independently constructed hand-tagged corpora are not mutually useful, and a corpus automatically acquired from the Web is shown to fail. Introduction Recent trends in word sense disambiguation (Ide & Veronis, 1998) show that ...
Querying the Web: A Multiontology Disambiguation Method
, 2006
"... The lack of explicit semantics in the current Web can lead to ambiguity problems: for example, current search engines return unwanted information since they do not take into account the exact meaning given by user to the keywords used. Though disambiguation is a very well-known problem in Natural La ..."
Abstract
-
Cited by 30 (10 self)
- Add to MetaCart
The lack of explicit semantics in the current Web can lead to ambiguity problems: for example, current search engines return unwanted information since they do not take into account the exact meaning given by user to the keywords used. Though disambiguation is a very well-known problem in Natural Language Processing and other domains, traditional methods are not flexible enough to work in a Webbased context. In this paper we have identified some desirable properties that a Web-oriented disambiguation method should fulfill, and make a proposal according to them. The proposed method processes a set of related keywords in order to discover and extract their implicit semantics, obtaining their most suitable senses according to their context. The possible senses are extracted from the knowledge represented by a pool of ontologies available in the Web. This method applies an iterative disambiguation algorithm that uses a semantic relatedness measure based on Google frequencies. Our proposal makes explicit the semantics of keywords by means of ontology terms; this information can be used for different purposes, such as improving the search and retrieval of underlying relevant information.
Word Translation Disambiguation Using Bilingual Bootstrapping
- COMPUTATIONAL LINGUISTICS
, 2002
"... This paper proposes a new method for word translation disambiguation using a machine learning technique called `Bilingual Bootstrapping'. Bilingual Bootstrapping makes use of # in learning# a small number of classified data and a large number of unclassified data in the source and the tar ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
This paper proposes a new method for word translation disambiguation using a machine learning technique called `Bilingual Bootstrapping'. Bilingual Bootstrapping makes use of # in learning# a small number of classified data and a large number of unclassified data in the source and the target languages in translation. It constructs classifiers in the two languages in parallel and repeatedly boosts the performances of the classifiers by further classifying data in each of the two languages and by exchanging between the two languages information regarding the classified data. Experimental results indicate that word translation disambiguation based on Bilingual Bootstrapping consistently and significantly outperforms the existing methods based on `Monolingual Bootstrapping'.
Word sense disambiguation: a survey
- ACM COMPUTING SURVEYS
, 2009
"... Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.
Combined Optimization of Feature Selection and Algorithm Parameters in Machine Learning of Language
- In Proc
, 2003
"... Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the `right bias' to solve specific natural language processing tasks, and (ii) to investigate which sourc ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the `right bias' to solve specific natural language processing tasks, and (ii) to investigate which sources of information add to accuracy in a learning approach.
An Empirical Study of the Domain Dependence of Supervised Word Sense Disambiguation Systems
- IN PROC. OF EMNLP/VLC00
, 2000
"... This paper describes a set of experiments carried out to explore the domain dependence of alternative supervised Word Sense Disambignation algorithms. The aim of the work is threefold: studying the performance of these algorithms when tested on a different corpus from that they were trained on; expl ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper describes a set of experiments carried out to explore the domain dependence of alternative supervised Word Sense Disambignation algorithms. The aim of the work is threefold: studying the performance of these algorithms when tested on a different corpus from that they were trained on; exploring their ability to tune to new domains and demonstrating empirically that the Lazy-Boosting algorithm outperforms state-of-theart supervised WSD algorithms in both previous situations.
A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation
- In Proc. of CoNLL-2000. ACL
, 2000
"... This paper describes a set of comparative experiments, including cross-corpus evaluation, between five alternative algorithms for supervised Word Sense Disambiguation (WSD), namely Naive Bayes, Exemplar-based learning, SNOW, Decision Lists, and Boosting. Two main conclu- sions can be drawn: 1) The L ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper describes a set of comparative experiments, including cross-corpus evaluation, between five alternative algorithms for supervised Word Sense Disambiguation (WSD), namely Naive Bayes, Exemplar-based learning, SNOW, Decision Lists, and Boosting. Two main conclu- sions can be drawn: 1) The LazyBoosting algorithm outperforms the other four state-of-theart algorithms in terms of accuracy and ability to tune to new domains; 2) The domain dependence of WSD systems seems very strong and suggests that some kind of adaptation or tuning is required for cross-corpus application.
Improving Term Extraction by System Combination using Boosting
, 2001
"... Term extraction is the task of automatically detecting, from textual corpora, lexical units that designate concepts in thematically restricted domains (e.g. medicine). Current systems for term extraction integrate linguistic and statistical cues to perform the detection of terms. ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Term extraction is the task of automatically detecting, from textual corpora, lexical units that designate concepts in thematically restricted domains (e.g. medicine). Current systems for term extraction integrate linguistic and statistical cues to perform the detection of terms.

