Results 1 - 10
of
25
A Sequential Model for Multi-Class Classification. EMNLP ’01
, 2001
"... Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general approach – a sequential learning model that utilizes classifie ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general approach – a sequential learning model that utilizes classifiers to sequentially restrict the number of competing classes while maintaining, with high probability, the presence of the true outcome in the candidates set. Some theoretical and computational properties of the model are discussed and we argue that these are important in NLP-like domains. The advantages of the model are illustrated in an experiment in partof-speech tagging. 1
Scaling Up Context-Sensitive Text Correction
, 2001
"... The main challenge in an effort to build a realistic system with context-sensitive inference capabilities, beyond accuracy, is scalability. This paper studies this problem in the context of a learning-based approach to context sensitive text correction -- the task of fixing spelling errors that resu ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
The main challenge in an effort to build a realistic system with context-sensitive inference capabilities, beyond accuracy, is scalability. This paper studies this problem in the context of a learning-based approach to context sensitive text correction -- the task of fixing spelling errors that result in valid words, such as substituting to for too, casual for causal, and so on. Research papers on this problem have developed algorithms that can achieve fairly high accuracy, in many cases over 90%. However, this level of performance is not sufficient for a large coverage practical system since it implies a low sentence level performance. We examine and offer solutions to several issues relating to scaling up a context sensitive text correction system. In particular, we suggest methods to reduce the memory requirements while maintaining a high level of performance and show that this can still allow the system to adapt to new domains. Most important, we show how to significantly increase the coverage of the system to realistic levels, while providing a very high level of performance, at the 99% level.
Learning active classifiers
- Proceedings of the Thirteenth International Conference on Machine Learning (ICML96
, 1996
"... Most classification algorithms are "passive", in that they assign a class-label to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can -- at some cost -- obtain the values of missing attributes, before deciding upon a class ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
Most classification algorithms are "passive", in that they assign a class-label to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can -- at some cost -- obtain the values of missing attributes, before deciding upon a class label. This can be useful when considering, for example, whether to extract some information from the web for a critical decision or whether to gather information for a medical test or experiment. The expected utility of using an active classifier depends on both the cost required to obtain the additional attribute values and the penalty incurred if the classifier outputs the wrong classification. This paper analyzes the problem of learning optimal active classifiers, using a variant of the probably-approximately-correct (PAC) model. After defining the framework, we show that this task can be achieved efficiently when the active classifier is allowed to perform only (at most) a constant number of tests. We then show that, in more general environments, the task is often intractable.
Measures and Applications of Lexical Distributional Similarity
, 2003
"... This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, statistics, Information Retrieval (IR) and Information Theory. Our aim is to investigate the properties which make a good measure of lexical distributional similarity. We start by introducing the concept of lexical distributional similarity. We discuss potential applications, which can be roughly divided into distributional or language modelling applications and semantic applications, and methods of evaluation (Chapter 2). We look at existing measures of distributional similarity and carry out an empirical comparison of fifteen of these measures, paying particular attention to the effects of word frequency (Chapter 3). We propose a new general framework for distributional similarity based on the context of lexical substitutability, which me measure using the IR concepts of precision and recall. This framework allows us to investigate the key factors in similarity of asymmetry, the relative influence of different contexts and the extent to which words share a context (Chapter 4). Finally, we consider the application of distributional similarity in language modelling (Chapter 5) and as a predictor of semantic similarity using human judgements of similarity and a spelling correction task (Chapter 6).
Discriminative learning of selectional preference from unlabeled text
- In Proc. of EMNLP
, 2008
"... We present a discriminative method for learning selectional preferences from unlabeled text. Positive examples are taken from observed predicate-argument pairs, while negatives are constructed from unobserved combinations. We train a Support Vector Machine classifier to distinguish the positive from ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We present a discriminative method for learning selectional preferences from unlabeled text. Positive examples are taken from observed predicate-argument pairs, while negatives are constructed from unobserved combinations. We train a Support Vector Machine classifier to distinguish the positive from the negative instances. We show how to partition the examples for efficient training with 57 thousand features and 6.5 million training instances. The model outperforms other recent approaches, achieving excellent correlation with human plausibility judgments. Compared to Mutual Information, it identifies 66% more verb-object pairs in unseen text, and resolves 37 % more pronouns correctly in a pronoun resolution experiment. 1
Recall systems: Efficient learning and use of category indices
- In International Conference on Artificial Intelligence and Statistics (AISTATS
, 2007
"... We introduce the framework of recall systems for efficient learning and retrieval of categories when the number of categories is large. A recallsystem here is a simple feature-based intermediate filtering step which reduces the potential categories for an instance to a small manageable set. The corr ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
We introduce the framework of recall systems for efficient learning and retrieval of categories when the number of categories is large. A recallsystem here is a simple feature-based intermediate filtering step which reduces the potential categories for an instance to a small manageable set. The correct categories from this set can then be determined using traditional classifiers. We present a formalization of the index learning problem and establish NP-hardness and approximation hardness. We proceed to give an efficient heuristic for learning indices, and evaluate it on several large data sets. In our experiments, the index is learned within minutes, and reduces the number of categories by several orders of magnitude, without affecting the quality of classification overall. 1
Pattern-Based Disambiguation for Natural Language Processing
, 2000
"... A wide range of natural language problems can be viewed as disambiguating between a small set of alternatives based upon the string context surrounding the ambiguity site. In this paper we demonstrate that classification accuracy can be improved by invoking a more descriptive feature set than ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
A wide range of natural language problems can be viewed as disambiguating between a small set of alternatives based upon the string context surrounding the ambiguity site. In this paper we demonstrate that classification accuracy can be improved by invoking a more descriptive feature set than what is typically used. We present a technique that disambiguates by learning regular expressions describing the string contexts in which the ambiguity sites appear.
Prediction games in infinitely rich worlds
- In Utility Based Data Mining Workshop (UBDM at KDD
, 2006
"... categories, every experience would be new, and one couldn’t make sense of one’s world. Furthermore, higher intelligence requires large numbers of categories, perhaps millions and beyond. Acquiring and robust detection of categories appears to be a complex task as categories inter-relate in complex w ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
categories, every experience would be new, and one couldn’t make sense of one’s world. Furthermore, higher intelligence requires large numbers of categories, perhaps millions and beyond. Acquiring and robust detection of categories appears to be a complex task as categories inter-relate in complex ways and occur in diverse conditions. We may then ask: how can a system learn so many complex inter-related categories? We propose and explore an avenue that we call prediction games in infinitely rich worlds. In these games, the world is a source of an unlimited stream of information. The games are played by a prediction system that in effect repeatedly experiments with its world and learns from its experiments. The system converts its input stream from the world into a sequence of learning episodes for itself. Each learning episode consists of the system hiding parts of the input, guessing (predicting) them using the remainder of the input (the local context), and updating itself based on comparing its observations with its predictions. The goal of the system is to improve its
Context-Based Word Prediction and Classification
"... This paper presents a new approach for word prediction problem. Word prediction is a natural language processing problem that tries to predict the correct word in a given context. Word completion utilities, writing aids, and language translation are among the most common applications of word predict ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
This paper presents a new approach for word prediction problem. Word prediction is a natural language processing problem that tries to predict the correct word in a given context. Word completion utilities, writing aids, and language translation are among the most common applications of word prediction. In this paper, we describe a new method to predict the correct word given its context. A data mining tool is used as a classification mean to predict the correct word in the given context. The method has been implemented; the testing results are promising. The approach requires a very small training text size compared with similar methods to produce an accuracy that approaches 93% correct predictions.
A first experimental demonstration of massive knowledge infusion
- Proc. 11th International Conference on Principles of Knowledge Representation and Reasoning
"... A central goal of Artificial Intelligence is to create systems that embody commonsense knowledge in a reliable enough form that it can be used for reasoning in novel situations. Knowledge Infusion is an approach to this problem in which the commonsense knowledge is acquired by learning. In this pape ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
A central goal of Artificial Intelligence is to create systems that embody commonsense knowledge in a reliable enough form that it can be used for reasoning in novel situations. Knowledge Infusion is an approach to this problem in which the commonsense knowledge is acquired by learning. In this paper we report on experiments on a corpus of a half million sentences of natural language text that test whether commonsense knowledge can be usefully acquired through this approach. We examine the task of predicting a deleted word from the remainder of a sentence for some 268 target words. As baseline we consider how well this task can be performed using learned rules based on the words within a fixed distance of the target word and their parts of speech. This captures an approach that has been previously demonstrated to be highly successful for a variety of natural language tasks. We then go on to learn from the corpus rules that embody commonsense knowledge, additional to the knowledge used in the baseline case. We show that chaining learned commonsense rules together leads to measurable improvements in prediction performance on our task as compared with the baseline. This is apparently the first experimental demonstration that commonsense knowledge can be learned from natural inputs on a massive scale reliably enough that chaining the learned rules is efficacious for reasoning.

