• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Applying Winnow to Context-Sensitive Spelling Correction (1996)

by Andrew Golding, Dan Roth
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 55
Next 10 →

Learning to Resolve Natural Language Ambiguities: A Unified Approach

by Dan Roth , 1998
"... We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be recast as learning linear separators in the feature space. Each of the methods makes a priori assumptions, which it employs, given the data, w ..."
Abstract - Cited by 154 (75 self) - Add to MetaCart
We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be recast as learning linear separators in the feature space. Each of the methods makes a priori assumptions, which it employs, given the data, when searching for its hypothesis. Nevertheless, as we show, it searches a space that is as rich as the space of all linear separators. We use this to build an argument for a data driven approach which merely searches for a good linear separator in the feature space, without further assumptions on the domain or a specific problem. We present such an approach - a sparse network of linear separators, utilizing the Winnow learning algorithm - and show how to use it in a variety of ambiguity resolution problems. The learning approach presented is attribute-efficient and, therefore, appropriate for domains having very large number of attributes. In particular, we present an extensive experimental ...

Machine-Learning Research -- Four Current Directions

by Thomas G. Dietterich
"... Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up super ..."
Abstract - Cited by 102 (1 self) - Add to MetaCart
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models.

Mistake-Driven Learning in Text Categorization

by Ido Dagan, Yael Karov, Dan Roth - IN EMNLP-97, THE SECOND CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING , 1997
"... Learning problems in the text processing domain often map the text to a space whose dimensions are the measured fea- tures of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very ..."
Abstract - Cited by 87 (7 self) - Add to MetaCart
Learning problems in the text processing domain often map the text to a space whose dimensions are the measured fea- tures of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistake-driven learning algo- rithms for a typical task of this nature - text categorization. We argue

Active learning with committees for text categorization

by Ray Liere, Prasad Tadepalli - In proceedings of the Fourteenth National Conference on Artificial Intelligence , 1997
"... In many real-world domains, supervised learning requires a large number of training examples. In this paper, we describe an active learning method that uses a committee of learners to reduce the number of training examples required for learning. Our approach is similar to the Query by Committee fram ..."
Abstract - Cited by 70 (0 self) - Add to MetaCart
In many real-world domains, supervised learning requires a large number of training examples. In this paper, we describe an active learning method that uses a committee of learners to reduce the number of training examples required for learning. Our approach is similar to the Query by Committee framework, where disagreement among the committee members on the predicted label for the input part of the example is used to signal the need for knowing the actual value of the label. Our experiments are conducted in the text categorization domain, which is characterized by a large number of features, many ofwhich are irrelevant. We report here on experiments using a committee of Winnowbased learners and demonstrate that this approach can reduce the number of labeled training examples required over that used by a single Winnow learner by 1-2 orders of magnitude. 1.

Automatic Rule Acquisition for Spelling Correction

by Lidia Mangu, Eric Brill - In Proceedings of the 14th International Conference on Machine Learning , 1997
"... This paper describes a new approach to automatically learning linguistic knowledge for spelling correction. A major feature of this approach is the fact that the acquired knowledge is captured in a small set of easily understood rules, as opposed to a large set of opaque features and weights. A pers ..."
Abstract - Cited by 59 (4 self) - Add to MetaCart
This paper describes a new approach to automatically learning linguistic knowledge for spelling correction. A major feature of this approach is the fact that the acquired knowledge is captured in a small set of easily understood rules, as opposed to a large set of opaque features and weights. A perspicuous representation is advantageous in order to best exploit human intuition to understand and improve upon the acquired knowledge of the system.

Learning Action Strategies for Planning Domains

by Roni Khardon - ARTIFICIAL INTELLIGENCE , 1997
"... This paper reports on experiments where techniques of supervised machine learning are applied to the problem of planning. The input to the learning algorithm is composed of a description of a planning domain, planning problems in this domain, and solutions for them. The output is an efficient algori ..."
Abstract - Cited by 58 (2 self) - Add to MetaCart
This paper reports on experiments where techniques of supervised machine learning are applied to the problem of planning. The input to the learning algorithm is composed of a description of a planning domain, planning problems in this domain, and solutions for them. The output is an efficient algorithm --- a strategy --- for solving problems in that domain. We test the strategy on an independent set of planning problems from the same domain, so that success is measured by its ability to solve complete problems. A system, L2Act, has been developed in order to perform these experiments. We have experimented with the blocks world domain, and the logistics domain, using strategies in the form of a generalization of decision lists, where the rules on the list are existentially quantified first order expressions. The learning algorithm is a variant of Rivest`s [39] algorithm, improved with several techniques that reduce its time complexity. As the experiments demonstrate, generalization is a...

A neuroidal architecture for cognitive computation

by Leslie G. Valiant - Journal of the ACM , 2000
"... Abstract. An architecture is described for designing systems that acquire and manipulate large amounts of unsystematized, or so-called commonsense, knowledge. Its aim is to exploit to the full those aspects of computational learning that are known to offer powerful solutions in the acquisition and m ..."
Abstract - Cited by 32 (4 self) - Add to MetaCart
Abstract. An architecture is described for designing systems that acquire and manipulate large amounts of unsystematized, or so-called commonsense, knowledge. Its aim is to exploit to the full those aspects of computational learning that are known to offer powerful solutions in the acquisition and maintenance of robust knowledge bases. The architecture makes explicit the requirements on the basic computational tasks that are to be performed and is designed to make these computationally tractable even for very large databases. The main claims are that (i) the basic learning and deduction tasks are provably tractable and (ii) tractable learning offers viable approaches to a range of issues that have been previously identified as problematic for artificial intelligence systems that are programmed. Among the issues that learning offers to resolve are robustness to inconsistencies, robustness to incomplete information and resolving among alternatives. Attribute-efficient learning algorithms, which allow learning from few examples in large dimensional systems, are fundamental to the approach. Underpinning the overall architecture is a new principled approach to manipulating relations in learning systems. This approach, of independently quantified arguments, allows propositional learning algorithms to be applied systematically to learning relational concepts in polynomial time and in a modular fashion.

Robust Logics

by Leslie G. Valiant
"... Suppose that we wish to learn from examples and counter-examples a criterion for recognizing whether an assembly of wooden blocks constitutes an arch. Suppose also that we have preprogrammed recognizers for various relationships e.g. on-top-of(x; y), above(x; y), etc. and believe that some possibl ..."
Abstract - Cited by 27 (6 self) - Add to MetaCart
Suppose that we wish to learn from examples and counter-examples a criterion for recognizing whether an assembly of wooden blocks constitutes an arch. Suppose also that we have preprogrammed recognizers for various relationships e.g. on-top-of(x; y), above(x; y), etc. and believe that some possibly complex expression in terms of these base relationships should suffice to approximate the desired notion of an arch. How can we formulate such a relational learning problem so as to exploit the benefits that are demonstrably available in propositional learning, such as attribute-efficient learning by linear separators, and error-resilient learning? We believe that learning in a general setting that allows for multiple objects and relations in this way is a fundamental key to resolving the following dilemma that arises in the design of intelligent systems: Mathematical logic is an attractive language of description because it has clear semantics and sound proof procedures. However, as a basis for large programmed systems it leads to brittleness because, in practice, consistent usage of the various predicate names throughout a system cannot be guaranteed, except in application areas such as mathematics where the viability of the axiomatic method has been demonstrated independently. In this paper we develop the following approach to circumventing this dilemma. We suggest that brittleness can be overcome by using a new kind of logic in which each statement is learnable. By allowing the system to learn rules empirically from the environment, relative to any particular programs it may have for recognizing some base predicates, we enable the system to acquire a set of statements approximately consistent with each other and with the world, without the need for a globally knowledgeable and consistent programmer. We illustrate

Mining features for sequence classification

by Neal Lesh, Mohammed J. Zaki, Mitsunori Ogihara , 1998
"... Classification algorithms are difficult to apply to sequential examples, such as plan executions or text, because there is a vast number of potentially useful features for describing each example. Past work on feature selection has focused on searching the space of all subsets of the available fe ..."
Abstract - Cited by 24 (1 self) - Add to MetaCart
Classification algorithms are difficult to apply to sequential examples, such as plan executions or text, because there is a vast number of potentially useful features for describing each example. Past work on feature selection has focused on searching the space of all subsets of the available features which is intractable for large feature sets. We adapt data mining techniques to act as a preprocessor to select features for standard classification algorithms such as Naive Bayes and Winnow. We apply our algorithm to the task of predicting whether or not a plan will succeed or fail, during plan execution. The features produced by our algorithm improve classi#cation accuracy by 10-50% in our experiments.

Scaling Up Context-Sensitive Text Correction

by Andrew J. Carlson, Jeffrey Rosen, Dan Roth , 2001
"... The main challenge in an effort to build a realistic system with context-sensitive inference capabilities, beyond accuracy, is scalability. This paper studies this problem in the context of a learning-based approach to context sensitive text correction -- the task of fixing spelling errors that resu ..."
Abstract - Cited by 21 (8 self) - Add to MetaCart
The main challenge in an effort to build a realistic system with context-sensitive inference capabilities, beyond accuracy, is scalability. This paper studies this problem in the context of a learning-based approach to context sensitive text correction -- the task of fixing spelling errors that result in valid words, such as substituting to for too, casual for causal, and so on. Research papers on this problem have developed algorithms that can achieve fairly high accuracy, in many cases over 90%. However, this level of performance is not sufficient for a large coverage practical system since it implies a low sentence level performance. We examine and offer solutions to several issues relating to scaling up a context sensitive text correction system. In particular, we suggest methods to reduce the memory requirements while maintaining a high level of performance and show that this can still allow the system to adapt to new domains. Most important, we show how to significantly increase the coverage of the system to realistic levels, while providing a very high level of performance, at the 99% level.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University