Results 1 - 10
of
21
General convergence results for linear discriminant updates
- Machine Learning
, 1997
"... Abstract. The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive ” algorithms, which includes Perce ..."
Abstract
-
Cited by 73 (0 self)
- Add to MetaCart
Abstract. The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive ” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge. Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically ” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additive-update behavior of Perceptron and the multiplicative-update behavior of Winnow.
On kernel methods for relational learning
- In Proc. of the International Conference on Machine Learning
, 2003
"... Kernel methods have gained a great deal of popularity in the machine learning community as a method to learn indirectly in highdimensional feature spaces. Those interested in relational learning have recently begun to cast learning from structured and relational data in terms of kernel operations. W ..."
Abstract
-
Cited by 53 (5 self)
- Add to MetaCart
Kernel methods have gained a great deal of popularity in the machine learning community as a method to learn indirectly in highdimensional feature spaces. Those interested in relational learning have recently begun to cast learning from structured and relational data in terms of kernel operations. We describe a general family of kernel functions built up from a description language of limited expressivity and use it to study the benefits and drawbacks of kernel learning in relational domains. Learning with kernels in this family directly models learning over an expanded feature space constructed using the same description language. This allows us to examine issues of time complexity in terms of learning with these and other relational kernels, and how these relate to generalization ability. The tradeoffs between using kernels in a very high dimensional implicit space versus a restricted feature space, is highlighted through two experiments, in bioinformatics and in natural language processing. 1.
Relational Learning via Propositional Algorithms: An Information Extraction Case Study
, 2001
"... This paper develops a new paradigm for relational learning which allows for the representation and learning of relational information using propositional means. This paradigm suggests different tradeoffs than those in the traditional approach to this problem -- the ILP approach -- and as a resu ..."
Abstract
-
Cited by 39 (11 self)
- Add to MetaCart
This paper develops a new paradigm for relational learning which allows for the representation and learning of relational information using propositional means. This paradigm suggests different tradeoffs than those in the traditional approach to this problem -- the ILP approach -- and as a result it enjoys several significant advantages over it. In particular, the new paradigm is more flexible and allows the use of any propositional algorithm, including probabilistic algorithms, within it. We evaluate the new approach on an important and relation-intensive task - Information Extraction - and show that it outperforms existing methods while being orders of magnitude more efficient. 1
Text Chunking based on a Generalization of Winnow
- Journal of Machine Learning Research
, 2001
"... This paper describes a text chunking system based on a generalization of the Winnow algorithm. ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
This paper describes a text chunking system based on a generalization of the Winnow algorithm.
A neuroidal architecture for cognitive computation
- Journal of the ACM
, 2000
"... Abstract. An architecture is described for designing systems that acquire and manipulate large amounts of unsystematized, or so-called commonsense, knowledge. Its aim is to exploit to the full those aspects of computational learning that are known to offer powerful solutions in the acquisition and m ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
Abstract. An architecture is described for designing systems that acquire and manipulate large amounts of unsystematized, or so-called commonsense, knowledge. Its aim is to exploit to the full those aspects of computational learning that are known to offer powerful solutions in the acquisition and maintenance of robust knowledge bases. The architecture makes explicit the requirements on the basic computational tasks that are to be performed and is designed to make these computationally tractable even for very large databases. The main claims are that (i) the basic learning and deduction tasks are provably tractable and (ii) tractable learning offers viable approaches to a range of issues that have been previously identified as problematic for artificial intelligence systems that are programmed. Among the issues that learning offers to resolve are robustness to inconsistencies, robustness to incomplete information and resolving among alternatives. Attribute-efficient learning algorithms, which allow learning from few examples in large dimensional systems, are fundamental to the approach. Underpinning the overall architecture is a new principled approach to manipulating relations in learning systems. This approach, of independently quantified arguments, allows propositional learning algorithms to be applied systematically to learning relational concepts in polynomial time and in a modular fashion.
Relational Representations that Facilitate Learning
, 2000
"... Given a collection of objects in the world, along with some relations that hold among them, a fundamental problem is how to learn denitions of some relations and concepts of interest in terms of the given relations. These denitions might be quite complex and, inevitably, might require the use ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
Given a collection of objects in the world, along with some relations that hold among them, a fundamental problem is how to learn denitions of some relations and concepts of interest in terms of the given relations. These denitions might be quite complex and, inevitably, might require the use of quanti- ed expressions. Attempts to use rst order languages for these purposes are hampered by the fact that relational inference is intractable and, consequently, so is the problem of learning relational denitions. This work develops an expressive relational representation language that allows the use of propositional learning algorithms when learning relational denitions. The representation serves as an intermediate level between a raw description of observations in the world and a propositional learning system that attempts to learn denitions for concepts and relations. It allows for hierarchical composition of relational expressions that can be evaluated ecientl...
Many-Layered Learning
- Neural Computation
, 2002
"... We explore incremental assimilation of new knowledge by sequential learning. Of particular interest is how a network of many knowledge layers can be constructed in an on-line manner, such that the learned units represent building blocks of knowledge that serve to compress the overall representation ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
We explore incremental assimilation of new knowledge by sequential learning. Of particular interest is how a network of many knowledge layers can be constructed in an on-line manner, such that the learned units represent building blocks of knowledge that serve to compress the overall representation and facilitate transfer. We motivate the need for many layers of knowledge, and we advocate sequential learning as an avenue for promoting construction of layered knowledge structures. Finally, our novel STL algorithm demonstrates an efficient method for simultaneously acquiring and organizing a collection of concepts and functions from a stream of rich but otherwise unstructured information. 1
Learning with feature description logics
- Proceedings of the 12th International Conference on Inductive Logic Programming
, 2002
"... Abstract. We present a paradigm for efficient learning and inference with relational data using propositional means. The paradigm utilizes description logics and concepts graphs in the service of learning relational models using efficient propositional learning algorithms. We introduce a Feature Des ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Abstract. We present a paradigm for efficient learning and inference with relational data using propositional means. The paradigm utilizes description logics and concepts graphs in the service of learning relational models using efficient propositional learning algorithms. We introduce a Feature Description Logic (FDL)- a relational (frame based) language that supports efficient inference, along with a generation function that uses inference with descriptions in the FDL to produce features suitable for use by learning algorithms. These are used within a learning framework that is shown to learn efficiently and accurately relational representations in terms of the FDL descriptions. The paradigm was designed to support learning in domains that are relational but where the amount of data and size of representation learned are very large; we exemplify it here, for clarity, on the classical ILP task of learning family relations. This paradigm provides a natural solution to the problem of learning and representing relational data; it extends and unifies several lines of works in KRR and Machine Learning in ways that provide hope for a coherent usage of learning and reasoning methods in large scale intelligent inference. 1
Text Chunking using Regularized Winnow
- In: ACL
, 2001
"... Many machine learning methods have recently been applied to natural language processing tasks. Among them, the Winnow algorithm has been argued to be particularly suitable for NLP problems, due to its robustness to irrelevant features. However in theory, Winnow may not converge for nonseparab ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Many machine learning methods have recently been applied to natural language processing tasks. Among them, the Winnow algorithm has been argued to be particularly suitable for NLP problems, due to its robustness to irrelevant features. However in theory, Winnow may not converge for nonseparable data. To remedy this problem, a modification called regularized Winnow has been proposed. In this paper, we apply this new method to text chunking. We show that this method achieves state of the art performance with significantly less computation than previous approaches.
Enhanced Answer Type Inference from Questions using Sequential Models
- In Proceedings of HLT/EMNLP 2005
, 2005
"... Question classification is an important step in factual question answering (QA) and other dialog systems. Several attempts have been made to apply statistical machine learning approaches, including Support Vector Machines (SVMs) with sophisticated features and kernels. Curiously, the payoff beyond a ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Question classification is an important step in factual question answering (QA) and other dialog systems. Several attempts have been made to apply statistical machine learning approaches, including Support Vector Machines (SVMs) with sophisticated features and kernels. Curiously, the payoff beyond a simple bag-ofwords representation has been small. We show that most questions reveal their class through a short contiguous token subsequence, which we call its informer span. Perfect knowledge of informer spans can enhance accuracy from 79.4 % to 88% using linear SVMs on standard benchmarks. In contrast, standard heuristics based on shallow pattern-matching give only a 3 % improvement, showing that the notion of an informer is non-trivial. Using a novel multi-resolution encoding of the question’s parse tree, we induce a Conditional Random Field (CRF) to identify informer spans with about 85 % accuracy. Then we build a meta-classifier using a linear SVM on the CRF output, enhancing accuracy to 86.2%, which is better than all published numbers. 1

