• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms (2002)

by Michael Collins
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 260
Next 10 →

Markov Logic Networks

by Matthew Richardson, Pedro Domingos - Machine Learning , 2006
"... Abstract. We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects ..."
Abstract - Cited by 363 (27 self) - Add to MetaCart
Abstract. We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.

Support vector machine learning for interdependent and structured output spaces

by Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, Yasemin Altun - In ICML , 2004
"... Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernel-based methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs suchas multiple depe ..."
Abstract - Cited by 211 (14 self) - Add to MetaCart
Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernel-based methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs suchas multiple dependent output variables and structured output spaces. We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem. We demonstrate the versatility and effectiveness of our method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment. 1.

Large margin methods for structured and interdependent output variables

by Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, Yasemin Altun - JOURNAL OF MACHINE LEARNING RESEARCH , 2005
"... Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary ..."
Abstract - Cited by 208 (10 self) - Add to MetaCart
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.

New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron

by Michael Collins, Nigel Duffy, Florham Park , 2002
"... This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 98), or a r ..."
Abstract - Cited by 164 (5 self) - Add to MetaCart
This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 98), or a representation tracking all sub-fragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and named-entity extraction from web data.

Hidden Markov Support Vector Machines

by Yasemin Altun, Ioannis Tsochantaridis, Thomas Hofmann , 2003
"... This paper presents a novel discriminative learning technique for label sequences based on a combination of the two most successful learning algorithms, Support Vector Machines and Hidden Markov Models which we call Hidden Markov Support Vector Machine. ..."
Abstract - Cited by 154 (7 self) - Add to MetaCart
This paper presents a novel discriminative learning technique for label sequences based on a combination of the two most successful learning algorithms, Support Vector Machines and Hidden Markov Models which we call Hidden Markov Support Vector Machine.

Efficiently Inducing Features of Conditional Random Fields

by Andrew McCallum , 2003
"... Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, non-independent features of the input. Faced with ..."
Abstract - Cited by 142 (9 self) - Add to MetaCart
Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, non-independent features of the input. Faced with

Learning structured prediction models: a large margin approach

by Ben Taskar, Vassil Chatalbashev, Daphne Koller, Carlos Guestrin , 2004
"... We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training ..."
Abstract - Cited by 127 (7 self) - Add to MetaCart
We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data. Our method relies on the expressive power of convex optimization problems to compactly capture inference or solution optimality in structured prediction models. Directly embedding this structure within the learning formulation produces concise convex problems for efficient estimation of very complex and diverse models. We describe experimental results on a matching task, disulfide connectivity prediction, showing significant improvements over state-of-the-art methods. 1.

Semi-Markov conditional random fields for information extraction

by Sunita Sarawagi, William W. Cohen - In Advances in Neural Information Processing Systems 17 , 2004
"... We describe semi-Markov conditional random fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semi-CRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements x ..."
Abstract - Cited by 119 (7 self) - Add to MetaCart
We describe semi-Markov conditional random fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semi-CRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semi-CRFs can measure properties of segments, and transitions within a segment can be non-Markovian. In spite of this additional power, exact learning and inference algorithms for semi-CRFs are polynomial-time—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semi-CRFs generally outperform conventional CRFs. 1

Discriminative fields for modeling spatial dependencies in natural images

by Sanjiv Kumar, Martial Hebert - In NIPS , 2003
"... In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to ..."
Abstract - Cited by 86 (2 self) - Add to MetaCart
In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudo-likelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph min-cut algorithms. The performance of the model was verified on the synthetic as well as the real-world images. The DRF model outperforms the MRF model in the experiments. 1

Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron

by Michael Collins, Florham Park , 2002
"... We describe algorithms that rerank the top N hypotheses from a maximum-entropy tagger, the application being named-entity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms g ..."
Abstract - Cited by 60 (2 self) - Add to MetaCart
We describe algorithms that rerank the top N hypotheses from a maximum-entropy tagger, the application being named-entity recognition in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University