Results 11  20
of
551
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
 IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain cond ..."
Abstract

Cited by 163 (11 self)
 Add to MetaCart
(Show Context)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain conditional random fields (CRFs) in which each time slice contains a set of state variables and edgesa distributed state representation as in dynamic Bayesian networks (DBNs)and parameters are tied across slices. Since exact
Contrastive estimation: Training loglinear models on unlabeled data
 In Proc. of ACL
, 2005
"... Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are loglinear, allowing the incorporation of arbitrary features into the model. To train on unlabele ..."
Abstract

Cited by 153 (16 self)
 Add to MetaCart
(Show Context)
Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are loglinear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for loglinear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem—POS tagging given a tagging dictionary and unlabeled text—contrastive estimation outperforms EM (with the same feature set), is more robust to degradations of the dictionary, and can largely recover by modeling additional features. 1
Table Extraction Using Conditional Random Fields
, 2003
"... The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multidimensional information. Tables do this by employing layout pa ..."
Abstract

Cited by 140 (8 self)
 Add to MetaCart
(Show Context)
The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multidimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in twodimensional form.
LocationBased Activity Recognition using Relational Markov Networks
"... In this paper we define a general framework for activity recognition by building upon and extending Relational Markov Networks. Using the example of activity recognition from location data, we show that our model can represent a variety of features including temporal information such as time of day, ..."
Abstract

Cited by 139 (13 self)
 Add to MetaCart
In this paper we define a general framework for activity recognition by building upon and extending Relational Markov Networks. Using the example of activity recognition from location data, we show that our model can represent a variety of features including temporal information such as time of day, spatial information extracted from geographic databases, and global constraints such as the number of homes or workplaces of a person. We develop an efficient inference and learning technique based on MCMC. Using GPS location data collected by multiple people we show that the technique can accurately label a person’s activity locations. Furthermore, we show that it is possible to learn good models from less data by using priors extracted from other people’s data.
Accelerated training of conditional random fields with stochastic gradient methods
 In ICML
, 2006
"... We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than lim ..."
Abstract

Cited by 131 (6 self)
 Add to MetaCart
(Show Context)
We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limitedmemory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques. 1.
cdec: A decoder, alignment, and learning framework for finitestate and contextfree translation models
 In Proceedings of ACL System Demonstrations
, 2010
"... We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including wordbased models, phrasebased models, and models based on synchronous contextfree grammars. Using a single unified internal representation for translat ..."
Abstract

Cited by 122 (48 self)
 Add to MetaCart
(Show Context)
We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including wordbased models, phrasebased models, and models based on synchronous contextfree grammars. Using a single unified internal representation for translation forests, the decoder strictly separates modelspecific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1 or kbest translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradientbased or gradientfree optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders. 1
Extracting places and activities from gps traces using hierarchical conditional random fields
 International Journal of Robotics Research
, 2007
"... Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent mod ..."
Abstract

Cited by 114 (3 self)
 Add to MetaCart
(Show Context)
Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent model of a person’s activities and places. In contrast to existing techniques, our approach takes highlevel context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicate that our system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons. 1
Learning the structure of Markov logic networks
 In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract

Cited by 109 (19 self)
 Add to MetaCart
(Show Context)
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudolikelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two realworld domains, and found that it outperforms using offtheshelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledgebased approaches. 1.
Hidden conditional random fields for phone classification
 in Interspeech
, 2005
"... In this paper, we show the novel application of hidden conditional random fields (HCRFs) – conditional random fields with hidden state sequences – for modeling speech. Hidden state sequences are critical for modeling the nonstationarity of speech signals. We show that HCRFs can easily be trained u ..."
Abstract

Cited by 104 (7 self)
 Add to MetaCart
(Show Context)
In this paper, we show the novel application of hidden conditional random fields (HCRFs) – conditional random fields with hidden state sequences – for modeling speech. Hidden state sequences are critical for modeling the nonstationarity of speech signals. We show that HCRFs can easily be trained using the simple direct optimization technique of stochastic gradient descent. We present the results on the TIMIT phone classification task and show that HCRFs outperforms comparable ML and CML/MMI trained HMMs. In fact, HCRF results on this task are the best single classifier results known to us. We note that the HCRF framework is easily extensible to recognition since it is a state and label sequence modeling technique. We also note that HCRFs have the ability to handle complex features without any change in training procedure. 1.