Results 11  20
of
660
Accelerated training of conditional random fields with stochastic gradient methods
 In ICML
, 2006
"... We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than lim ..."
Abstract

Cited by 141 (6 self)
 Add to MetaCart
We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limitedmemory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques. 1.
Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms
 ICML 2004: PROCEEDINGS OF THE TWENTYFIRST INTERNATIONAL CONFERENCE ON MACHINE LEARNING. OMNIPRESS
, 2004
"... Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classi cation, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linea ..."
Abstract

Cited by 122 (10 self)
 Add to MetaCart
Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classi cation, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. This class of methods, related to online algorithms such as perceptron, are both ecient and very simple to implement.
Extracting places and activities from gps traces using hierarchical conditional random fields
 International Journal of Robotics Research
, 2007
"... Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent mod ..."
Abstract

Cited by 119 (3 self)
 Add to MetaCart
(Show Context)
Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent model of a person’s activities and places. In contrast to existing techniques, our approach takes highlevel context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicate that our system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons. 1
Hidden conditional random fields for phone classification
 in Interspeech
, 2005
"... In this paper, we show the novel application of hidden conditional random fields (HCRFs) – conditional random fields with hidden state sequences – for modeling speech. Hidden state sequences are critical for modeling the nonstationarity of speech signals. We show that HCRFs can easily be trained u ..."
Abstract

Cited by 114 (7 self)
 Add to MetaCart
(Show Context)
In this paper, we show the novel application of hidden conditional random fields (HCRFs) – conditional random fields with hidden state sequences – for modeling speech. Hidden state sequences are critical for modeling the nonstationarity of speech signals. We show that HCRFs can easily be trained using the simple direct optimization technique of stochastic gradient descent. We present the results on the TIMIT phone classification task and show that HCRFs outperforms comparable ML and CML/MMI trained HMMs. In fact, HCRF results on this task are the best single classifier results known to us. We note that the HCRF framework is easily extensible to recognition since it is a state and label sequence modeling technique. We also note that HCRFs have the ability to handle complex features without any change in training procedure. 1.
Discriminative training of markov logic networks
 In Proc. of the Natl. Conf. on Artificial Intelligence
, 2005
"... Many machine learning applications require a combination of probability and firstorder logic. Markov logic networks (MLNs) accomplish this by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be lear ..."
Abstract

Cited by 107 (19 self)
 Add to MetaCart
Many machine learning applications require a combination of probability and firstorder logic. Markov logic networks (MLNs) accomplish this by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be learned by maximizing the likelihood of a relational database, but this can be quite costly and lead to suboptimal results for any given prediction task. In this paper we propose a discriminative approach to training MLNs, one which optimizes the conditional likelihood of the query predicates given the evidence ones, rather than the joint likelihood of all predicates. We extend Collins’s (2002) voted perceptron algorithm for HMMs to MLNs by replacing the Viterbi algorithm with a weighted satisfiability solver. Experiments on entity resolution and link prediction tasks show the advantages of this approach compared to generative MLN training, as well as compared to purely probabilistic and purely logical approaches.
Entity Resolution with Markov Logic
 In ICDM
, 2006
"... Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolate ..."
Abstract

Cited by 105 (10 self)
 Add to MetaCart
(Show Context)
Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolated aspects of the problem, and are often ad hoc. This paper proposes a wellfounded, integrated solution to the entity resolution problem based on Markov logic. Markov logic combines firstorder logic and probabilistic graphical models by attaching weights to firstorder formulas, and viewing them as templates for features of Markov networks. We show how a number of previous approaches can be formulated and seamlessly combined in Markov logic, and how the resulting learning and inference problems can be solved efficiently. Experiments on two citation databases show the utility of this approach, and evaluate the contribution of the different components. 1
Learning CRFs using Graph Cuts
"... Abstract. Many computer vision problems are naturally formulated as random fields, specifically MRFs or CRFs. The introduction of graph cuts has enabled efficient and optimal inference in associative random fields, greatly advancing applications such as segmentation, stereo reconstruction and many o ..."
Abstract

Cited by 104 (8 self)
 Add to MetaCart
(Show Context)
Abstract. Many computer vision problems are naturally formulated as random fields, specifically MRFs or CRFs. The introduction of graph cuts has enabled efficient and optimal inference in associative random fields, greatly advancing applications such as segmentation, stereo reconstruction and many others. However, while fast inference is now widespread, parameter learning in random fields has remained an intractable problem. This paper shows how to apply fast inference algorithms, in particular graph cuts, to learn parameters of random fields with similar efficiency. We find optimal parameter values under standard regularized objective functions that ensure good generalization. Our algorithm enables learning of many parameters in reasonable time, and we explore further speedup techniques. We also discuss extensions to nonassociative and multiclass problems. We evaluate the method on image segmentation and geometry recognition. 1
Exploiting dictionaries in named entity extraction: Combining semimarkov extraction processes and data integration method
 In Proceedings of the ACM SIGKDD Conference
, 2004
"... We consider the problem of improving named entity recognition (NER) systems by using external dictionaries—more specifically, the problem of extending stateoftheart NER systems by incorporating information about the similarity of extracted entities to entities in an external dictionary. This is d ..."
Abstract

Cited by 98 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries—more specifically, the problem of extending stateoftheart NER systems by incorporating information about the similarity of extracted entities to entities in an external dictionary. This is difficult because most highperformance named entity recognition systems operate by sequentially classifying words as to whether or not they participate in an entity name; however, the most useful similarity measures score entire candidate names. To correct this mismatch we formalize a semiMarkov extraction process which relaxes the usual Markov assumptions. This process is based on sequentially classifying segments of several adjacent words, rather than single words. In addition to allowing a natural way of coupling NER and highperformance record linkage methods, this formalism also allows the direct use of other useful entitylevel features, and provides a more natural formulation of the NER problem than sequential word classification. Experiments in multiple domains show that the new model can substantially improve extraction performance, relative to previously published methods for using external dictionaries in NER.
Kernel Conditional Random Fields: Representation and Clique Selection
 IN ICML
, 2004
"... Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Me ..."
Abstract

Cited by 96 (5 self)
 Add to MetaCart
(Show Context)
Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. A procedure for greedily selecting cliques in the dual representation is then proposed, which allows sparse representations. By incorporating kernels and implicit feature spaces into conditional graphical models, the framework enables semisupervised learning algorithms for structured data through the use of graph kernels.
Learning conditional random fields for stereo
 In CVPR
, 2007
"... Stateoftheart stereo vision algorithms utilize color changes as important cues for object boundaries. Most methods impose heuristic restrictions or priors on disparities, for example by modulating local smoothness costs with intensity gradients. In this paper we seek to replace such heuristics wi ..."
Abstract

Cited by 92 (3 self)
 Add to MetaCart
(Show Context)
Stateoftheart stereo vision algorithms utilize color changes as important cues for object boundaries. Most methods impose heuristic restrictions or priors on disparities, for example by modulating local smoothness costs with intensity gradients. In this paper we seek to replace such heuristics with explicit probabilistic models of disparities and intensities learned from real images. We have constructed a large number of stereo datasets with groundtruth disparities, and we use a subset of these datasets to learn the parameters of Conditional Random Fields (CRFs). We present experimental results illustrating the potential of our approach for automatically learning the parameters of models with richer structure than standard handtuned MRF models. 1. Introduction and