Results 1  10
of
193
Unsupervised NamedEntity Extraction from the Web: An Experimental Study
 ARTIFICIAL INTELLIGENCE
, 2005
"... The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domainindependent, and scalable manner. The paper presents an overview of KNOWITALL’s novel architecture and design princip ..."
Abstract

Cited by 300 (38 self)
 Add to MetaCart
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domainindependent, and scalable manner. The paper presents an overview of KNOWITALL’s novel architecture and design principles, emphasizing its distinctive ability to extract information without any handlabeled training examples. In its first major run, KNOWITALL extracted over 50,000 facts, but suggested a challenge: How can we improve KNOWITALL’s recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domainspecific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies subclasses in order to boost recall. List Extraction locates lists of class instances, learns a “wrapper ” for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL’s domainindependent methods, the methods also obviate handlabeled training examples. The paper reports on experiments, focused on namedentity extraction, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4fold to 8fold increase in recall, while maintaining high precision, and discovered over 10,000 cities missing from the Tipster Gazetteer.
Learning to combine bottomup and topdown segmentation
 in: European Conference on Computer Vision
"... Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations t ..."
Abstract

Cited by 106 (0 self)
 Add to MetaCart
Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations that can be significantly refined using lowlevel cues. This raises the question of how to combine both topdown and bottomup cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragmentbased segmentation algorithm which takes into account both bottomup and topdown cues simultaneously, in contrast to most existing algorithms which train topdown and bottomup modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure topdown algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with lowlevel cues to efficiently compute high quality segmentations. 1
Efficient structure learning of Markov networks using L1regularization
 In NIPS
, 2006
"... Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to ..."
Abstract

Cited by 106 (2 self)
 Add to MetaCart
Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally effective method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the loglinear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: modeling the joint distribution of pixel values in the MNIST data, and modeling the joint distribution of genetic sequence variations in the human HapMap data. We show that our L1based method achieves considerably higher generalization performance than the more standard L2based method (a Gaussian parameter prior) or pure maximumlikelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem. 1
Learning the structure of Markov logic networks
 In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract

Cited by 94 (18 self)
 Add to MetaCart
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudolikelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two realworld domains, and found that it outperforms using offtheshelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledgebased approaches. 1.
Extracting places and activities from gps traces using hierarchical conditional random fields
 International Journal of Robotics Research
, 2007
"... Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent mod ..."
Abstract

Cited by 85 (2 self)
 Add to MetaCart
Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent model of a person’s activities and places. In contrast to existing techniques, our approach takes highlevel context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicate that our system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons. 1
WebScale Information Extraction in KnowItAll
, 2004
"... Manually querying search engines in order to accumulate a large body of factual information is a tedious, errorprone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from ..."
Abstract

Cited by 85 (6 self)
 Add to MetaCart
Manually querying search engines in order to accumulate a large body of factual information is a tedious, errorprone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domainindependent, and scalable manner.
Discriminative training of markov logic networks
 In Proc. of the Natl. Conf. on Artificial Intelligence
, 2005
"... Many machine learning applications require a combination of probability and firstorder logic. Markov logic networks (MLNs) accomplish this by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be lear ..."
Abstract

Cited by 84 (18 self)
 Add to MetaCart
Many machine learning applications require a combination of probability and firstorder logic. Markov logic networks (MLNs) accomplish this by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be learned by maximizing the likelihood of a relational database, but this can be quite costly and lead to suboptimal results for any given prediction task. In this paper we propose a discriminative approach to training MLNs, one which optimizes the conditional likelihood of the query predicates given the evidence ones, rather than the joint likelihood of all predicates. We extend Collins’s (2002) voted perceptron algorithm for HMMs to MLNs by replacing the Viterbi algorithm with a weighted satisfiability solver. Experiments on entity resolution and link prediction tasks show the advantages of this approach compared to generative MLN training, as well as compared to purely probabilistic and purely logical approaches.
Kernel Conditional Random Fields: Representation and Clique Selection
 in ICML
, 2004
"... Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Me ..."
Abstract

Cited by 80 (5 self)
 Add to MetaCart
Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. A procedure for greedily selecting cliques in the dual representation is then proposed, which allows sparse representations. By incorporating kernels and implicit feature spaces into conditional graphical models, the framework enables semisupervised learning algorithms for structured data through the use of graph kernels.
Collective multilabel classification
 In CIKM
, 2005
"... Common approaches to multilabel classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only wellsuited to problems in which categories are independen ..."
Abstract

Cited by 75 (1 self)
 Add to MetaCart
Common approaches to multilabel classification learn independent classifiers for each category, and employ ranking or thresholding schemes for classification. Because they do not exploit dependencies between labels, such techniques are only wellsuited to problems in which categories are independent. However, in many domains labels are highly interdependent. This paper explores multilabel conditional random field (CRF) classification models that directly parameterize label cooccurrences in multilabel classification. Experiments show that the models outperform their singlelabel counterparts on standard text corpora. Even when multilabels are sparse, the models improve subset classification error by as much as 40%.