Results 1 - 10
of
110
Unsupervised Named-Entity Extraction from the Web: An Experimental Study
- ARTIFICIAL INTELLIGENCE
, 2005
"... The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW-ITALL’s novel architecture and design princip ..."
Abstract
-
Cited by 205 (37 self)
- Add to MetaCart
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domain-independent, and scalable manner. The paper presents an overview of KNOW-ITALL’s novel architecture and design principles, emphasizing its distinctive ability to extract information without any hand-labeled training examples. In its first major run, KNOW-ITALL extracted over 50,000 facts, but suggested a challenge: How can we improve KNOW-ITALL’s recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domain-specific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies sub-classes in order to boost recall. List Extraction locates lists of class instances, learns a “wrapper ” for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL’s domainindependent methods, the methods also obviate hand-labeled training examples. The paper reports on experiments, focused on named-entity extraction, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOW-ITALL a 4-fold to 8-fold increase in recall, while maintaining high precision, and discovered over 10,000 cities missing from the Tipster Gazetteer.
Efficient structure learning of Markov networks using L1regularization
- In NIPS
, 2006
"... Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally effective method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the log-linear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: modeling the joint distribution of pixel values in the MNIST data, and modeling the joint distribution of genetic sequence variations in the human HapMap data. We show that our L1-based method achieves considerably higher generalization performance than the more standard L2-based method (a Gaussian parameter prior) or pure maximum-likelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem. 1
Learning the structure of Markov logic networks
- In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract
-
Cited by 67 (15 self)
- Add to MetaCart
Markov logic networks (MLNs) combine logic and probability by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudo-likelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two real-world domains, and found that it outperforms using off-the-shelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledge-based approaches. 1.
Web-Scale Information Extraction in KnowItAll
, 2004
"... Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from ..."
Abstract
-
Cited by 61 (6 self)
- Add to MetaCart
Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner.
Identifying gene and protein mentions in text using conditional random fields
- BMC Bioinformatics
"... Applying information extraction techniques in the biological domain has been a growing research area over the past few years. Numerous large scale copora have been developed [10] or are being developed [4] ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
Applying information extraction techniques in the biological domain has been a growing research area over the past few years. Numerous large scale copora have been developed [10] or are being developed [4]
Discriminative training of markov logic networks
- In Proc. of the Natl. Conf. on Artificial Intelligence
, 2005
"... Many machine learning applications require a combination of probability and first-order logic. Markov logic networks (MLNs) accomplish this by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be lear ..."
Abstract
-
Cited by 54 (13 self)
- Add to MetaCart
Many machine learning applications require a combination of probability and first-order logic. Markov logic networks (MLNs) accomplish this by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be learned by maximizing the likelihood of a relational database, but this can be quite costly and lead to suboptimal results for any given prediction task. In this paper we propose a discriminative approach to training MLNs, one which optimizes the conditional likelihood of the query predicates given the evidence ones, rather than the joint likelihood of all predicates. We extend Collins’s (2002) voted perceptron algorithm for HMMs to MLNs by replacing the Viterbi algorithm with a weighted satisfiability solver. Experiments on entity resolution and link prediction tasks show the advantages of this approach compared to generative MLN training, as well as compared to purely probabilistic and purely logical approaches.
Kernel Conditional Random Fields: Representation and Clique Selection
- in ICML
, 2004
"... Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graph-structured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Me ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graph-structured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. A procedure for greedily selecting cliques in the dual representation is then proposed, which allows sparse representations. By incorporating kernels and implicit feature spaces into conditional graphical models, the framework enables semi-supervised learning algorithms for structured data through the use of graph kernels.
Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison
, 2004
"... Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable manner. In its first ..."
Abstract
-
Cited by 53 (10 self)
- Add to MetaCart
Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable manner. In its first
Learning to combine bottom-up and top-down segmentation
- in: European Conference on Computer Vision
"... Abstract. Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations t ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
Abstract. Bottom-up segmentation based only on low-level cues is a notoriously difficult problem. This difficulty has lead to recent top-down segmentation algorithms that are based on class-specific image information. Despite the success of top-down algorithms, they often give coarse segmentations that can be significantly refined using low-level cues. This raises the question of how to combine both top-down and bottom-up cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragment-based segmentation algorithm which takes into account both bottom-up and top-down cues simultaneously, in contrast to most existing algorithms which train top-down and bottom-up modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure top-down algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with low-level cues to efficiently compute high quality segmentations. 1
Extracting places and activities from gps traces using hierarchical conditional random fields
- International Journal of Robotics Research
, 2007
"... Learning patterns of human behavior from sensor data is extremely important for high-level activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent mod ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
Learning patterns of human behavior from sensor data is extremely important for high-level activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent model of a person’s activities and places. In contrast to existing techniques, our approach takes high-level context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicate that our system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons. 1

