Results 1 - 10
of
21
Design challenges and misconceptions in named entity recognition
- PROCEEDINGS OF THE THIRTEENTH CONFERENCE ON COMPUTATIONAL NATURAL LANGUAGE LEARNING (CONLL)
, 2009
"... We analyze some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system. In particular, we address issues such as the representation of text chunks, the inference approach needed to combine local NER decisions, the sources of prior ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
We analyze some of the fundamental design challenges and misconceptions that underlie the development of an efficient and robust NER system. In particular, we address issues such as the representation of text chunks, the inference approach needed to combine local NER decisions, the sources of prior knowledge and how to use them within an NER system. In the process of comparing several solutions to these challenges we reach some surprising conclusions, as well as develop an NER system that achieves 90.8 F1 score on the CoNLL-2003 NER shared task, the best reported result for this dataset.
A new perceptron algorithm for sequence labeling with non-local features
- In Proceedings of EMNLP
, 2007
"... We cannot use non-local features with current major methods of sequence labeling such as CRFs due to concerns about complexity. We propose a new perceptron algorithm that can use non-local features. Our algorithm allows the use of all types of non-local features whose values are determined from the ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We cannot use non-local features with current major methods of sequence labeling such as CRFs due to concerns about complexity. We propose a new perceptron algorithm that can use non-local features. Our algorithm allows the use of all types of non-local features whose values are determined from the sequence and the labels. The weights of local and non-local features are learned together in the training process with guaranteed convergence. We present experimental results from the CoNLL 2003 named entity recognition (NER) task to demonstrate the performance of the proposed algorithm. 1
Efficient inference with cardinality-based clique potentials
- In Proc. 24th ICML
, 2007
"... Many collective labeling tasks require inference on graphical models where the clique potentials depend only on the number of nodes that get a particular label. We design efficient inference algorithms for various families of such potentials. Our algorithms are exact for arbitrary cardinality-based ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Many collective labeling tasks require inference on graphical models where the clique potentials depend only on the number of nodes that get a particular label. We design efficient inference algorithms for various families of such potentials. Our algorithms are exact for arbitrary cardinality-based clique potentials on binary labels and for max-like and majority-like clique potentials on multiple labels. Moving towards more complex potentials, we show that inference becomes NP-hard even on cliques with homogeneous Potts potentials. We present a 13 15-approximation algorithm with runtime sub-quadratic in the clique size. In contrast, the best known previous guarantee for graphs with Potts potentials is only 0.5. We perform empirical comparisons on real and synthetic data, and show that our proposed methods are an order of magnitude faster than the well-known Tree-based reparameterization (TRW) and graph-cut algorithms. 1.
Stacked graphical models for efficient inference in markov random fields
- In Proceedings of the 2007 SIAM International Conference on Data Mining
, 2007
"... In collective classification, classes are predicted simultaneously for a group of related instances, rather than predicting a class for each instance separately. Collective classification has been widely used for classification on relational datasets. However, the inference procedure used in collect ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In collective classification, classes are predicted simultaneously for a group of related instances, rather than predicting a class for each instance separately. Collective classification has been widely used for classification on relational datasets. However, the inference procedure used in collective classification usually requires many iterations and thus is expensive. We propose stacked graphical learning, a meta-learning scheme in which a base learner is augmented by expanding one instance’s features with predictions on other related instances. Stacked graphical learning is efficient, especially during inference, capable of capturing dependencies easily, and can be implemented with any kind of base learner. In experiments on eight datasets, stacked graphical learning is 40 to 80 times faster than Gibbs sampling during inference. 1
Regular expression learning for information extraction
- In EMNLP
, 2008
"... Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propos ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propose Re-LIE, a novel transformation-based algorithm for learning such complex regular expressions. We evaluate the performance of our algorithm on multiple datasets and compare it against the CRF algorithm. We show that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data. Finally, we show how the accuracy of CRF can be improved by using features extracted by ReLIE. 1
Recognizing Named Entities in Tweets
"... The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning fram ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The challenges of Named Entities Recognition (NER) for tweets lie in the insufficient information in a tweet and the unavailability of training data. We propose to combine a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model under a semi-supervised learning framework to tackle these challenges. The KNN based classifier conducts pre-labeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semi-supervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of KNN and semisupervised learning. 1
Notes on Stacked Graphical Learning for Efficient Inference
- Department, Carnegie Mellon University
, 2007
"... In collective classification, classes are predicted simultaneously for a group of related instances, rather than predicting a class for each instance separately. Collective classification has been widely used for classification on relational datasets. However, the inference procedure used in collect ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In collective classification, classes are predicted simultaneously for a group of related instances, rather than predicting a class for each instance separately. Collective classification has been widely used for classification on relational datasets. However, the inference procedure used in collective classification usually requires many iterations and thus is expensive. We propose stacked graphical learning, a meta-learning scheme in which a base learner is augmented by expanding one instance’s features with predictions on other related instances. Stacked graphical learning is efficient, especially during inference, capable of capturing dependencies easily, and can be implemented with any kind of base learner. In experiments on eight datasets, stacked graphical learning is 40 to 80 times faster than Gibbs sampling during inference. We also give theoretical analysis to better understand the algorithm.
Personal Name Classification in Web Queries
, 2008
"... Personal names are an important kind of Web queries in Web search, and yet they are special in many ways. Strategies for retrieving information on personal names should therefore be different from the strategies for other types of queries. To improve the search quality for personal names, a first st ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Personal names are an important kind of Web queries in Web search, and yet they are special in many ways. Strategies for retrieving information on personal names should therefore be different from the strategies for other types of queries. To improve the search quality for personal names, a first step is to detect whether a query is a personal name. Despite the importance of this problem, relatively little previous research has been done on this topic. Since Web queries are usually short, conventional supervised machine-learning algorithms cannot be applied directly. An alternative is to apply some heuristic rules coupled with name-term dictionaries. However, when the dictionaries are small, this method tends to make false negatives; when the dictionaries are large, it tends to generate false positives. A more serious problem is that this method cannot provide a good tradeoff between precision and recall. To solve these problems, we propose an approach based on the construction of probabilistic name-term dictionaries and personal name grammars, and use this algorithm to predict the probability of a query to be a personal name. In this paper, we develop four different methods for building probabilistic name-term dictionaries in which a term is assigned with a probability value of the term being a name term. We compared our approach with baseline algorithms such as dictionary-based look-up methods and supervised classification algorithms including logistic regression and SVM on some manually labeled test sets. The results validate the effectiveness of our approach, whose F1 value is more than 79.8%, which outperforms the best baseline by more than 11.3%.
Tsinghua University at the summarization track of TAC 2008
"... The three authors should be all first-authors. ..."

