Results 21 - 30
of
59
Named Entity Recognition in Tweets: An Experimental Study
, 2011
"... People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-bu ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms cotraining, increasing F1 by 25 % over ten common entity types. Our NLP tools are available at:
Distributed word clustering for large scale class-based language modeling in machine translation
- In ACL International Conference Proceedings
, 2008
"... In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we investigate the effects of applying such a technique to higherorder n-gram models trained on large corpora. We introduce a modi ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we investigate the effects of applying such a technique to higherorder n-gram models trained on large corpora. We introduce a modification of the exchange clustering algorithm with improved efficiency for certain partially class-based models and a distributed version of this algorithm to efficiently obtain automatic word classifications for large vocabularies (>1 million words) using such large training corpora (>30 billion tokens). The resulting clusterings are then used in training partially class-based language models. We show that combining them with wordbased n-gram models in the log-linear model of a state-of-the-art statistical machine translation system leads to improvements in translation quality as indicated by the BLEU score. 1
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
Prediction games in infinitely rich worlds
- In Utility Based Data Mining Workshop (UBDM at KDD
, 2006
"... categories, every experience would be new, and one couldn’t make sense of one’s world. Furthermore, higher intelligence requires large numbers of categories, perhaps millions and beyond. Acquiring and robust detection of categories appears to be a complex task as categories inter-relate in complex w ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
categories, every experience would be new, and one couldn’t make sense of one’s world. Furthermore, higher intelligence requires large numbers of categories, perhaps millions and beyond. Acquiring and robust detection of categories appears to be a complex task as categories inter-relate in complex ways and occur in diverse conditions. We may then ask: how can a system learn so many complex inter-related categories? We propose and explore an avenue that we call prediction games in infinitely rich worlds. In these games, the world is a source of an unlimited stream of information. The games are played by a prediction system that in effect repeatedly experiments with its world and learns from its experiments. The system converts its input stream from the world into a sequence of learning episodes for itself. Each learning episode consists of the system hiding parts of the input, guessing (predicting) them using the remainder of the input (the local context), and updating itself based on comparing its observations with its predictions. The goal of the system is to improve its
Morphologically motivated language models in speech recognition
- in Proceedings of International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning
, 2005
"... Language modelling in large vocabulary speech recognition has traditionally been based on words. A lexicon of the most common words of the language in question is created and the recogniser is limited to consider only the words in the lexicon. In Finnish, however, it is more difficult to create an e ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Language modelling in large vocabulary speech recognition has traditionally been based on words. A lexicon of the most common words of the language in question is created and the recogniser is limited to consider only the words in the lexicon. In Finnish, however, it is more difficult to create an extensive lexicon, since the compounding of words, numerous inflections and suffixes increase the number of commonly used word forms considerably. The problem is that reasonably sized lexica lack many common words, and for very large lexica, it is hard to estimate a reliable language model. We have previously reported a new approach for improving the recognition of inflecting or compounding languages in large vocabulary continuous speech recognition tasks. Significant reductions in error rates have been obtained by replacing a traditional word lexicon with a lexicon based on morpheme-like word fragments learnt directly from data. In this paper, we evaluate these so called statistical morphs further, and compare them to grammatical morphs and very large word lexica using n-gram language models of different orders. When compared to the best word model, the morph models seem to be clearly more effective with respect to entropy, and give 30 % relative error-rate reductions in a Finnish recognition task. Furthermore, the statistical morphs seem to be slightly better than the rule-based grammatical morphs. 1.
Large-scale many-class learning
"... A number of tasks, such as large-scale text categorization and word prediction, can benefit from efficient learning and classification when the number of classes (categories), in addition to instances and features, is large, that is, in the thousands and beyond. We investigate learning of sparse cat ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
A number of tasks, such as large-scale text categorization and word prediction, can benefit from efficient learning and classification when the number of classes (categories), in addition to instances and features, is large, that is, in the thousands and beyond. We investigate learning of sparse category indices to address this challenge. An index is a weighted bipartite graph mapping features to categories. On presentation of an instance, the index retrieves and scores a small set of candidate categories. The candidates can then be ranked and the ranking or the scores can be used for category assignment. We present novel online index learning algorithms. When compared to other approaches, including one-versusrest and top-down learning and classification using support vector machines, we find that indexing is highly advantageous in terms of space and time efficiency, at both training and classification times, while yielding similar and often better accuracies. On problems with hundreds of thousands of instances and thousands of categories, the index is learned in minutes, while other methods can take orders of magnitude longer. As we explain, the design of the algorithm makes it convenient to maintain a constraint on the number of prediction connections a feature is allowed to make. This constraint is crucial in yielding efficient learning and classification.
An empirical generative framework for computational modeling of language acquisition
, 2010
"... ..."
Shallow Parsing using Probabilistic Grammatical Inference
- Sacaan A.I., Santori E., Stauderman K.A., Whelan K., Lloyd G.K., McDonald I.A., (S)-(-)-5-ethynyl3 -(l-methyl-2-pyrrolidinyl)pyridine
, 2002
"... This paper presents a machine learning approach to shallow parsing using techniques of grammatical inference. We first learn a deterministic probabilistic automaton that models the joint distribution of chunk and Part-of-speech tags, and then use this automaton as a transducer to find the most l ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper presents a machine learning approach to shallow parsing using techniques of grammatical inference. We first learn a deterministic probabilistic automaton that models the joint distribution of chunk and Part-of-speech tags, and then use this automaton as a transducer to find the most likely chunk tag sequence using a dynamic programming algorithm. The resulting transducers can also be combined with statistical P05' taggers. We also discuss an efficient means of incorporating lexical information together with an application of bagging that improve our results.
Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation
"... The purpose of this work is to explore the integration of morphosyntactic information into the translation model itself, by enriching words with their morphosyntactic categories. We investigate word disambiguation using morphosyntactic categories, n-best hypotheses reranking, and the combination of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The purpose of this work is to explore the integration of morphosyntactic information into the translation model itself, by enriching words with their morphosyntactic categories. We investigate word disambiguation using morphosyntactic categories, n-best hypotheses reranking, and the combination of both methods with word or morphosyntactic n-gram language model reranking. Experiments are carried out on the English-to-Spanish translation task. Using the morphosyntactic language model alone does not results in any improvement in performance. However, combining morphosyntactic word disambiguation with a word based 4-gram language model results in a relative improvement in the BLEU score of 2.3 % on the development set and 1.9% on the test set. 1
Use of hidden markov models and factored language models for automatic chord recognition
- in Proc. ISMIR
, 2009
"... This paper focuses on automatic extraction of acoustic chord sequences from a musical piece. Standard and factored language models are analyzed in terms of applicability to the chord recognition task. Pitch class profile vectors that represent harmonic information are extracted from the given audio ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper focuses on automatic extraction of acoustic chord sequences from a musical piece. Standard and factored language models are analyzed in terms of applicability to the chord recognition task. Pitch class profile vectors that represent harmonic information are extracted from the given audio signal. The resulting chord sequence is obtained by running a Viterbi decoder on trained hidden Markov models and subsequent lattice rescoring, applying the language model weight. We performed several experiments using the proposed technique. Results obtained on 175 manually-labeled songs provided an increase in accuracy of about 2%. 1.

