Results 1 - 10
of
12
Named entity recognition in tweets: An experimental study.
- In Proceedings of Empirical Methods for Natural Language Processing EMNLP,
, 2011
"... Abstract People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issu ..."
Abstract
-
Cited by 143 (11 self)
- Add to MetaCart
(Show Context)
Abstract People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-NER system doubles F 1 score compared with the Stanford NER system. T-NER leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms cotraining, increasing F 1 by 25% over ten common entity types. Our NLP tools are available at: http:// github.com/aritter/twitter_nlp
Benchmarking the extraction and disambiguation of named entities on the semantic web
- In Proceedings of the 9th International Conference on Language Resources and Evaluation
, 2014
"... Named entity recognition and disambiguation are of primary importance for extracting information and for populating knowledge bases. Detecting and classifying named entities has traditionally been taken on by the natural language processing community, whilst linking of entities to external resources ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Named entity recognition and disambiguation are of primary importance for extracting information and for populating knowledge bases. Detecting and classifying named entities has traditionally been taken on by the natural language processing community, whilst linking of entities to external resources, such as those in DBpedia, has been tackled by the Semantic Web community. As these tasks are treated in different communities, there is as yet no oversight on the performance of these tasks combined. We present an approach that combines the state-of-the art from named entity recognition in the natural language processing domain and named entity linking from the semantic web community. We report on experiments and results to gain more insights into the strengths and limitations of current approaches on these tasks. Our approach relies on the numerous web extractors supported by the NERD framework, which we combine with a machine learning algorithm to optimize recognition and linking of named entities. We test our approach on four standard data sets that are composed of two diverse text types, namely newswire and microposts.
Nerit: Named Entity Recognition for Informal Text
, 2012
"... recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor. ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor.
A generic openworld named entity disambiguation approach for tweets
- In Proceedings of the 5th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2013
, 2013
"... Abstract: Social media is a rich source of information. To make use of this information, it is sometimes required to extract and disambiguate named entities. In this paper, we focus on named entity disambiguation (NED) in twitter messages. NED in tweets is challenging in two ways. First, the limited ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
(Show Context)
Abstract: Social media is a rich source of information. To make use of this information, it is sometimes required to extract and disambiguate named entities. In this paper, we focus on named entity disambiguation (NED) in twitter messages. NED in tweets is challenging in two ways. First, the limited length of Tweet makes it hard to have enough context while many disambiguation techniques depend on it. The second is that many named entities in tweets do not exist in a knowledge base (KB). We share ideas from information retrieval (IR) and NED to propose solutions for both challenges. For the first problem we make use of the gregarious nature of tweets to get enough context needed for disambiguation. For the second problem we look for an alternative home page if there is no Wikipedia page represents the entity. Given a mention, we obtain a list of Wikipedia candidates from YAGO KB in addition to top ranked pages from Google search engine. We use Support Vector Machine (SVM) to rank the candidate pages to find the best representative entities. Experiments conducted on two data sets show better disambiguation results compared with the baselines and a competitor.
A hybrid framework for scalable Opinion Mining in Social Media: detecting polarities and attitude targets
"... joan.codina ..."
(Show Context)
Simple and Knowledge-intensive Generative Model for Named Entity Recognition
"... ABSTRACT Almost all of the existing work on Named Entity Recognition (NER) consists of the following pipeline stages -part-of-speech tagging, segmentation, and named entity type classification. The requirement of hand-labeled training data on these stages makes it very expensive to extend to differ ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Almost all of the existing work on Named Entity Recognition (NER) consists of the following pipeline stages -part-of-speech tagging, segmentation, and named entity type classification. The requirement of hand-labeled training data on these stages makes it very expensive to extend to different domains and entity classes. Even with a large amount of hand-labeled data, existing techniques for NER on informal text, such as social media, perform poorly due to a lack of reliable capitalization, irregular sentence structure and a wide range of vocabulary. In this paper, we address the lack of hand-labeled training data by taking advantage of weak super vision signals. We present our approach in two parts. First, we propose a novel generative model that combines the ideas from Hidden Markov Model (HMM) and n-gram language models into what we call an N-gram Language Markov Model (NLMM). Second, we utilize large-scale weak supervision signals from sources such as Wikipedia titles and the corresponding click counts to estimate parameters in NLMM. Our model is simple and can be implemented without the use of Expectation Maximization or other expensive iterative training techniques. Even with this simple model, our approach to NER on informal text outperforms existing systems trained on formal English and matches state-of-the-art NER systems trained on hand-labeled Twitter messages. Because our model does not require hand-labeled data, we can adapt our system to other domains and named entity classes very easily. We demonstrate the flexibility of our approach by successfully applying it to the different domain of extracting food dishes from restaurant reviews with very little extra work.
Exploiting Language Models to Classify Events from Twitter
"... Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure t ..."
Abstract
- Add to MetaCart
(Show Context)
Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets' features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events.
Universidade Federal do Amazonas
"... Microblog platforms such as Twitter are being increasingly adopted by Web users, yielding an important source of data forwebsearchandminingapplications. TaskssuchasNamed Entity Recognition are at the core of many of these applications, but the effectiveness of existing tools is seriously compromised ..."
Abstract
- Add to MetaCart
(Show Context)
Microblog platforms such as Twitter are being increasingly adopted by Web users, yielding an important source of data forwebsearchandminingapplications. TaskssuchasNamed Entity Recognition are at the core of many of these applications, but the effectiveness of existing tools is seriously compromised when applied to Twitter data, since messages are terse, poorlywordedandpostedinmanydifferentlanguages. Also, Twitter follows a streaming paradigm, imposing that entities must be recognized in real-time. In view of these challenges and the inappropriateness of existing tools, we propose a novel approach for Named Entity Recognition on Twitter data called FS-NER (Filter-Stream Named Entity Recognition). FS-NER is characterized by the use of filters that process unlabeled Twitter messages, being much more practical than existing supervised CRF-based approaches. Such filters can be combined either in sequence or in parallel in a flexible way. Moreover, because these filters are not language dependent, FS-NER can be applied to different languages without requiring a laborious adaptation. Through a systematic evaluation using three Twitter collections and considering seven types of entity, we show that FS-NERperforms 3 % better than a CRF-based baseline, besides being orders of magnitude faster and much more practical.
Linguistic Engineering Group Polish Academy of Sciences
"... This paper reports on some experiments aiming at tuning a rule-based NER system designed for detecting names in Polish online news to the processing of targeted Twitter streams. In particular, one explores whether the performance of the baseline NER system can be improved through the incremental app ..."
Abstract
- Add to MetaCart
(Show Context)
This paper reports on some experiments aiming at tuning a rule-based NER system designed for detecting names in Polish online news to the processing of targeted Twitter streams. In particular, one explores whether the performance of the baseline NER system can be improved through the incremental application of knowledge-poor methods for name matching and guessing. We study various settings and combinations of the methods and present evaluation results on five corpora gathered from Twitter, centred around major events and known individuals. 1
Slovak Republic
"... Abstract—In this paper we evaluate eight well-known Information Extraction (IE) tools on a task of Named Entity Recognition (NER) in microposts. We have chosen six NLP tools and two Wikipedia concept extractors for the evaluation. Our intent was to see how these tools would perform on relatively sho ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—In this paper we evaluate eight well-known Information Extraction (IE) tools on a task of Named Entity Recognition (NER) in microposts. We have chosen six NLP tools and two Wikipedia concept extractors for the evaluation. Our intent was to see how these tools would perform on relatively short texts of microposts. Evaluation dataset has been adopted from the MSM 2013 IE Challenge. This dataset contained manually annotated microposts with classification restricted to four entity types: PER, LOC, ORG and MISC. I.