Results 1 -
5 of
5
Mining evidences for named entity disambiguation
- In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2013
"... Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information net-work or knowledge graph from unstructured text. Previous research has tackled this problem by making use of vari-ous textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to col-lect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a genera-tive model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of “background topic ” and “unknown entities”, our model is able to harvest useful evidences out of noisy in-formation. Experimental results show that our proposed method outperforms the state-of-the-art approaches signif-icantly: boosting the disambiguation accuracy from 43% (baseline) to 86 % on short queries derived from tweets.
A Scalable Gibbs Sampler for Probabilistic Entity Linking
"... Abstract. Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is chal-lenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilis ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is chal-lenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associ-ated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incor-porate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset. 1
M.: Scalable probabilistic entity-topic modeling. arXiv preprint arXiv:1309.0337
, 2013
"... We present an LDA approach to entity disambiguation. Each topic is asso-ciated with a Wikipedia article and topics generate either content words or entity mentions. Training such models is challenging because of the topic and vocabulary size, both in the millions. We tackle these problems using a no ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We present an LDA approach to entity disambiguation. Each topic is asso-ciated with a Wikipedia article and topics generate either content words or entity mentions. Training such models is challenging because of the topic and vocabulary size, both in the millions. We tackle these problems using a novel distributed infer-ence and representation framework based on a parallel Gibbs sampler guided by the Wikipedia link graph, and pipelines of MapReduce allowing fast and memory-frugal processing of large datasets. We report state-of-the-art performance on a public dataset. 1
A Context-Aware Topic Model for Statistical Machine Translation
"... Lexical selection is crucial for statistical ma-chine translation. Previous studies separately exploit sentence-level contexts and document-level topics for lexical selection, neglecting their correlations. In this paper, we propose a context-aware topic model for lexical selec-tion, which not only ..."
Abstract
- Add to MetaCart
Lexical selection is crucial for statistical ma-chine translation. Previous studies separately exploit sentence-level contexts and document-level topics for lexical selection, neglecting their correlations. In this paper, we propose a context-aware topic model for lexical selec-tion, which not only models local contexts and global topics but also captures their correla-tions. The model uses target-side translations as hidden variables to connect document top-ics and source-side local contextual words. In order to learn hidden variables and distribu-tions from data, we introduce a Gibbs sam-pling algorithm for statistical estimation and inference. A new translation probability based on distributions learned by the model is inte-grated into a translation system for lexical se-lection. Experiment results on NIST Chinese-English test sets demonstrate that 1) our model significantly outperforms previous lexical se-lection methods and 2) modeling correlations between local words and global topics can fur-ther improve translation quality. 1
Cross-lingual Link Discovery between Chinese and English Wiki Knowledge Bases
"... Wikipedia is an online multilingual encyclopedia that contains a very large number of articles covering most written languages. However, one critical issue for Wikipedia is that the pages in different languages are rarely linked except for the cross-lingual link between pages about the same subject. ..."
Abstract
- Add to MetaCart
Wikipedia is an online multilingual encyclopedia that contains a very large number of articles covering most written languages. However, one critical issue for Wikipedia is that the pages in different languages are rarely linked except for the cross-lingual link between pages about the same subject. This could pose serious difficulties to humans and machines who try to seek information from different lingual sources. In order to address above issue, we propose a hybrid approach that exploits anchor strength, topic relevance and entity knowledge graph to automatically discovery cross-lingual links. In addition, we develop CELD, a system for automatically linking key terms in Chinese documents with English Concepts. As demonstrated in the experiment evaluation, the proposed model outperforms several baselines on the NTCIR data set, which has been designed especially for the cross-lingual link discovery evaluation. 1