Results 1 -
6 of
6
Learning to find interesting connections in Wikipedia
"... Abstract—To help users answer the question, what is the relation between (real world) entities or concepts, we might need to go well beyond the borders of traditional information retrieval systems. In this paper, we explore the possibility of exploiting the Wikipedia link graph as a knowledge base f ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—To help users answer the question, what is the relation between (real world) entities or concepts, we might need to go well beyond the borders of traditional information retrieval systems. In this paper, we explore the possibility of exploiting the Wikipedia link graph as a knowledge base for finding interesting connections between two or more given concepts, described by Wikipedia articles. We use a modified Spreading Activation algorithm to identify connections between input concepts. The main challenge in our approach lies in assessing the strength of a relation defined by a link between articles. We propose two approaches for link weighting and evaluate their results with a user evaluation. Our results show a strong correlation between used weighting methods and user preferences; results indicate that the Wikipedia link graph can be used as valuable semantic resource. I.
Using the Wiktionary Graph Structure for Synonym Detection
"... This paper presents our work on using the graph structure of Wiktionary for synonym detection. We implement semantic relatedness metrics using both a direct measure of information flow on the graph and a comparison of the list of vertices found to be “close ” to a given vertex. Our algorithms, evalu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents our work on using the graph structure of Wiktionary for synonym detection. We implement semantic relatedness metrics using both a direct measure of information flow on the graph and a comparison of the list of vertices found to be “close ” to a given vertex. Our algorithms, evaluated on ESL 50, TOEFL 80 and RDWP 300 data sets, perform better than or comparable to existing semantic relatedness measures. 1
of the Russian Academy of Sciences
"... We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch ..."
Abstract
- Add to MetaCart
We present a novel method for key term extraction from text documents. In our method, document is modeled as a graph of semantic relationships between terms of that document. We exploit the following remarkable feature of the graph: the terms related to the main topics of the document tend to bunch up into densely interconnected subgraphs or communities, while non-important terms fall into weakly interconnected communities, or even become isolated vertices. We apply graph community detection techniques to partition the graph into thematically cohesive groups of terms. We introduce a criterion function to select groups that contain key terms discarding groups with unimportant terms. To weight terms and determine semantic relatedness between them we exploit information extracted from Wikipedia. Using such an approach gives us the following two advantages. First, it allows effectively processing multi-theme documents. Second, it is good at filtering out noise information in the document, such as, for example, navigational bars or headers in web pages. Evaluations of the method show that it outperforms existing methods producing key terms with higher precision and recall. Additional experiments on web pages prove that our method appears to be substantially more effective on noisy and multi-theme documents than existing methods.
Effective Extraction of Thematically Grouped Key Terms From Text
"... We present a novel method for extraction of key terms from text documents. The important and novel feature of our method is that it produces groups of key terms, while each group contains key terms semantically related to one of the main themes of the document. Our method bases on a combination of t ..."
Abstract
- Add to MetaCart
We present a novel method for extraction of key terms from text documents. The important and novel feature of our method is that it produces groups of key terms, while each group contains key terms semantically related to one of the main themes of the document. Our method bases on a combination of the following two techniques: Wikipedia-based semantic relatedness measure of terms and algorithm for detecting community structure of a network. One of the advantages of our method is that it does not require any training, as it works upon the Wikipedia knowledge base. Our experimental evaluation using human judgments shows that our method produces key terms with high precision and recall.
DeterminingtheSpatialReaderScopesofNewsSources
"... Information sources on the Internet (e.g. web versions of newspapers) usually have an implicit spatial reader scope, termed the audience location which is the geographical location for which the content has been primarily produced. Knowledge of the spatial reader scope facilitates the construction o ..."
Abstract
- Add to MetaCart
Information sources on the Internet (e.g. web versions of newspapers) usually have an implicit spatial reader scope, termed the audience location which is the geographical location for which the content has been primarily produced. Knowledge of the spatial reader scope facilitates the construction of a news search engine that provides readers a set of news sources relevant to the location in which they are interested. In particular, it plays an important role in disambiguating toponyms (e.g. textual specifications of geographical locations) in news articles, as the process of selecting an interpretation for the toponym often reduces to one of selecting an interpretation that seems natural to those familiar with the audience location. The key to determining the spatial reader scope of news sources is the notion of local lexicon, which for a location s is a set of concepts such as, but not limited to, names of people, landmarks, and historical events, that are spatially related to s. Techniques to automatically generate the local lexicon of a location by using the link structure of Wikipedia are described and evaluated. A key contribution is the improvement of existing methods used in the semantic relatedness domain to extract concepts spatially related to a given location from the Wikipedia. Results of experiments are presented that indicate that the knowledge of the audience location significantly improves the disambiguation of textually specified locations in news articles and that using local lexicons is an effective method to determine the spatial reader scopes of news sources.
A Wikipedia Based Semantic Graph Model for Topic Tracking in Blogosphere
- PROCEEDINGS OF THE TWENTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
"... There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth ” effect, thus making topics evolve very fast. This paper presents a novel t ..."
Abstract
- Add to MetaCart
There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth ” effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph, in which the semantic relatedness between terms are learned from Wikipedia. For a given topic/post, the name entities, Wikipedia concepts, and the semantic relatedness are extracted to generate the graph model. Noises are filtered out through the graph clustering algorithm. To handle topic evolution, the topic model is enriched by using Wikipedia as background knowledge. Furthermore, graph edit distance is used to measure the similarity between a topic and its posts. The proposed method is tested by using the real-world blog data. Experimental results show the advantage of the proposed method on tracking the topic in short, noisy texts.

