Results 1 - 10
of
18
Context-Sensitive Semantic Smoothing for the Language Modeling Approach to Genomic IR
- ACM SIGIR 2006, Aug 6-11
, 2006
"... Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. The implemented semantic smoothing models, such as the translation model which statistically maps document terms to query terms, a ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. The implemented semantic smoothing models, such as the translation model which statistically maps document terms to query terms, and a number of works that have followed have shown good experimental results. However, these models are unable to incorporate contextual information. Thus, the resulting translation might be mixed and fairly general. To overcome this limitation, we propose a novel context-sensitive semantic smoothing method that decomposes a document or a query into a set of weighted context-sensitive topic signatures and then translate those topic signatures into query terms. In detail, we solve this problem through (1) choosing concept pairs as topic signatures and adopting an ontology-based approach to extract concept pairs; (2) estimating the translation model for each topic signature using the EM algorithm; and (3) expanding document and query models based on topic signature translations. The new smoothing method is evaluated on TREC 2004/05 Genomics Track collections and significant improvements are obtained. The MAP (mean average precision) achieves a 33.6 % maximal gain over the simple language model, as well as a 7.8 % gain over the language model with context-insensitive semantic smoothing.
Topic Signature Language Models for Ad Hoc Retrieval
, 2007
"... Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. Previously implemented semantic smoothing models such as the translation model have shown good experimental results. However, the ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. Previously implemented semantic smoothing models such as the translation model have shown good experimental results. However, these models are unable to incorporate contextual information. To overcome this limitation, we propose a novel context-sensitive semantic smoothing method that decomposes a document into a set of weighted context-sensitive topic signatures and then maps those topic signatures into query terms. The language model with such a contextsensitive semantic smoothing is referred to as the topic signature language model. In detail, we implement two types of topic signatures, depending on whether ontology exists in the application domain. One is the ontology-based concept and the other is the multiword phrase. The mapping probabilities from each topic signature to individual terms are estimated through the EM algorithm. Document models based on topic signature mapping are then derived. The new smoothing method is evaluated on the TREC 2004/ 2005 Genomics Track with ontology-based concepts, as well as the TREC Ad Hoc Track (Disks 1, 2, and 3) with multiword phrases. Both experiments show significant improvements over the two-stage language model, as well as the language model with contextinsensitive semantic smoothing.
Concept learning and information inferencing on a highdimensional semantic space
- ACM SIGIR 2004 Workshop on Mathematical/Formal Methods in Information Retrieval (MF/IR'2004
, 2004
"... How to automatically capture a significant portion of relevant background knowledge and keep it up-to-date has been a challenging problem encountered in current research on logic based information retrieval. This paper addresses this problem by investigating various information inference mechanisms ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
How to automatically capture a significant portion of relevant background knowledge and keep it up-to-date has been a challenging problem encountered in current research on logic based information retrieval. This paper addresses this problem by investigating various information inference mechanisms based on a high dimensional semantic space constructed from a text corpus using the Hyperspace Analogue to Language (HAL) model. Additionally, the Singular Value Decomposition (SVD) algorithm is considered as an alternative way to enhance the quality of the HAL matrix as well as a mechanism of infering implicit associations. The different characteristics of these inference mechanisms are demonstrated using examples from the Reuters-21578 collection. Our hope is that the techniques discussed in this paper provide a basis for logic based IR to progress to large scale applications. Keywords: Logic-based Information Retrieval, Information Inference 1
Using Markov Chains to Exploit Word Relationships in Information Retrieval
"... Document expansion and query expansion aim to add related terms into document and query representations in order to make them more complete. However, most previous studies are limited in two respects: They use either query expansion or document expansion, but not both; expansion has been limited to ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Document expansion and query expansion aim to add related terms into document and query representations in order to make them more complete. However, most previous studies are limited in two respects: They use either query expansion or document expansion, but not both; expansion has been limited to directly related words. In this paper, we propose a more general approach: both document and query representations are expanded, and the expansion process also exploits indirect term relationships. The whole process is implemented through Markov chains. Our experiments show that each of these extensions brings additional improvements.
Assisting Concept Location in Software Comprehension
"... Abstract. Concept location, the problem of associating human oriented concepts with their counterpart solution domain concepts, is a fundamental problem that lies at the heart of software comprehension. Recent research has attempted to alleviate the impact of the concept location problem through the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Concept location, the problem of associating human oriented concepts with their counterpart solution domain concepts, is a fundamental problem that lies at the heart of software comprehension. Recent research has attempted to alleviate the impact of the concept location problem through the application of methods drawn from the Information Retrieval (IR) community. Here we present a new approach based on a complimentary IR method which also has a sound basis in cognitive theory. We compare our approach to related work through an experiment and present our conclusions.... 1
A Context-Theoretic Framework for Compositionality in Distributional Semantics
"... Formalizing “meaning as context ” mathematically leads to a new, algebraic theory of meaning, in which composition is bilinear and associative. These properties are shared by other methods that have been proposed in the literature, including the tensor product, vector addition, pointwise multiplicat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Formalizing “meaning as context ” mathematically leads to a new, algebraic theory of meaning, in which composition is bilinear and associative. These properties are shared by other methods that have been proposed in the literature, including the tensor product, vector addition, pointwise multiplication, and matrix multiplication. Entailment can be represented by a vector lattice ordering, inspired by a strengthened form of the distributional hypothesis, and a degree of entailment is defined in the form of a conditional probability. Approaches to the task of recognizing textual entailment, including the use of subsequence matching, lexical entailment probability, and latent Dirichlet allocation, can be described within our framework. 1.
A Comparison of Various Approaches for Using Probabilistic Dependencies
"... this article is to study several estimates of relevance models which will be computed based on differing approaches for incorporating term dependency information. In this way, we hope to shed light on the relative merits of term dependency information, as well as provide a theoretical framework for ..."
Abstract
- Add to MetaCart
this article is to study several estimates of relevance models which will be computed based on differing approaches for incorporating term dependency information. In this way, we hope to shed light on the relative merits of term dependency information, as well as provide a theoretical framework for such investigations
Information Flow Analysis with Chinese Text
"... Abstract. This article investigates the effectiveness of an information inference mechanism on Chinese text. The information inference derives implicit associations via computation of information flow on a high dimensional conceptual space, which is approximated by a cognitively motivated lexical se ..."
Abstract
- Add to MetaCart
Abstract. This article investigates the effectiveness of an information inference mechanism on Chinese text. The information inference derives implicit associations via computation of information flow on a high dimensional conceptual space, which is approximated by a cognitively motivated lexical semantic space model, namely Hyperspace Analogue to Language (HAL). A dictionary-based Chinese word segmentation system was used to segment words. To evaluate the Chinese-based information flow model, it is applied to query expansion, in which a set of test queries are expanded automatically via information flow computations and documents are retrieved. Standard recall-precision measures are used to measure performance. Experimental results for TREC-5 Chinese queries and People Daily’s corpus suggest that the Chinese information flow model significantly increases average precision, though the increase is not as high as those achieved using English corpus. Nevertheless, there is justification to believe that the HAL-based information flow model, and in turn our psychologistic stance on the next generation of information processing systems, have a promising degree of language independence. 1
Distributed Systems Technology
"... This paper describes part of a solution to the interpretation of human-readable policy documents into semi-automatic conformance checking. Using a socio-cognitively motivated representation of shared knowledge, and applying appropriate inference mechanisms from a normative perspective, a mechanism t ..."
Abstract
- Add to MetaCart
This paper describes part of a solution to the interpretation of human-readable policy documents into semi-automatic conformance checking. Using a socio-cognitively motivated representation of shared knowledge, and applying appropriate inference mechanisms from a normative perspective, a mechanism to automatically detect potentially non-conforming blog entries is detailed. Candidate non-conforming blog entries are flagged for a human to make a judgement on whether they should be published. Analysis of data from a public corporate blog is analysed and results suggest the methodology has merit.

