Results 1 - 10
of
35
Concept-Based Information Retrieval Using Explicit Semantic Analysis
"... Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
(Show Context)
Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), arecentlyproposedmethodthataugmentskeywordbased text representation with concept-based features, automaticallyextractedfrommassivehumanknowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional featureselectionmethodscannotbeused,hencewe propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.
Argumentative Feedback: A Linguistically-motivated Term Expansion for Information Retrieval
"... We report on the development of a new automatic feedback model to improve information retrieval in digital libraries. Our hypothesis is that some particular sentences, selected based on argumentative criteria, can be more useful than others to perform well-known feedback information retrieval tasks. ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
We report on the development of a new automatic feedback model to improve information retrieval in digital libraries. Our hypothesis is that some particular sentences, selected based on argumentative criteria, can be more useful than others to perform well-known feedback information retrieval tasks. The argumentative model we explore is based on four disjunct classes, which has been very regularly observed in scientific reports: PURPOSE, METHODS, RE-SULTS, CONCLUSION. To test this hypothesis, we use the Rocchio algorithm as baseline. While Rocchio selects the features to be added to the original query based on statistical evidence, we propose to base our feature selection also on argumentative criteria. Thus, we restrict the expansion on features appearing only in sentences classified into one of our argumentative categories. Our results, obtained on the OHSUMED collection, show a significant improvement when expansion is based on PURPOSE (mean average precision = +23%) and CONCLUSION (mean average precision = +41%) contents rather than on other argumentative contents. These results suggest that argumentation is an important linguistic dimension that could benefit information retrieval. 1
Towards semantic search and inference in electronic medical records: an approach using conceptbased information retrieval
- Australasian Medical Journal
, 2012
"... Abstract Background This paper presents a novel approach to searching electronic medical records that is based on concept matching rather than keyword matching. Aims The concept--based approach is intended to overcome specific challenges we identified in searching medical records. Method Queries an ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract Background This paper presents a novel approach to searching electronic medical records that is based on concept matching rather than keyword matching. Aims The concept--based approach is intended to overcome specific challenges we identified in searching medical records. Method Queries and documents were transformed from their term--based originals into medical concepts as defined by the SNOMED--CT ontology. Results Evaluation on a real--world collection of medical records showed our concept--based approach outperformed a keyword baseline by 25% in Mean Average Precision. Conclusion The concept--based approach provides a framework for further development of inference based search systems for dealing with medical data.
Mining Linguistic Cues for Query Expansion: Applications to Drug Interaction Search
"... Given a drug under development, what are other drugs or biochemical compounds that it might interact with? Early answers to this question, by mining the literature, are valuable for pharmaceutical companies, both monetarily and in avoiding public relations nightmares. Inferring drug-drug interaction ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Given a drug under development, what are other drugs or biochemical compounds that it might interact with? Early answers to this question, by mining the literature, are valuable for pharmaceutical companies, both monetarily and in avoiding public relations nightmares. Inferring drug-drug interactions is also important in designing combination therapies for complex diseases including cancers. We study this problem as one of mining linguistic cues for query expansion. By using (only) positive instances of drug interactions, we show how we can extract linguistic cues which can then be used to expand and reformulate queries to improve the effectiveness of drug interaction search. Our approach integrates many learning paradigms: partially supervised classification, association measures for collocation mining, and feature selection in supervised learning. We demonstrate compelling results on using positive examples from the DrugBank database to seed MEDLINE searches for drug interactions. In particular, we show that purely data-driven linguistic cues can be effectively mined and applied to realize a successful domain-specific query expansion framework.
AEHRC & QUT at TREC 2011 Medical Track: a concept-based information retrieval approach
"... The Australian e-Health Research Centre and Queensland University of Technology recently participated in the TREC 2011 Medical Records Track. This paper reports on our methods, results and experience using a concept-based information retrieval approach. Our concept-based approach is intended to over ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The Australian e-Health Research Centre and Queensland University of Technology recently participated in the TREC 2011 Medical Records Track. This paper reports on our methods, results and experience using a concept-based information retrieval approach. Our concept-based approach is intended to overcome specific challenges we identify in searching medical records. Queries and documents are transformed from their term-based originals into medical concepts as defined by the SNOMED-CT ontology. Results show our concept-based approach performed above the median in all three performance metrics: bref (+12%), R-prec (+18%) and
Scenario-Driven Information Retrieval: Supporting Rule-Based Monitoring of Subsea Operations
- Information Technology and Control
"... Abstract. The production systems used by the subsea petroleum industry are knowledge and information intensive. Any problem needs to be solved quickly and efficiently avoiding decommissioning or waiting for the symptoms to be escalated. This requires precise information to be supplied on-time. For t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The production systems used by the subsea petroleum industry are knowledge and information intensive. Any problem needs to be solved quickly and efficiently avoiding decommissioning or waiting for the symptoms to be escalated. This requires precise information to be supplied on-time. For this reason we have proposed rule-based monitoring of device performance. However, covering all possible cases by rules is a labour-intensive and not trivial task. Therefore, in this paper we propose a scenario-driven information retrieval approach to complement rule-based monitoring. The main objective is to automatically formulate a query that is sent to a vector-space model information retrieval engine every time incomplete inference happens, i.e. when a specific case has no rules defined.
DutchHatTrick: Semantic query modeling, ConText, section detection, and match score maximization.
"... This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient reports, and a patient visit was recorded in one or more reports.
Modeling Users for Adaptive Information Retrieval by Capturing User Intent
"... Abstract: In this chapter, we study and present our results on the problem of employing a cognitive user model for Information Retrieval (IR) in which a user’s intent is captured and used for improving his/her effectiveness in an information seeking task. The user intent is captured by analyzing the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: In this chapter, we study and present our results on the problem of employing a cognitive user model for Information Retrieval (IR) in which a user’s intent is captured and used for improving his/her effectiveness in an information seeking task. The user intent is captured by analyzing the commonality of the retrieved relevant documents. The effectiveness of our user model is evaluated with regards to retrieval performance using an evaluation methodology which allows us to compare with the existing approaches from the information retrieval community while assessing the new features offered by our user model. We compare our approach with the Ide dec-hi approach using term frequency inverted document frequency weighting which is considered to be the best traditional approach to relevance feedback. We use CRANFIELD, CACM and MEDLINE collections which are very popular collections from the information retrieval community to evaluate relevance feedback techniques. The results show that our approach performs better in the initial runs and works competitively with Ide dec-hi in the feedback runs. Additionally, we evaluate the effects of our user modeling approach with human analysts. The results show that our approach retrieves more relevant documents to a specific analyst compared to keyword-based information retrieval application called Verity Query Language.
Expanding the Space of Plausible Solutions in a Medical
"... Abstract. In well-defined domains such as Physics, Mathematics, and Chemistry, solutions to a posed problem can objectively be classified as correct or incorrect. In ill-defined domains such as medicine, the classification of solutions to a patient problem as correct or incorrect is much more comple ..."
Abstract
- Add to MetaCart
Abstract. In well-defined domains such as Physics, Mathematics, and Chemistry, solutions to a posed problem can objectively be classified as correct or incorrect. In ill-defined domains such as medicine, the classification of solutions to a patient problem as correct or incorrect is much more complex. Typical tutoring systems accept only a small set of approved solutions for each problem scenario fed to the system. Plausible student solutions that fall outside the scope of this small set of approved solutions are rejected as being incorrect, even though these solutions may be acceptable or close to acceptable. This leads to brittleness in the evaluation of student solutions. This paper describes a tutoring system for medical problem-based learning (PBL), which can accept a wide variety of plausible solutions without placing an extensive burden on knowledge acquisition. A widely