Results 1 - 10
of
19
Learning to rank answers on large online QA collections
- In Proceedings of the 46th Annual Meeting for the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT
, 2008
"... This work describes an answer ranking engine for non-factoid questions built using a large online community-generated question-answer collection (Yahoo! Answers). We show how such collections may be used to effectively set up large supervised learning experiments. Furthermore we investigate a wide r ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
This work describes an answer ranking engine for non-factoid questions built using a large online community-generated question-answer collection (Yahoo! Answers). We show how such collections may be used to effectively set up large supervised learning experiments. Furthermore we investigate a wide range of feature types, some exploiting NLP processors, and demonstrate that using them in combination leads to considerable improvements in accuracy. 1
Learning Graph Walk Based Similarity Measures for Parsed Text
"... We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-speci ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We consider a parsed text corpus as an instance of a labelled directed graph, where nodes represent words and weighted directed edges represent the syntactic relations between them. We show that graph walks, combined with existing techniques of supervised learning, can be used to derive a task-specific word similarity measure in this graph. We also propose a new path-constrained graph walk method, in which the graph walk process is guided by high-level knowledge about meaningful edge sequences (paths). Empirical evaluation on the task of named entity coordinate term extraction shows that this framework is preferable to vector-based models for smallsized corpora. It is also shown that the pathconstrained graph walk algorithm yields both performance and scalability gains. 1
Adaptive Graph Walk Based Similarity Measures in Entity-Relation Graphs
, 2008
"... Relational or semi-structured data is naturally represented by a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different types of objects and multiple types of links. For example, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Relational or semi-structured data is naturally represented by a graph schema, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different types of objects and multiple types of links. For example, email data can be described in a graph that includes messages, persons, dates and other objects; in this graph, a message may be associated with a person with different relations, such as ”sent-to”, ”sent-from ” and so on. In the past, researchers have suggested to apply random graph walks in order to elicit a measure of similarity between entities that are not directly connected in a graph. In this thesis, we suggest a general framework, in which different arbitrary queries (for instance, ”what persons are most related to this email message?”) are addressed using random walks. Naturally, there are many types of queries possible that correspond to various flavors of inter-entity similarity; several learning techniques are therefore suggested and evaluated that adapt the graph-walk
Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning
"... The method of finding high-quality answers has a significant impact on users ’ satisfaction in a community question answering system. However, due to the lexical gap between questions and answers as well as spam typically contained in user-generated content, filtering and ranking answers is very cha ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The method of finding high-quality answers has a significant impact on users ’ satisfaction in a community question answering system. However, due to the lexical gap between questions and answers as well as spam typically contained in user-generated content, filtering and ranking answers is very challenging. Existing solutions mainly focus on generating redundant features, or finding textual clues using machine learning techniques; none of them ever consider questions and their answers as relational data but instead model them as independent information. Meanwhile, they only consider the answers of the current question, and ignore any previous knowledge that would be helpful to bridge the lexical and semantic gap. We assume that answers are connected to their questions with various types of links, i.e. positive links indicating high-quality answers, negative links indicating incorrect answers or user-generated spam, and propose an analogical reasoning-based approach which measures the analogy between the new question-answer linkages and those of some previous relevant knowledge which contains only positive links; the candidate answer which has the most analogous link to the supporting set is assumed to be the best answer. We conducted our experiments based on 29.8 million Yahoo!Answer question-answer threads and showed the effectiveness of our proposed approach.
Learning to Rank Answers to Non-Factoid Questions from Web Collections
"... This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit existing large collections of question–answer pairs (from online social Question Answering sites) to extract such features and train ranking models which combine them effectively. We investigate a wide range of feature types, some exploiting natural language processing such as coarse word sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable improvements in accuracy. Depending on the system settings we measure relative improvements of 14 % to 21 % in Mean Reciprocal Rank and Precision@1, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks. 1.
Linguistic and Semantic Passage Retrieval Strategies for Question Answering
, 2009
"... Question Answering (QA) is the task of searching a large text collection for specific answers to questions posed in natural language. Many QA systems rely heavily on Natural Language Processing (NLP) technology, such as syntactic and semantic parsing and named entity recognition, for question analys ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Question Answering (QA) is the task of searching a large text collection for specific answers to questions posed in natural language. Many QA systems rely heavily on Natural Language Processing (NLP) technology, such as syntactic and semantic parsing and named entity recognition, for question analysis and for answer generation. To access the text collection, QA systems often use off-the-shelf bag-of-words Information Retrieval (IR) solutions, which rank results by matching a set of keyterms extracted from the question. There is a fundamental disconnect between the capabilities of the bag-of-words retrieval model and the retrieval needs of the QA system. Bag-of-words IR retrieves documents matching a query, but the QA system really needs documents that contain answers. Through question analysis, the QA system has compiled a sophisticated information need representation for what constitutes an answer to the question. This representation is composed of a set of linguistic and semantic constraints
An Inverted Index for Storing and Retrieving Grammatical Dependencies
"... Web count statistics gathered from search engines have been widely used as a resource in a variety of NLP tasks. For some tasks, however, the information they exploit is not fine-grained enough. We propose an inverted index over grammatical relations as a fast and reliable resource to access more ge ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Web count statistics gathered from search engines have been widely used as a resource in a variety of NLP tasks. For some tasks, however, the information they exploit is not fine-grained enough. We propose an inverted index over grammatical relations as a fast and reliable resource to access more general and also more detailed frequency information. To build the index, we use a dependency parser to parse a large corpus. We extract binary dependency relations, such as he-subj-say (he is the subject of say) as index terms and construct the index using publicly available open-source indexing software. The unit we index over is the sentence. The index can be used to extract grammatical relations and frequency counts for these relations. The framework also provides the possibility to search for partial dependencies (say, the frequency of he occurring in subject position), words, strings and a combination of these. One possible application is the disambiguation of syntactic structures. 1.
Rank Learning for Factoid Question Answering with Linguistic and Semantic Constraints
"... This work presents a general rank-learning framework for passage ranking within Question Answering (QA) systems using linguistic and semantic features. The framework enables query-time checking of complex linguistic and semantic ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This work presents a general rank-learning framework for passage ranking within Question Answering (QA) systems using linguistic and semantic features. The framework enables query-time checking of complex linguistic and semantic
Theory, Design
"... This paper introduces a theoretical framework for focused retrieval, based on a formalism called the annotation graph. Annotation graph-based retrieval provides a rich retrieval representation that directly supports query-time constraintchecking of arbitrary relations. This representation can suppor ..."
Abstract
- Add to MetaCart
This paper introduces a theoretical framework for focused retrieval, based on a formalism called the annotation graph. Annotation graph-based retrieval provides a rich retrieval representation that directly supports query-time constraintchecking of arbitrary relations. This representation can support focused retrieval tasks, such as Question Answering systems, which often have information needs containing constraint types that can not be queried easily under many retrieval models. The problem of annotation graph-based retrieval is mapped onto existing XML element retrieval functionality in the Indri search engine. The remainder of the paper serves to identify and discuss the issues that emerged and illustrate by example what in our opinion constitutes the upcoming research challenges facing the focused retrieval community.
Query formulation
"... Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrieval models when there are mismatches between the structured query and the actual structure in a relevant document or erro ..."
Abstract
- Add to MetaCart
Structured documents contain elements defined by the author(s) and annotations assigned by other people or processes. Structured documents pose challenges for probabilistic retrieval models when there are mismatches between the structured query and the actual structure in a relevant document or erroneous structure introduced by an annotator. This paper makes three contributions. First, a new generative retrieval model is proposed to deal with the mismatch problem. This new model extends the basic keyword language model by treating structure as hidden variable during the generation process. Second, variations of the model are compared. Third, term-level and structure-level smoothing strategies are studied. Evaluation was conducted with INEX XML retrieval and question-answering retrieval tasks. Experimental results indicate that the optimal structured retrieval model is task dependent, twolevel Dirichlet smoothing significantly outperforms two-level Jelinek-Mercer smoothing, and with accurate structured queries, the proposed structured retrieval model outperforms keyword retrieval significantly, on both QA and INEX datasets. Categories and Subject Descriptors

