Results 1 - 10
of
11
Machine Learned Sentence Selection Strategies for Query-Biased Summarization
- SIGIR LEARNING TO RANK WORKSHOP
, 2008
"... It has become standard for search engines to augment result lists with document summaries. Each document summary consists of a title, abstract, and a URL. In this work, we focus on the task of selecting relevant sentences for inclusion in the abstract. In particular, we investigate how machine learn ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
It has become standard for search engines to augment result lists with document summaries. Each document summary consists of a title, abstract, and a URL. In this work, we focus on the task of selecting relevant sentences for inclusion in the abstract. In particular, we investigate how machine learning-based approaches can effectively be applied to the problem. We analyze and evaluate several learning to rank approaches, such as ranking support vector machines (SVMs), support vector regression (SVR), and gradient boosted decision trees (GBDTs). Our work is the first to evaluate SVR and GBDTs for the sentence selection task. Using standard TREC test collections, we rigorously evaluate various aspects of the sentence selection problem. Our results show that the effectiveness of the machine learning approaches varies across collections with different characteristics. Furthermore, the results show that GBDTs provide a robust and powerful framework for the sentence selection task and significantly outperform SVR and ranking SVMs on several data sets.
Ad-hoc Object Retrieval in the Web of Data
"... Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a s ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Semantic Search refers to a loose set of concepts, challenges and techniques having to do with harnessing the information of the growing Web of Data (WoD) for Web search. Here we propose a formal model of one specific semantic search task: ad-hoc object retrieval. We show that this task provides a solid framework to study some of the semantic search problems currently tackled by commercial Web search engines. We connect this task to the traditional ad-hoc document retrieval and discuss appropriate evaluation metrics. Finally, we carry out a realistic evaluation of this task in the context of a Web search application.
Columbia University in the Novelty Track at TREC 2004
"... Our system for the Novelty Track at TREC 2004 looks beyond sentence boundaries as well as within sentences to identify novel, nonduplicative passages. It tries to identify text spans of two or more sentences that encompass mini-segments of new information. At the same time, we avoid any pairwise com ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Our system for the Novelty Track at TREC 2004 looks beyond sentence boundaries as well as within sentences to identify novel, nonduplicative passages. It tries to identify text spans of two or more sentences that encompass mini-segments of new information. At the same time, we avoid any pairwise comparison of sentences, but rely on the presence of previously unseen terms to provide evidence of novelty. The system is guided by a number of parameters, both weights and thresholds, that are learned automatically with a randomized hill-climbing algorithm. During learning, we varied the target function to produce configurations that emphasize either precision or recall. We also implemented a straightforward vector-space model as a comparison and to test a combined approach. 1.
The Effect Of Smoothing In Language Models For Novelty Detection
"... The novelty task consists of finding relevant and novel sentences in a ranking of documents given a query. In the literature, different techniques have been applied to address this problem. Nevertheless, little is known about Language Models for novelty detection and, especially, the effect of smoot ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The novelty task consists of finding relevant and novel sentences in a ranking of documents given a query. In the literature, different techniques have been applied to address this problem. Nevertheless, little is known about Language Models for novelty detection and, especially, the effect of smoothing on the selection of novel sentences. Language Models can be used to study novelty and relevance in a principled way. These statistical models have been shown to perform well empirically in many Information Retrieval tasks. In this work we study formally the effects of smoothing on novelty detection. To this aim, we compare different techniques based on the Kullback-Leibler divergence and we analyze the sensitivity of retrieval performance to the smoothing parameters. The ability of Language Modeling estimation methods to handle quantitatively the uncertainty associated to the use of natural language is a powerful tool that can drive the future development of noveltybased mechanisms.
The Nature of Novelty Detection ∗
, 2006
"... Sentence level novelty detection aims at spotting sentences with novel information from an ordered sentence list. In the task, sentences appearing later in the list with no new meanings are eliminated. For the task of novelty detection, the contributions of this paper are three-fold. First, conceptu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Sentence level novelty detection aims at spotting sentences with novel information from an ordered sentence list. In the task, sentences appearing later in the list with no new meanings are eliminated. For the task of novelty detection, the contributions of this paper are three-fold. First, conceptually, this paper reveals the computational nature of the task currently overlooked by the Novelty community − Novelty as a combination of partial overlap (PO) and complete overlap (CO) relations between sentences. We define partial overlap between two sentences as a sharing of common facts, while complete overlap is when one sentence covers all of the meanings of the other sentence. Second, technically, a novel approach, the selected pool method is provided which follows naturally from the PO-CO computational structure. We provide formal error analysis for selected pool and methods based on this PO-CO framework. We address the question how accurate must the PO judgments be to outperform the baseline pool method. Third, experimentally, results were presented for all the three novelty datasets currently available. Results show that the selected pool is significantly better or no worse than the current methods, an indication that the term overlap criterion for the PO judgments could be adequately accurate.
Finding New News: Novelty Detection in Broadcast News
"... Abstract. The automatic detection of novelty, or newness, as part of an information retrieval system would greatly improve a searcher’s experience by presenting “documents ” in order of how much extra information they add to what is already known instead of how similar they are to a user’s query. In ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. The automatic detection of novelty, or newness, as part of an information retrieval system would greatly improve a searcher’s experience by presenting “documents ” in order of how much extra information they add to what is already known instead of how similar they are to a user’s query. In this paper we present a novelty detection system evaluated on the AQUAINT text collection as part of our TREC 2004 Novelty Track experiments. Subsequent to participation in TREC, the algorithm has been evaluated on another collection with its parameters optimized and we present those results here. We also discuss how we are extending the text-only approach to novelty detection to also include input from video analysis. 1
Highly Frequent Terms and Sentence Retrieval
"... Abstract. In this paper we propose a novel sentence retrieval method based on extracting highly frequent terms from top retrieved documents. We compare it against state of the art sentence retrieval techniques, including those based on pseudo-relevant feedback, showing that the approach is robust an ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. In this paper we propose a novel sentence retrieval method based on extracting highly frequent terms from top retrieved documents. We compare it against state of the art sentence retrieval techniques, including those based on pseudo-relevant feedback, showing that the approach is robust and competitive. Our results reinforce the idea that top retrieved data is a valuable source to enhance retrieval systems. This is especially true for short queries because there are usually few querysentence matching terms. Moreover, the approach is particularly promising for weak queries. We demonstrate that this novel method is able to improve significantly the precision at top ranks when handling poorly specified information needs.
Novelty as a Form of Contextual Re-ranking: Efficient KLD Models and Mixture Models
"... Current Information Retrieval systems are often based on topicality. They estimate relevance by comparing the similarity between the user query and each document. These systems do not take into account important contextual information. More specifically, they do not often apply mechanisms to filter ..."
Abstract
- Add to MetaCart
Current Information Retrieval systems are often based on topicality. They estimate relevance by comparing the similarity between the user query and each document. These systems do not take into account important contextual information. More specifically, they do not often apply mechanisms to filter out redundant information. We interpret context here as the set of chunks of text from the ranked set of documents that the user has already seen. This is a valuable contextual information to guide the retrieval processes in a way that avoids redundancy. It is desirable that the ranking of results is composed by relevant but also novel material. This means that each document must provide to the user unseen information which is related to his need. In this work we study different novelty detection approaches that make good use of this contextual information. We show that these techniques can be applied effectively and efficiently at the sentence level. Symposium Themes: Context-aware retrieval models. 1.
Combining Named Entities and Tags for Novel Sentence Detection ABSTRACT
"... Novel sentence detection aims at identifying novel information from an incoming stream of sentences. Our research applies named entity recognition (NER) and part-of-speech (POS) tagging on sentence-level novelty detection and proposes a mixed method to utilize these two techniques. Furthermore, we d ..."
Abstract
- Add to MetaCart
Novel sentence detection aims at identifying novel information from an incoming stream of sentences. Our research applies named entity recognition (NER) and part-of-speech (POS) tagging on sentence-level novelty detection and proposes a mixed method to utilize these two techniques. Furthermore, we discuss the performance when setting different history sentence sets. Experimental results of different approaches on TREC’04 Novelty Track show that our new combined method outperforms some other novelty detection methods in terms of precision and recall. The experimental observations of each approach are also discussed.
Learning to Rank Relevant and Novel Documents through User Feedback ABSTRACT
"... We consider the problem of learning to rank relevant and novel documents so as to directly maximize a performance metric called Expected Global Utility (EGU), which has several desirable properties: (i) It measures retrieval performance in terms of relevant as well as novel information, (ii) gives m ..."
Abstract
- Add to MetaCart
We consider the problem of learning to rank relevant and novel documents so as to directly maximize a performance metric called Expected Global Utility (EGU), which has several desirable properties: (i) It measures retrieval performance in terms of relevant as well as novel information, (ii) gives more importance to top ranks to reflect common browsing behavior of users, as opposed to existing objective functions based on set-coverage, (iii) accommodates different levels of tolerance towards redundancy, which is not taken into account by existing evaluation measures, and (iv) extends naturally to the evaluation of session-based retrieval comprising multiple ranked lists. Our ground truth is defined in terms of “information nuggets”, which are obviously not known to the retrieval system when processing a new user query. Therefore, our approach uses observable query and document features (words and named entities) as surrogates for nuggets, whose weights are learned based on user feedback in an iterative search session. The ranked list is produced to maximize the weighted coverage of these surrogate nuggets. The optimization of such coverage-based metrics is known to be NP-hard. Therefore, we use a greedy algorithm and show that it guarantees good performance due to the submodularity of the objective function. Our experiments on Topic Detection and Tracking data show that the proposed approach represents an efficient and effective retrieval strategy for maximizing EGU, as compared to a purely-relevance based ranking approach that uses Indri, as well as a MMR-based approach for non-redundant ranking.

