DMCA
Document compaction for efficient query biased snippet generation (2009)
Venue: | ECIR 2009 31st European Conference on Information Retrieval, volume 5478 of LNCS |
Citations: | 8 - 2 self |
Citations
3230 | Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto
- 1999
(Show Context)
Citation Context ... term’s weight is a numerical value that reflects its importance or impact in that document. We investigate two separate term weighting approaches used in existing text retrieval models, namely TFIDF =-=[1,18]-=- and Kullback-Leibler divergence [4,7]. TF·IDF weighting. This is similar to Luhn’s concept of term significance [9]. However, in addition to the term’s frequency in a document (TF), the inverse docum... |
2212 |
On information and sufficiency
- Kullback, RA
- 1951
(Show Context)
Citation Context ...t reflects its importance or impact in that document. We investigate two separate term weighting approaches used in existing text retrieval models, namely TFIDF [1,18] and Kullback-Leibler divergence =-=[4,7]-=-. TF·IDF weighting. This is similar to Luhn’s concept of term significance [9]. However, in addition to the term’s frequency in a document (TF), the inverse document frequency (IDF) – reciprocal of th... |
961 | A study of smoothing methods for language models applied to information retrieval.
- Zhai, Lafferty
- 2004
(Show Context)
Citation Context ...e term t. Kullback-Leibler divergence. The Kullback-Leibler divergence (KLD) estimates the similarity between a document and query by measuring the relative entropy between their corresponding models =-=[7,16]-=-. This measure has also been used as a means of identifying terms for which that document is most likely to retrieved [3]. Based on this premise, we use KLD to assign a term a weight that indicates it... |
798 |
The automatic creation of literature abstracts
- Luhn
- 1958
(Show Context)
Citation Context ...ge TREC web collections of 10 GB and 100 GB. 2 Related Work Snippets are a form of extractive document summaries. The use of automatic extractive document summarisation dates back to 1950s, when Luhn =-=[9]-=- proposed that a summary should be composed of the most significant sentences in a document; significant sentences contain clusters of significant terms, and a term is considered significant based on ... |
329 | Inverted files for text search engines
- Zobel, Moffat
(Show Context)
Citation Context ... term’s weight is a numerical value that reflects its importance or impact in that document. We investigate two separate term weighting approaches used in existing text retrieval models, namely TFIDF =-=[1,18]-=- and Kullback-Leibler divergence [4,7]. TF·IDF weighting. This is similar to Luhn’s concept of term significance [9]. However, in addition to the term’s frequency in a document (TF), the inverse docum... |
303 |
The Challenges of Automatic Summarization.
- Hahn, Mani
- 2000
(Show Context)
Citation Context ...ant based on its frequency in the document. Similar sentence selection principles have since been the general theme in much of the work in document summarisation in text information retrieval systems =-=[6,10,11,13]-=-. For summaries presented in search result list captions, Tombros and Sanderson study the advantages of query-biased summaries [13], in which sentence fragments that best match the query are selected.... |
236 | Summarizing text documents: sentence selection and evaluation metrics
- Goldstein, Kantrowitz, et al.
- 2013
(Show Context)
Citation Context ...ant based on its frequency in the document. Similar sentence selection principles have since been the general theme in much of the work in document summarisation in text information retrieval systems =-=[6,10,11,13]-=-. For summaries presented in search result list captions, Tombros and Sanderson study the advantages of query-biased summaries [13], in which sentence fragments that best match the query are selected.... |
151 | Advantages of query biased summaries in information retrieval
- Tombros, Sanderson
- 1998
(Show Context)
Citation Context ... that consist of parts of the document (sentences or sentence fragments) that are in some way pertinent to the query. It is perhaps not surprising that users prefer query-biased snippets over generic =-=[13]-=-: by showing the user how the query terms are used in the context of a document, query-biased snippets reduce the need to refer to the full document. However, this quality comes at the cost of increas... |
136 | Exploring the similarity space.
- Zobel, Moffat
- 1998
(Show Context)
Citation Context ...rare across the collection are given higher weight. The weight of a term t is computed as a product of TF and IDF components. Here, we use a combination of TF and IDF as specified by Zobel and Moffat =-=[17]-=-: ( ) N TF = 1 + ln(fd,t), IDF = ln , where fd,t is the raw count of the term t in the document d, N is the total number of documents in the collection, while df is the count of documents that contain... |
96 | An informationtheoretic approach to automatic query expansion.
- Carpineto, Mori, et al.
- 2001
(Show Context)
Citation Context ...t reflects its importance or impact in that document. We investigate two separate term weighting approaches used in existing text retrieval models, namely TFIDF [1,18] and Kullback-Leibler divergence =-=[4,7]-=-. TF·IDF weighting. This is similar to Luhn’s concept of term significance [9]. However, in addition to the term’s frequency in a document (TF), the inverse document frequency (IDF) – reciprocal of th... |
56 |
A task-oriented study on the influencing effects of query-biased summarization in the web searching.
- White, Jose, et al.
- 2003
(Show Context)
Citation Context ...s generate worse snippets than would have been generated had the full documents been present. Rather than employing human judges to make this decision, as is the case with previous snippet evaluation =-=[13,15]-=-, we use simple textual comparison. Snippets generated using the full (unpruned) document are taken as ideal. We generate snippets using the pruned documents and compare them to those generated using ... |
52 | Fast generation of result snippets in web search
- Turpin, Tsegay, et al.
- 2007
(Show Context)
Citation Context ... of queries for which a document may be fetched, it must retain each document in some form, to be searched and processed with respect to a query each time the document is highly ranked. Turpin et al. =-=[14]-=- studied the computation involved in snippet generation, finding that 70%–80% of snippet generation time is spent fetching a document M. Boughanem et al. (Eds.): ECIR 2009, LNCS 5478, pp. 509–520, 200... |
39 | A document-centric approach to static index pruning in text retrieval systems.
- Büttcher, Clarke
- 2006
(Show Context)
Citation Context ... query by measuring the relative entropy between their corresponding models [7,16]. This measure has also been used as a means of identifying terms for which that document is most likely to retrieved =-=[3]-=-. Based on this premise, we use KLD to assign a term a weight that indicates its significant in a document, P (t|d) KLD(t, d, c) =P (t|d)log (1) P (t|c) where P (t|M) which computes the probability of... |
33 | Generic summaries for indexing in information retrieval
- Sakai, Sparck-Jones
- 2001
(Show Context)
Citation Context ...ant based on its frequency in the document. Similar sentence selection principles have since been the general theme in much of the work in document summarisation in text information retrieval systems =-=[6,10,11,13]-=-. For summaries presented in search result list captions, Tombros and Sanderson study the advantages of query-biased summaries [13], in which sentence fragments that best match the query are selected.... |
29 | Efficient text summarization using lexical chains
- Silber, McCoy
- 2000
(Show Context)
Citation Context ...e time it took users to complete search tasks. Despite the utility of query-biased summaries for web search, surprisingly little published work addresses methods for generating them. Silber and McCoy =-=[12]-=-Document Compaction for Efficient Query Biased Snippet Generation 511 propose an efficient method of generating an intermediate document representation that can then be used to formulate summaries. T... |
26 | Improving web search efficiency via a locality based static pruning method.
- Moura
- 2005
(Show Context)
Citation Context ...ent of past queries, which is our aim in this work. Several authors have utilized document compaction or pruning schemes for information retrieval tasks other than snippet generation. De Moura et al. =-=[5]-=- use document pruning to retain only the “novel” sentences in a document, with the pruned documents then used to construct a smaller (pruned) inverted index which supports phrase queries. Lu and Calla... |
19 |
Pruning long documents for distributed information retrieval
- Lu, Callan
- 2002
(Show Context)
Citation Context ...se document pruning to retain only the “novel” sentences in a document, with the pruned documents then used to construct a smaller (pruned) inverted index which supports phrase queries. Lu and Callan =-=[8]-=- propose an approach that selects and retains keywords in a document in order to reduce the size of sample documents in a distributed retrieval environment. Billerbeck and Zobel [2] use a keyword-base... |
4 | Efficient query expansion with auxiliary data structures.
- Billerbeck, Zobel
- 2006
(Show Context)
Citation Context ...ies. Lu and Callan [8] propose an approach that selects and retains keywords in a document in order to reduce the size of sample documents in a distributed retrieval environment. Billerbeck and Zobel =-=[2]-=- use a keyword-based document pruning approach to construct document surrogates for efficient query expansion. Alternatively, document size can be reduced by lossless means, that is, by compression. U... |