Results 1 - 10
of
121
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
- In Proceedings of SIGIR’94
, 1994
"... The 2–Poisson model for term frequencies is used to suggest ways of incorporating certain variables in probabilistic models for information retrieval. The variables concerned are within-document term frequency, document length, and within-query term frequency. Simple weighting functions are develope ..."
Abstract
-
Cited by 289 (9 self)
- Add to MetaCart
The 2–Poisson model for term frequencies is used to suggest ways of incorporating certain variables in probabilistic models for information retrieval. The variables concerned are within-document term frequency, document length, and within-query term frequency. Simple weighting functions are developed, and tested on the TREC test collection. Considerable performance improvements (over simple inverse collection frequency weighting) are demonstrated. 1
Advances in Domain Independent Linear Text Segmentation
, 2000
"... This paper describes a method for linear text seg- mc. ntation which is twice as accurate and over seven times as fast as the state-of-the-art (Reynar, 1998). Inter-sentence similarity is replaced by rank in the local context. Boundary locations are discovered by divisive clustering. ..."
Abstract
-
Cited by 100 (1 self)
- Add to MetaCart
This paper describes a method for linear text seg- mc. ntation which is twice as accurate and over seven times as fast as the state-of-the-art (Reynar, 1998). Inter-sentence similarity is replaced by rank in the local context. Boundary locations are discovered by divisive clustering.
Automatic Content-Based Retrieval of Broadcast News
- Proceedings of ACM Multimedia. San Francisco: ACM
, 1995
"... This paper presents current work on a video retrieval project at Cambridge University and Olivetti Research Limited (ORL). We show that statistical methods developed for text retrieval are also effective for retrieving and browsing multimedia documents. These methods allow rapid retrieval of news br ..."
Abstract
-
Cited by 54 (7 self)
- Add to MetaCart
This paper presents current work on a video retrieval project at Cambridge University and Olivetti Research Limited (ORL). We show that statistical methods developed for text retrieval are also effective for retrieving and browsing multimedia documents. These methods allow rapid retrieval of news broadcasts by information content determined from teletext subtitles. Information retrieval results for experiments performed on a large archive of news broadcasts are presented. This is made possible by the ORL Medusa system, which allows practical recording, storage, and playback of tens of gigabytes of multimedia data. This work is a step towards practical retrieval of multimedia documents, where the information content is determined from speech recognition performed on the audio soundtrack. We describe the project background, the ORL Medusa multimedia system, and retrieval application, as well as the news broadcast corpus and methods of browsing the retrieved news stories.
An Adaptive Agent for Automated Web Browsing
, 1995
"... The current exponential growth of the Internet precipitates a need for new tools to help people cope with the volume of information. To complement recent work on creating searchable indexes of the WorldWide Web and systems for filtering incoming e-mail and Usenet news articles, we describe it syst ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
The current exponential growth of the Internet precipitates a need for new tools to help people cope with the volume of information. To complement recent work on creating searchable indexes of the WorldWide Web and systems for filtering incoming e-mail and Usenet news articles, we describe it system which learns to browse the Internet on behalf of it user. Every day it presents it selection of interesting Web pages. The user evaluates each page, and given this feedback the system adapts and attempts to produce better pages the following day. After demonstrating that our system is able to learn it model of it user with a single well-defined interest, we present an initial experiment where over the course of 24 days the output of our system was compared to both randomly-selected and human-selected pages. It consistently performed better than the random pages, and was better than the human-selected pages half of the time.
Latent Semantic Analysis for Text Segmentation
- In Proceedings of EMNLP
, 2001
"... This paper describes a method for linear text segmentation that is more accurate or at least as accurate as state-of-the-art methods (Utiyama and Isahara, 2001 ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
This paper describes a method for linear text segmentation that is more accurate or at least as accurate as state-of-the-art methods (Utiyama and Isahara, 2001
Open-Vocabulary Speech Indexing for Voice and Video Mail Retrieval
, 1996
"... This paper presents recent work on a multimedia retrieval project at Cambridge University and Olivetti Research Limited (ORL). We present novel techniques that allow ex- tremely rapid audio indexing, at rates approaching several thousand times real time. Unlike other methods, these techniques do not ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
This paper presents recent work on a multimedia retrieval project at Cambridge University and Olivetti Research Limited (ORL). We present novel techniques that allow ex- tremely rapid audio indexing, at rates approaching several thousand times real time. Unlike other methods, these techniques do not depend on a fixed vocabulary recognition system or on keywords that must be known well in advance. Using statistical methods developed for text, these indexing techniques allow rapid and efficient retrieval and browsing of audio and video documents. This paper presents the project background, the indexing and retrieval techniques, and a video mail retrieval application incorporating content-based audio indexing, retrieval, and browsing.
Scalable Techniques for Clustering the Web
- In Proc. of the WebDB Workshop
, 2000
"... Clustering is one of the most crucial techniques for dealing with the massive amount of information present on the web. Clustering can either be performed once offline, independent of search queries, or performed online on the results of search queries. Our offline approach aims to efficiently clust ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Clustering is one of the most crucial techniques for dealing with the massive amount of information present on the web. Clustering can either be performed once offline, independent of search queries, or performed online on the results of search queries. Our offline approach aims to efficiently cluster similar pages on the web, using the technique of Locality-Sensitive Hashing (LSH), in which web pages are hashed in such a way that similar pages have a much higher probability of collision than dissimilar pages. Our preliminary experiments on the Stanford WebBase have shown that the hash-based scheme can be scaled to millions of urls. 1.
Improving web search results using affinity graph
- In Proceedings of the 28th annual international ACM SIGIR
, 2005
"... In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank search results by optimizing two metrics: (1) diversity-- which indicates the variance of topics in a group of documents; (2) information richness-- which measures the coverage of a single document to its topic. ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank search results by optimizing two metrics: (1) diversity-- which indicates the variance of topics in a group of documents; (2) information richness-- which measures the coverage of a single document to its topic. Both of the two metrics are calculated from a directed link graph named Affinity Graph (AG). AG models the structure of a group of documents based on the asymmetric content similarities between each pair of documents. Experimental results in Yahoo! Directory, ODP Data, and Newsgroup data demonstrate that our proposed ranking algorithm significantly improves the search performance. Specifically, the algorithm achieves 31 % improvement in diversity and 12 % improvement in information richness relatively within the top 10 search results.
News Video Classification Using SVM-based Multimodal Classifiers and Combination Strategies
- In ACM Multimedia, Juan-les-Pins
, 2002
"... Video classification is the first step toward multimedia content understanding. When video is classified into conceptual categories, it is usually desirable to combine evidence from multiple modalities. However, combination strategies in previous studies were usually ad hoc. We investigate a meta-cl ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Video classification is the first step toward multimedia content understanding. When video is classified into conceptual categories, it is usually desirable to combine evidence from multiple modalities. However, combination strategies in previous studies were usually ad hoc. We investigate a meta-classification combination strategy using Support Vector Machine, and compare it with probability-based strategies. Text features from closedcaptions and visual features from images are combined to classify broadcast news video. The experimental results show that combining multimodal classifiers can significantly improve recall and precision, and our meta-classification strategy gives better precision than the approach of taking the product of the posterior probabilities.
Video Mail Retrieval: The Effect of Word Spotting Accuracy on Precision
- Proceedings of ICASSP 95
, 1995
"... The goal of the Video Mail Retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by wor ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
The goal of the Video Mail Retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by word spotting performance. It includes a description of the database design, the word spotting algorithm, and the information retrieval method used. Results are presented which show audio retrieval performance very close to that of text.

