Abstract:
We present a principled methodology for filtering news stories by formal measures of information novelty, and show how the techniques can be used to custom-tailor newsfeeds based on information that a user has already reviewed. We review methods for analyzing novelty and then describe Newsjunkie, a system that personalizes news for users by identifying the novelty of stories in the context of stories they have already reviewed. Newsjunkie employs novelty-analysis algorithms that represent articles as words and named entities. The algorithms analyze inter- and intra- document dynamics by considering how information evolves over time from article to article, as well as within individual articles. We review the results of a user study undertaken to gauge the value of the approach over legacy time-based review of newsfeeds, and also to compare the performance of alternate distance metrics that are used to estimate the dissimilarity between candidate new articles and sets of previously reviewed articles.
Citations
|
4923
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
3356
|
C4.5: Programs for Machine Learning
– Quinlan
- 1993
|
|
1636
|
Indexing by latent semantic analysis
– Deerwester, Dumais, et al.
- 1990
|
|
1303
|
WordNet: An Electronic Lexical Database
– Fellbaum
- 1998
|
|
915
|
Term-weighting approaches in automatic text retrieval
– Salton, Buckley
- 1988
|
|
218
|
Semantic Similarity in a Taxonomy: An Information-Based Meas-ure and its Applications to Problems of Ambiguity in Natural Language
– Resnik
|
|
202
|
The use of mmr, diversity-based reranking for reordering documents and producing summaries
– Carbonell, Goldstein
- 1998
|
|
122
|
Measures of distributional similarity
– Lee
- 1999
|
|
105
|
Bursty and hierarchical structure in streams
– KLEINBERG
|
|
98
|
Individual comparisons by ranking methods
– Wilcoxon
- 1945
|
|
94
|
Models of attention in computing and communication: From principles to applications. CACM
– Horvitz, Paek, et al.
- 2003
|
|
83
|
Predicting query performance
– Cronen-Townsend, Zhou, et al.
- 2002
|
|
43
|
The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web
– Douglis, Ball, et al.
- 1998
|
|
39
|
Temporal summaries of news topics
– Allan, Gupta, et al.
|
|
33
|
Overview of the TREC 2002 novelty track
– Harman
- 2002
|
|
32
|
Placing search in context: the concept revisited
– Finkelstein, Gabrilovich, et al.
|
|
31
|
A natural law of succession
– Ristad
- 1995
|
|
28
|
TimeMines: Constructing Timelines with Statistical Models of Word Usage
– Swan, Jensen
- 2000
|
|
24
|
Topicconditioned novelty detection
– Yang, Zhang, et al.
- 2002
|
|
23
|
Explorations in context space: words, sentences, discourse
– Burgess, Livesay, et al.
- 1998
|
|
22
|
Comparing corpora
– Kilgarriff
- 2001
|
|
20
|
Quantifying query ambiguity
– Cronen-Townsend, Croft
- 2002
|
|
18
|
Information filtering, novelty detection, and named-page finding
– Collins-Thompson, Ogilvie, et al.
|
|
18
|
Experiments in multidocument summarization
– Schiffman, Nenkova, et al.
- 2002
|
|
14
|
Combining multiple learning strategies for effective cross validation
– Yang, Ault, et al.
- 2000
|