Abstract:
Text retrieval systems typically produce a ranking of documents and let a user decide how far down that ranking to go. In contrast, programs that filter text streams, software that categorizes documents, agents which alert users, and many other IR systems must make decisions without human input or supervision. It is important to define what constitutes good effectiveness for these autonomous systems, tune the systems to achieve the highest possible effectiveness, and estimate how the effectiveness changes as new data is processed. We show how to do this for binary text classification systems, emphasizing that different goals for the system lead to different optimal behaviors. Optimizing and estimating effectiveness is greatly aided if classifiers that explicitly estimate the probability of class membership are used. 1 Introduction Ranked retrieval is the information retrieval (IR) researcher's favorite tool for dealing with information overload. Ranked retrieval systems display docum...
Citations
|
291
|
A sequential algorithm for training text classifiers
– Lewis, Gale
- 1994
|
|
129
|
Pattern Classification and Scene Analysis. A Wiley-Inter science Publication
– Duda, Hart
- 1973
|
|
110
|
Relevance: A review of and a framework for the thinking on the notion in information science. (book
– Saracevic
- 1975
|
|
51
|
Statistical Theory
– Lindgren
- 1968
|
|
49
|
Full text retrieval based on probabilistic equations with coe cients tted by logistic regression
– Cooper, Chen, et al.
- 1994
|
|
30
|
The Second Text REtrieval Conference
– Harman, editor
- 1994
|
|
30
|
Information Retrieval. Butterworths
– Van-Rijsbergen
- 1979
|
|
23
|
Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions
– Fuhr, Pfeifer
- 1994
|
|
7
|
Optimum Probability Estimation from Empirical Distributions
– Fuhr, Huther
- 1989
|
|
4
|
Text to information: Sampling uncertainty in an example from physician/patient encounters
– Thomas, Scovel, et al.
- 1995
|
|
2
|
A mathematical model of retrieval system performance
– McCarn, Lewis
- 1990
|
|
1
|
the American Society for Information Science, 24:87--100, March--April
– Mood, Graybill, et al.
- 1973
|
|
1
|
Implementation of a philosophy
– Part
- 1973
|
|
1
|
CANCERLINE evaluation project: Final report
– Pollitt
- 1977
|