MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Evaluating and Optimizing Autonomous Text Classification Systems (1995) [72 citations — 9 self]

by David Lewis
Add To MetaCart

Abstract:

Text retrieval systems typically produce a ranking of documents and let a user decide how far down that ranking to go. In contrast, programs that filter text streams, software that categorizes documents, agents which alert users, and many other IR systems must make decisions without human input or supervision. It is important to define what constitutes good effectiveness for these autonomous systems, tune the systems to achieve the highest possible effectiveness, and estimate how the effectiveness changes as new data is processed. We show how to do this for binary text classification systems, emphasizing that different goals for the system lead to different optimal behaviors. Optimizing and estimating effectiveness is greatly aided if classifiers that explicitly estimate the probability of class membership are used. 1 Introduction Ranked retrieval is the information retrieval (IR) researcher's favorite tool for dealing with information overload. Ranked retrieval systems display docum...

Citations

291 A sequential algorithm for training text classifiers – Lewis, Gale - 1994
129 Pattern Classification and Scene Analysis. A Wiley-Inter science Publication – Duda, Hart - 1973
110 Relevance: A review of and a framework for the thinking on the notion in information science. (book – Saracevic - 1975
51 Statistical Theory – Lindgren - 1968
49 Full text retrieval based on probabilistic equations with coe cients tted by logistic regression – Cooper, Chen, et al. - 1994
30 The Second Text REtrieval Conference – Harman, editor - 1994
30 Information Retrieval. Butterworths – Van-Rijsbergen - 1979
23 Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions – Fuhr, Pfeifer - 1994
7 Optimum Probability Estimation from Empirical Distributions – Fuhr, Huther - 1989
4 Text to information: Sampling uncertainty in an example from physician/patient encounters – Thomas, Scovel, et al. - 1995
2 A mathematical model of retrieval system performance – McCarn, Lewis - 1990
1 the American Society for Information Science, 24:87--100, March--April – Mood, Graybill, et al. - 1973
1 Implementation of a philosophy – Part - 1973
1 CANCERLINE evaluation project: Final report – Pollitt - 1977