MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks (2001)

by David Lewis Independent ,  David D. Lewis
In Text Retrieval Conference (TREC-10
Add To MetaCart

Abstract:

this paper. Here's the history: 1. Avi Arampatzis wrote (15-August-2001) to the TREC filtering mailing list, worrying that using only the top 1000 docs in the routing evaluation wouldn't be meaningful, because there were too many positive test documents. 2. As part of the discussion of Avi's obvservation, I wrote: "While I have not looked at the test data labels, I'll go out on a limb and predict that many groups will have have [sic] test set precision @ 1000 over 90% for a nontrivial number of topics. That suggests that any interesting differences between systems will only kick [sic] among documents well below rank 1000..." 4. Chris Buckley wrote "I'd be surprised with P @ 1000 of over 90% for any topics except those that are defined by a single keyword. That's a comment about reliability of the target categorization, not on system performance. Ie, the system may find 950 documents that should be in the category, but only 850 of those were actually assigned the category." 5. I wrote Chris off the list betting dinner that some system would get P @ 1000 of over 90% for some topic that was not defined by a single keyword. We discussed a bit how "single keyword" would defined and he accepted. (Basically, if Chris can write a single word query that gets P @ 1000 of 90% or more, the topic doesn't count.) 6. Separately, Paul Kantor wrote me and the list that "I will buy you a dinner if any system gets 90% @ 1000 for any topic." These were much looser terms than I'd already proposed to Chris, so I happily accepted. 7. Paul conceded on the list on September 6, 2001, after the preliminary results were released and several groups reported 90% @ 1000 results on 30 or so of the topics. I had a nice dinner with Paul at TREC 2001. 8. I am eagerly awaiting the result of Chris Buckley'...

Citations

805 Making large-scale SVM learning practical – Joachims - 1999
209 Training algorithms for linear text classifiers – Lewis, Schapire, et al. - 1996
72 Evaluating and optimizing autonomous text classification systems – Lewis - 1995
53 Text categorization based on regularized linear classification methods – Zhang, Oles - 2001
23 The Significance of the Cranfield Tests on Index Languages – Cleverdon - 1991
2 Estimating the Generalization performance of an – Joachims - 2000