MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Accelerated Focused Crawling through Online Relevance Feedback (2002) [47 citations — 2 self]

by Soumen Chakrabarti ,  Kunal Punera ,  Mallela Subramanyam
Add To MetaCart

Abstract:

The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded.

Citations

2329 Introduction to modern information retrieval – Salton - 1983
640 Combining labeled and unlabeled data with co-training – Blum, Mitchell - 1998
514 A comparison of event models for naive bayes text classification – McCallum, Nigam - 1998
468 An agent that assists web browsing – Lieberman - 1995
339 Focused crawling: a new approach to topic-specific (web) resource discovery – Chakrabarti, Berg, et al. - 1999
254 Enhanced hypertext categorization using hyperlinks – Chakrabarti, Dom, et al. - 1998
244 Automatic resource compilation by analyzing hyperlink structure and associated text – Chakrabarti, Dom, et al. - 1998
200 Efficient crawling through URL ordering – Cho, Garcia-Molina, et al. - 1998
186 Automated learning of decision rules for text categorization – Apte, Damerau - 1994
175 A method for disambiguating word senses in a large corpus – Gale, Church, et al. - 1993
147 Focused crawling using context graphs – Diligenti, Coetzee, et al. - 2000
143 Stochastic models for the web graph – Kumar, Raghavan, et al. - 2000
113 A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ ˜mccallum/bow – Bow
88 Using reinforcement learning to spider the webefficiently – Rennie, McCallum - 1999
85 Topical locality in the web – Davison - 2000
70 Raghavan P: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomy – Chakrabarti, AR - 1998
68 Adaptive retrieval agents: Internalizing local context and scaling up to the web. Machine Learning 39(2/3):203–242 – Menczer, Belew - 2000
58 Intelligent crawling on the World Wide Web with arbitrary predicates – Aggarwal, Al-Garawi, et al. - 2001
57 Evaluating topic-driven Web crawlers – Menczer, Pant, et al. - 2001
49 Integrating the Document Object Model with Hyperlinks for Enhanced Top Distillation and Information Extraction – Chakrabarti
44 Information retrieval in the world-wide web: Making client-based searching feasible – DeBra, Post - 1994
41 Exploring the web with reconnaissance Agents – Lieberman, Fry, et al.
17 WTMS: A System for Collecting and Analyzing Topic-specific Web Information – Mukherjea - 2000
8 Searching for arbitrary information in the WWW: the fish-search for Mosaic – Bra, Post - 1994
7 Regression by Classification – Torgo, J - 1996
4 The sharksearch algorithm—An application: Tailored Web site mapping – HERSOVICI, JACOVI, et al. - 1998
3 Topic distillation and spectral filtering – Chakrabarti, Dom, et al. - 1999
3 WebWatcher: A tour guide for the web – Joachims, Freitag, et al. - 1997
2 tell us about lexical and semantic Web content – Links - 2001
1 Mining the Web – Mitchell - 2001
1 Focused crawling using TFIDF centroid. Hypertext Retrieval and Mining (CS610) class project, Apr. 2001. Details available from manyam@cs.utexas.edu – Subramanyam, Phanindra, et al.