• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Building Domain-Specific Search Engines with Machine Learning Techniques (1999)

Cached

  • Download as a PDF
  •  
  • Download as a PS

Download Links

  • [www.cs.umass.edu]
  • [people.csail.mit.edu]
  • [www.csail.mit.edu]
  • [www.ri.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.justresearch.com]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.ri.cmu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Andrew Mccallum , Kamal Nigam , Jason Rennie , Kristie Seymore
Citations:58 - 6 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Mccallum99buildingdomain-specific,
    author = {Andrew Mccallum and Kamal Nigam and Jason Rennie and Kristie Seymore},
    title = {Building Domain-Specific Search Engines with Machine Learning Techniques},
    year = {1999}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with the general, Web-wide search engines. For example, www.campsearch.com allows complex queries by agegroup, size, location and cost over summer camps. Unfortunately, these domain-specific search engines are difficult and time consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that automates efficient spidering, populating topic hierarchies, and identifying informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers. It already contains over 33,000 papers and is publicly available at www.cora.jprc.com. 1 Introduction As the amount of information on the World ...

Citations

6231 Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird - 1977
3116 A tutorial on hidden Markov models and selected applications in speech recognition - Rabiner - 1989
1964 Dynamic Programming - Bellman - 1957
1134 Reinforcement learning: A survey - Kaelbling, Littman, et al. - 1996
632 Text classification from labeled and unlabeled documents using - Nigram, McCallum, et al.
619 Nigam K: A comparison of event models for naïve Bayes text classification - McCallum - 1998
496 Statistical Language Learning - Charniak - 1993
460 Wrapper induction for information extraction - Kushmerick, Weld, et al. - 1997
422 An Inequality and Associated Maximization Technique in Statistical Estimation of a Markov Process - Baum - 1972
290 DiPasquo,“Learning to extract Symbolic Knowledge from the World Wide Web - Craven, Freitag, et al. - 1998
290 Webwatcher: A tour guide for the world wide web - Joachims, Freitag, et al. - 1997
268 Naive (Bayes) at forty: The independence assumption in information retrieval - Lewis - 1998
253 Efficient crawling through URL ordering - Cho, Garcia-Molina, et al. - 1998
238 Nymble: a High-Performance Learning Name-finder - Bikel, Miller, et al. - 1997
208 Bayes and Empirical Bayes Methods for Data Analysis - CARLIN, T - 1996
149 On structuring probabilistic dependencies in stochastic language modeling. Computer Speech and Language - Ney, Essen, et al. - 1994
111 Modeling web sources for information integration - Knoblock, Minton, et al. - 1998
110 Bayesian Learning of Probabilistic Language Models - Stolcke - 1994
99 CiteSeer: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting - Bollacker, Lawrence, et al. - 1998
79 Statistical models for co-occurrence data - Hofmann, Puzicha - 1998
76 Information extraction using hidden Markov models. Master’s thesis - Leek
63 A machine learning architecture for optimizing web search engines - Boyan, Freitag, et al. - 1996
53 A web-based information system that reasons with structured collections of text - Cohen - 1998
45 ARACHNID: Adaptive retrieval agents choosing heuristic neighborhoods for information discovery - Menczer - 1997
35 Learning Page-independent Heuristics for Extracting Data from the Web. accepted for WWW-99 - Cohen, Fan
27 Error bounds for convolutional codes and an asymtotically optimum decoding algorithm - Viterbi - 1967
18 Improving text clasification by shrinkage in a hierarchy of classes - McCallum, Rosenfeld, et al. - 1998
11 Text classi¯ cation fromlabeled and unlabeled documents using em - Nigam, McCallum, et al. - 2011
9 S.J.Cunningham: Digital libraries based on fulltext retrieval - Witten, Nevill-Manning - 1996
8 Ef� cient crawling through URL ordering - Cho, Garcia-Molina, et al. - 1998
7 Nymble: a high-performance learning name- nder - Bikel, Miller, et al. - 1997
5 Improving text clasi cation by shrinkage in a hierarchy of classes - McCallum, Rosenfeld, et al. - 1998
3 Regression using classification algorithms. Intelligent Data Analysis 1(4 - Torgo, Gama - 1997
1 Regression using classi - cation algorithms. Intelligent Data Analysis 1(4 - Torgo, Gama - 1997
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University