• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Modeling and Managing Content Changes in Text Databases (2007)

Cached

  • Download as a PDF

Download Links

  • [www.ntoulas.net]
  • [oak.cs.ucla.edu]
  • [rose.cs.ucla.edu]
  • [www.cs.columbia.edu]
  • [www1.cs.columbia.edu]
  • [www.cs.columbia.edu]
  • [archive.nyu.edu]
  • [qprober.cs.columbia.edu]
  • [www.stern.nyu.edu]
  • [www.ntoulas.net]
  • [www.ipeirotis.com]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Panagiotis G. Ipeirotis , Alexandros Ntoulas , Junghoo Cho , Luis Gravano
Citations:18 - 4 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Ipeirotis07modelingand,
    author = {Panagiotis G. Ipeirotis and Alexandros Ntoulas and Junghoo Cho and Luis Gravano},
    title = {Modeling and Managing Content Changes in Text Databases},
    year = {2007}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Large amounts of (often valuable) information are stored in web-accessible text databases. “Metasearchers” provide unified interfaces to query multiple such databases at once. For efficiency, metasearchers rely on succinct statistical summaries of the database contents to select the best databases for each query. So far, database selection research has largely assumed that databases are static, so the associated statistical summaries do not evolve over time. However, databases are rarely static and the statistical summaries that describe their contents need to be updated periodically to reflect content changes. In this article, we first report the results of a study showing how the content summaries of 152 real web databases evolved over a period of 52 weeks. Then, we show how to use “survival analysis ” techniques in general, and Cox’s proportional hazards regression in particular, to model database changes over time and predict when we should update each content summary. Finally, we exploit our change model to devise update schedules that keep the summaries up to date by contacting databases only when needed, and then we evaluate the

Keyphrases

text database    managing content change    content summary    update schedule    change model    unified interface    associated statistical summary    content change    real web database    database selection research    web-accessible text database    succinct statistical summary    database change    database content    statistical summary    large amount    multiple database    proportional hazard regression    survival analysis technique   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University