• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 3,029
Next 10 →

Mining Text Snippets for Images on the Web

by Anitha Kannan, Simon Baker, Krishnan Ramnath, Juliet Fiss, Dahua Lin, Lucy Vanderwende
"... Images are often used to convey many different concepts or illustrate many different stories. We propose an algorithm to mine multiple diverse, relevant, and interesting text snippets for images on the web. Our algorithm scales to all images on the web. For each image, all webpages that contain it a ..."
Abstract - Add to MetaCart
Images are often used to convey many different concepts or illustrate many different stories. We propose an algorithm to mine multiple diverse, relevant, and interesting text snippets for images on the web. Our algorithm scales to all images on the web. For each image, all webpages that contain

Data Mining: Concepts and Techniques

by Jiawei Han, Micheline Kamber , 2000
"... Our capabilities of both generating and collecting data have been increasing rapidly in the last several decades. Contributing factors include the widespread use of bar codes for most commercial products, the computerization of many business, scientific and government transactions and managements, a ..."
Abstract - Cited by 3142 (23 self) - Add to MetaCart
, and advances in data collection tools ranging from scanned texture and image platforms, to on-line instrumentation in manufacturing and shopping, and to satellite remote sensing systems. In addition, popular use of the World Wide Web as a global information system has flooded us with a tremendous amount

Matching words and pictures

by Kobus Barnard, Pinar Duygulu, David Forsyth, Nando De Freitas, David M. Blei, Michael I. Jordan - JOURNAL OF MACHINE LEARNING RESEARCH , 2003
"... We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation ..."
Abstract - Cited by 665 (40 self) - Add to MetaCart
We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto

Web Document Clustering: A Feasibility Demonstration

by Oren Zamir, Oren Etzioni , 1998
"... Abstract Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major s ..."
Abstract - Cited by 435 (3 self) - Add to MetaCart
that clusters based on snippets are almost as good as clusters created using the full text of Web documents. To satisfy the stringent requirements of the Web domain, we introduce an incremental, linear time (in the document collection size) algorithm called Suffix Tree Clustering (STC). which creates clusters

Survey of clustering data mining techniques

by Pavel Berkhin , 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract - Cited by 408 (0 self) - Add to MetaCart
applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern

Sentiwordnet: A publicly available lexical resource for opinion mining

by Andrea Esuli, Fabrizio Sebastiani - In In Proceedings of the 5th Conference on Language Resources and Evaluation (LRECÕ06 , 2006
"... Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about products ..."
Abstract - Cited by 376 (5 self) - Add to MetaCart
Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about

Keyword Extraction for Social Snippets ∗

by Zhenhui Li, Ding Zhou, Yun-fang Juan, Jiawei Han
"... Today, a huge amount of text is being generated for social purposes on social networking services on the Web. Unlike traditional documents, such text is usually extremely short and tends to be informal. Analysis of such text benefit many applications such as advertising, search, and content filterin ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
Today, a huge amount of text is being generated for social purposes on social networking services on the Web. Unlike traditional documents, such text is usually extremely short and tends to be informal. Analysis of such text benefit many applications such as advertising, search, and content

Improve Web search using image snippets

by Xiao-bing Xue, Zhi-hua Zhou, Zhongfei (mark Zhang - In Proceeding of the 21st National Conference on Artificial Intelligence , 2006
"... The Web has become the largest information repository over the world. Therefore, effectively and efficiently searching the Web becomes a key challenge. Previous research on Web search mainly attempts to exploit the text in the Web pages and the link information between the pages. This paper shows th ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
that the Web search performance can be enhanced if image information is considered. In detail, a new Web search framework is proposed, where image snippets are extracted for the Web pages, which are then provided along with text snippets to the user such that it is much easier and more accurate for the user

Mining the Web for Bilingual Text

by Philip Resnik - In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999
"... STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. ..."
Abstract - Cited by 135 (7 self) - Add to MetaCart
STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance

A large-scale study of the evolution of web pages

by Dennis Fetterly, Mark Manasse, Marc Najork, Janet L. Wiener - In Proceedings of the 12th International World Wide Web Conference , 2003
"... How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who m ..."
Abstract - Cited by 241 (5 self) - Add to MetaCart
mine the web, including all the popular search engines, but few studies have been performed to date to answer them. One notable exception is a study by Cho and Garcia-Molina, who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum
Next 10 →
Results 1 - 10 of 3,029
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University