• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Z.: Reinforcing web-object categorization through interrelationships. Data (2006)

by G R Xue, Y Yu, D Shen, Q Yang, H J Zeng, Chen
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Using the wisdom of the crowds for keyword generation

by Ariel Fuxman, Kannan Achan, Panayiotis Tsaparas, Rakesh Agrawal - In WWW , 2008
"... In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords, and their ad is displayed when the keyword is queried to the search engine. An important problem in this process is key ..."
Abstract - Cited by 26 (3 self) - Add to MetaCart
In the sponsored search model, search engines are paid by businesses that are interested in displaying ads for their site alongside the search results. Businesses bid for keywords, and their ad is displayed when the keyword is queried to the search engine. An important problem in this process is keyword generation: given a business that is interested in launching a campaign, suggest keywords that are related to that campaign. We address this problem by making use of the query logs of the search engine. We identify queries related to a campaign by exploiting the associations between queries and URLs as they are captured by the user’s clicks. These queries form good keyword suggestions since they capture the “wisdom of the crowd ” as to what is related to a site. We formulate the problem as a semi-supervised learning problem, and propose algorithms within the Markov Random Field model. We perform experiments with real query logs, and we demonstrate that our algorithms scale to large query logs and produce meaningful results.

Web Page Classification: Features and Algorithms

by Xiaoguang Qi, Brian D. Davison , 2007
"... Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but th ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As we review work in web page classification, we note the importance of these web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages. 1

Query-log mining for detecting spam

by Carlos Castillo, Claudio Corsi, Debora Donato, Paolo Ferragina, Aristides Gionis , 2008
"... Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is encoded in the form of a query log that consists of a sequence of search actions, one per user query, ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Every day millions of users search for information on the web via search engines, and provide implicit feedback to the results shown for their queries by clicking or not onto them. This feedback is encoded in the form of a query log that consists of a sequence of search actions, one per user query, each describing the following information: (i) terms composing a query, (ii) documents returned by the search engine, (iii) documents that have been clicked, (iv) the rank of those documents in the list of results, (v) date and time of the search action/click, (vi) an anonymous identifier for each session, and more. In this work, we investigate the idea of characterizing the documents and the queries belonging to a given query log with the goal of improving algorithms for detecting spam, both at the document level and at the query level.

Moderated Class membership Interchange in Iterative Multi relational Graph Classifier

by Peter Vojtek, Mária Bieliková
"... Organizing information resources into classes helps significantly in searching in massive volumes of on line documents available through the Web or other information sources such as electronic mail, digital libraries, corporate databases. Existing classification methods are often based only on own ..."
Abstract - Add to MetaCart
Organizing information resources into classes helps significantly in searching in massive volumes of on line documents available through the Web or other information sources such as electronic mail, digital libraries, corporate databases. Existing classification methods are often based only on own content of document, i.e. its attributes. Considering relations in the web document space brings better results. We adopt multi relational classification that interconnects attribute based classifiers with iterative optimization based on relational heterogeneous graph structures, while different types of instances and various relation types can be classified together. We establish moderated class membership spreading mechanism in multi relational graphs and compare the impact of various levels of regulation in collective inference classifier. The experiments based on large scale graphs originated in MAPEKUS research project data set (web portals of scientific libraries) demonstrate that moderated class membership spreading significantly increases accuracy of the relational classifier (up to 10%) and protects instances with heterophilic neighborhood to be misclassified.

Query-log mining for detecting polysemy and spam

by Carlos Castillo, Claudio Corsi, Debora Donato, Paolo Ferragina, Aristides Gionis , 2008
"... ..."
Abstract - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University