Results 1 - 10
of
22
Shuffling a stacked deck: the case for partially randomized ranking of search engine results
- In Proc. 31st International Conference on Very Large Databases (VLDB
, 2005
"... In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is closely correlated with quality, a more elusive concept that is difficult to measure directly. Unfortun ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is closely correlated with quality, a more elusive concept that is difficult to measure directly. Unfortunately, the correlation between popularity and quality is very weak for newly-created pages that have yet to receive many visits and/or in-links. Worse, since discovery of new content is largely done by querying search engines, and because users usually focus their attention on the top few results, newly-created but high-quality pages are effectively “shut out, ” and it can take a very long time before they become popular. We propose a simple and elegant solution to this problem: the introduction of a controlled amount of randomness into search result ranking methods. Doing so offers new pages a chance to prove their worth, although clearly using too much randomness will degrade result quality and annul any benefits achieved. Hence there is a tradeoff between exploration to estimate the quality of new pages and exploitation of pages already known to be of high quality. We study this tradeoff both analytically and via simulation, in the context of an economic objective function based on aggregate result quality amortized over time. We show that a modest amount of randomness leads to improved search results. 1
Web Spam, Propaganda and Trust
, 2005
"... Web spamming, the practice of introducing artificial text and links into web pages to a#ect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Web spamming, the practice of introducing artificial text and links into web pages to a#ect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we first analyze the influence that web spam has on the evolution of the search engines and we identify the strong relationship of spamming methods to propagandistic techniques in society. Our analysis provides a foundation to understanding why spamming works and o#ers new insight on how to address it. In particular, it suggest that one could use anti-propagandistic techniques in the web to recognize spam. The second part of the paper demonstrates such a technique, called backwards propagation of distrust. In society, recognition of an untrustworthy message (in the opinion of a particular person or other social entity) is a reason for questioning the entities that recommend the message. Entities that are found to strongly support untrustworthy messages become untrustworthy themselves. So, social distrust is propagated backwards for a number of steps. Our algorithm simulates this social behavior on the web graph. In our algorithm, starting from an untrustworthy (according to the end user) site s, we examine its trust neighborhood, that is, the neighborhood of sites that link to s in a few steps. Evaluating the sites-members of the neighborhood we identify a biconnected component (BCCs) with a high percentage of untrustworthy sites. BCCs are formed when there are multiple paths to reach s, thus indicating a concerted e#ort to promote s. This is not the case when starting from a trustworthy site. Our tool explores thousands o...
"Of Course it's True; I Saw it on the Internet!" Critical Thinking in the internet Era
- Commun. ACM
, 2003
"... this article simply states that further research is necessary. [4] Students were asked: "Would you recommend Vespro Life Science's hGH product to a friend concerned about getting older?" Only 13% of students immediately agreed to recommend this product, without consulting another source. 35% ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
this article simply states that further research is necessary. [4] Students were asked: "Would you recommend Vespro Life Science's hGH product to a friend concerned about getting older?" Only 13% of students immediately agreed to recommend this product, without consulting another source. 35% of students conducted further research and reported that they would not recommend this product without more information
Web searcher interaction with the Dogpile.com metasearch engine
- Journal of the American Society for Information Science and Technology
"... Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major Web metasearch engine, with the aim of discovering how Web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from 534,507 users of Dogpile.com on May 6, 2005 and compare these results with findings from other Web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84 % of searchers), use about 3 terms per query (mean � 2.85), implement system feedback moderately (8.4 % of users), and generally (56 % of users) spend less than one minute interacting with the Web search engine. Overall, metasearchers seem to have higher degrees of interaction than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of Web searching. We discuss the implications of our findings in relation to metasearch for Web searchers, search engines, and content providers.
The Comparative Effectiveness of Sponsored and Non-Sponsored Links for Web Ecommerce Queries
- ACM Transactions on the Web
, 2007
"... The predominant business model for Web search engines is sponsored search, which generates billions in yearly revenue. But are sponsored links providing online consumers with relevant choices for products and services? We address this and related issues by investigating the relevance of sponsored an ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The predominant business model for Web search engines is sponsored search, which generates billions in yearly revenue. But are sponsored links providing online consumers with relevant choices for products and services? We address this and related issues by investigating the relevance of sponsored and nonsponsored links for e-commerce queries on the major search engines. The results show that average relevance ratings for sponsored and nonsponsored links are practically the same, although the relevance ratings for sponsored links are statistically higher. We used 108 ecommerce queries and 8,256 retrieved links for these queries from three major Web search engines: Yahoo!, Google, and MSN. In addition to relevance measures, we qualitatively analyzed the e-commerce queries, deriving five categorizations of underlying information needs. Product-specific queries are the most prevalent (48%). Title (62%) and summary (33%) are the primary basis for evaluating sponsored links with URL a distant third (2%). To gauge the effectiveness of sponsored search campaigns, we analyzed the sponsored links from various viewpoints. It appears that links from organizations with large sponsored search campaigns are more relevant than the average sponsored link. We discuss the implications for Web search engines and sponsored search as a long-term business model and as a mechanism for finding relevant information for searchers.
Sponsored search: an overview of the concept, history, and technology
"... Abstract: The success of sponsored search has radically affected how people interact with the information, websites, and services on the web. Sponsored search provides the necessary revenue streams to web search engines and is critical to the success of many online businesses. However, there has bee ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract: The success of sponsored search has radically affected how people interact with the information, websites, and services on the web. Sponsored search provides the necessary revenue streams to web search engines and is critical to the success of many online businesses. However, there has been limited academic examination of sponsored search, with the exception of online auctions. In this paper, we conceptualise the sponsored search process as an aspect of information searching. We provide a brief history of sponsored search and an extensive examination of the technology making sponsored search possible. We critique this technology, highlighting possible implications for the future of the sponsored search process.
Ranking web sites with real user traffic
- INTERNATIONAL CONFERENCE ON WEB SEARCH AND WEB DATA MINING
, 2008
"... We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean link host graph and others pointing to importa ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean link host graph and others pointing to important differences. We find that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited. The temporal traffic patterns display strong regularities, with a large portion of future requests being statistically predictable by past ones. Given the importance of topological measures such as PageRank in modeling user navigation, as well as their role in ranking sites for Web search, we use the traffic data to validate the PageRank random surfing model. The ranking obtained by the actual frequency with which a site is visited by users differs significantly from that approximated by the uniform surfing/teleportation behavior modeled by PageRank, especially for the most important sites. To interpret this finding, we consider each of the fundamental assumptions underlying PageRank and show how each is violated by actual user behavior.
Faq mining via list detection
- In Proc. of the Workshop on Multilingual Summarization and Question Answering
, 2002
"... This paper presents an approach to FAQ mining via a list detection algorithm. List detection is very important for data collection since list has been widely used for representing data and information on the Web. By analyzing the rendering of FAQs on the Web, we found a fact that all FAQs are always ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents an approach to FAQ mining via a list detection algorithm. List detection is very important for data collection since list has been widely used for representing data and information on the Web. By analyzing the rendering of FAQs on the Web, we found a fact that all FAQs are always fully/partially represented in a list-like form. There are two ways to author a list on the Web. One is to use some specific tags, e.g. <li> tag for HTML. The lists authored in this way can be easily detected by parsing those special tags. Another way uses other tags instead of the special tags. Unfortunately, many lists are authored in the second way. To detect lists, therefore, we present an algorithm, which is independent of Web languages. By combining the algorithm with some domain knowledge, we detect and collect FAQs from the Web. The mining task achieved a performance of 72.54 % recall and 80.16% precision rates.
myVU: A Next Generation Recommender System Based on Observed Consumer Behavior and Interactive Evolutionary Algorithms
- Schader (Eds.): Data Analysis – Scientific Modeling and Practical Applications, Studies in Classification, Data Analysis, and Knowledge Organization
, 2000
"... . myVU is a next generation recommender system based on observed consumer behavior and interactive evolutionary algorithms implementing customer relationship management and one-to-one marketing in the educational and scientific broker system of a virtual university. myVU provides a personalized, ada ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
. myVU is a next generation recommender system based on observed consumer behavior and interactive evolutionary algorithms implementing customer relationship management and one-to-one marketing in the educational and scientific broker system of a virtual university. myVU provides a personalized, adaptive WWW-based user interface for all members of a virtual university and it delivers routine recommendations for frequently used scientific and educational Web-sites.
Educational and scientific recommender systems: Designing the information channels of the virtual university
, 2001
"... this article we investigate the role of recommender systems and their potential in the educational and scientific environment of a Virtual University. The key idea is to use the information aggregation capabilities of a recommender system to improve the tutoring and consulting services of a Virtual ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
this article we investigate the role of recommender systems and their potential in the educational and scientific environment of a Virtual University. The key idea is to use the information aggregation capabilities of a recommender system to improve the tutoring and consulting services of a Virtual University in an automated way and thus scale tutoring and consulting in a personalized way to a mass audience. We describe the recommender services of myVU, the collection of the personalized services of the Virtual University (VU) of the Vienna University of Economics and Business Administration which are based on observed user behavior and self assignment of experience which are currently field-tested. We show, how the usual mechanism design problems inherent to recommender systems are addressed in this prototype.

