Results 1 - 10
of
13
Information retrieval on the Web
- ACM Computing Surveys
, 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited
Web Spam, Propaganda and Trust
, 2005
"... Web spamming, the practice of introducing artificial text and links into web pages to a#ect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Web spamming, the practice of introducing artificial text and links into web pages to a#ect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we first analyze the influence that web spam has on the evolution of the search engines and we identify the strong relationship of spamming methods to propagandistic techniques in society. Our analysis provides a foundation to understanding why spamming works and o#ers new insight on how to address it. In particular, it suggest that one could use anti-propagandistic techniques in the web to recognize spam. The second part of the paper demonstrates such a technique, called backwards propagation of distrust. In society, recognition of an untrustworthy message (in the opinion of a particular person or other social entity) is a reason for questioning the entities that recommend the message. Entities that are found to strongly support untrustworthy messages become untrustworthy themselves. So, social distrust is propagated backwards for a number of steps. Our algorithm simulates this social behavior on the web graph. In our algorithm, starting from an untrustworthy (according to the end user) site s, we examine its trust neighborhood, that is, the neighborhood of sites that link to s in a few steps. Evaluating the sites-members of the neighborhood we identify a biconnected component (BCCs) with a high percentage of untrustworthy sites. BCCs are formed when there are multiple paths to reach s, thus indicating a concerted e#ort to promote s. This is not the case when starting from a trustworthy site. Our tool explores thousands o...
An efficient algorithm to rank Web resources
, 2000
"... How to rank Web resources is critical to Web Resource Discovery (Search Engine). This paper not only points out the weakness of current approaches, but also presents in-depth analysis of the multidimensionality and subjectivity of rank algorithms. From a dynamics viewpoint, this paper abstracts a us ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
How to rank Web resources is critical to Web Resource Discovery (Search Engine). This paper not only points out the weakness of current approaches, but also presents in-depth analysis of the multidimensionality and subjectivity of rank algorithms. From a dynamics viewpoint, this paper abstracts a user's Web surfing action as a Markov model. Based on this model, we propose a new rank algorithm. The result of our rank algorithm, which synthesizes the relevance, authority, integrativity and novelty of each Web resource, can be computed efficiently not by iteration but through solving a group of linear equations. 2000 Published by Elsevier Science B.V. All rights reserved.
I.: An analysis of factors used in search engine ranking
- In: Adversarial Information Retrieval on the Web (2005
"... This paper investigates the influence of different page features on the ranking of search engine results. We use Google as our testbed and analyze the result rankings for several queries of different categories using statistical methods. We reformulate the problem of learning the underlying, hidden ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper investigates the influence of different page features on the ranking of search engine results. We use Google as our testbed and analyze the result rankings for several queries of different categories using statistical methods. We reformulate the problem of learning the underlying, hidden scores as binary classification. To this problem we then apply both linear and non-linear methods. In all cases, we split the data into a training set and a test set to obtain a meaningful, unbiased estimator for the quality of our predictor. Although our results clearly show that the scoring function cannot be approximated well using only the observed features, we do obtain many interesting insights along the way and discuss ways of obtaining a better estimate and principal limitations in trying to do so. 1
Combining Text-, Link-, and Classification-based Retrieval Methods to Enhance Information Discovery on the Web
, 2002
"... ..."
Information Retrieval on the Web: Selected Topics
- IBM research, Tokyo Research Laboratory, IBM
, 1999
"... In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper we review studies on the growth of the Internet and technologies which are useful for information search and retrieval on the Web. In the rst section, we present data on the Internet from several dierent sources, e.g., current as well as projected number of users, hosts and Web sites. Although the numerical gures vary, the overall trends cited by the sources are consistent and point to exponential growth during the coming decade. And Internet users are increasingly using search engines and search services to nd speci c information of interest. However, users are not satis ed with the performance of the current generation of search engines; the slow speed of retrieval, communication delays, and poor quality of retrieved results (e.g., noise and broken links) are commonly cited problems. The main body of our paper focuses on linear algebraic models and techniques for solving these problems. keywords: clustering, indexing, information retrieval, Internet, late...
On the evolution of search engine rankings
- In In the Proceedings of the 2009 WEBIST Conference
"... Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990’s, however, it was ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990’s, however, it was apparent that the human expert model of categorizing web pages does not scale. The first search engines appeared and they have been evolving ever since, taking over the role that web directories used to play. But what need makes a search engine evolve? Beyond the financial objectives, there is a need for quality in search results. Users interact with search engines through search query results. Search engines know that the quality of their ranking will determine how successful they are. If users perceive the results as valuable and reliable, they will use it again. Otherwise, it is easy for them to switch to another search engine. Search results, however, are not simply based on well-designed scientific principles, but they are influenced by web spammers. Web spamming, the practice of introducing artificial text and links into web pages to affect the results of web searches, has been recognized as a major search engine problem. It is also a serious users problem because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search. In this paper, we analyze the influence that web spam has on the evolution of the search engines and we identify the strong relationship of spamming methods on the web to propagandistic techniques in society. Our analysis provides a foundation for understanding why spamming works and offers new insight on how to address it. In particular, it suggests that one could use social anti-propagandistic techniques to recognize web spam. 1
Commercial web site links
- Internet Research: Electronic Networking Applications and Policy
"... Every hyperlink pointing at a web site is a potential source of new visitors, especially one near the top of a results page from a popular search engine. The order of the links in a search results page is often decided upon by an algorithm that takes into account the number and quality of links to a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Every hyperlink pointing at a web site is a potential source of new visitors, especially one near the top of a results page from a popular search engine. The order of the links in a search results page is often decided upon by an algorithm that takes into account the number and quality of links to all matching pages. The number of standard links targeted at a site is therefore doubly important, yet little research has touched on the actual interlinkage between business web sites, which numerically dominate the web. This paper discusses business use of the web and related search engine design issues as well as research on general and academic links before reporting on a survey of the links published by a relatively random collection of business web sites. The results indicate that around 66 % of web sites do carry external links, most of which are targeted at a specific purpose, but that about 17 % publish general links, with implications for those designing and marketing web sites.
Literature Review
, 2001
"... this paper, IR will imply text-based retrieval unless explicitly stated otherwise. ..."
Abstract
- Add to MetaCart
this paper, IR will imply text-based retrieval unless explicitly stated otherwise.

