Results 1 - 10
of
10
Link-Based Characterization and Detection of Web Spam
- In AIRWeb
, 2006
"... We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a stu ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. Using this approach we are able to detect 80.4% of the Web spam in our sample, with only 1.1% of false positives.
A reference collection for Web spam
- SIGIR Forum
, 2006
"... We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by ..."
Abstract
-
Cited by 36 (12 self)
- Add to MetaCart
We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges. 1
Implicit: An agent-based recommendation system for web search
- In Proceedings of the 4th International Conference on Autonomous Agents and Multi-Agent Systems
, 2005
"... The amount of information on Internet is increasing very fast and, as a result, search becomes more and more a harder task. A common solution is to use authority-based search engines. However, for a community of people with similar interests, quality of results can be improved exploiting also implic ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
The amount of information on Internet is increasing very fast and, as a result, search becomes more and more a harder task. A common solution is to use authority-based search engines. However, for a community of people with similar interests, quality of results can be improved exploiting also implicit knowledge. We propose an agentbased recommendation system for supporting communities of people in searching the web by means of a popular search engine. Agents use data mining techniques in order to learn and discover users behaviors, and they interact to share knowledge about the users. We also present a set of experimental results showing in terms of precision and recall how interaction increases the performance of the system. 1.
Evaluating the information quality of web sites: A methodology based on fuzzy computing with words
- Journal of American Society for Information Science and Technology
, 2006
"... An evaluation methodology based on fuzzy computing with words aimed at measuring the information quality of Web sites containing documents is presented. This methodology is qualitative and user oriented because it generates linguistic recommendations on the information quality of the content-based W ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
An evaluation methodology based on fuzzy computing with words aimed at measuring the information quality of Web sites containing documents is presented. This methodology is qualitative and user oriented because it generates linguistic recommendations on the information quality of the content-based Web sites based on users ’ perceptions. It is composed of two main components, an evaluation scheme to analyze the information quality of Web sites and a measurement method to generate the linguistic recommendations. The evaluation scheme is based on both technical criteria related to the Web site structure and criteria related to the content of information on the Web sites. It is user driven because the chosen criteria are easily understandable by the users, in such a way that Web visitors can assess them by means of linguistic evaluation judgments. The measurement method is user centered because it generates linguistic recommendations of the Web sites based on the visitors ’ linguistic evaluation judgments. To combine the linguistic evaluation judgments we introduce two new majority guided linguistic aggregation operators, the Majority guided Linguistic Induced Ordered Weighted Averaging (MLIOWA) and weighted MLIOWA operators, which generate the linguistic recommendations according to the majority of the evaluation judgments provided by different visitors. The use of this methodology could improve tasks such as information filtering and evaluation on the World Wide Web.
On local estimations of PageRank: A mean field approach. Internet Mathematics
"... PageRank is a key element in the success of search engines, allowing to rank the most important hits in the top screen of results. One key aspect that distinguishes PageRank from other prestige measures such as in-degree is its global nature. From the information provider perspective, this makes it ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
PageRank is a key element in the success of search engines, allowing to rank the most important hits in the top screen of results. One key aspect that distinguishes PageRank from other prestige measures such as in-degree is its global nature. From the information provider perspective, this makes it difficult or impossible to predict how their pages will be ranked. Consequently a market has emerged for the optimization of search engine results. Here we study the accuracy with which PageRank can be approximated by in-degree, a local measure made freely available by search engines. Theoretical and empirical analyses lead to conclude that given the weak degree correlations in the Web link graph, the approximation can be relatively accurate, giving service and information providers an effective new marketing tool. 1 1
Web Spam Detection: link-based and content-based techniques
"... Abstract. The Web is both an excellent medium for sharing information as well as an attractive platform for delivering products and services. This platform is, to some extent, mediated by search engines in order to meet the needs of users seeking information. Search engines are the “dragons” that ke ..."
Abstract
- Add to MetaCart
Abstract. The Web is both an excellent medium for sharing information as well as an attractive platform for delivering products and services. This platform is, to some extent, mediated by search engines in order to meet the needs of users seeking information. Search engines are the “dragons” that keep a valuable treasure: information [13]. Given the vast amount of information available on the Web, it is customary to answer queries with only a small set of results (typically 10 or 20 pages at most). Search engines must then rank Web pages, in order to create a short list of high-quality results for users. Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial search engines to detect spam pages efficiently and accurately. Here we present the main techniques recently introduced for Web Spam detection e demotion. 1
Applying Aggregation Operators for Information Access Systems: An Application in Digital Libraries
"... Nowadays, the information access on the Web is a main problem in the computer science community. Any major advance in the field of information access on the Web requires the collaboration of different methodologies and research areas. In this paper, the concept of aggregation operator playing a role ..."
Abstract
- Add to MetaCart
Nowadays, the information access on the Web is a main problem in the computer science community. Any major advance in the field of information access on the Web requires the collaboration of different methodologies and research areas. In this paper, the concept of aggregation operator playing a role for information access on the Web is analyzed. We present some Web methodologies, as search engines, recommender systems, and Web quality evaluation models and analyze the way aggregation operators help toward the success of their activities. We also show an application of the aggregation operators in digital libraries. In particular, we introduce a Web information system to analyze the quality of digital libraries that implements an important panel of aggregation operators to obtain the quality assessments. C ○ 2008 Wiley Periodicals, Inc. 1.
Determining Attributes to Maximize Visibility of Objects
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
"... In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a sel ..."
Abstract
- Add to MetaCart
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We develop several formulations of this problem. Although the problems are NP-complete, we give several exact and approximation algorithms that work well in practice. One type of exact algorithms is based on Integer Programming (IP) formulations of the problems. Another class of exact methods is based on maximal frequent itemset mining algorithms. The approximation algorithms are based on greedy heuristics. A detailed performance study illustrates the benefits of our methods on real and synthetic data.
Applying Aggregation Operators for Information Access Systems: An Application in Digital Libraries
"... Nowadays, the information access on the Web is a main problem in the computer science community. Any major advance in the field of information access on the Web requires the collaboration of different methodologies and research areas. In this paper, the concept of aggregation operator playing a role ..."
Abstract
- Add to MetaCart
Nowadays, the information access on the Web is a main problem in the computer science community. Any major advance in the field of information access on the Web requires the collaboration of different methodologies and research areas. In this paper, the concept of aggregation operator playing a role for information access on the Web is analyzed. We present some Web methodologies, as search engines, recommender systems, and Web quality evaluation models, and analyze the way aggregation operators help towards the success of their activities. We also show an application of the aggregation operators in digital libraries. In particular, we introduce a Web information system to analyze the quality of Digital Libraries which implements an important panel of aggregation operators to obtain the quality assessments.

