Results 1 - 10
of
19
A survey on pagerank computing
- Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Link spam alliances
- In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB
, 2005
"... Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be interconnected in a spam farm in order to optimize rankings. We also study alliances, that is, interconnections ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be interconnected in a spam farm in order to optimize rankings. We also study alliances, that is, interconnections of spam farms. Our results identify the optimal structures and quantify the potential gains. In particular, we show that alliances can be synergistic and improve the rankings of all participants. We believe that the insights we gain will be useful in identifying and combating link spam. 1
Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection
- In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD
, 2006
"... This paper describes a technique for automating the detection of Web link spam, that is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and di#cult to solve, mostly due to the large size ..."
Abstract
-
Cited by 26 (12 self)
- Add to MetaCart
This paper describes a technique for automating the detection of Web link spam, that is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and di#cult to solve, mostly due to the large size of web collections that makes many algorithms unfeasible in practice.
Measuring Article Quality in Wikipedia: Models and Evaluation
, 2007
"... Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia artic ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our Basic model is designed based on the mutual dependency between article quality and their author authority. The PeerReview model introduces the review behavior into measuring article quality. Finally, our ProbReview models extend PeerReview with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.
Efficiency and nash equilibria in a scrip system for p2p networks
- IN ACM CONFERENCE ON ELECTRONIC COMMERCE
, 2006
"... A model of providing service in a P2P network is analyzed. It is shown that by adding a scrip system, a mechanism that admits a reasonable Nash equilibrium that reduces free riding can be obtained. The effect of varying the total amount of money (scrip) in the system on efficiency (i.e., social welf ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
A model of providing service in a P2P network is analyzed. It is shown that by adding a scrip system, a mechanism that admits a reasonable Nash equilibrium that reduces free riding can be obtained. The effect of varying the total amount of money (scrip) in the system on efficiency (i.e., social welfare) is analyzed, and it is shown that by maintaining the appropriate ratio between the total amount of money and the number of agents, efficiency is maximized. The work has implications for many online systems, not only P2P networks but also a wide variety of online forums for which scrip systems are popular, but formal analyses have been lacking.
Link-based similarity search to fight web spam
- In AIRWEB
, 2006
"... www.ilab.sztaki.hu/websearch We investigate the usability of similarity search in fighting Web spam based on the assumption that an unknown spam page is more similar to certain known spam pages than to honest pages. In order to be successful, search engine spam never appears in isolation: we observe ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
www.ilab.sztaki.hu/websearch We investigate the usability of similarity search in fighting Web spam based on the assumption that an unknown spam page is more similar to certain known spam pages than to honest pages. In order to be successful, search engine spam never appears in isolation: we observe link farms and alliances for the sole purpose of search engine ranking manipulation. The artificial nature and strong inside connectedness however gave rise to successful algorithms to identify search engine spam. One example is trust and distrust propagation, an idea originating in recommender systems and P2P networks, that yields spam classificators by spreading information along hyperlinks from white and blacklists. While most previous results use PageRank variants for propagation, we form classifiers by investigating similarity top lists of an unknown page along various measures such as co-citation, companion, nearest neighbors in low dimensional projections and SimRank. We test our method over two data sets previously used to measure spam filtering algorithms. 1.
Local computation of pagerank contributions
- In WAW
, 2007
"... Abstract. Motivated by the problem of detecting link-spam, we consider the following graph-theoretic primitive: Given a webgraph G, a vertex v in G, and a parameter δ ∈ (0, 1), compute the set of all vertices that contribute to v at least a δ fraction of v’s PageRank. We call this set the δ-contribu ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Motivated by the problem of detecting link-spam, we consider the following graph-theoretic primitive: Given a webgraph G, a vertex v in G, and a parameter δ ∈ (0, 1), compute the set of all vertices that contribute to v at least a δ fraction of v’s PageRank. We call this set the δ-contributing set of v. To this end, we define the contribution vector of v to be the vector whose entries measure the contributions of every vertex to the PageRank of v. A local algorithm is one that produces a solution by adaptively examining only a small portion of the input graph near a specified vertex. We give an efficient local algorithm that computes an ɛ-approximation of the contribution vector for a given vertex by adaptively examining O(1/ɛ) vertices. Using this algorithm, we give a local approximation algorithm for the primitive defined above. Specifically, we give an algorithm that returns a set containing the δcontributing set of v and at most O(1/δ) vertices from the δ/2-contributing set of v, and which does so by examining at most O(1/δ) vertices. We also give a local algorithm for solving the following problem: If there exist k vertices that contribute a ρ-fraction to the PageRank of v, find a set of k vertices that contribute at least a (ρ − ɛ)-fraction to the PageRank of v. In this case, we prove that our algorithm examines at most O(k/ɛ) vertices. 1
Robust PageRank and Locally Computable Spam Detection Features
, 2008
"... Since the link structure of the web is an important element in ranking systems on search engines, web spammers widely use the link structure of the web to increase the rank of their pages. Various link-based features of web pages have been introduced and have proven effective at identifying link spa ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Since the link structure of the web is an important element in ranking systems on search engines, web spammers widely use the link structure of the web to increase the rank of their pages. Various link-based features of web pages have been introduced and have proven effective at identifying link spam. One particularly successful family of features (as described in the SpamRank algorithm), is based on examining the sets of pages that contribute most to the PageRank of a given vertex, called supporting sets. In a recent paper, the current authors described an algorithm for efficiently computing, for a single specified vertex, an approximation of its supporting sets. In this paper, we describe several linkbased spam-detection features, both supervised and unsupervised, that can be derived from these approximate supporting sets. In particular, we examine the size of a node’s supporting sets and the approximate l2 norm of the PageRank contributions from other nodes. As a supervised feature, we examine the composition of a node’s supporting sets. We perform experiments on two labeled real data sets to demonstrate the effectiveness of these features for spam detection, and demonstrate that these features can be computed efficiently. Furthermore, we design a variation of PageRank (called Robust PageRank) that incorporates some of these features into its ranking, argue that this variation is more robust against link spam engineering, and give an algorithm for approximating Robust PageRank.
Blogosphere: Research Issues, Tools, and Applications
"... Weblogs, or Blogs, have facilitated people to express their thoughts, voice their opinions, and share their experiences and ideas. Individuals experience a sense of community, a feeling of belonging, a bonding that members matter to one another and their niche needs will be met through online intera ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Weblogs, or Blogs, have facilitated people to express their thoughts, voice their opinions, and share their experiences and ideas. Individuals experience a sense of community, a feeling of belonging, a bonding that members matter to one another and their niche needs will be met through online interactions. Its open standards and low barrier to publication have transformed information consumers to producers. This has created a plethora of open-source intelligence, or “collective wisdom ” that acts as the storehouse of overwhelming amounts of knowledge about the members, their environment and the symbiosis between them. Nonetheless, vast amounts of this knowledge still remain to be discovered and exploited in its suitable way. In this paper, we introduce various state-of-the-art research issues, review some key elements of research such as tools and methodologies in Blogosphere, and present a case study of identifying the influential bloggers in a community to exemplify the integration of some major aspects discussed in this paper. Towards the end, we also compare and contrast the blogosphere and social networks and the research therein. 1. INTRODUCTION TO
Sketching landscapes of page farms
- In SDM’07
"... The Web is a very large social network. It is important and interesting to understand the “ecology ” of the Web: the general relations of Web pages to their environment. The understanding of such relations has a few important applications, including Web community identification and analysis, and Web ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The Web is a very large social network. It is important and interesting to understand the “ecology ” of the Web: the general relations of Web pages to their environment. The understanding of such relations has a few important applications, including Web community identification and analysis, and Web spam detection. In this paper, we propose the notion of page farm, which is the set of pages contributing to (a major portion of) the PageRank score of a target page. We try to understand the “landscapes ” of page farms in general: how are farms of Web pages similar to or different from each other? In order to sketch the landscapes of page farms, we need to extract page farms extensively. We show that computing page farms is NP-hard, and develop a simple greedy algorithm. Then, we analyze the farms of a large number of (over 3 million) pages randomly sampled from the Web, and report some interesting findings. Most importantly, the landscapes of page farms tend to also follow the power law distribution. Moreover, the landscapes of page farms strongly reflect the importance of the Web pages. 1

