Results 11 - 20
of
180
Searching the Workplace Web
, 2003
"... The social impact from the World Wide Web cannot be underestimated, but technologies used to build the Web are also revolutionizing the sharing of business and government information within intranets. In many ways the lessons learned from the Internet carry over directly to intranets, but others do ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
The social impact from the World Wide Web cannot be underestimated, but technologies used to build the Web are also revolutionizing the sharing of business and government information within intranets. In many ways the lessons learned from the Internet carry over directly to intranets, but others do not apply. In particular, the social forces that guide the development of intranets are quite di#erent, and the determination of a "good answer" for intranet search is quite di#erent than on the Internet. In this paper we study the problem of intranet search. Our approach focuses on the use of rank aggregation, and allows us to examine the e#ects of di#erent heuristics on ranking of search results.
The connectivity sonar: detecting site functionality by structural patterns
- In Proceedings of the Fourteenth ACM Conference on Hypertext and Hypermedia
, 2003
"... Web sites today serve many different functions, such as corporate sites, search engines, e-stores, and so forth. As sites are created for different purposes, their structure and connectivity characteristics vary. However, this research argues that sites of similar role exhibit similar structural pat ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Web sites today serve many different functions, such as corporate sites, search engines, e-stores, and so forth. As sites are created for different purposes, their structure and connectivity characteristics vary. However, this research argues that sites of similar role exhibit similar structural patterns, as the functionality of a site naturally induces a typical hyperlinked structure and typical connectivity patterns to and from the rest of the Web. Thus, the functionality of Web sites is reflected in a set of structural and connectivity-based features that form a typical signature. In this paper, we automatically categorize sites into eight distinct functional classes, and highlight several search-engine related applications that could make immediate use of such technology. We purposely limit our categorization algorithms by tapping connectivity and structural data alone, making no use of any content analysis whatsoever. When applying two classification algorithms to a set of 202 sites of the eight defined functional categories, the algorithms correctly classified between 54.5 % and 59 % of the sites. On some categories, the precision of the classification exceeded 85%. An additional result of this work indicates that the structural signature can be used to detect spam rings and mirror sites, by clustering sites with almost identical signatures.
Clustering aggregation
- In Proceedings of the 21st International Conference on Data Engineering (ICDE
, 2005
"... We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categ ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categorical variable can be viewed as a clustering of the input rows. Moreover, clustering aggregation can be used as a meta-clustering method to improve the robustness of clusterings. The problem formulation does not require apriori information about the number of clusters, and it gives a natural way for handling missing values. We give a formal statement of the clustering-aggregation problem, we discuss related work, and we suggest a number of algorithms. For several of the methods we provide theoretical guarantees on the quality of the solutions. We also show how sampling can be used to scale the algorithms for large data sets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions. 1
A survey on pagerank computing
- Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Hybrid voting protocols and hardness of manipulation
- In Proceedings of the 16th International Symposium on Algorithms and Computation
, 2005
"... This paper addresses the problem of constructing voting protocols that are hard to manipulate. We describe a general technique for obtaining a new protocol by combining two or more base protocols, and study the resulting class of (vote-once) hybrid voting protocols, which also includes most previous ..."
Abstract
-
Cited by 40 (2 self)
- Add to MetaCart
This paper addresses the problem of constructing voting protocols that are hard to manipulate. We describe a general technique for obtaining a new protocol by combining two or more base protocols, and study the resulting class of (vote-once) hybrid voting protocols, which also includes most previously known manipulationresistant protocols. We show that for many choices of underlying base protocols, including some that are easily manipulable, their hybrids are NP-hard to manipulate, and demonstrate that this method can be used to produce manipulationresistant protocols with unique combinations of useful features. 1
Mining Anchor Text for Query Refinement
- WWW2004
, 2004
"... When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of search results. We propose a new method for automat ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
When searching large hypertext document collections, it is often possible that there are too many results available for ambiguous queries. Query refinement is an interactive process of query modification that can be used to narrow down the scope of search results. We propose a new method for automatically generating refinements or related terms to queries by mining anchor text for a large hypertext document collection. We show that the usage of anchor text as a basis for query refinement produces high quality refinement suggestions that are significantly better in terms of perceived usefulness compared to refinements that are derived using the document content. Furthermore, our study suggests that anchor text refinements can also be used to augment traditional query refinement algorithms based on query logs, since they typically differ in coverage and produce different refinements. Our results are based on experiments on an anchor text collection of a large corporate intranet.
Personalized Web search for improving retrieval effectiveness
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2004
"... Current Web search engines are built to serve all users, independent of the special needs of any individual user. Personalization of Web search is to carry out retrieval for each user incorporating his/her interests. We propose a novel technique to learn user profiles from users’ search histories. T ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
Current Web search engines are built to serve all users, independent of the special needs of any individual user. Personalization of Web search is to carry out retrieval for each user incorporating his/her interests. We propose a novel technique to learn user profiles from users’ search histories. The user profiles are then used to improve retrieval effectiveness in Web search. A user profile and a general profile are learned from the user’s search history and a category hierarchy, respectively. These two profiles are combined to map a user query into a set of categories which represent the user’s search intention and serve as a context to disambiguate the words in the user’s query. Web search is conducted based on both the user query and the set of categories. Several profile learning and category mapping algorithms and a fusion algorithm are provided and evaluated. Experimental results indicate that our technique to personalize Web search is both effective and efficient.
Rank-aware query optimization
- In SIGMOD Conference
, 2004
"... Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The new operators have a direct impact on traditional query processing and optimization. We introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this paper is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans, although challenging, is key to the full integration of rank-join operators in real-world query processing engines. We experimentally evaluate our framework by modifying the query optimizer of an open-source database management system. The experiments show the validity of our framework and the accuracy of the proposed estimation model. 1.
The Complexity of Bribery in Elections
, 2006
"... We study the complexity of influencing elections through bribery: How computationally complex is it for an external actor to determine whether by a certain amount of bribing voters a specified candidate can be made the election’s winner? We study this problem for election systems as varied as scorin ..."
Abstract
-
Cited by 34 (14 self)
- Add to MetaCart
We study the complexity of influencing elections through bribery: How computationally complex is it for an external actor to determine whether by a certain amount of bribing voters a specified candidate can be made the election’s winner? We study this problem for election systems as varied as scoring protocols and Dodgson voting, and in a variety of settings regarding the nature of the voters, the size of the candidate set, and the specification of the input. We obtain both polynomial-time bribery algorithms and proofs of the intractability of bribery. Our results indicate that the complexity of bribery is extremely sensitive to the setting. For example, we find settings where bribing weighted voters is NP-complete in general but if weights are represented in unary then the bribery problem is in P. We provide a complete classification of the complexity of bribery for the broad class of elections (including plurality, Borda, k-approval, and veto) known as scoring protocols.
PageRank, HITS and a Unified Framework for Link Analysis
"... Two popular webpage ranking algorithms are HITS and PageRank. HITS emphasizes mutual reinforcement between authority and hub webpages, while PageRank emphasizes hyperlink weight normalization and web surfing based on random walk models. We systematically generalize/combine these concepts into a unif ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Two popular webpage ranking algorithms are HITS and PageRank. HITS emphasizes mutual reinforcement between authority and hub webpages, while PageRank emphasizes hyperlink weight normalization and web surfing based on random walk models. We systematically generalize/combine these concepts into a unified framework. The ranking framework contains a large algorithm space; HITS and PageRank are two extreme ends in this space. We study several normalized ranking algorithms which are intermediate between HITS and PageRank, and obtain closed-form solutions. We show that, to first order approximation, all ranking algorithms in this framework, including PageRank and HITS, lead to same ranking which is highly correlated with ranking by indegree.

