Results 11 - 20
of
415
Searching the Web
- ACM TRANSACTIONS ON INTERNET TECHNOLOGY
, 2001
"... We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting search performance. The most common design and im ..."
Abstract
-
Cited by 162 (1 self)
- Add to MetaCart
(Show Context)
We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting search performance. The most common design and implementation techniques for each of these components are presented. For this presentation we draw from the literature and from our own experimental search engine testbed. Emphasis is on introducing the fundamental concepts and the results of several performance analyses we conducted to compare different designs.
Context in Web Search
- IEEE Data Engineering Bulletin
"... Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. Nextgeneration search engines will make increasing use of context information, either by using explicit or implici ..."
Abstract
-
Cited by 157 (0 self)
- Add to MetaCart
(Show Context)
Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. Nextgeneration search engines will make increasing use of context information, either by using explicit or implicit context information from users, or by implementing additional functionality within restricted contexts. Greater use of context in web search may help increase competition and diversity on the web. 1
Topical Locality in the Web
- In Proceedings of the 23rd Annual International Conference on Research and Development in Information Retrieval (SIGIR 2000
, 2000
"... Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable WorldWide Web. In this paper, we examine to what extent these ideas hold by empi ..."
Abstract
-
Cited by 156 (8 self)
- Add to MetaCart
(Show Context)
Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable WorldWide Web. In this paper, we examine to what extent these ideas hold by empirically testing whether topical locality mirrors spatial locality of pages on the Web. In particular, we find that the likelihood of linked pages having similar textual content to be high; the similarity of sibling pages increases when the links from the parent are close together; titles, descriptions, and anchor text represent at least part of the target page; and that anchor text may be a useful discriminator among unseen child pages. These results show the foundations necessary for the success of many web systems, including search engines, focused crawlers, linkage analyzers, and intelligent web agents.
Graph mining: laws, generators, and algorithms
- ACM COMPUT SURV (CSUR
, 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in ..."
Abstract
-
Cited by 132 (7 self)
- Add to MetaCart
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs? ” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
Challenges in Web Search Engines
, 2002
"... This article presents a high-level discussion of some problems in information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas. ..."
Abstract
-
Cited by 128 (0 self)
- Add to MetaCart
This article presents a high-level discussion of some problems in information retrieval that are unique to web search engines. The goal is to raise awareness and stimulate research in these areas.
Friends and neighbors on the web,”
- Social Networks,
, 2003
"... Abstract The Internet has become a rich and large repository of information about us as individuals. Anything from the links and text on a user's homepage to the mailing lists the user subscribes to are reflections of social interactions a user has in the real world. In this paper we devise te ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
(Show Context)
Abstract The Internet has become a rich and large repository of information about us as individuals. Anything from the links and text on a user's homepage to the mailing lists the user subscribes to are reflections of social interactions a user has in the real world. In this paper we devise techniques and tools to mine this information in order to extract social networks and the exogenous factors underlying the networks' structure. In an analysis of two data sets, from Stanford University and the Massachusetts Institute of Technology (MIT), we show that some factors are better indicators of social connections than others, and that these indicators vary between user populations. Our techniques provide potential applications in automatically inferring real world connections and discovering, labeling, and characterizing communities.
GraphChi: Large-scale Graph Computation On just a PC
- In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract
-
Cited by 115 (6 self)
- Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC. 1
A Framework For Community Identification in Dynamic Social Networks
, 2007
"... We propose frameworks and algorithms for identifying communities in social networks that change over time. Communities are intuitively characterized as “unusually densely knit ” subsets of a social network. This notion becomes more problematic if the social interactions change over time. Aggregating ..."
Abstract
-
Cited by 114 (6 self)
- Add to MetaCart
We propose frameworks and algorithms for identifying communities in social networks that change over time. Communities are intuitively characterized as “unusually densely knit ” subsets of a social network. This notion becomes more problematic if the social interactions change over time. Aggregating social networks over time can radically misrepresent the existing and changing community structure. Instead, we propose an optimization-based approach for modeling dynamic community structure. We prove that finding the most explanatory community structure is NP-hard and APX-hard, and propose algorithms based on dynamic programming, exhaustive search, maximum matching, and greedy heuristics. We demonstrate empirically that the heuristics trace developments of community structure accurately for several synthetic and real-world examples.
Using PageRank to Characterize Web Structure
"... Recent work on modeling the web graph has dwelt on capturing the degree distributions observed on the web. Pointing out that this represents a heavy reliance on “local” properties of the web graph, we study the distribution of PageRank values on the web. Our measurements suggest that PageRank value ..."
Abstract
-
Cited by 114 (0 self)
- Add to MetaCart
Recent work on modeling the web graph has dwelt on capturing the degree distributions observed on the web. Pointing out that this represents a heavy reliance on “local” properties of the web graph, we study the distribution of PageRank values on the web. Our measurements suggest that PageRank values on the web follow a power law. We then develop generative models for the web graph that explain this observation and moreover remain faithful to previously studied degree distributions. We analyze these models and compare the analysis to both snapshots from the web and to graphs generated by simulations on the new models. To our knowledge this represents the first modeling of the web that goes beyond fitting degree distributions on the web.
A survey on pagerank computing
- Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract
-
Cited by 106 (0 self)
- Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.