Results 1 - 10
of
223
Focused crawling: a new approach to topic-specific Web resource discovery
, 1999
"... The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevan ..."
Abstract
-
Cited by 411 (8 self)
- Add to MetaCart
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. This leads to significant savings in hardware and network resources, and helps keep the crawl more up-to-date. To achieve such goal-directed crawling, we designed two hypertext mining programs that guide our crawler: a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, ...
The synchronization of periodic routing messages
- IEEE/ACM Transactions on Networking
, 1994
"... Abstract — The paper considers a network with many apparently-independent periodic processes and discusses one method by which these processes can inadvertent Iy become synchronized. In particular, we study the synchronization of periodic routing messages, and offer guidelines on how to avoid inadve ..."
Abstract
-
Cited by 241 (8 self)
- Add to MetaCart
Abstract — The paper considers a network with many apparently-independent periodic processes and discusses one method by which these processes can inadvertent Iy become synchronized. In particular, we study the synchronization of periodic routing messages, and offer guidelines on how to avoid inadvertent synchronization. Using simulations and analysis, we study the process of synchronization and show that the transition from unsynchronized to synchronized traffic is not one of gradual degradation but is instead a very abrupt ‘phase transition’: in general, the addition of a single router will convert a completely unsynchronized traffic stream into a completely synchronized one. We show that synchronization can be avoided by the addition of randomization to the tra~c sources and quantify how much randomization is necessary. In addition, we argue that the inadvertent synchronization of periodic processes is likely to become an increasing problem in computer networks.
Data mules: Modeling a three-tier architecture for sparse sensor networks
- IN IEEE SNPA WORKSHOP
, 2003
"... Abstract — This paper presents and analyzes an architecture that exploits the serendipitous movement of mobile agents in an environment to collect sensor data in sparse sensor networks. The mobile entities, called MULEs, pick up data from sensors when in close range, buffer it, and drop off the data ..."
Abstract
-
Cited by 237 (4 self)
- Add to MetaCart
Abstract — This paper presents and analyzes an architecture that exploits the serendipitous movement of mobile agents in an environment to collect sensor data in sparse sensor networks. The mobile entities, called MULEs, pick up data from sensors when in close range, buffer it, and drop off the data to wired access points when in proximity. This leads to substantial power savings at the sensors as they only have to transmit over a short range. Detailed performance analysis is presented based on a simple model of the system incorporating key system variables such as number of MULEs, sensors and access points. The performance metrics observed are the data success rate (the fraction of generated data that reaches the access points) and the required buffer capacities on the sensors and the MULEs. The modeling along with simulation results can be used for further analysis and provide certain guidelines for deployment of such systems. I.
The Power of Two Choices in Randomized Load Balancing
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1996
"... Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d ..."
Abstract
-
Cited by 159 (22 self)
- Add to MetaCart
Suppose that n balls are placed into n bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately log n log log n . Suppose instead that each ball is placed sequentially into the least full of d bins chosen independently and uniformly at random. It has recently been shown that the maximum load is then only log log n log d +O(1) with high probability. Thus giving each ball two choices instead of just one leads to an exponential improvement in the maximum load. This result demonstrates the power of two choices, and it has several applications to load balancing in distributed systems. In this thesis, we expand upon this result by examining related models and by developing techniques for stu...
THE ELECTRICAL RESISTANCE OF A GRAPH CAPTURES ITS COMMUTE AND COVER TIMES
"... View an n-vertex, m-edge undirected graph as an electrical network with unit resistors as edges. We extend known relations between random walks and electrical networks by showing that resistance in this network is intimately connected with the lengths of random walks on the graph. For example, the c ..."
Abstract
-
Cited by 118 (4 self)
- Add to MetaCart
View an n-vertex, m-edge undirected graph as an electrical network with unit resistors as edges. We extend known relations between random walks and electrical networks by showing that resistance in this network is intimately connected with the lengths of random walks on the graph. For example, the commute time between two vertices s and t (the expected length of a random walk from s to t and back) is precisely characterized by the e ective resistance Rst between s and t: commute time = 2mRst. As a corollary, the cover time (the expected length of a random walk visiting all vertices) is characterized by the maximum resistance R in the graph to within a factor of log n: mR cover time O(mR log n). For many graphs, the bounds on cover time obtained in this manner are better than those obtained from previous techniques such as the eigenvalues of the adjacency matrix. In particular, we improve known bounds on cover times for high-degree graphs and expanders, and give new proofs of known results for multidimensional meshes. Moreover, resistance seems to provide an intuitively appealing and tractable approach to these problems.
System-Level Power Optimization: Techniques and Tools
- ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
, 2000
"... ..."
Analysis of Branch Prediction via Data Compression
- in Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems
, 1996
"... Branch prediction is an important mechanism in modem microprocessor design. The focus of research in this area has been on designing new branch prediction schemes. In contrast, very few studies address the theoretical basis behind these prediction schemes. Knowing this theoretical basis helps us to ..."
Abstract
-
Cited by 79 (3 self)
- Add to MetaCart
Branch prediction is an important mechanism in modem microprocessor design. The focus of research in this area has been on designing new branch prediction schemes. In contrast, very few studies address the theoretical basis behind these prediction schemes. Knowing this theoretical basis helps us to evaluate how good a prediction scheme is and how much we can expect to improve its accuracy.
Classifying scheduling policies with respect to unfairness in an M/GI/1
- Proc. of SIGMETRICS’03
, 2003
"... It is common to classify scheduling policies based on their mean response times. Another important, but sometimes opposing, performance metric is a scheduling policy’s fairness. For example, a policy that biases towards short jobs so as to minimize mean response time, may end up being unfair to long ..."
Abstract
-
Cited by 75 (13 self)
- Add to MetaCart
It is common to classify scheduling policies based on their mean response times. Another important, but sometimes opposing, performance metric is a scheduling policy’s fairness. For example, a policy that biases towards short jobs so as to minimize mean response time, may end up being unfair to long jobs. In this paper we define three types of unfairness and demonstrate large classes of scheduling policies that fall into each type. We end with a discussion on which jobs are the ones being treated unfairly. 1
Capacity and delay tradeoffs for ad-hoc mobile networks
- IEEE Transactions on Information Theory
, 2005
"... Abstract — We consider the throughput/delay tradeoffs for scheduling data transmissions in a mobile ad-hoc network. To reduce delays in the network, each user sends redundant packets along multiple paths to the destination. Assuming the network has a cell partitioned structure and users move accordi ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
Abstract — We consider the throughput/delay tradeoffs for scheduling data transmissions in a mobile ad-hoc network. To reduce delays in the network, each user sends redundant packets along multiple paths to the destination. Assuming the network has a cell partitioned structure and users move according to a simplified independent and identically distributed (i.i.d.) mobility model, we compute the exact network capacity and the exact endto-end queueing delay when no redundancy is used. The capacity achieving algorithm is a modified version of the Grossglauser-Tse 2-hop relay algorithm and provides O(N) delay (where N is the number of users). We then show that redundancy cannot increase capacity, but can significantly improve delay. The following necessary tradeoff is established: delay/rate ≥ O(N). Two protocols that use redundancy and operate near the boundary of this curve are developed, with delays of O ( √ N) and O(log(N)), respectively. Networks with non-i.i.d. mobility are also considered and shown through simulation to closely match the performance of i.i.d. systems in the O ( √ N) delay regime. Index Terms — fundamental limits, queueing analysis, stochastic systems, wireless networks I.
Local search characteristics of incomplete SAT procedures
- Artificial Intelligence
, 2000
"... Effective local search methods for finding satisfying assignments of CNF formulae exhibit several systematic characteristics in their search. We identify a series of measurable characteristics of local search behavior that are predictive of problem solving efficiency. These measures are shown to be ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
Effective local search methods for finding satisfying assignments of CNF formulae exhibit several systematic characteristics in their search. We identify a series of measurable characteristics of local search behavior that are predictive of problem solving efficiency. These measures are shown to be useful for diagnosing inefficiencies in given search procedures, tuning parameters, and predicting the value of innovations to existing strategies. We then introduce a new local search method, SDF (“smoothed descent and flood”), that builds upon the intuitions gained by our study. SDF works by greedily descending in an informative objective (that considers how strongly clauses are satisfied, in addition to counting the number of unsatisfied clauses) and, once trapped in a local minima, “floods ” this minima by re-weighting unsatisfied clauses to create a new descent direction. The resulting procedure exhibits superior local search characteristics under our measures. We show that this method can compete with the state of the art techniques, and significantly reduces the number of search steps relative to many recent methods. © 2001 Elsevier Science B.V. All rights reserved.

