Results 1  10
of
48
Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction
, 2001
"... Topic distillation is the process of finding authoritative Web pages a comprehensive "hubs" which reciprocally endorse each other and are relevant to a given query. Hyperlinkbased topic distillation has been traditionally applied to a macroscopic Web model where documents are nodes in a d ..."
Abstract

Cited by 69 (2 self)
 Add to MetaCart
(Show Context)
Topic distillation is the process of finding authoritative Web pages a comprehensive "hubs" which reciprocally endorse each other and are relevant to a given query. Hyperlinkbased topic distillation has been traditionally applied to a macroscopic Web model where documents are nodes in a directed graph and hyperlinks are edges.Mas.M::[KP models miss va lua44 clues such aba4'::M na viga::M paa els,as templa]M2'0]K inclusions, whicha: embedded in HTML paLM using ma0KP taKP Consequently, results of ma:]6:1M2' distillaKP] atillaKP have been deterioraKP] inqua:1 ya s Webpa0: a becoming more complex. We propose a uniformfinegra'K] model for the Web in which pa:] a represented by theirta trees (aes caesM their Document Object Models or DOMs)aM these DOM trees ar interconnected by ordinaM hyperlinks. Surprisingly, ma]6:[M2K' distillaKKP atillaKK do not work in the finegra M: scena:]6 We present a new awM0PK1P suitaK1 for the finegra2K0 model. It can disaggregate hubs into coherent regions by segmenting their DO trees.utua endorsement between hubs as aM0[1['M2K involve these regions, rans, tha single nodes representing complete hubs. Anecdotesae meatesMP' ts using a 28query, 366000document benchmark suite, used in ea0]K4 topic distilla[M2 reseai h, reveal two benefits from the new aM:0KK6M2 distillastion quati y improves, a,a byproduct of distillation is the aeM14 y to extra0 relevat snippets from hubs which a: nonly payM40[K relevant to the query.
Improved scheduling algorithms for minsum criteria
 Automata, Languages and Programming, volume 1099 of Lecture Notes in Computer Science
, 1996
"... Abstract. We consider the problem of finding nearoptimal solutions for a variety of A/I)hard scheduling problems for which the objective is to minimize the total weighted completion time. Recent work has led to the development of several techniques that yield constant worstcase bounds in a number ..."
Abstract

Cited by 65 (18 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the problem of finding nearoptimal solutions for a variety of A/I)hard scheduling problems for which the objective is to minimize the total weighted completion time. Recent work has led to the development of several techniques that yield constant worstcase bounds in a number of settings. We continue this line of research by providing improved performance guarantees for several of the most basic scheduling models, and by giving the first constant performance guarantee for a number of more realistically constrained scheduling problems. For example, we give an improved performance guarantee for minimizing the total weighted completion time subject to release dates on a single machine, and subject to release dates and/or precedence constraints on identical parallel machines. We also give improved bounds on the power of preemption in scheduling jobs with release dates on parallel machines. We give improved online algorithms for many more realistic scheduling models, including environments with parallelizable jobs, jobs contending for shared resources, tree precedenceconstrained jobs, as well as shop scheduling models. In several of these cases, we give the first constant performance guarantee achieved online. Finally, one of the consequences of our work is the surprising structural property that there are schedules that simultaneously approximate the optimal makespan and the optimal weighted completion time to within small constants. Not only do such schedules exist, but we can find approximations to them with an online algorithm. 1
Enhanced Topic Distillation using Text, Markup Tags, and Hyperlinks
 In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , ACM
, 2001
"... Topic distillation is the analysis of hyperlink graph structure to identify mutually reinforcing authorities (popular pages) and hubs (comprehensive lists of links to authorities). Topic distillation is becoming common in Web search engines, but the bestknown algorithms model the Web graph at a coa ..."
Abstract

Cited by 64 (1 self)
 Add to MetaCart
(Show Context)
Topic distillation is the analysis of hyperlink graph structure to identify mutually reinforcing authorities (popular pages) and hubs (comprehensive lists of links to authorities). Topic distillation is becoming common in Web search engines, but the bestknown algorithms model the Web graph at a coarse grain, with whole pages as single nodes. Such models may lose vital details in the markup tag structure of the pages, and thus lead to a tightly linked irrelevant subgraph winning over a relatively sparse relevant subgraph, a phenomenon called topic drift or contamination. The problem gets especially severe in the face of increasingly complex pages with navigation panels and advertisement links. We present an enhanced topic distillation algorithm which analyzes text, the markup tag trees that constitute HTML pages, and hyperlinks between pages. It thereby identifies subtrees which have high text and hyperlinkbased coherence w.r.t. the query. These subtrees get preferential treatment in the mutual reinforcement process. Using over 50 queries, 28 from earlier topic distillation work, we analyzed over 700 000 pages and obtained quantitative and anecdotal evidence that the new algorithm reduces topic drift. Topic areas: Citation and Link Analysis, Machine Learning for IR, Web IR. 1
Resource Scheduling for Parallel Database and Scientific Applications
 in Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures
, 1996
"... We initiate a study of resource scheduling problems in parallel database and scientific applications. Based on this study we formulate a problem. In our formulation, jobs specify their running times and amounts of a fixed number of other resources (like memory, IO) they need. The resourcetime trade ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
(Show Context)
We initiate a study of resource scheduling problems in parallel database and scientific applications. Based on this study we formulate a problem. In our formulation, jobs specify their running times and amounts of a fixed number of other resources (like memory, IO) they need. The resourcetime tradeoff may be fundamentally different for different resource types. The processor resource is malleable, meaning we can trade processors for time gracefully. Other resources may not be malleable. One way to model them is to assume no malleability: the entire requirement of those resources has to be reserved for a job to begin execution, and no smaller quantity is acceptable. The jobs also have precedences amongst them; in our applications, the precedence structure may be restricted to being a collection of trees or seriesparallel graphs. Not much is known about considering precedence and nonmalleable resource constraints together. For many other problems, it has been possible to find schedule...
BlackBox Randomized Reductions in Algorithmic Mechanism Design
"... Abstract—We give the first blackbox reduction from arbitrary approximation algorithms to truthful approximation mechanisms for a nontrivial class of multiparameter problems. Specifically, we prove that every packing problem that admits an FPTAS also admits a truthfulinexpectation randomized mech ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
Abstract—We give the first blackbox reduction from arbitrary approximation algorithms to truthful approximation mechanisms for a nontrivial class of multiparameter problems. Specifically, we prove that every packing problem that admits an FPTAS also admits a truthfulinexpectation randomized mechanism that is an FPTAS. Our reduction makes novel use of smoothed analysis, by employing small perturbations as a tool in algorithmic mechanism design. We develop a “duality” between linear perturbations of the objective function of an optimization problem and of its feasible set, and use the “primal ” and “dual ” viewpoints to prove the running time bound and the truthfulness guarantee, respectively, for our mechanism.
A Generic Program for Sequential Decision Processes
 Programming Languages: Implementations, Logics, and Programs
, 1995
"... This paper is an attempt to persuade you of my viewpoint by presenting a novel generic program for a certain class of optimisation problems, named sequential decision processes. This class was originally identified by Richard Bellman in his pioneering work on dynamic programming [4]. It is a perfect ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
This paper is an attempt to persuade you of my viewpoint by presenting a novel generic program for a certain class of optimisation problems, named sequential decision processes. This class was originally identified by Richard Bellman in his pioneering work on dynamic programming [4]. It is a perfect example of a class of problems which are very much alike, but which has until now escaped solution by a single program. Those readers who have followed some of the work that Richard Bird and I have been doing over the last five years [6, 7] will recognise many individual examples: all of these have now been unified. The point of this observation is that even when you are on the lookout for generic programs, it can take a rather long time to discover them. The presentation below will follow that earlier work, by referring to the calculus of relations and the relational theory of data types. I shall however attempt to be light on the formalism, as I do not regard it as essential to the main thesis of this paper. Undoubtedly there are other (perhaps more convenient) notations in which the same ideas could be developed. This paper does assume some degree of familiarity with a lazy functional programming language such as Haskell, Hope, Miranda
SharingAware Algorithms for Virtual Machine Colocation
"... Virtualization technology enables multiple virtual machines (VMs) to run on a single physical server. VMs that run on the same physical server can share memory pages that have identical content, thereby reducing the overall memory requirements on the server. We develop sharingaware algorithms that ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
Virtualization technology enables multiple virtual machines (VMs) to run on a single physical server. VMs that run on the same physical server can share memory pages that have identical content, thereby reducing the overall memory requirements on the server. We develop sharingaware algorithms that can colocate VMs with similar page content on the same physical server to optimize the benefits of interVM sharing. We show that interVM sharing occurs in a largely hierarchical fashion, where the sharing can be attributed to VM’s running the same OS platform, OS version, software libraries, or applications. We propose two hierarchical sharing models: a tree model and a more general clustertree model. Using a set of VM traces, we show that up to 67 % percent of the interVM sharing is captured by the tree model and up to 82 % is captured by the clustertree model. Next, we study two problem variants of critical interest to a virtualization service provider: the VM Maximization problem that determines the most profitable subset of the VMs that can be packed into the given set of servers, and the VM packing problem that determines the smallest set of servers that can accommodate a set of VMs. While both variants are NPhard, we show that both admit provably good approximation schemes in the hierarchical sharing models. We show that VM maximization for the tree and clustertree models can be approximated in polytime to within a (1 − 1) factor of optimal. Further, we show that e VM packing can be approximated in polytime to within a factor of O(log n) of optimal for clustertrees and to within a factor of 3 of optimal for trees, where n is the number of VMs. Finally, we evaluate our VM packing algorithm for the tree sharing model on realworld VM traces and show that our algorithm can exploit most of the available interVM sharing to achieve a 32 % to 50 % reduction in servers and a 25 % to 57 % reduction in memory footprint compared to sharingoblivious algorithms.
PartiallyOrdered Knapsack and Applications to Scheduling
, 2002
"... In the partiallyordered knapsack problem (POK) we are given a set N of items and a partial order on N. Each item has a size and an associated weight. The objective is to pack a set N # N of maximum weight in a knapsack of bounded size. N # should be precedenceclosed, i.e., be a valid pref ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
In the partiallyordered knapsack problem (POK) we are given a set N of items and a partial order on N. Each item has a size and an associated weight. The objective is to pack a set N # N of maximum weight in a knapsack of bounded size. N # should be precedenceclosed, i.e., be a valid prefix of . POK is a natural generalization, for which very little is known, of the classical Knapsack problem. In this paper we present both positive and negative results.
Universal sequencing on an unreliable machine
, 2011
"... We consider scheduling on an unreliable machine that may experience unexpected changes in processing speed or even full breakdowns. We aim for a universal solution that performs well without adaptation for any possible machine behavior. Our objective is to minimize ∑ wjf(Cj) for any nondecreasing, ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We consider scheduling on an unreliable machine that may experience unexpected changes in processing speed or even full breakdowns. We aim for a universal solution that performs well without adaptation for any possible machine behavior. Our objective is to minimize ∑ wjf(Cj) for any nondecreasing, nonnegative, differentiable cost function f(Cj). We design a deterministic algorithm that finds a universal scheduling sequence with a solution value within 4 times the value of an optimal clairvoyant algorithm that knows the machine behavior in advance. A randomized version of this algorithm attains in expectation a ratio of e. We also show that both results are best possible among all universal solutions. Our algorithms can be adapted to run in polynomial time with slightly increased cost. When jobs have individual release dates, the situation changes drastically. Even if all weights are equal, there are instances for which any universal solution is a factor of Ω(log n / log log n) worse than an optimal sequence for any unbounded cost function. Motivated by this hardness, we study the special case when the processing time of each job is proportional to its weight. We present a nontrivial algorithm with a small constant performance guarantee.