Results 1  10
of
721
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 687 (12 self)
 Add to MetaCart
(Show Context)
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
On the power of unique 2prover 1round games
 In Proceedings of the 34th Annual ACM Symposium on Theory of Computing
, 2002
"... ABSTRACT A 2prover game is called unique if the answer of one prover uniquely determines the answer of the second prover and vice versa (we implicitly assume games to be one round games). The value of a 2prover game is the maximum acceptance probability of the verifier over all the prover strategi ..."
Abstract

Cited by 269 (19 self)
 Add to MetaCart
ABSTRACT A 2prover game is called unique if the answer of one prover uniquely determines the answer of the second prover and vice versa (we implicitly assume games to be one round games). The value of a 2prover game is the maximum acceptance probability of the verifier over all the prover strategies. We make the following conjecture regarding the power of unique 2prover games, which we call the Unique Games Conjecture: The Unique Games Conjecture: For arbitrarily small constants i; ffi? 0, there exists a constant k = k(i; ffi) such that it is NPhard to determine whether a unique 2prover game with answers from a domain of size k has value at least 1 \Gamma i or at most ffi. We show that a positive resolution of this conjecture would imply the following hardness results:
Selection of Views to Materialize in a Data Warehouse
, 1997
"... . A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The ..."
Abstract

Cited by 233 (5 self)
 Add to MetaCart
(Show Context)
. A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space etc. In this article, we develop a theoretical framework for the general problem of selection of views in a data warehouse. We present competitive polynomialtime heuristics for selection of views to optimize total query response time, for some important special cases of the general data warehouse scenario, viz.: (i) an AND view graph, where each query/view has a unique evaluation, and (ii) an OR view graph, in which any view can be computed from any one of its related views, e.g.,...
Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
 In Proceedings of SIGIR
, 2003
"... We present a nontraditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, v ..."
Abstract

Cited by 202 (5 self)
 Add to MetaCart
We present a nontraditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.
Topology Control and Routing in Ad hoc Networks: A Survey
 SIGACT News
, 2002
"... this article, we review some of the characteristic features of ad hoc networks, formulate problems and survey research work done in the area. We focus on two basic problem domains: topology control, the problem of computing and maintaining a connected topology among the network nodes, and routing. T ..."
Abstract

Cited by 156 (0 self)
 Add to MetaCart
(Show Context)
this article, we review some of the characteristic features of ad hoc networks, formulate problems and survey research work done in the area. We focus on two basic problem domains: topology control, the problem of computing and maintaining a connected topology among the network nodes, and routing. This article is not intended to be a comprehensive survey on ad hoc networking. The choice of the problems discussed in this article are somewhat biased by the research interests of the author
A polylogarithmic approximation algorithm for the group Steiner tree problem
 Journal of Algorithms
, 2000
"... The group Steiner tree problem is a generalization of the Steiner tree problem where we ae given several subsets (groups) of vertices in a weighted graph, and the goal is to find a minimumweight connected subgraph containing at least one vertex from each group. The problem was introduced by Reich a ..."
Abstract

Cited by 146 (11 self)
 Add to MetaCart
(Show Context)
The group Steiner tree problem is a generalization of the Steiner tree problem where we ae given several subsets (groups) of vertices in a weighted graph, and the goal is to find a minimumweight connected subgraph containing at least one vertex from each group. The problem was introduced by Reich and Widmayer and finds applications in VLSI design. The group Steiner tree problem generalizes the set covering problem, and is therefore at least as had. We give a randomized O(log 3 n log k)approximation algorithm for the group Steiner tree problem on an nnode graph, where k is the number of groups. The best previous ink)v/ (Bateman, Helvig, performance guarantee was (1 +  Robins and Zelikovsky).
Scalable Influence Maximization for Prevalent Viral Marketing in LargeScale Social Networks
"... Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling preval ..."
Abstract

Cited by 140 (12 self)
 Add to MetaCart
(Show Context)
Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in largescale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several realworld and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond millionsized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread — it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100%–260 % increase in influence spread.
THE PRIMALDUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN PROBLEMS
"... The primaldual method is a standard tool in the design of algorithms for combinatorial optimization problems. This chapter shows how the primaldual method can be modified to provide good approximation algorithms for a wide variety of NPhard problems. We concentrate on results from recent researc ..."
Abstract

Cited by 135 (5 self)
 Add to MetaCart
The primaldual method is a standard tool in the design of algorithms for combinatorial optimization problems. This chapter shows how the primaldual method can be modified to provide good approximation algorithms for a wide variety of NPhard problems. We concentrate on results from recent research applying the primaldual method to problems in network design.
ConstantTime Distributed Dominating Set Approximation
 In Proc. of the 22 nd ACM Symposium on the Principles of Distributed Computing (PODC
, 2003
"... Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set ..."
Abstract

Cited by 133 (25 self)
 Add to MetaCart
(Show Context)
Finding a small dominating set is one of the most fundamental problems of traditional graph theory. In this paper, we present a new fully distributed approximation algorithm based on LP relaxation techniques. For an arbitrary parameter k and maximum degree #, our algorithm computes a dominating set of expected size O k# log #DSOPT rounds where each node has to send O k messages of size O(log #). This is the first algorithm which achieves a nontrivial approximation ratio in a constant number of rounds.
What Cannot Be Computed Locally!
 In Proceedings of the 23 rd ACM Symposium on the Principles of Distributed Computing (PODC
, 2004
"... We give time lower bounds for the distributed approximation of minimum vertex cover (MVC) and related problems such as minimum dominating set (MDS). In k communication rounds, MVC and MDS can only be approximated by factors# /k) and # /k) for some constant c, where n and # denote the number ..."
Abstract

Cited by 129 (27 self)
 Add to MetaCart
We give time lower bounds for the distributed approximation of minimum vertex cover (MVC) and related problems such as minimum dominating set (MDS). In k communication rounds, MVC and MDS can only be approximated by factors# /k) and # /k) for some constant c, where n and # denote the number of nodes and the largest degree in the graph. The number of rounds required in order to achieve a constant or even only a polylogarithmic approximation ratio is at log n/ log log n) and#1 #/ log log #). By a simple reduction, the latter lower bounds also hold for the construction of maximal matchings and maximal independent sets.