Results 1 - 10
of
776
Automatic Subspace Clustering of High Dimensional Data
- Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the or ..."
Abstract
-
Cited by 724 (12 self)
- Add to MetaCart
(Show Context)
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
On the power of unique 2-prover 1-round games
- In Proceedings of the 34th Annual ACM Symposium on Theory of Computing
, 2002
"... ABSTRACT A 2-prover game is called unique if the answer of one prover uniquely determines the answer of the second prover and vice versa (we implicitly assume games to be one round games). The value of a 2-prover game is the maximum acceptance probability of the verifier over all the prover strategi ..."
Abstract
-
Cited by 312 (20 self)
- Add to MetaCart
ABSTRACT A 2-prover game is called unique if the answer of one prover uniquely determines the answer of the second prover and vice versa (we implicitly assume games to be one round games). The value of a 2-prover game is the maximum acceptance probability of the verifier over all the prover strategies. We make the following conjecture regarding the power of unique 2-prover games, which we call the Unique Games Conjecture: The Unique Games Conjecture: For arbitrarily small constants i; ffi? 0, there exists a constant k = k(i; ffi) such that it is NP-hard to determine whether a unique 2-prover game with answers from a domain of size k has value at least 1 \Gamma i or at most ffi. We show that a positive resolution of this conjecture would imply the following hardness results:
Selection of Views to Materialize in a Data Warehouse
, 1997
"... . A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The ..."
Abstract
-
Cited by 246 (5 self)
- Add to MetaCart
(Show Context)
. A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space etc. In this article, we develop a theoretical framework for the general problem of selection of views in a data warehouse. We present competitive polynomial-time heuristics for selection of views to optimize total query response time, for some important special cases of the general data warehouse scenario, viz.: (i) an AND view graph, where each query/view has a unique evaluation, and (ii) an OR view graph, in which any view can be computed from any one of its related views, e.g.,...
Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval
- In Proceedings of SIGIR
, 2003
"... We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, v ..."
Abstract
-
Cited by 217 (6 self)
- Add to MetaCart
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. This means that the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.
Scalable Influence Maximization for Prevalent Viral Marketing in Large-Scale Social Networks
"... Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling preval ..."
Abstract
-
Cited by 183 (14 self)
- Add to MetaCart
(Show Context)
Influence maximization, defined by Kempe, Kleinberg, and Tardos (2003), is the problem of finding a small set of seed nodes in a social network that maximizes the spread of influence under certain influence cascade models. The scalability of influence maximization is a key factor for enabling prevalent viral marketing in largescale online social networks. Prior solutions, such as the greedy algorithm of Kempe et al. (2003) and its improvements are slow and not scalable, while other heuristic algorithms do not provide consistently good performance on influence spreads. In this paper, we design a new heuristic algorithm that is easily scalable to millions of nodes and edges in our experiments. Our algorithm has a simple tunable parameter for users to control the balance between the running time and the influence spread of the algorithm. Our results from extensive simulations on several real-world and synthetic networks demonstrate that our algorithm is currently the best scalable solution to the influence maximization problem: (a) our algorithm scales beyond million-sized graphs where the greedy algorithm becomes infeasible, and (b) in all size ranges, our algorithm performs consistently well in influence spread — it is always among the best algorithms, and in most cases it significantly outperforms all other scalable heuristics to as much as 100%–260 % increase in influence spread.
Topology Control and Routing in Ad hoc Networks: A Survey
- SIGACT News
, 2002
"... this article, we review some of the characteristic features of ad hoc networks, formulate problems and survey research work done in the area. We focus on two basic problem domains: topology control, the problem of computing and maintaining a connected topology among the network nodes, and routing. T ..."
Abstract
-
Cited by 164 (0 self)
- Add to MetaCart
(Show Context)
this article, we review some of the characteristic features of ad hoc networks, formulate problems and survey research work done in the area. We focus on two basic problem domains: topology control, the problem of computing and maintaining a connected topology among the network nodes, and routing. This article is not intended to be a comprehensive survey on ad hoc networking. The choice of the problems discussed in this article are somewhat biased by the research interests of the author
A polylogarithmic approximation algorithm for the group Steiner tree problem
- Journal of Algorithms
, 2000
"... The group Steiner tree problem is a generalization of the Steiner tree problem where we ae given several subsets (groups) of vertices in a weighted graph, and the goal is to find a minimum-weight connected subgraph containing at least one vertex from each group. The problem was introduced by Reich a ..."
Abstract
-
Cited by 149 (9 self)
- Add to MetaCart
(Show Context)
The group Steiner tree problem is a generalization of the Steiner tree problem where we ae given several subsets (groups) of vertices in a weighted graph, and the goal is to find a minimum-weight connected subgraph containing at least one vertex from each group. The problem was introduced by Reich and Widmayer and finds applications in VLSI design. The group Steiner tree problem generalizes the set covering problem, and is therefore at least as had. We give a randomized O(log 3 n log k)-approximation algorithm for the group Steiner tree problem on an n-node graph, where k is the number of groups. The best previous ink)v/ (Bateman, Helvig, performance guarantee was (1 + - Robins and Zelikovsky).
The multiplicative weights update method: a meta algorithm and applications
, 2005
"... Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies ..."
Abstract
-
Cited by 147 (13 self)
- Add to MetaCart
(Show Context)
Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies these disparate algorithms and drives them as simple instantiations of the meta algorithm. 1
Near-optimal nonmyopic value of information in graphical models
- IN ANNUAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2005
"... A fundamental issue in real-world systems, such as sensor networks, is the selection of observations which most effectively reduce uncertainty. More specifically, we address the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model. We present ..."
Abstract
-
Cited by 146 (25 self)
- Add to MetaCart
(Show Context)
A fundamental issue in real-world systems, such as sensor networks, is the selection of observations which most effectively reduce uncertainty. More specifically, we address the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model. We present the first efficient randomized algorithm providing a constant factor (1 − 1/e − ε) approximation guarantee for any ε> 0 with high confidence. The algorithm leverages the theory of submodular functions, in combination with a polynomial bound on sample complexity. We furthermore prove that no polynomial time algorithm can provide a constant factor approximation better than (1 − 1/e) unless P = NP. Finally, we provide extensive evidence of the effectiveness of our method on two complex real-world datasets.
Maximizing non-monotone submodular functions
- IN PROCEEDINGS OF 48TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS
, 2007
"... Submodular maximization generalizes many important problems including Max Cut in directed/undirected graphs and hypergraphs, certain constraint satisfaction problems and maximum facility location problems. Unlike the problem of minimizing submodular functions, the problem of maximizing submodular fu ..."
Abstract
-
Cited by 146 (18 self)
- Add to MetaCart
Submodular maximization generalizes many important problems including Max Cut in directed/undirected graphs and hypergraphs, certain constraint satisfaction problems and maximum facility location problems. Unlike the problem of minimizing submodular functions, the problem of maximizing submodular functions is NP-hard. In this paper, we design the first constant-factor approximation algorithms for maximizing nonnegative submodular functions. In particular, we give a deterministic local search 1 2-approximation and a randomized-approximation algo-