Results 1 - 10
of
69
Efficient Query Evaluation on Probabilistic Databases
, 2004
"... We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attentio ..."
Abstract
-
Cited by 275 (36 self)
- Add to MetaCart
We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract
-
Cited by 170 (11 self)
- Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacian-based methods in a statistical setting.
A new approach to the minimum cut problem
- Journal of the ACM
, 1996
"... Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds th ..."
Abstract
-
Cited by 83 (8 self)
- Add to MetaCart
Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds the minimum cut in an arbitrarily weighted undirected graph with high probability. The algorithm runs in O(n 2 log 3 n) time, a significant improvement over the previous Õ(mn) time bounds based on maximum flows. It is simple and intuitive and uses no complex data structures. Our algorithm can be parallelized to run in �� � with n 2 processors; this gives the first proof that the minimum cut problem can be solved in ���. The algorithm does more than find a single minimum cut; it finds all of them. With minor modifications, our algorithm solves two other problems of interest. Our algorithm finds all cuts with value within a multiplicative factor of � of the minimum cut’s in expected Õ(n 2 � ) time, or in �� � with n 2 � processors. The problem of finding a minimum multiway cut of a graph into r pieces is solved in expected Õ(n 2(r�1) ) time, or in �� � with n 2(r�1) processors. The “trace ” of the algorithm’s execution on these two problems forms a new compact data structure for representing all small cuts and all multiway cuts in a graph. This data structure can be efficiently transformed into the
An Algorithm for Clustering cDNAs for Gene Expression Analysis
- In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clusterin ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
Computing All Small Cuts in an Undirected Network
- SIAM Journal on Discrete Mathematics
, 1994
"... : Let (N ) denote the weight of a minimum cut in an edge-weighted undirected network N , and n and m denote the numbers of vertices and edges, respectively. It is known that O(n 2k ) is an upper bound on the number of cuts with weights less than k(N ), where k 1 is a given constant. This paper rs ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
: Let (N ) denote the weight of a minimum cut in an edge-weighted undirected network N , and n and m denote the numbers of vertices and edges, respectively. It is known that O(n 2k ) is an upper bound on the number of cuts with weights less than k(N ), where k 1 is a given constant. This paper rst shows that all cuts of weights less than k(N ) can be enumerated in O(m 2 n + n 2k m) time without using the maximum ow algorithm. The paper then proves for k < 4 3 that 0 n 2 is a tight upper bound on the number of cuts of weights less than k(N ), and that all those cuts can be enumerated in O(m 2 n+mn 2 log n) time. Keywords: minimum cuts, graphs, edge-splitting, polynomial algorithm Abbreviated title: Computing Small Cuts AMS subject classications: 05C35, 05C40 1 Introduction Let N stand for an undirected network with its edges being weighted by nonnegative real numbers. Counting the number of cuts with small weights, and deriving upper and lower bounds on their...
Towards Network Denial Of Service Resistant Protocols
, 2000
"... Networked and distributed systems have introduced a new significant threat to the availability of data and services: network denial of service attacks. A well known example is the TCP SYN ooding. In general, any statefull handshake protocol is vulnerable to similar attacks. This paper examines the n ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Networked and distributed systems have introduced a new significant threat to the availability of data and services: network denial of service attacks. A well known example is the TCP SYN ooding. In general, any statefull handshake protocol is vulnerable to similar attacks. This paper examines the network denial of service in detail and surveys and compares different approaches towards preventing the attacks. As a conclusion, a number of protocol design principles are identified essential in designing network denial of service resistant protocols, and examples provided on applying the principles.
Towards a Distributed Platform for Resource-Constrained Devices
- PROC. OF IEEE 22ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2002
, 2002
"... Many visions of the future predict a world with pervasive computing, where computing services and resources permeate the environment. In these visions, people will want to execute a service on any available device without worrying about whether the service has been tailored for the device. We believ ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Many visions of the future predict a world with pervasive computing, where computing services and resources permeate the environment. In these visions, people will want to execute a service on any available device without worrying about whether the service has been tailored for the device. We believe that it will be difficult to create services that can execute well on the wide variety of devices that are being developed because of problems with diversity and resource constraints. We believe
Nonlinear Dimensionality Reduction of Data Manifolds With Essential Loops
, 2005
"... Numerous methods or algorithms have been designed to solve the problem of nonlinear dimensionality reduction (NLDR). However, very few among them are able to embed efficiently `circular' manifolds like cylinders or tori, which have one or more essential loops. This paper presents a simple and fast p ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Numerous methods or algorithms have been designed to solve the problem of nonlinear dimensionality reduction (NLDR). However, very few among them are able to embed efficiently `circular' manifolds like cylinders or tori, which have one or more essential loops. This paper presents a simple and fast procedure that can tear or cut those manifolds, i.e. break their essential loops, in order to make their embedding in a low-dimensional space easier. The key idea is the following: starting from the available data points, the tearing procedure represents the underlying manifold by a graph and then builds a maximum subgraph with no loops anymore. Because it works with a graph, the procedure can preprocess data for all NLDR techniques that uses the same representation. Recent techniques using geodesic distances (Isomap, geodesic Sammon's mapping, geodesic CCA, etc.) or $K$-ary neighborhoods (LLE, hLLE, Laplacian eigenmaps) fall in that category. After describing the tearing procedure in details, the paper comments a few experimental results.

