Results 1  10
of
96
Efficient Query Evaluation on Probabilistic Databases
, 2004
"... We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attentio ..."
Abstract

Cited by 347 (38 self)
 Add to MetaCart
We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #Pcomplete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a MonteCarlo simulation algorithm.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 286 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
A new approach to the minimum cut problem
 Journal of the ACM
, 1996
"... Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds th ..."
Abstract

Cited by 95 (8 self)
 Add to MetaCart
Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds the minimum cut in an arbitrarily weighted undirected graph with high probability. The algorithm runs in O(n 2 log 3 n) time, a significant improvement over the previous Õ(mn) time bounds based on maximum flows. It is simple and intuitive and uses no complex data structures. Our algorithm can be parallelized to run in �� � with n 2 processors; this gives the first proof that the minimum cut problem can be solved in ���. The algorithm does more than find a single minimum cut; it finds all of them. With minor modifications, our algorithm solves two other problems of interest. Our algorithm finds all cuts with value within a multiplicative factor of � of the minimum cut’s in expected Õ(n 2 � ) time, or in �� � with n 2 � processors. The problem of finding a minimum multiway cut of a graph into r pieces is solved in expected Õ(n 2(r�1) ) time, or in �� � with n 2(r�1) processors. The “trace ” of the algorithm’s execution on these two problems forms a new compact data structure for representing all small cuts and all multiway cuts in a graph. This data structure can be efficiently transformed into the
An Algorithm for Clustering cDNAs for Gene Expression Analysis
 In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clusterin ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
Computing All Small Cuts in an Undirected Network
 SIAM Journal on Discrete Mathematics
, 1994
"... : Let (N ) denote the weight of a minimum cut in an edgeweighted undirected network N , and n and m denote the numbers of vertices and edges, respectively. It is known that O(n 2k ) is an upper bound on the number of cuts with weights less than k(N ), where k 1 is a given constant. This paper rs ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
: Let (N ) denote the weight of a minimum cut in an edgeweighted undirected network N , and n and m denote the numbers of vertices and edges, respectively. It is known that O(n 2k ) is an upper bound on the number of cuts with weights less than k(N ), where k 1 is a given constant. This paper rst shows that all cuts of weights less than k(N ) can be enumerated in O(m 2 n + n 2k m) time without using the maximum ow algorithm. The paper then proves for k < 4 3 that 0 n 2 is a tight upper bound on the number of cuts of weights less than k(N ), and that all those cuts can be enumerated in O(m 2 n+mn 2 log n) time. Keywords: minimum cuts, graphs, edgesplitting, polynomial algorithm Abbreviated title: Computing Small Cuts AMS subject classications: 05C35, 05C40 1 Introduction Let N stand for an undirected network with its edges being weighted by nonnegative real numbers. Counting the number of cuts with small weights, and deriving upper and lower bounds on their...
Towards a Distributed Platform for ResourceConstrained Devices
 PROC. OF IEEE 22ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2002
, 2002
"... Many visions of the future predict a world with pervasive computing, where computing services and resources permeate the environment. In these visions, people will want to execute a service on any available device without worrying about whether the service has been tailored for the device. We believ ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
Many visions of the future predict a world with pervasive computing, where computing services and resources permeate the environment. In these visions, people will want to execute a service on any available device without worrying about whether the service has been tailored for the device. We believe that it will be difficult to create services that can execute well on the wide variety of devices that are being developed because of problems with diversity and resource constraints. We believe
Towards Network Denial Of Service Resistant Protocols
, 2000
"... Networked and distributed systems have introduced a new significant threat to the availability of data and services: network denial of service attacks. A well known example is the TCP SYN ooding. In general, any statefull handshake protocol is vulnerable to similar attacks. This paper examines the n ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Networked and distributed systems have introduced a new significant threat to the availability of data and services: network denial of service attacks. A well known example is the TCP SYN ooding. In general, any statefull handshake protocol is vulnerable to similar attacks. This paper examines the network denial of service in detail and surveys and compares different approaches towards preventing the attacks. As a conclusion, a number of protocol design principles are identified essential in designing network denial of service resistant protocols, and examples provided on applying the principles.
Assessing significance of connectivity and conservation in protein interaction networks
 Journal of Computational Biology
, 2006
"... Computational and comparative analysis of proteinprotein interaction (PPI) networks enable understanding of the modular organization of the cell through identification of functional modules and protein complexes. These analysis techniques generally rely on topological features such as connectedness ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Computational and comparative analysis of proteinprotein interaction (PPI) networks enable understanding of the modular organization of the cell through identification of functional modules and protein complexes. These analysis techniques generally rely on topological features such as connectedness, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work in our lab, and in other labs has focused on efficient algorithms for identification of modules and their conservation. Application of these methods to a variety of networks has yielded novel biological insights. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools such as BLAST and CLUSTAL. One critical component of this infrastructure is a measure of the statistical significance of a match or a dense subcomponent. Corresponding sequencebased measures such as Evalues are key components of sequence matching tools. In the absence of an analytical measure, conventional methods rely on computer simulations based on adhoc models for quantifying significance. This paper presents the first such effort, to the best of our knowledge, aimed at analytically quantifying statistical significance