Results 1  10
of
11
Advances in metric embedding theory
 IN STOC ’06: PROCEEDINGS OF THE THIRTYEIGHTH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 2006
"... Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The theory of metric embedding received much attention in recent years by mathematicians ..."
Abstract

Cited by 36 (13 self)
 Add to MetaCart
Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The theory of metric embedding received much attention in recent years by mathematicians as well as computer scientists and has been applied in many algorithmic applications. A cornerstone of the field is a celebrated theorem of Bourgain which states that every finite metric space on n points embeds in Euclidean space with O(log n) distortion. Bourgain’s result is best possible when considering the worst case distortion over all pairs of points in the metric space. Yet, it is possible that an embedding can do much better in terms of the average distortion. Indeed, in most practical applications of metric embedding the main criteria for the quality of an embedding is its average distortion over all pairs. In this paper we provide an embedding with constant average distortion for arbitrary metric spaces, while maintaining the same worst case bound provided by Bourgain’s theorem. In fact, our embedding possesses a much stronger property. We define the ℓqdistortion of a uniformly distributed pair of points. Our embedding achieves the best possible ℓqdistortion for all 1 ≤ q ≤ ∞ simultaneously. These results have several algorithmic implications, e.g. an O(1) approximation for the unweighted uncapacitated quadratic assignment problem. The results are based on novel embedding methods which improve on previous methods in another important aspect: the dimension. The dimension of an embedding is of very high importance in particular in applications and much effort has been invested in analyzing it. However, no previous result im
Embedding metrics into ultrametrics and graphs into spanning trees with constant average distortion, 2006. Arxiv
"... This paper addresses the basic question of how well can a tree approximate distances of a metric space or a graph. Given a graph, the problem of constructing a spanning tree in a graph which strongly preserves distances in the graph is a fundamental problem in network design. We present scaling dist ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
(Show Context)
This paper addresses the basic question of how well can a tree approximate distances of a metric space or a graph. Given a graph, the problem of constructing a spanning tree in a graph which strongly preserves distances in the graph is a fundamental problem in network design. We present scaling distortion embeddings where the distortion scales as a function of ǫ, with the guarantee that for each ǫ the distortion of a fraction 1−ǫ of all pairs is bounded accordingly. Such a bound implies, in particular, that the average distortion and ℓqdistortions are small. Specifically, our embeddings have constant average distortion and O ( √ log n) ℓ2distortion. This follows from the following results: we prove that any metric space embeds into an ultrametric with scaling distortion O ( √ 1/ǫ). For the graph setting we prove that any weighted graph contains a spanning tree with scaling distortion O ( √ 1/ǫ). These bounds are tight even for embedding in arbitrary trees. For probabilistic embedding into spanning trees we prove a scaling distortion of Õ(log 2 (1/ǫ)), which implies constant ℓqdistortion for every fixed q < ∞. 1
A uniform projection method for motif discovery in DNA sequences
 IEEE/ACM Transactions on Computational Biology and Bioinformatics
, 2004
"... Abstract—Buhler and Tompa [5] introduced the random projection algorithm for the motif discovery problem and demonstrated that this algorithm performs well on both simulated and biological samples. We describe a modification of the random projection algorithm, called the uniform projection algorithm ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Buhler and Tompa [5] introduced the random projection algorithm for the motif discovery problem and demonstrated that this algorithm performs well on both simulated and biological samples. We describe a modification of the random projection algorithm, called the uniform projection algorithm, which utilizes a different choice of projections. We replace the random selection of projections by a greedy heuristic that approximately equalizes the coverage of the projections. We show that this change in selection of projections leads to improved performance on motif discovery problems. Furthermore, the uniform projection algorithm is directly applicable to other problems where the random projection algorithm has been used, including comparison of protein sequence databases. Index Terms—Motif discovery, transcription factor binding sites, random projection, combinatorial designs, lowdiscrepancy sequences. 1
INVITED PAPER
"... Technology advances suggest that the data deluge, network bandwidth and computers performance will continue their exponential increase. Computers will exhibit 64128 cores in some 5 years. Consequences include a growing importance of data mining and data analysis capabilities that need to perform we ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Technology advances suggest that the data deluge, network bandwidth and computers performance will continue their exponential increase. Computers will exhibit 64128 cores in some 5 years. Consequences include a growing importance of data mining and data analysis capabilities that need to perform well on both parallel and distributed Grid systems. We discuss a class of such algorithms important in Chemoinformatics, bioinformatics and demographic studies. We present a unified formalism and initial performance results for clustering and dimension reduction algorithm using annealing to avoid local minima. This uses a runtime CCR/DSS that combine the features of both MPI, parallel threaded and service paradigms. 1.
PARALLEL CLUSTERING AND DIMENSIONAL SCALING ON MULTICORE SYSTEMS
"... Technology advances suggest that the data deluge, network bandwidth and computers performance will continue their exponential increase. Computers will exhibit 64128 cores in some 5 years. Consequences include a growing importance of data mining and data analysis capabilities that need to perform we ..."
Abstract
 Add to MetaCart
(Show Context)
Technology advances suggest that the data deluge, network bandwidth and computers performance will continue their exponential increase. Computers will exhibit 64128 cores in some 5 years. Consequences include a growing importance of data mining and data analysis capabilities that need to perform well on both parallel and distributed Grid systems. We discuss a class of such algorithms important in Chemoinformatics, bioinformatics and demographic studies. We present a unified formalism and initial performance results for clustering and dimension reduction algorithm using annealing to avoid local minima. This uses a runtime CCR/DSS that combine the features of both MPI, parallel threaded and service paradigms. 1
Istituto di Informatica e
"... Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multirepeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function ..."
Abstract
 Add to MetaCart
(Show Context)
Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multirepeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence. In this paper we present PTRStalker, a new algorithm for abinitio detection of very fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/SwissProt database we detect novel tandemly repeated structures not captured by other stateoftheart tools.
EMBEDDING METRICS INTO ULTRAMETRICS AND GRAPHS INTO SPANNING TREES WITH CONSTANT AVERAGE DISTORTION∗
"... Abstract. This paper addresses the basic question of how well a tree can approximate distances of a metric space or a graph. Given a graph, the problem of constructing a spanning tree in a graph which strongly preserves distances in the graph is a fundamental problem in network design. We present sc ..."
Abstract
 Add to MetaCart
Abstract. This paper addresses the basic question of how well a tree can approximate distances of a metric space or a graph. Given a graph, the problem of constructing a spanning tree in a graph which strongly preserves distances in the graph is a fundamental problem in network design. We present scaling distortion embeddings where the distortion scales as a function of , with the guarantee that for each simultaneously, the distortion of a fraction 1 − of all pairs is bounded accordingly. Quantitatively, we prove that any finite metric space embeds into an ultrametric with scaling distortion O( 1/). For the graph setting, we prove that any weighted graph contains a spanning tree with scaling distortion O( 1/). These bounds are tight even for embedding into arbitrary trees. These results imply that the average distortion of the embedding is constant and that the 2 distortion is O( logn). For probabilistic embedding into spanning trees we prove a scaling distortion of Õ(log2(1/)), which implies constant qdistortion for every fixed q < ∞.
A novel approach to embedding of metric spaces
, 2009
"... An embedding of one metric space (X, d) into another (Y, ρ) is an injective map f: X → Y. The central genre of problems in the area of metric embedding is finding such maps in which the distances between points do not change “too much”. Metric Embedding plays an important role in a vast range of app ..."
Abstract
 Add to MetaCart
An embedding of one metric space (X, d) into another (Y, ρ) is an injective map f: X → Y. The central genre of problems in the area of metric embedding is finding such maps in which the distances between points do not change “too much”. Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The mathematical theory of metric embedding is well studied in both pure and applied analysis and has more recently been a source of interest for computer scientists as well. Most of this work is focused on the development of biLipschitz mappings between metric spaces. In this work we present new concepts in metric embeddings as well as new embedding methods for metric spaces. We focus on finite metric spaces, however some of the concepts and methods may be applicable in other settings as well. One of the main cornerstones in finite metric embedding theory is a celebrated theorem of Bourgain which states that every finite metric space on n points embeds in Euclidean