Results 1 - 10
of
12
A spectral algorithm for seriation and the consecutive ones problem
- SIAM Journal on Computing
, 1998
"... Abstract. In applications ranging from DNA sequencing through archeological dating to sparse matrix reordering, a recurrent problem is the sequencing of elements in such a way that highly correlated pairs of elements are near each other. That is, given a correlation function f reflecting the desire ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Abstract. In applications ranging from DNA sequencing through archeological dating to sparse matrix reordering, a recurrent problem is the sequencing of elements in such a way that highly correlated pairs of elements are near each other. That is, given a correlation function f reflecting the desire for each pair of elements to be near each other, find all permutations π with the property that if π(i) < π(j) < π(k) then f(i, j) ≥ f(i, k) and f(j, k) ≥ f(i, k). This seriation problem is a generalization of the well-studied consecutive ones problem. We present a spectral algorithm for this problem that has a number of interesting features. Whereas most previous applications of spectral techniques provide only bounds or heuristics, our result is an algorithm that correctly solves a nontrivial combinatorial problem. In addition, spectral methods are being successfully applied as heuristics to a variety of sequencing problems, and our result helps explain and justify these applications.
The history of the cluster heat map
- The American Statistician
, 2009
"... The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (column ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The cluster heat map is an ingenious display that simultaneously reveals row and column hierarchical cluster structure in a data matrix. It consists of a rectangular tiling with each tile shaded on a color scale to represent the value of the corresponding element of the data matrix. The rows (columns) of the tiling are ordered such that similar rows (columns) are near each other. On the vertical and horizontal margins of the tiling there are hierarchical cluster trees. This cluster heat map is a synthesis of several different graphic displays developed by statisticians over more than a century. We locate the earliest sources of this display in late 19th century publications. And we trace a diverse 20th century statistical literature that provided a foundation for this most widely used of all bioinformatics displays. 1
Seriation in the presence of errors: NP-hardness of l∞-fitting Robinson structures to dissimilarity matrices
"... In this paper, we establish that the following fitting problem is NP-hard: given a finite set X and a dissimilarity measure d on X (d is a symmetric function d from X 2 to the nonnegative real numbers and vanishing on the diagonal), we wish to find a Robinsonian dissimilarity dR on X minimizing the ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we establish that the following fitting problem is NP-hard: given a finite set X and a dissimilarity measure d on X (d is a symmetric function d from X 2 to the nonnegative real numbers and vanishing on the diagonal), we wish to find a Robinsonian dissimilarity dR on X minimizing the l∞-error ||d − dR|| ∞ = maxx,y∈X{|d(x, y) − dR(x, y)|} between d and dR. Recall that a dissimilarity dR on X is called monotone (or Robinsonian) if there exists a total order ≺ on X such that x ≺ z ≺ y implies that d(x, y) ≥ max{d(x, z), d(z, y)}. The Robinsonian dissimilarities appear in seriation and clustering problems, in sparse matrix ordering and DNA sequencing.
Minimum Contradiction Matrices in Whole Genome
"... Abstract: Minimum contradiction matrices are a useful complement to distance-based phylogenies. A minimum contradiction n matrix represents phylogenetic information under the form of an ordered distance matrix Y. A matrix element corresponds i, j to the distance from a reference vertex n to the path ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract: Minimum contradiction matrices are a useful complement to distance-based phylogenies. A minimum contradiction n matrix represents phylogenetic information under the form of an ordered distance matrix Y. A matrix element corresponds i, j to the distance from a reference vertex n to the path (i, j). For an X-tree or a split network, the minimum contradiction matrix n n n n is a Robinson matrix. It therefore fulfills all the inequalities defining perfect order: Y � Y i, j i, k, Y � Y k, j ki,, i � j � k � n. In real phylogenetic data, some taxa may contradict the inequalities for perfect order. Contradictions to perfect order correspond to deviations from a tree or from a split network topology. Efficient algorithms that search for the best order are presented and tested on whole genome phylogenies with 184 taxa including many Bacteria, Archaea and Eukaryota. After optimization, taxa are classified in their correct domain and phyla. Several significant deviations from perfect order correspond to well-documented evolutionary events.
Seriation in the Presence of Errors: A Factor 16 Approximation Algorithm for l∞-Fitting Robinson Structures to Distances
- ALGORITHMICA
, 2007
"... The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as ..."
Abstract
- Add to MetaCart
The classical seriation problem consists in finding a permutation of the rows and the columns of the distance (or, more generally, dissimilarity) matrix d on a finite set X so that small values should be concentrated around the main diagonal as close as possible, whereas large values should fall as far from it as possible. This goal is best achieved by considering the Robinson property: a distance dR on X is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. If the distance d fails to satisfy the Robinson property, then we are lead to the problem of finding a reordering of d which is as close as possible to a Robinsonian distance. In this paper, we present a factor 16 approximation algorithm for the following NP-hard fitting problem: given a finite set X and a dissimilarity d on X, wewish to find a Robinsonian dissimilarity dR on X minimizing the lâ-error âd â dRâ â = maxx,yâX{|d(x,y) â dR(x, y)|} between d and dR.
www.stacs-conf.org AN APPROXIMATION ALGORITHM FOR l∞-FITTING ROBINSON STRUCTURES TO DISTANCES
"... Abstract. In this paper, we present a factor 16 approximation algorithm for the following NP-hard distance fitting problem: given a finite set X and a distance d on X, find a Robinsonian distance dR on X minimizing the l∞-error ||d −dR|| ∞ = maxx,y∈X{|d(x, y) − dR(x,y)|}. A distance dR on a finite ..."
Abstract
- Add to MetaCart
Abstract. In this paper, we present a factor 16 approximation algorithm for the following NP-hard distance fitting problem: given a finite set X and a distance d on X, find a Robinsonian distance dR on X minimizing the l∞-error ||d −dR|| ∞ = maxx,y∈X{|d(x, y) − dR(x,y)|}. A distance dR on a finite set X is Robinsonian if its matrix can be symmetrically permuted so that its elements do not decrease when moving away from the main diagonal along any row or column. Robinsonian distances generalize ultrametrics, line distances and occur in the seriation problems and in classification. 1.
Variants of the Consecutive-Ones Property Motivated by the Reconstruction of Ancestral Species
, 2012
"... The polynomial-time decidable Consecutive-Ones Property (C1P) of binary matrices, formally introduced in 1965 by Fulkerson and Gross [52], has since found applications in many areas. In this thesis, we propose and study several variants of this property that are motivated by the reconstruction of an ..."
Abstract
- Add to MetaCart
The polynomial-time decidable Consecutive-Ones Property (C1P) of binary matrices, formally introduced in 1965 by Fulkerson and Gross [52], has since found applications in many areas. In this thesis, we propose and study several variants of this property that are motivated by the reconstruction of ancestral species. We first propose the Gapped C1P, or the (k,δ)-Consecutive-Ones Property ((k,δ)-C1P): a binary matrix M has the (k,δ)-C1P for integers k and δ if the columns of M can be permuted such that each row contains at most k blocks of1’s and no two neighboring blocks of 1’s are separated by a gap of more than δ 0’s. The C1P is equivalent to the (1,0)-C1P. We show that for every bounded and unbounded k ≥ 2,δ ≥ 1,(k,δ)̸=(2,1), deciding the(k,δ)-C1P is NP-complete [55]. We also provide an algorithm for a relevant case of the (2,1)-C1P. We then study the(k,δ)-C1P with a bound d on the maximum number of1’s in any row (the maximum degree) of M. We show that the(d,k,δ)-Consecutive-Ones Property ((d,k,δ)-C1P) is polynomial-time decidable when all three parameters are fixed constants. Since fixing d also fixes k(k≤d), the only case left to consider
Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering
, 2009
"... For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between ..."
Abstract
- Add to MetaCart
For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been well-known for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows for judging cluster quality but also makes mis-specification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples.
A CONTINUOUS RATING METHOD FOR PREFERENTIAL VOTING
, 2008
"... A method is given for quantitatively rating the social acceptance of different options which are the matter of a preferential vote. The proposed method is proved to satisfy certain desirable conditions, among which there is a majority principle, a property of clone consistency, and the continuity of ..."
Abstract
- Add to MetaCart
A method is given for quantitatively rating the social acceptance of different options which are the matter of a preferential vote. The proposed method is proved to satisfy certain desirable conditions, among which there is a majority principle, a property of clone consistency, and the continuity of the rates with respect to the data. One can view this method as a quantitative complement for a qualitative method introduced in 1997 by Markus Schulze. It is also related to certain methods of one-dimensional scaling or cluster analysis.

