Results 1 -
9 of
9
Clustering Gene Expression Patterns
, 1999
"... Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the ana ..."
Abstract
-
Cited by 275 (10 self)
- Add to MetaCart
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multi-condition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O(n 2 (log(n)) c ). We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its p...
Identifying Distinctive Subsequences in Multivariate Time Series by Clustering
- PROC. ACM SIGKDD
, 1999
"... Most time series comparison algorithms attempt to discover what the members of a set of time series have in common. We investigate a different problem, determining what distinguishes time series in that set from other time series obtained from the same source. In both cases the goal is to identif ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Most time series comparison algorithms attempt to discover what the members of a set of time series have in common. We investigate a different problem, determining what distinguishes time series in that set from other time series obtained from the same source. In both cases the goal is to identify shared patterns, though in the latter case those patterns must be distinctiveaswell. An efficient incremental algorithm for identifying distinctive subsequences in multivariate, real-valued time series is described and evaluated with data from two very different sources: the response of a set of bandpass filters to human speech and the sensors of a mobile robot.
Performance Criteria for Graph Clustering and Markov Cluster Experiments
- NATIONAL RESEARCH INSTITUTE FOR MATHEMATICS AND COMPUTER SCIENCE IN THE
, 2000
"... In [6] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in [8] establish an intrinsic relationship bet ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
In [6] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in [8] establish an intrinsic relationship between the corresponding algebraic process (MCL process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the flow characteristics of the MCL algorithm and the heuristic on which the...
Reinterpreting the Category Utility Function
, 2001
"... . The category utility function is a partition quality scoring function applied in some clustering programs of machine learning. We reinterpret this function in terms of the data variance explained by a clustering, or, equivalently, in terms of the square-error classical clustering criterion that ad ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
. The category utility function is a partition quality scoring function applied in some clustering programs of machine learning. We reinterpret this function in terms of the data variance explained by a clustering, or, equivalently, in terms of the square-error classical clustering criterion that administers the K-Means and Ward methods. This analysis suggests extensions of the scoring function to situations with differently standardized and mixed scale data. Keywords: Clustering, data standardization, contingency coefficient, correlation ratio, weighting features, mixed-scale data 2 BORIS MIRKIN 1.
Least-Squares Structuring, Clustering, and Data Processing Issues
"... Approximation structuring clustering is an extension of what is usually called "square-error-clustering" onto various cluster structures and data formats. It appears to be not only a mathematical device to support, specify and extend many clustering techniques, but also a framework for mathematical ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Approximation structuring clustering is an extension of what is usually called "square-error-clustering" onto various cluster structures and data formats. It appears to be not only a mathematical device to support, specify and extend many clustering techniques, but also a framework for mathematical analysis of interrelations among the techniques and their relations to other concepts and problems in data analysis, statistics, machine learning, data compression and decompression, and design and use of multiresolution hierarchies. Based on the results found, a number of methods for solving data processing problems are described.
Combinatoral Optimization in Clustering
"... Contents 1 Introduction 2 2 Types of Data 5 3 Cluster Structures 14 4 Clustering Criteria 15 5 Single Cluster Clustering 16 5.1 Clustering Approaches.......................... 16 5.1.1 De#nition-based Clusters .................... 16 5.1.2 Direct Algorithms ........................ 18 5.1.3 Optimal ..."
Abstract
- Add to MetaCart
Contents 1 Introduction 2 2 Types of Data 5 3 Cluster Structures 14 4 Clustering Criteria 15 5 Single Cluster Clustering 16 5.1 Clustering Approaches.......................... 16 5.1.1 De#nition-based Clusters .................... 16 5.1.2 Direct Algorithms ........................ 18 5.1.3 Optimal Clusters . ........................ 20 5.2 Single and Monotone Linkage Clusters ................. 21 5.2.1 MST and Single Linkage Clustering .............. 21 5.2.2 Monotone Linkage Clusters . . ................. 23 1 5.2.3 Modeling Skeletons in Digital Image Processing . . . . . . . . 25 5.2.4 Linkage-based Convex Criteria ................. 27 5.3 Moving Center and Approximation Clusters . . . . . ......... 29 5.3.1 Criteria for Moving Center Methods . . . . . ......... 29 5.3.2 Principal Cluster . . ....................... 29 5.3.3 Additive Cluster ......................... 32 5.3.4 Seriation with Returns . . . . . . ................ 34 6 Partitioning
Approximation Clustering: a Mine of Semidefinite Programming Problems
"... . Clustering is a discipline devoted to #nding homogeneous groups of data entities. In contrast to conventional clustering whichinvolves data processing in terms of either entities or variables, approximation clustering is aimed at processing of the data matrices as they are. Currently, approxima ..."
Abstract
- Add to MetaCart
. Clustering is a discipline devoted to #nding homogeneous groups of data entities. In contrast to conventional clustering whichinvolves data processing in terms of either entities or variables, approximation clustering is aimed at processing of the data matrices as they are. Currently, approximation clustering is a set of clustering models and methods based on approximate decomposition of the data table into scalar product matrices representing weighted subsets, partitions or hierarchies as the sought clustering structures. Some of the problems involved are of semide#nite programming, the others seem quite similar. 1 Introduction Clustering models may di#er depending on the nature of data. We distinguish here among three types of data: column-conditional, similarity and aggregable ones. The #rst two are those usually considered in clustering: a column-conditional data set is represented by an entity-to-variable matrix so that the entries within any column #variable# can be c...
Three Approaches to Aggregation of Interaction Tables
"... An interaction table is a summable square matrix emerging in analysis of inter-citation, international trade, brand-switching, mobility, or input-output industrial data. Three approaches to aggregation of interaction data are theoretically compared: (i) loglinear modeling, (ii) aggregation of Mar ..."
Abstract
- Add to MetaCart
An interaction table is a summable square matrix emerging in analysis of inter-citation, international trade, brand-switching, mobility, or input-output industrial data. Three approaches to aggregation of interaction data are theoretically compared: (i) loglinear modeling, (ii) aggregation of Markov chains, and (iii) principle of equivalence in the correspondence analysis. This way an empirical clustering algorithm, developed in the framework (iii), is justified and amended by substantively modeling the interaction processes.
RICE UNIVERSITY Architecture and Algorithms for Scalable Mobile
"... Supporting Quality of Service is an important objective for future mobile systems, and requires resource reservation and admission control to achieve. In this thesis, we introduce a scalable scheme to admission control termed Virtual Bottleneck Cell. Our approach is designed to scale to many users a ..."
Abstract
- Add to MetaCart
Supporting Quality of Service is an important objective for future mobile systems, and requires resource reservation and admission control to achieve. In this thesis, we introduce a scalable scheme to admission control termed Virtual Bottleneck Cell. Our approach is designed to scale to many users and hand-o s, while simultaneously controlling \hot spots". The key technique is to hierarchically control the virtual system, ensuring QoS objectives are satis ed without requiring accurate predictions of the users ' future locations. We develop a simple analytical model to study the system and illustrate several key components of the approach. We formulate the problem of how to group the cells to form the virtual system as an optimization problem and propose a heuristic adaptive clustering algorithm as its solution. Finally, we perform simulations in a two-dimensional network to compare the performance obtained with VBC and adaptive clustering with alternate schemes, including the optimal o ine

