Results 1 -
9 of
9
A Bi-clustering Framework for Categorical Data
- In Proc. 9th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD’05
, 2005
"... Abstract. Bi-clustering is a promising conceptual clustering approach. Within categorical data, it provides a collection of (possibly overlapping) bi-clusters, i.e., linked clusters for both objects and attribute-value pairs. We propose a generic framework for bi-clustering which enables to compute ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract. Bi-clustering is a promising conceptual clustering approach. Within categorical data, it provides a collection of (possibly overlapping) bi-clusters, i.e., linked clusters for both objects and attribute-value pairs. We propose a generic framework for bi-clustering which enables to compute a bi-partition from collections of local patterns which capture locally strong associations between objects and properties. To validate this framework, we have studied in details the instance CDK-Means. It is a K-Means-like clustering on collections of formal concepts, i.e., connected closed sets on both dimensions. It enables to build bi-partitions with a user control on overlapping between bi-clusters. We provide an experimental validation on many benchmark datasets and discuss the interestingness of the computed bi-partitions. 1
J.F.: Towards constrained co-clustering in ordered 0/1 data sets
- In: Proceedings of International Symposium on Methodologies for Intelligent Systems (LNAI
, 2006
"... Abstract. Within 0/1 data, co-clustering provides a collection of biclusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use user-defined constraints to capture subjective interestingness aspects and thus t ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. Within 0/1 data, co-clustering provides a collection of biclusters, i.e., linked clusters for both objects and Boolean properties. Beside the classical need for grouping quality optimization, one can also use user-defined constraints to capture subjective interestingness aspects and thus to improve bi-cluster relevancy. We consider the case of 0/1 data where at least one dimension is ordered, e.g., objects denotes time points, and we introduce co-clustering constrained by interval constraints. Exploiting such constraints during the intrinsically heuristic clustering process is challenging. We propose one major step in this direction where bi-clusters are computed from collections of local patterns. We provide an experimental validation on two temporal gene expression data sets. 1
Towards fault-tolerant formal concept analysis
- In AI*IA’05, volume 3673 of LNAI
, 2005
"... Abstract. GivenBooleandatasetswhichrecordpropertiesofobjects, Formal Concept Analysis is a well-known approach for knowledge discovery. Recent application domains, e.g., for very large data sets, have motivated new algorithms which can perform constraint-based mining of formal concepts (i.e., closed ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. GivenBooleandatasetswhichrecordpropertiesofobjects, Formal Concept Analysis is a well-known approach for knowledge discovery. Recent application domains, e.g., for very large data sets, have motivated new algorithms which can perform constraint-based mining of formal concepts (i.e., closed sets on both dimensions which are associated by the Galois connection and satisfy some user-defined constraints). In this paper, we consider a major limit of these approaches when considering noisy data sets. This is indeed the case of Boolean gene expression data analysis where objects denote biological experiments and attributes denote gene expression properties. In this type of intrinsically noisy data, the Galois association is so strong that the number of extracted formal concepts explodes. We formalize the computation of the so-called δ-bisets as an alternative for capturing strong associations between sets of objects and sets of properties. Based on a previous work on approximate condensed representations of frequent sets by means of δ-free itemsets, we get an efficient technique which can be applied on large data sets. An experimental validation on both synthetic and real data is given. It confirms the added-value of our approach w.r.t. formal concept discovery, i.e., the extraction of smaller collections of relevant associations. 1
From local pattern mining to relevant bi-cluster characterization
- In Proceedings IDA’05, volume 3646 of LNCS
, 2005
"... Abstract. Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi-partition is available. We introduce a generic cluster characterization technique which is based on collections of bi-sets (i.e., sets of objects associated to sets of properties) which satisfy some userdefined constraints, and a measure of the accuracy of a given bi-set as a bi-cluster characterization pattern. The method is illustrated on both formal concepts (i.e., “maximal rectangles of true values”) and the new type of δ-bi-sets (i.e., “rectangles of true values with a bounded number of exceptions per column”). The added-value is illustrated on benchmark data and two real data sets which are intrinsically noisy: a medical data about meningitis and Plasmodium falciparum gene expression data. 1
Characterization of Unsupervised Clusters With the Simplest Association Rules: Application for Child's Meningitis
"... We combine different recent data mining techniques to improve the symbolic description of unsupervised clusters. First, we use a clustering method that computes bi-partitions (a partition of examples and a related partition of attribute-value pairs). Then, we use an efficient association rule mining ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We combine different recent data mining techniques to improve the symbolic description of unsupervised clusters. First, we use a clustering method that computes bi-partitions (a partition of examples and a related partition of attribute-value pairs). Then, we use an efficient association rule mining technique to describe the membership of examples within each cluster. We propose a technique for removing rules that are not relevant enough for the cluster characterization. An experimental validation on a real world medical data set is provided.
Constrained Co-clustering of Gene Expression Data
, 2008
"... In many applications, the expert interpretation of coclustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can sup ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In many applications, the expert interpretation of coclustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional case (e.g., using the so-called must-link and cannot-link constraints). Here, we ... Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We propose an iterative coclustering algorithm which exploits user-defined constraints while minimizing the sum-squared residues, i.e., an objective function introduced for gene expression data clustering by Cho et al. (2004). We illustrate the added value of our approach in two applications on gene expression data.
Characterization of Unsupervised Clusters
"... We combine different recent data mining techniques to improve the symbolic description of unsupervised clusters. First, we use a clustering method that computes bi-partitions (a partition of examples and a related partition of attribute-value pairs). Then, we use an efficient association rule mining ..."
Abstract
- Add to MetaCart
We combine different recent data mining techniques to improve the symbolic description of unsupervised clusters. First, we use a clustering method that computes bi-partitions (a partition of examples and a related partition of attribute-value pairs). Then, we use an efficient association rule mining technique to describe the membership of examples within each cluster. We propose a technique for removing rules that are not relevant enough for the cluster characterization. An experimental validation on a real world medical data set is provided.
Chapter 1 Constraint-driven co-clustering of 0/1 data
"... Abstract We investigate a co-clustering framework (i.e., a method that provides a partition of objects and a linked partition of features) for binary data sets. So far, constrained co-clustering has been seldomly explored. First, we consider straightforward extensions of the classical instance level ..."
Abstract
- Add to MetaCart
Abstract We investigate a co-clustering framework (i.e., a method that provides a partition of objects and a linked partition of features) for binary data sets. So far, constrained co-clustering has been seldomly explored. First, we consider straightforward extensions of the classical instance level constraints (Must-link, Cannot-link) to express relationships on both objects and features. Furthermore, we study constraints that exploit sequential orders on objects and/or features. The idea is that we can specify whether the extracted co-clusters should involve or not contiguous elements (Interval and non-Interval constraints). Instead of designing constraint processing integration within a co-clustering scheme, we propose a Local-to-Global (L2G) framework. It consists in postprocessing a collection of (constrained) local patterns that have been computed beforehand (e.g., closed feature sets and their supporting sets of objects) to build a global pattern like a co-clustering. Roughly speaking, the algorithmic scheme is a K-Means-like approach that groups the local patterns. We show that it is possible to push local counterparts of the global constraints on the co-clusters during the local pattern mining phase itself. A large part of the chapter is dedicated to experiments that demonstrate the added-value of our approach. Considering both synthetic data and real gene expression data sets, we discuss the use of constraints to get not only more stable but also more relevant co-clusters.
Co-clustering Numerical Data under User-defined Constraints
, 2009
"... Abstract: In the generic setting of objects × attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and thes ..."
Abstract
- Add to MetaCart
Abstract: In the generic setting of objects × attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional clustering case (e.g. using the must-link and cannot-link constraints on one of the two dimensions). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e. both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We describe an iterative co-clustering algorithm which exploits user-defined constraints while minimizing a given objective function. Thanks to a generic setting, we emphasize that different objective functions can be used. The added value of our approach is demonstrated on both synthetic and real data. Among others, several experiments illustrate the practical impact of this original co-clustering setting in the context of gene expression data analysis, and in an original application to a protein motif discovery problem. © 2009 Wiley

