#### DMCA

## Duplessis: Mining gene expression data with pattern structures in formal concept analysis (2011)

Venue: | Information Sciences |

Citations: | 22 - 10 self |

### Citations

1218 |
Formal concept analysis: mathematical foundations
- Ganter, Wille
- 1999
(Show Context)
Citation Context ... to be more computationally efficient and to provide more readable results. Experiments with real-world gene expression data are discussed and give a practical basis for the comparison and evaluation of the methods. 2011 Published by Elsevier Inc.1. Introduction Numerous classification problems can be formalized by means of formal contexts. A context materializes a set of individuals (called objects), a set of properties (called attributes), and a binary relation usually represented by a binary table relating objects to attributes, where (g,m) = (a cross) if the object g has the property m [1,9]. Considering a pair of mappings between sets of all subsets of objects and attributes, called Galois connection, it is possible to derive for each object g the set of all attributes that apply to g. Similarly, it is possible to derive for each attribute m the set of all objects to which m applies. As a consequence, one may classify within formal concepts a set of objects sharing the same maximal set of attributes, and vice versa. Concepts are ordered within a lattice structure called concept lattice within the formal concept analysis (FCA) framework [9]. This mathematical structure supports p... |

598 | Biclustering of expression data
- Cheng, Church
- 2000
(Show Context)
Citation Context ...(b1, b2)] = [a1, b1] ⇔ a1 ≤ a2 and b1 ≥ b2 ⇔ [a1, b1] ⊇ [a2, b2]. The definition of ⊓ implies that smaller intervals subsume larger intervals that contain them. For example, with D = {[4, 4], [5, 5], =-=[6, 6]-=-, [4, 5], [5, 6], [4, 6]}, the meet-semi-lattice (D,⊓) is given in Figure 2. The interval labeling a node is the meet of all intervals labeling its ascending nodes, e.g. [4, 5] = [4, 4] ⊓ [5, 5], and ... |

481 | Biclustering algorithms for biological data analysis: A survey
- Madeira, Oliveira
- 2004
(Show Context)
Citation Context ...d genes interact together within the same biological process [34]. GED analysis is an important task and an active area of research involving mainly data-mining methods: clustering [14], biclustering =-=[23, 29]-=-. FCA-based methods have been recently designed and applied in this domain [4, 15, 26]. For analysing GEDs by means of FCA, one needs to build a formal context from a GED, attribute values have to be ... |

398 |
Introduction to Formal Concept Analysis
- Wille
- 1997
(Show Context)
Citation Context ...he highest concept w.r.t. ≤, has an extent containing all objects, and an intent composed of the largest intervals subsumed by all respective intervals of the data. In the example, Top = (G, 〈[4, 6], =-=[7, 9]-=-, [4, 8]〉). However, the main goal of GED analysis is extracting homogeneous groups of genes, i.e. groups of genes having similar expression values. Therefore, descriptions of homogeneous groups shoul... |

151 | A systematic comparison and evaluation of biclustering methods for gene expression data.
- Prelic, Bleuer, et al.
- 2006
(Show Context)
Citation Context ...d genes interact together within the same biological process [34]. GED analysis is an important task and an active area of research involving mainly data-mining methods: clustering [14], biclustering =-=[23, 29]-=-. FCA-based methods have been recently designed and applied in this domain [4, 15, 26]. For analysing GEDs by means of FCA, one needs to build a formal context from a GED, attribute values have to be ... |

149 | Cluster analysis for gene expression data: A survey,”
- Jiang, Tang, et al.
- 2004
(Show Context)
Citation Context ...ed that co-expressed genes interact together within the same biological process [34]. GED analysis is an important task and an active area of research involving mainly data-mining methods: clustering =-=[14]-=-, biclustering [23, 29]. FCA-based methods have been recently designed and applied in this domain [4, 15, 26]. For analysing GEDs by means of FCA, one needs to build a formal context from a GED, attri... |

133 | Comparing performance of algorithms for generating concept lattices.
- Kuznetsov, Obiedkov
- 2002
(Show Context)
Citation Context ...mework [9]. This mathematical structure supports potential knowledge discovery in databases that benefits of an important set of techniques for building, visualizing and interpreting concept lattices =-=[9, 20]-=-. Concept lattices are represented by diagrams giving nice visualization of classes of objects of a domain. At the same time, the edges of these diagrams give essential knowledge about objects, by giv... |

125 |
Fuzzy Relational Systems: Foundations and Principles,
- Belohlávek
- 2002
(Show Context)
Citation Context ...y concepts with similar objects, w.r.t. a distance on their values [16]. Furthermore, one can also be interested to use pattern structures in fuzzy settings, although FCA has already been extended in =-=[2, 3]-=- where an object is associated with an attribute with a truth degree. Finally, and most importantly, the use of interval pattern structures should be of great interest for mining association rules in ... |

101 |
B.: Ordre et classification, Algèbre et Combinatoire
- Barbut, Monjardet
- 1970
(Show Context)
Citation Context ...cts), a set of properties (called attributes), and a binary relation usually represented by a binary table relating objects to attributes, where (g,m) = × (a cross) if the object g has the property m =-=[1, 9]-=-. Considering a pair of mappings between sets of all subsets of objects and attributes, called Galois connection, it is possible to derive for each object g the set of all attributes that apply to g. ... |

91 | Extracting conserved gene expression motifs from gene expression data
- Murali, Kasif
(Show Context)
Citation Context ...-like”, then interesting patterns may be missed [4]. This is one of the reasons why only a few biclustering algorithms allow overlapping of biclusters [23]. Our approach is close to that described in =-=[27]-=- where descriptions of gene clusters are sought as interval vectors of gene expression values. The authors of [27], however, do not use the semi-lattice on intervals for systematically generating all ... |

85 | Efficient algorithms for mining closed itemsets and their lattice structure,”
- Zaki, Hsiao
- 2005
(Show Context)
Citation Context ...ally scaled contexts (Section 3). We have implemented the Norris, NextClosure, and CloseByOne algorithms, for both processing formal contexts and pattern structures. We have added the Charm algorithm =-=[12]-=- that extracts closed itemsets, i.e. concept intents in a formal context. FCA algorithms have been implemented in original versions as described in [20]. These algorithms are run within the Coron Syst... |

66 | Pattern Structures and Their Projections,
- Ganter, Kuznetsov
- 2001
(Show Context)
Citation Context ... literature [9], however, they do not always suggest the most efficient implementation right away, and there are situations where one would choose original data representation rather than scaled data =-=[8]-=-. Although scaling allows one to apply FCA tools, it may dramatically increase the complexity of computation and representation, and make worse the visualization of results. Instead of scaling, one ma... |

47 |
The genome of Laccaria bicolor provides insights into mycorrhizal symbiosis.
- Martin, Aerts, et al.
- 2008
(Show Context)
Citation Context ...terms of processing time. 5.1. A real-world GED Biologists at the UMR IAM (INRA) study interactions between fungi and trees. They published the complete genome sequence of the fungus Laccaria bicolor =-=[24]-=-. This fungus lives in symbiosis with many trees of boreal and temperate forests. The fungus forms a mixed organ on tree roots and is able to exchange nutrients with its host in a specific symbiotic s... |

47 |
Applications of DNAmicroarrays in biology,”
- Stoughton
- 2005
(Show Context)
Citation Context ...or cancerous tissues, etc.). Genes with similar expression profiles are said to be co-expressed. It is now widely accepted that co-expressed genes interact together within the same biological process =-=[34]-=-. GED analysis is an important task and an active area of research involving mainly data-mining methods: clustering [14], biclustering [23, 29]. FCA-based methods have been recently designed and appli... |

40 |
On stability of a formal concept,
- Kuznetsov
- 2007
(Show Context)
Citation Context ...on generates too many patterns: some patterns and their sub-patterns w.r.t. ⊑ may describe almost the same set of genes, i.e. a few genes differs in their extents. Concept stability was introduced in =-=[19]-=- for measuring this phenomena. In this paper, we solved the problem of un-interesting patterns thanks to a monotone constraint. Recently, we embedded tolerance relations in pattern structures to produ... |

32 |
Why can concept lattices support knowledge discovery in databases?
- Wille
- 2002
(Show Context)
Citation Context ...ion rules between attributes describing the objects [21]. FCA can also be used for a number of purposes among which knowledge formalization and acquisition, ontology design, and information retrieval =-=[36, 38]-=-. In real-world applications, e.g. in biology or chemistry, one rarely obtains binary data directly, complex and heterogeneous data involving numbers, graphs, intervals, etc., are more typical. To app... |

30 |
Learning of Simple Conceptual Graphs from Positive and Negative Examples.
- Kuznetsov
- 1999
(Show Context)
Citation Context ....e. complex object descriptions, defining so-called similarity operators which induce a semi-lattice on data descriptions. Several attempts were made for defining such semi-lattices on sets of graphs =-=[8, 17, 18, 22]-=- and logical formulas [5, 7] (see also [10, 37] for FCA extensions). Indeed, if one is able to order object descriptions in complex data, e.g. with graph morphism when objects are 2 described by label... |

29 | A logical generalization of formal concept analysis, in:
- Ferré, Ridoux
- 2000
(Show Context)
Citation Context ...he highest concept w.r.t. ≤, has an extent containing all objects, and an intent composed of the largest intervals subsumed by all respective intervals of the data. In the example, Top = (G, 〈[4, 6], =-=[7, 9]-=-, [4, 8]〉). However, the main goal of GED analysis is extracting homogeneous groups of genes, i.e. groups of genes having similar expression values. Therefore, descriptions of homogeneous groups shoul... |

29 |
Formal concept analysis for knowledge discovery and data mining: The new challenges.
- Valtchev, Missaoui, et al.
- 2004
(Show Context)
Citation Context ...ion rules between attributes describing the objects [21]. FCA can also be used for a number of purposes among which knowledge formalization and acquisition, ontology design, and information retrieval =-=[36, 38]-=-. In real-world applications, e.g. in biology or chemistry, one rarely obtains binary data directly, complex and heterogeneous data involving numbers, graphs, intervals, etc., are more typical. To app... |

19 | Assessment of discretization techniques for relevant pattern discovery from gene expression data, in:
- Pensa, Leschi, et al.
- 2004
(Show Context)
Citation Context ...e and interpretation of the resulting concept lattice. Usually to apply FCA in GED analysis, an l-cut scaling is operated by using a single threshold l on expression values determined for each object =-=[26, 28, 29]-=-. Expression values greater than this threshold are said to be overexpressed and encoded by 1, otherwise by 0. Then formal concepts represent sets of genes simultaneously over-expressed. In [15], a ge... |

19 |
Symbolic Data Mining Methods with the Coron Platform,
- Szathmary
- 2006
(Show Context)
Citation Context ...at extracts closed itemsets, i.e. concept intents in a formal context. FCA algorithms have been implemented in original versions as described in [20]. These algorithms are run within the Coron System =-=[35]-=-.4 All implementations are in Java: sets of objects and binary attributes are described with the BitSet class and interval descriptions with standard double arrays. The experiments were carried out on... |

16 |
A: Using formal concept analysis for the extraction of groups of co-expressed genes
- Kaytoue-Uberall, Duplessis, et al.
(Show Context)
Citation Context ...an important task and an active area of research involving mainly data-mining methods: clustering [14], biclustering [23, 29]. FCA-based methods have been recently designed and applied in this domain =-=[4, 15, 26]-=-. For analysing GEDs by means of FCA, one needs to build a formal context from a GED, attribute values have to be discretized and intervals of entry values have to be considered as binary attributes, ... |

16 |
A fast algorithm for computing all intersections of objects in a finite semilattice. Automatic Documentation
- Kuznetsov
- 1993
(Show Context)
Citation Context ...easily determine if a concept extent was already generated. Finally, recursiveness of the algorithm induces a tree structure on the set of all concepts. More details on this algorithm can be found in =-=[20, 33]-=-. To adapt this algorithm for pattern structures, one has to replace each call to a (.)′ operator by a call to the corresponding (.) operator. Then, computing A for a set A ⊆ G is realized by taking... |

13 |
Transcript patterns associated with ectomycorrhiza development in Eucalyptus Globulus and Pisolithus Microcarpus,
- Duplessis, Courty, et al.
- 2005
(Show Context)
Citation Context ...that these processes are essential not only to the fruit-body development but also to general cellular processes as previously described in expression studies of the tree-fungus symbiosis development =-=[31]-=-. 19 5.5. Performance and efficiency study Here we compare time performance of three algorithms for mining pattern structures of interval vectors (Section 4) and equivalent interordinally scaled conte... |

12 |
Clustering formal concepts to discover biologically relevant knowledge from gene expression data,
- Blachon, Pensa, et al.
- 2007
(Show Context)
Citation Context ...an important task and an active area of research involving mainly data-mining methods: clustering [14], biclustering [23, 29]. FCA-based methods have been recently designed and applied in this domain =-=[4, 15, 26]-=-. For analysing GEDs by means of FCA, one needs to build a formal context from a GED, attribute values have to be discretized and intervals of entry values have to be considered as binary attributes, ... |

12 | Isolation and characterization of differentially expressed genes in the mycelium and fruit body of Tuber Borchii,
- Lacourt, Duplessis, et al.
- 2002
(Show Context)
Citation Context ...expression analyses of the fruit-body development conducted in the ectomycorrhizal fungus Tuber borchii also reported the strong induction of several genes involved in carbon and nitrogen metabolisms =-=[13]-=- as well as in lipid metabolism [32]. The present results are consistent with these observations and supports an important mobilization of nutrient sources from the 18 mycelium to the fruit-body. It s... |

11 | Efficient mining of association rules based on formal concept analysis, in:
- Lakhal, Stumme
- 2005
(Show Context)
Citation Context ...nd vice versa. Concepts are ordered within a lattice structure called concept lattice within the formal concept analysis (FCA) framework [9]. This mathematical structure supports potential knowledge discovery in databases that benefits of an important set of techniques for building, visualizing and interpreting concept lattices [9,20]. Concept lattices are represented by diagrams giving nice visualization of classes of objects of a domain. At the same time, the edges of these diagrams give essential knowledge about objects, by giving association rules between attributes describing the objects [21]. FCA can also be used for a number of purposes among which knowledge formalization and acquisition, ontology design, and information retrieval [36,38]. In real-world applications, e.g. in biology or chemistry, one rarely obtains binary data directly, complex and heterogeneous data involving numbers, graphs, intervals, etc., are more typical. To apply FCA-based methods to such data, the latter have toy Elsevier Inc. ue), skuznetsov@hse.ru (S.O. Kuznetsov), amedeo.napoli@loria.fr (A. Napoli), duplessi@nancy.inra.fr 1990 M. Kaytoue et al. / Information Sciences 181 (2011) 1989–2001be binarized, ... |

9 |
Generalized formal concept analysis, in:
- Chaudron, Maille
- 2000
(Show Context)
Citation Context ...A′′ \ A and g′ ⊇ A′. This means that for all i mi ≤ δi(g) ≤ mi. Therefore, g ∈ A and A 6= A, a contradiction. The proof 2→ 1 is similar. Consider an example of pattern concept: ({g1, g2, g5}, 〈=-=[5, 6]-=-, [7, 8], [4, 6]〉), the equivalent concept of the interordinally scaled context is ({g1, g2, g5}, {s1 ≤ 6, s1 ≥ 4, s1 ≥ 5, s2 ≥ 7, s2 ≤ 8, s2 ≤ 9, s3 ≤ 6, s3 ≤ 8, s3 ≥ 4}). Pattern intents are concise... |

8 | Interpreting microarray expression data using text annotating the genes,
- Molla, Andreae, et al.
- 2002
(Show Context)
Citation Context ...ons (D,⊓) may be viewed as an attribute hierarchy, where domain knowledge may be encoded, e.g. in some dimensions of a pattern vector. Domain knowledge can be given by text annotations on genes, e.g. =-=[25]-=-, for which a similarity operation ⊓ can be defined. Considering the similarity operation ⊓ as interval convexification generates too many patterns: some patterns and their sub-patterns w.r.t. ⊑ may d... |

7 |
Extending conceptualisation modes for generalised Formal Concept Analysis.
- Valverde-Albacete, Pelaez-Moreno
- 2011
(Show Context)
Citation Context ...milarity operators which induce a semi-lattice on data descriptions. Several attempts were made for defining such semi-lattices on sets of graphs [8, 17, 18, 22] and logical formulas [5, 7] (see also =-=[10, 37]-=- for FCA extensions). Indeed, if one is able to order object descriptions in complex data, e.g. with graph morphism when objects are 2 described by labelled graphs, one may attempt to directly build a... |

7 |
Evolving clusters in geneexpression data,
- Hruschka, Campello, et al.
- 2006
(Show Context)
Citation Context ... be divided into three categories: clustering, biclustering, and FCA-based methods. Units extracted by these methods, i.e. clusters, biclusters, and formal concepts, characterize biological processes. These methods are unsupervised or supervised by domain knowledge. We restrict our discussion to unsupervised methods, as there is generally a few knowledge units available when dealing with GEDs of species whose genome was very recently sequenced (this is the case of the plant species Laccaria bicolor considered in Section 5). We also do not discuss evolutionary computation of clusters, see e.g. [11]. Clustering methods group genes into clusters w.r.t. a global similarity, e.g. based on Euclidean distance, of their expression profiles. Here, ‘‘global” means that the similarity is computed for whole numerical vectors representing gene expression profiles. Then, clustering may fail to detect biological processes activated in some situations only [23]. To overcome this limitation, biclustering algorithms have been suggested [6,23]. Biclusters in a GED are defined as groups of genes having similar expression values in a same group of situations, but not necessarily all. However, it is known t... |

7 |
Embedding tolerance relations in Formal Concept Analysis for classifying numerical data, in:
- Kaytoue, Assaghir, et al.
- 2010
(Show Context)
Citation Context ...g. [25], for which a similarity operation u can be defined. Considering the similarity operation u as interval convexification generates too many patterns: some patterns and their sub-patterns w.r.t. vmay describe almost the same set of genes, i.e. a few genes differs in their extents. Concept stability was introduced in [19] for measuring this phenomena. In this paper, we solved the problem of un-interesting patterns thanks to a monotone constraint. Recently, we embedded tolerance relations in pattern structures to produce only concepts with similar objects, w.r.t. a distance on their values [16]. Furthermore, one can also be interested to use pattern structures in fuzzy settings, although FCA has already been extended in [2,3] where an object is associated with an attribute with a truth degree. Finally, and most importantly, the use of interval pattern structures should be of great interest for mining association rules in numerical data. Acknowledgements The second author was supported by the project of the Russian Foundation for Basic Research, Grant No. 08-07-92497- NTsNIL_a. This work was partially funded by the Contrat de Plan Etat – Région Lorraine: Modélisation, Information et ... |

5 |
JSM-Method as a Machine Learning Method. Itogi Nauki i Tekhniki, ser
- Kuznetsov
- 1991
(Show Context)
Citation Context ....e. complex object descriptions, defining so-called similarity operators which induce a semi-lattice on data descriptions. Several attempts were made for defining such semi-lattices on sets of graphs =-=[8, 17, 18, 22]-=- and logical formulas [5, 7] (see also [10, 37] for FCA extensions). Indeed, if one is able to order object descriptions in complex data, e.g. with graph morphism when objects are 2 described by label... |

5 | Transcript profiling reveals novel marker genes involved in fruiting body formation in Tuber Borchii,
- Gabella, Abba, et al.
- 2005
(Show Context)
Citation Context ...Values are taken before the logarithmic transformation. 1998 M. Kaytoue et al. / Information Sciences 181 (2011) 1989–2001essential for the fungus and reflect that the fruit-body is a highly active tissue. The fruit-body is a specific fungal organ that differentiate in order to produce spores and that further ensure spore dispersal in nature [30]. Previous gene expression analyses of the fruit-body development conducted in the ectomycorrhizal fungus Tuber borchii also reported the strong induction of several genes involved in carbon and nitrogen metabolisms [13] as well as in lipid metabolism [32]. The present results are consistent with these observations and supports an important mobilization of nutrient sources from the mycelium to the fruit-body. It seems obvious that the primary metabolism requires to be adapted to use these sources in order to properly build spores and provide spore-forming cells with nutrients [30]. The pattern on Fig. 4 (right) also contains seven genes, of which only three possess a putative biological function. Interestingly, one of these genes encodes one pseudouridylate synthase, an enzyme involved in nucleotide metabolism that might also be involved in rem... |

4 |
Embedding Tolerance Relations
- Kaytoue, Kuznetsov, et al.
- 2010
(Show Context)
Citation Context ...-interesting patterns thanks to a monotone constraint. Recently, we embedded tolerance relations in pattern structures to produce only concepts with similar objects, w.r.t. a distance on their values =-=[16]-=-. Furthermore, one can also be interested to use pattern structures in fuzzy settings, although FCA has already been extended in [2, 3] where an object is associated with an attribute with a truth deg... |

4 |
Formal concept analysis for the identification of combinatorial biomarkers in breast cancer, in:
- Motameny, Versmold, et al.
- 2008
(Show Context)
Citation Context ...an important task and an active area of research involving mainly data-mining methods: clustering [14], biclustering [23, 29]. FCA-based methods have been recently designed and applied in this domain =-=[4, 15, 26]-=-. For analysing GEDs by means of FCA, one needs to build a formal context from a GED, attribute values have to be discretized and intervals of entry values have to be considered as binary attributes, ... |

2 |
Evaluation of IPAQ questionnaires supported by formal concept analysis,
- Belohlavek, Sigmund, et al.
- 2011
(Show Context)
Citation Context ... di]〉i∈[1,p] ⇔ [ai, bi] ⊑ [ci, di], ∀i ∈ [1, p].i ∈ [1, p], meaning that each interval [ai, bi] of e is subsumed by the corresponding interval [ci, di] of f . For example, 〈[2, 4], [2, 6]〉 ⊑ 〈[4, 4], =-=[3, 4]-=-〉 as [2, 4] ⊑ [4, 4] and [2, 6] ⊑ [3, 4]. 4.4. Mining a GED as a pattern structure GED in Table 1 can be formalized as a pattern structure (G, (D,⊓), δ) where G = {g1, . . . , g5} and D is a set of in... |

2 |
Representing lattices using many-valued relations,
- Gély, Medina, et al.
- 2009
(Show Context)
Citation Context ...milarity operators which induce a semi-lattice on data descriptions. Several attempts were made for defining such semi-lattices on sets of graphs [8, 17, 18, 22] and logical formulas [5, 7] (see also =-=[10, 37]-=- for FCA extensions). Indeed, if one is able to order object descriptions in complex data, e.g. with graph morphism when objects are 2 described by labelled graphs, one may attempt to directly build a... |

2 |
Some links between formal concept analysis and graph mining, in:
- Liquiere
- 2006
(Show Context)
Citation Context ....e. complex object descriptions, defining so-called similarity operators which induce a semi-lattice on data descriptions. Several attempts were made for defining such semi-lattices on sets of graphs =-=[8, 17, 18, 22]-=- and logical formulas [5, 7] (see also [10, 37] for FCA extensions). Indeed, if one is able to order object descriptions in complex data, e.g. with graph morphism when objects are 2 described by label... |

2 |
How to build a fungal fruit body: from uniform cells to specialized tissue,
- Busch, Braus
- 2007
(Show Context)
Citation Context ...us and reflect that the fruit-body is a highly active tissue. The fruit-body is a specific fungal organ that differentiate in order to produce spores and that further ensure spore dispersal in nature =-=[30]-=-. Previous gene expression analyses of the fruit-body development conducted in the ectomycorrhizal fungus Tuber borchii also reported the strong induction of several genes involved in carbon and nitro... |

1 |
JSM-method as a machine learning method,
- Kuznetsov
- 1991
(Show Context)
Citation Context ...lways suggest the most efficient implementation right away, and there are situations where one would choose original data representation rather than scaled data [8]. Although scaling allows one to apply FCA tools, it may dramatically increase the complexity of computation and representation, and make worse the visualization of results. Instead of scaling, one may work directly with initial data, i.e. complex object descriptions, defining so-called similarity operators which induce a semi-lattice on data descriptions. Several attempts were made for defining such semi-lattices on sets of graphs [8,17,18,22] and logical formulas [5,7] (see also [10,37] for FCA extensions). Indeed, if one is able to order object descriptions in complex data, e.g. with graph morphism when objects are described by labelled graphs, one may attempt to directly build a concept lattice from such data. In [8], a general approach called pattern structures was proposed, which allows one to apply standard FCA to any partially ordered data descriptions. This paper addresses the problem of FCA-based classification of numerical data, where object descriptions are vectors of numbers, with pattern structures and a particular sim... |