Results 1 - 10
of
18
Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis
- Journal of Artificial Intelligence research
, 2005
"... We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Ha ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.
Understanding Behavioral Dependencies in Class Hierarchies using Concept Analysis
- LMO '03: LANGAGES ET MODÈLES À OBJETS (OBJECT ORIENTED LANGUAGES AND MODELS)
, 2003
"... The functionalities of software artifacts are defined by structural and behavioral dependencies. During evolution and maintenance phases of a system, the developer has to be able to understand how these dependencies were defined and how they influence the interaction of the artifacts. The developer ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The functionalities of software artifacts are defined by structural and behavioral dependencies. During evolution and maintenance phases of a system, the developer has to be able to understand how these dependencies were defined and how they influence the interaction of the artifacts. The developer must be sure that modifications done in the system will not break its behavior. In the most of the cases, this happens because the dependencies are not documented . We propose to tackle this problem in the context of object oriented classes hierarchies using Concept Analysis. We use different properties about invocations in methods to analyze the dependencies among the hierarchy classes in terms of class behaviour. Based on these results, we show a set of patterns that describe repeated kinds of behavior in class hierarchies. We show the application of these patterns in the specific case of Magnitude hierarchy in Smalltalk.
Simple crosscutting concerns are not so simple – analysing variability in large-scale idioms-based implementations
- In Proceedings of the Sixth International Conference on AspectOriented Software Development (AOSD’07
, 2007
"... This paper describes a method for studying idioms-based implementations of crosscutting concerns, and our experiences with it in the context of a real-world, large-scale embedded software system. In particular, we analyse a seemingly simple concern, tracing, and show that it exhibits significant var ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper describes a method for studying idioms-based implementations of crosscutting concerns, and our experiences with it in the context of a real-world, large-scale embedded software system. In particular, we analyse a seemingly simple concern, tracing, and show that it exhibits significant variability, despite the use of a prescribed idiom. We discuss the consequences of this variability in terms of how aspect-oriented software development techniques could help prevent it, how it paralyses (automated) migration efforts, and which aspect language features are required in order to obtain precise and concise aspects. Additionally, we elaborate on the representativeness of our results and on the usefulness of our
Understanding Classes using XRay Views
- In Proceedings of 2nd. MASPEGHI (ASE
, 2003
"... Understanding the internal workings of classes is a key prerequisite to maintaining an object-oriented software system. Unfortunately, classical editing and browsing tools offer mainly linear and textual views of classes and their implementation. These views fail to expose the semantic relationships ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Understanding the internal workings of classes is a key prerequisite to maintaining an object-oriented software system. Unfortunately, classical editing and browsing tools offer mainly linear and textual views of classes and their implementation. These views fail to expose the semantic relationships between the internal parts of a class. We propose XRay views ---a technique based on Concept Analysis--- which reveal the internal relationships between groups of methods and attributes of a class. XRay views are composed out of elementary collaborations between attributes and methods and help the engineer to build a mental model of how a class works internally. In this paper we present XRay views, and illustrate the approach by applying it to three Smalltalk classes: OrderedCollection, Scanner, and UIBuilder.
Mining Roles with Semantic Meanings
"... With the growing adoption of role-based access control (RBAC) in commercial security and identity management products, how to facilitate the process of migrating a non-RBAC system to an RBAC system has become a problem with significant business impact. Researchers have proposed to use data mining te ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
With the growing adoption of role-based access control (RBAC) in commercial security and identity management products, how to facilitate the process of migrating a non-RBAC system to an RBAC system has become a problem with significant business impact. Researchers have proposed to use data mining techniques to discover roles to complement the costly top-down approaches for RBAC system construction. A key problem that has not been adequately addressed by existing role mining approaches is how to discover roles with semantic meanings. In this paper, we study the problem in two settings with different information availability. When the only information is user-permission relation, we propose to discover roles whose semantic meaning is based on formal concept lattices. We argue that the theory of formal concept analysis provides a solid theoretical foundation for mining roles from userpermission relation. When user-attribute information is also available, we propose to create roles that can be explained by expressions of user-attributes. Since an expression of attributes describes a real-world concept, the corresponding role represents a real-world concept as well. Furthermore, the algorithms we proposed balance the semantic guarantee of roles with system complexity. Our experimental results demonstrate the effectiveness of our approaches.
Using Blind Search and Formal Concepts for Binary Factor Analysis
, 2004
"... Binary Factor Analysis (BFA, also known as Boolean Factor Analysis) may help with understanding collections of binary data. Since we can take collections of text documents as binary data too, the BFA can be used to analyse such collections. Unfortunately, exact solving of BFA is not easy. This artic ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Binary Factor Analysis (BFA, also known as Boolean Factor Analysis) may help with understanding collections of binary data. Since we can take collections of text documents as binary data too, the BFA can be used to analyse such collections. Unfortunately, exact solving of BFA is not easy. This article shows two BFA methods based on exact computing, boolean algebra and the theory of formal concepts.
Multi-level Rule Discovery from Propositional Knowledge Bases
- International Workshop on Knowledge Discovery in Multimedia and Complex Data (KDMCD’02
, 2002
"... This paper explores how knowledge in the form of propositions in an expert system can be used as input into data mining. The output is multi-level knowledge which can be used to provide structure, suggest interesting concepts, improve understanding and support querying of the original knowledge. A ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper explores how knowledge in the form of propositions in an expert system can be used as input into data mining. The output is multi-level knowledge which can be used to provide structure, suggest interesting concepts, improve understanding and support querying of the original knowledge. Appropriate algorithms for mining knowledge must take into account the peculiar features of knowledge which distinguish it from data. The most obvious and problematic distinction is that only one of each rule exists. This paper introduces the possible benefits of mining knowledge and describes a technique for reorganizing knowledge and discovering higher-level concepts in the knowledge base. The rules input may have been acquired manually (we describe a simple technique known as Ripple Down Rules for this purpose) or automatically using an existing data mining technique. In either case, once the knowledge exists in propositional form, Formal Concept Analysis is applied to the rules to develop an abstraction hierarchy from which multi-level rules can be extracted. The user is able to explore the knowledge at and across any of the levels of abstraction to provide a much richer picture of the knowledge and understanding of the domain.
Clusters, Concepts, and Pseudometrics
"... Introduction The fields of cluster analysis and concept analysis are both used to identify patterns in data. Concept analysis identifies similarities between sets of objects based on their attributes. Cluster analysis groups objects with related characteristics based on some notion of distance. In ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Introduction The fields of cluster analysis and concept analysis are both used to identify patterns in data. Concept analysis identifies similarities between sets of objects based on their attributes. Cluster analysis groups objects with related characteristics based on some notion of distance. In this paper, we investigate connections between these two approaches. The framework for concept analysis is a finite set of objects, a finite set of attributes, and a binary relation between the two sets that describes objects according to their attributes. A concept is a maximal rectangle in the relation. The goal of concept analysis is to compute concepts and to analyze them for significant groupings of objects and attributes. However, concept analysis is not a practical tool for identifying patterns in large data sets. First, it can be computationally expensive to compute all of the concepts. Secondly, the number of concepts may be too large to analyze in a reasonable amount of tim
A Fast Algorithm for Building the Hasse Diagram of a Galois Lattice
- In: Proceedings of the Colloque LaCIM 2000,Montréal (CA
, 2000
"... Formal concept analysis and Galois lattices in general are increasingly used for large contexts that are automatically generated. As the size of the resulting datasets may grow considerably, it becomes essential to keep the algorithmic complexity of the analysis procedures as low as possible. Thi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Formal concept analysis and Galois lattices in general are increasingly used for large contexts that are automatically generated. As the size of the resulting datasets may grow considerably, it becomes essential to keep the algorithmic complexity of the analysis procedures as low as possible. This paper presents an efficient algorithm that computes the Hasse diagram of a Galois lattice from the lattice ground set, i.e., the set of all concepts. The algorithm performs an element-wise completion of the lattice according to a linear extension of the lattice order. This requires only a limited number of comparisons between concepts and therefore makes the global algorithm very efficient. In fact, its asymptotic time complexity is almost linear in the number of concepts. Consequently, the joint use of our algorithm with an efficient procedure for concept generation yields a complete procedure for building the Galois lattice.
Discovering substantial distinctions among incremental bi-clusters
- in SDM
, 2009
"... A fundamental task of data analysis is comprehending what distinguishes clusters found within the data. We present the problem of mining distinguishing sets which seeks to find sets of objects or attributes that induce that most change among the incremental bi-clusters of a binary dataset. Unlike em ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A fundamental task of data analysis is comprehending what distinguishes clusters found within the data. We present the problem of mining distinguishing sets which seeks to find sets of objects or attributes that induce that most change among the incremental bi-clusters of a binary dataset. Unlike emerging patterns and contrast sets which only focus on statistical differences between support of itemsets, our approach considers distinctions in both the attribute space and the object space. Viewing the lattice of bi-clusters formed within a data set as a weighted directed graph, we mine the most significant distinguishing sets by growing a maximal cost spanning tree of the lattice. In this paper we present a weighting function for measuring distinction among bi-clusters in the lattice and the novel MIDS algorithm. MIDS simultaneously enumerates biclusters, constructs the bi-cluster lattice, and computes the distinguishing sets. The efficient computational performance of MIDS is exhibited in a performance test on real world and benchmark data sets. The utility of distinguishing sets is also demonstrated with experiments on synthetic and real data. 1

