Results 11 - 20
of
63
A Unified Framework for Expressing Software Subsystem Classification Techniques
, 1996
"... The architecture of a software system classifies its components into subsystems and describes the relationships between the subsystems. The information contained in such an abstraction is of immense significance in various software maintenance activities. There is considerable interest in extracting ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
The architecture of a software system classifies its components into subsystems and describes the relationships between the subsystems. The information contained in such an abstraction is of immense significance in various software maintenance activities. There is considerable interest in extracting the architecture of a software system from its source code, and hence in techniques that classify the components of a program into subsystems. Techniques for classifying subsystems presented in the literature differ in the type of components they place in a subsystem and the information they use to identify related components. However, these techniques have been presented using different terminology and symbols, making it harder to perform comparative analyses. This paper presents a unified framework for expressing techniques of classifying subsystems of a software system. The framework consists of a consistent set of terminology, notation, and symbols that may be used to describe the input, output, and processing performed by these techniques. Using this framework several subsystem classification techniques have been reformulated. This reformulation makes it easier to compare these techniques, a first step towards evaluating their relative effectiveness.
Clustering ensembles: Models of consensus and weak partitions
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2005
"... Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intra-class variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world datasets.
Knowledge Discovery In Databases: An Attribute-Oriented Rough Set Approach
, 1995
"... Knowledge Discovery in Databases (KDD) is an active research area with the promise for a high payoff in many business and scientific applications. The grand challenge of knowledge discovery in databases is to automatically process large quantities of raw data, identify the most significant and meani ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Knowledge Discovery in Databases (KDD) is an active research area with the promise for a high payoff in many business and scientific applications. The grand challenge of knowledge discovery in databases is to automatically process large quantities of raw data, identify the most significant and meaningful patterns, and present this knowledge in an appropriate form for achieving the user's goal. Knowledge discovery systems face challenging problems from the real-world databases which tend to be very large, redundant, noisy and dynamic. Each of these problems has been addressed to some extent within machine learning, but few, if any, systems address them all. Collectively handling these problems while producing useful knowledge efficiently and effectively is the main focus of the thesis. In this thesis, we develop an attribute-oriented rough set approach for knowledge discovery in databases. The method adopts the artificial intelligent "learning from examples" paradigm combined with rough...
Student Modeling and Machine Learning
- INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION
, 1998
"... After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds l ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds light on the difficulty, suitability and potential of using machine learning for student modeling processes, and, to a lesser extent, the potential of using student modeling techniques in machine learning.
ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing
- IEEE Trans. on Multimedia
, 2004
"... Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1 ) Semantics---sensitive video classification problem because of the semantic gap between low-le ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1 ) Semantics---sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.
Computer Vision Algorithms on Reconfigurable Logic Arrays
- IEEE TRANS. ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1999
"... Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million a ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Computer vision algorithms are natural candidates for high performance computing due to their inherent parallelism and intense computational demands. For example, a simple 3 x 3 convolution on a 512 x 512 gray scale image at 30 frames per second requires 67.5 million multiplications and 60 million additions to be performed in one second. Computer vision tasks can be classified into three categories based on their computational complexity andcommunication complexity: low-level, intermediate-level and high-level. Special-purpose hardware provides better performance compared to a general-purpose hardware for all the three levels of vision tasks. With recent advances in very large scale integration (VLSI) technology, an application specific integrated circuit (ASIC) can provide the best performance in terms of total execution time. However, long design cycle time, high development cost and inflexibility of a dedicated hardware deter design of ASICs. In contrast, field programmable gate arrays (FPGAs) support lower design verification time and easier design adaptability atalower cost. Hence, FPGAs with an array of reconfigurable logic blocks canbevery useful compute elements. FPGA-based custom computing machines are
Concept Hierarchy in Data Mining: Specification, Generation and Implementation
, 1997
"... Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. As one of the most important background knowledge, concept hierarchy plays a fundamentally important role in data mining. It is the purpose of this thesis to study some aspects of ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. As one of the most important background knowledge, concept hierarchy plays a fundamentally important role in data mining. It is the purpose of this thesis to study some aspects of concept hierarchy such as the automatic generation and encoding technique in the context of data mining. After the discussion on the basic terminology and categorization, automatic generation of concept hierarchies is studied for both nominal and numerical hierarchies. One algorithm is designed for determining the partial order on a given set of nominal attributes. The resulting partial order is a useful guide for users to finalize the concept hierarchy for their particular data mining tasks. Based on hierarchical and partitioning clustering methods, two algorithms are proposed for the automatic generation of numerical hierarchies. The quality and performance comparisons indicates that the ...
An alternative extension of the k-means algorithm for clustering categorical data
- Int. J. Appl. Math. Comput. Sci
, 2004
"... Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computatio ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of “cluster centers ” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a k-means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, soybean disease and nursery databases.
An Introduction to Symbolic Data Analysis and the Sodas Software
- Journal of Symbolic Data Analysis
, 2003
"... ..."
Comparing International Development Patterns Using Multi-Operator Learning and Discovery Tools
- Proceedings of AAAI-94 Workshop on Knowledge Discovery in Databases
, 1994
"... The multistrategy knowledge discovery tool, INLEN, is applied to databases consisting of economic and demographic facts and statistics about the countries of the world. Preliminary experiments focus on discerning and comparing various patterns in the status and development of countries in different ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
The multistrategy knowledge discovery tool, INLEN, is applied to databases consisting of economic and demographic facts and statistics about the countries of the world. Preliminary experiments focus on discerning and comparing various patterns in the status and development of countries in different regions of the world. These experiments have provided some interesting and often unexpected results, but they are only a beginning in exploring such data. By discovering patterns and exceptions such as the ones presented, domain experts may have new insights into national development patterns, predict future developments in certain countries, or use these discoveries to influence national policies. Users who are not experts in the domain may also make interesting discoveries with INLEN. The results of these initial experiments are presented and future paths of research in this domain are proposed. 1

