Results 1 -
5 of
5
An Introduction to Symbolic Data Analysis and the Sodas Software
- Journal of Symbolic Data Analysis
, 2003
"... ..."
Probabilistic Models for Bacterial Taxonomy
- INTERNATIONAL STATISTICAL REVIEW
, 2000
"... We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological in ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological information in a Bayesian forecasting setting. We show that there is a close connection between classification and probabilistic identification and that, in fact, our approach ties these two concepts together in a coherent way.
Applying the EM-algorithm to Classification of Bacteria
- Proceedings of the International ICSC Congress on Intelligent Systems and Applications
, 2000
"... In present paper we study the use of the expectation maximization (EM) algorithm in classification. The EM-algorithm is used to calculate the probability of each vector belonging to each class. If we assign each vector to the class of maximal probability we get a classification minimizing a certain ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In present paper we study the use of the expectation maximization (EM) algorithm in classification. The EM-algorithm is used to calculate the probability of each vector belonging to each class. If we assign each vector to the class of maximal probability we get a classification minimizing a certain log-likelihood function. By analyzing these probabilities we get a clearer picture of how well data fits to the classification than by traditional classification methods. We define a vector to be well classified in the classification if its probability of belonging to some class is above a prescribed value 1 \Gamma ffl. Then we set up the experimental procedure to filter out elements that are not well classified in a large data set describing strains of bacteria belonging to the family Enterobacteriaceae. We compare classifications with subset of the data (containing only well classified elements) to classifications done with randomly chosen subsets. We note that classifications done with w...
BinClass: A Software Package for Classifying Binary Vectors User's Guide
"... In this document we introduce a software package BinClass for the classification of binary vectors and analysis of the classification results. First we will give brief introduction to the mathematical foundations and theory of clustering, cumulative classification and mixture classification. We also ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this document we introduce a software package BinClass for the classification of binary vectors and analysis of the classification results. First we will give brief introduction to the mathematical foundations and theory of clustering, cumulative classification and mixture classification. We also introduce methods for analysis of the classifications including trees (dendrograms) , comparison of the classifications and bootstrapping. A few pseudoalgorithms are presented. These methods are included in the software package. The third and fourth chapters are the user's guide to the actual software package. Finally a short sample session is presented to give insight into how the software actually works and to illustrate the function of some of the many parameters. Apart from being a user's guide to the software package, this document can be seen as a review and tutorial to classification methodology of binary data. This is due to extensive research done on the subject at our department.
Knowledge Discovery From Symbolic Data And The Sodas Software
- Conf. on Principles and Practice of Knowledge Discovery in Databases, PPKDD-2000
, 2000
"... The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their under ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means getting explanatory results, that why, "symbolic objects" are introduced and studied in this paper. They model concepts and constitute an explanatory output for data analysis. Moreover they can be used in order to define queries of a Relational Data Base and propagate concepts between Data Bases. We define "Symbolic Data Analysis" (SDA) as the extension of standard Data Analysis to symbolic data tables as input in order to find symbolic objects as output. In this paper we give an overview on recent development on SDA. We present some tools and methods of SDA and introduce the SODAS software prototype (issued from the work of 17 teams of nine countries involved in an European project of EUROSTAT). 1

