Results 1 
7 of
7
An Introduction to Symbolic Data Analysis and the Sodas Software
 Journal of Symbolic Data Analysis
, 2003
"... ..."
Probabilistic Models for Bacterial Taxonomy
 INTERNATIONAL STATISTICAL REVIEW
, 2000
"... We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological in ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We give a survey of different probabilistic partitioning methods that have been applied to bacterial taxonomy. We introduce a theoretical framework, which makes it possible to treat the various models in a unified way. The key concepts of our approach are prediction and storing of microbiological information in a Bayesian forecasting setting. We show that there is a close connection between classification and probabilistic identification and that, in fact, our approach ties these two concepts together in a coherent way.
Applying the EMalgorithm to Classification of Bacteria
 Proceedings of the International ICSC Congress on Intelligent Systems and Applications
, 2000
"... In present paper we study the use of the expectation maximization (EM) algorithm in classification. The EMalgorithm is used to calculate the probability of each vector belonging to each class. If we assign each vector to the class of maximal probability we get a classification minimizing a certain ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In present paper we study the use of the expectation maximization (EM) algorithm in classification. The EMalgorithm is used to calculate the probability of each vector belonging to each class. If we assign each vector to the class of maximal probability we get a classification minimizing a certain loglikelihood function. By analyzing these probabilities we get a clearer picture of how well data fits to the classification than by traditional classification methods. We define a vector to be well classified in the classification if its probability of belonging to some class is above a prescribed value 1 \Gamma ffl. Then we set up the experimental procedure to filter out elements that are not well classified in a large data set describing strains of bacteria belonging to the family Enterobacteriaceae. We compare classifications with subset of the data (containing only well classified elements) to classifications done with randomly chosen subsets. We note that classifications done with w...
BinClass: A Software Package for Classifying Binary Vectors User's Guide
"... In this document we introduce a software package BinClass for the classification of binary vectors and analysis of the classification results. First we will give brief introduction to the mathematical foundations and theory of clustering, cumulative classification and mixture classification. We also ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this document we introduce a software package BinClass for the classification of binary vectors and analysis of the classification results. First we will give brief introduction to the mathematical foundations and theory of clustering, cumulative classification and mixture classification. We also introduce methods for analysis of the classifications including trees (dendrograms) , comparison of the classifications and bootstrapping. A few pseudoalgorithms are presented. These methods are included in the software package. The third and fourth chapters are the user's guide to the actual software package. Finally a short sample session is presented to give insight into how the software actually works and to illustrate the function of some of the many parameters. Apart from being a user's guide to the software package, this document can be seen as a review and tutorial to classification methodology of binary data. This is due to extensive research done on the subject at our department.
Knowledge Discovery From Symbolic Data And The Sodas Software
 Conf. on Principles and Practice of Knowledge Discovery in Databases, PPKDD2000
, 2000
"... The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means getting explanatory results, that why, "symbolic objects" are introduced and studied in this paper. They model concepts and constitute an explanatory output for data analysis. Moreover they can be used in order to define queries of a Relational Data Base and propagate concepts between Data Bases. We define "Symbolic Data Analysis" (SDA) as the extension of standard Data Analysis to symbolic data tables as input in order to find symbolic objects as output. In this paper we give an overview on recent development on SDA. We present some tools and methods of SDA and introduce the SODAS software prototype (issued from the work of 17 teams of nine countries involved in an European project of EUROSTAT). 1
Five SEARCH STRATEGIES
"... So far very little has been said about the actual process by which the required information is located. In the case of document retrieval the information is the subset of documents which are deemed to be relevant to the query. In Chapter 4, occasional reference was made to search efficiency, and the ..."
Abstract
 Add to MetaCart
So far very little has been said about the actual process by which the required information is located. In the case of document retrieval the information is the subset of documents which are deemed to be relevant to the query. In Chapter 4, occasional reference was made to search efficiency, and the appropriateness of a file structure for searching. The kind of
unknown title
"... TWINSPAN, based partly on an earlier program called ‘Indicator Species Analysis’ (Hill et al. 1975), was written in 1979, five years before the Apple Macintosh revolutionized personal computing, and two years before MS DOS was launched. The first version of Windows did not appear till November 1985. ..."
Abstract
 Add to MetaCart
TWINSPAN, based partly on an earlier program called ‘Indicator Species Analysis’ (Hill et al. 1975), was written in 1979, five years before the Apple Macintosh revolutionized personal computing, and two years before MS DOS was launched. The first version of Windows did not appear till November 1985. In 1979, a personal computer was an expensive luxury possessed only by a few geeks; all serious calculations were made on mainframes. At Cornell University, newlyavailable computer terminals had made programming much easier than in earlier years. Programs and problems could be submitted electronically (to another room in the Cornell’s Langmuir Lab), though the output was still normally on paper. The time was therefore ripe for development of numerical methods in ecology to the point where they could become routine tools rather than interesting prospects for development. The early proponents of numerical methods, notably Goodall (1953a, 1953b), had seen themselves as champions of objectivity. They were uncomfortable about the ZürichMontpellier tradition of continental Europe, which had sought to construct a comprehensive system of knowledge. In the eyes of many British and American ecologists, the ZM system was subjective and therefore intellectually dubious, because field workers sampled in a way