Results 1 -
3 of
3
An Introduction to Symbolic Data Analysis and the Sodas Software
- Journal of Symbolic Data Analysis
, 2003
"... ..."
Knowledge Discovery From Symbolic Data And The Sodas Software
- Conf. on Principles and Practice of Knowledge Discovery in Databases, PPKDD-2000
, 2000
"... The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their under ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means getting explanatory results, that why, "symbolic objects" are introduced and studied in this paper. They model concepts and constitute an explanatory output for data analysis. Moreover they can be used in order to define queries of a Relational Data Base and propagate concepts between Data Bases. We define "Symbolic Data Analysis" (SDA) as the extension of standard Data Analysis to symbolic data tables as input in order to find symbolic objects as output. In this paper we give an overview on recent development on SDA. We present some tools and methods of SDA and introduce the SODAS software prototype (issued from the work of 17 teams of nine countries involved in an European project of EUROSTAT). 1
Generalization of the Principal Components Analysis to Histogram Data
- In Principles and Practice of knowledge discovery in databases
, 2000
"... In this article we propose an algorithm for Principal Components Analysis when the variables are histogram type. This algorithm also works if the data table has variables of interval type and histogram type mixed. If all the variables are interval type it produces the same output as the one produced ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this article we propose an algorithm for Principal Components Analysis when the variables are histogram type. This algorithm also works if the data table has variables of interval type and histogram type mixed. If all the variables are interval type it produces the same output as the one produced by the algorithm of the Centers Method propose in [5, Cazes, Chouakria, Diday and Schektman (1997)]. 1 The algorithm In this algorithm we use the idea proposed in [9, Diday (1998)]. We represent each histogram--individual by a succession of k interval--individuals (the first one included in the second one, the second one included in the third one and so on) where k is the maximum number of modalities taken by some variable in the input symbolic data table. Instead of representing the histograms in the factorial plane, we are going to represent the Empirical Distribution Function F Y defined, in [3, Bock and Diday (2000)] associated with each histogram. In other words if we have an histogram variable Y on a set E = {a 1 , a 2 , . . .} of objects with domain Y represented by the mapping Y (a) = (U(a), # a ), for a # E, where # a is frequency distribution, then in the algorithm we will use the function F (x) = # i / # i #x # i instead of the histogram. Definition 1. Let X = (x ij ) i=1,2,...,m, j=1,2,...,n be a symbolic data table with variables type continuous, interval and histogram, and let be k = max{s, where s is the number of modalities of Y j , j = 1, 2, . . . , n} where Y j is a variable of histogram type 1 . We define the vector--succession of intervals associated with each cell of X as: 1 If all the variables are interval type then k = 1. 1. if x ij = [a, b] then the vector--succession of intervals associated is: x # ij = # # # # # [a, b] [a...

