Results 1 
8 of
8
Experimental Uncertainty Estimation and Statistics for Data Having Interval Uncertainty
, 2007
"... This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute variou ..."
Abstract

Cited by 20 (14 self)
 Add to MetaCart
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
An Introduction to Symbolic Data Analysis and the Sodas Software
 Journal of Symbolic Data Analysis
, 2003
"... ..."
Basic statistical methods for interval data
 Statistica Applicata [Italian Journal of Applied Statistics
, 2005
"... Real world data analysis is often affected by different type of errors as: measurement errors, computation errors, imprecision related to the method adopted for estimating the data (parameters). The uncertainty in the data, which is strictly connected to the above errors, may be treated by consideri ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Real world data analysis is often affected by different type of errors as: measurement errors, computation errors, imprecision related to the method adopted for estimating the data (parameters). The uncertainty in the data, which is strictly connected to the above errors, may be treated by considering, rather than a single value for each data, the interval of values in which it may fall: the interval data. This kind of data representation imposes a new formulation of the classical statistical methods in the case that intervalvalued variables are considered. Accordingly, purpose of the present work is to develop suitable statistical methods for: obtaining a synthesis of the data, analysing the variability in the data and the existing relations among intervalvalued variables. The proposed solutions are based on the following assessments: – The developed statistics for intervalvalued variables are intervals. – Statistical methods for intervalvalued variables embrace classical statistical methods as special cases. – The proposed interval solutions do not contain redundant elements with respect to a given criterion. In the present work particular interest is devoted to the proof of the properties of the proposed techniques and to the comparison of the obtained results with those already existing in the literature.
Knowledge Discovery From Symbolic Data And The Sodas Software
 Conf. on Principles and Practice of Knowledge Discovery in Databases, PPKDD2000
, 2000
"... The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their under ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The data descriptions of the units are called "symbolic" when they are more complex than the standard ones due to the fact that they contain internal variation and are structured. Symbolic data happen from many sources, for instance in order to summarise huge Relational Data Bases by their underlying concepts. "Extracting knowledge" means getting explanatory results, that why, "symbolic objects" are introduced and studied in this paper. They model concepts and constitute an explanatory output for data analysis. Moreover they can be used in order to define queries of a Relational Data Base and propagate concepts between Data Bases. We define "Symbolic Data Analysis" (SDA) as the extension of standard Data Analysis to symbolic data tables as input in order to find symbolic objects as output. In this paper we give an overview on recent development on SDA. We present some tools and methods of SDA and introduce the SODAS software prototype (issued from the work of 17 teams of nine countries involved in an European project of EUROSTAT). 1
Generalization of the Principal Components Analysis to Histogram Data
 In Principles and Practice of knowledge discovery in databases
, 2000
"... In this article we propose an algorithm for Principal Components Analysis when the variables are histogram type. This algorithm also works if the data table has variables of interval type and histogram type mixed. If all the variables are interval type it produces the same output as the one produced ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this article we propose an algorithm for Principal Components Analysis when the variables are histogram type. This algorithm also works if the data table has variables of interval type and histogram type mixed. If all the variables are interval type it produces the same output as the one produced by the algorithm of the Centers Method propose in [5, Cazes, Chouakria, Diday and Schektman (1997)]. 1 The algorithm In this algorithm we use the idea proposed in [9, Diday (1998)]. We represent each histogramindividual by a succession of k intervalindividuals (the first one included in the second one, the second one included in the third one and so on) where k is the maximum number of modalities taken by some variable in the input symbolic data table. Instead of representing the histograms in the factorial plane, we are going to represent the Empirical Distribution Function F Y defined, in [3, Bock and Diday (2000)] associated with each histogram. In other words if we have an histogram variable Y on a set E = {a 1 , a 2 , . . .} of objects with domain Y represented by the mapping Y (a) = (U(a), # a ), for a # E, where # a is frequency distribution, then in the algorithm we will use the function F (x) = # i / # i #x # i instead of the histogram. Definition 1. Let X = (x ij ) i=1,2,...,m, j=1,2,...,n be a symbolic data table with variables type continuous, interval and histogram, and let be k = max{s, where s is the number of modalities of Y j , j = 1, 2, . . . , n} where Y j is a variable of histogram type 1 . We define the vectorsuccession of intervals associated with each cell of X as: 1 If all the variables are interval type then k = 1. 1. if x ij = [a, b] then the vectorsuccession of intervals associated is: x # ij = # # # # # [a, b] [a...
Analysis
"... Abstract. This paper aims to adapt clusterwise regression to intervalvalued data. The proposed approach combines the dynamic clustering algorithm with the center and range regression method for intervalvalued data in order to identify both the partition of the data and the relevant regression mode ..."
Abstract
 Add to MetaCart
Abstract. This paper aims to adapt clusterwise regression to intervalvalued data. The proposed approach combines the dynamic clustering algorithm with the center and range regression method for intervalvalued data in order to identify both the partition of the data and the relevant regression models, one for each cluster. Experiments with a car intervalvalued data set show the usefulness of combining both approaches.
Likelihoodbased Imprecise Regression
"... We introduce a new approach to regression with imprecisely observed data, combining likelihood inference with ideas from imprecise probability theory, and thereby taking different kinds of uncertainty into account. The approach is very general: it provides a uniform theoretical framework for regress ..."
Abstract
 Add to MetaCart
We introduce a new approach to regression with imprecisely observed data, combining likelihood inference with ideas from imprecise probability theory, and thereby taking different kinds of uncertainty into account. The approach is very general: it provides a uniform theoretical framework for regression analysis with imprecise data, where all kinds of relationships between the variables of interest may be considered and all types of imprecisely observed data are allowed. Furthermore, we propose a regression method based on this approach, where no parametric distributional assumption is needed and likelihoodbased interval estimates of quantiles of the residuals distribution are used to identify a set of plausible descriptions of the relationship of interest. Thus, the proposed regression method is very robust and yields a setvalued result, whose extent is determined by the amounts of both kinds of uncertainty involved in the regression problem with imprecise data: statistical uncertainty and indetermination. In addition, we apply our robust regression method to an interesting question in the social sciences by analyzing data from a social survey. As result we obtain a large set of plausible relationships, reflecting the high uncertainty inherent in the analyzed data set.
doi:10.1155/2011/523937 Research Article Classification and Regression Trees on Aggregate Data Modeling: An Application in Acute
"... Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Cardiologists are interested in determining whether the type of hospital pathway followed by a patient is predictive of survival. The study objecti ..."
Abstract
 Add to MetaCart
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Cardiologists are interested in determining whether the type of hospital pathway followed by a patient is predictive of survival. The study objective was to determine whether accounting for hospital pathways in the selection of prognostic factors of oneyear survival after acute myocardial infarction �AMI � provided a more informative analysis than that obtained by the use of a standard regression tree analysis �CART method�. Information on AMI was collected for 1095 hospitalized patients over an 18month period. The construction of pathways followed by patients produced symbolicvalued observations requiring a symbolic regression tree analysis. This analysis was compared with the standard CART analysis using patients as statistical units described by standard data selected TIMI score as the primary predictor variable. For the 1011 �84, resp. � patients with a lower �higher � TIMI score, the pathway variable did not appear as a diagnostic variable until the third �second � stage of the tree construction. For an ecological analysis, again TIMI score was the first predictor variable. However, in a symbolic regression tree