Results 1 - 10
of
85
Trade-Off Between Sample Size and Accuracy: Case of Measurements under Interval Uncertainty
, 2009
"... In many practical situations, we are not satisfied with the accuracy of the existing measurements. There are two possible ways to improve the measurement accuracy: • first, instead of a single measurement, we can make repeated measurements; the additional information coming from these additional mea ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In many practical situations, we are not satisfied with the accuracy of the existing measurements. There are two possible ways to improve the measurement accuracy: • first, instead of a single measurement, we can make repeated measurements; the additional information coming from these additional measurements can improve the accuracy of the result of this series of measurements; • second, we can replace the current measuring instrument with a more accurate one; correspondingly, we can use a more accurate (and more expensive) measurement procedure provided by a measuring lab – e.g., a procedure that includes the use of a higher quality reagent. In general, we can combine these two ways, and make repeated measurements with a more accurate measuring instrument. What is the appropriate trade-off between sample size and accuracy? This is the general problem that we address in this paper.
Measures of Deviation (and Dependence) for Heavy-Tailed Distributions and their Estimation under Interval and Fuzzy Uncertainty
"... techniques are based on the assumption that the random variables are normally distributed. For such distributions, a natural characteristic of the “average ” value is the mean, and a natural characteristic of the deviation from the average is the variance. However, in many practical situations, e.g. ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
techniques are based on the assumption that the random variables are normally distributed. For such distributions, a natural characteristic of the “average ” value is the mean, and a natural characteristic of the deviation from the average is the variance. However, in many practical situations, e.g., in economics and finance, we encounter probability distributions for which the variance is infinite; such distributions are called heavy-tailed. For such distributions, we describe which characteristics can be used to describe the average and the deviation from the average, and how to estimate these characteristics under interval and fuzzy uncertainty. We also discuss what are the reasonable analogues of correlation for such heavy-tailed distributions.
Interval Computations and Interval-Related Statistical Techniques: Tools for Estimating Uncertainty of the Results of Data Processing and Indirect Measurements
"... In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x, i.e., we only know that the measurement error is located on the interval [−∆, ∆]. The traditional engineering approach to such situations is to assume that ∆x is uniformly distributed on ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x, i.e., we only know that the measurement error is located on the interval [−∆, ∆]. The traditional engineering approach to such situations is to assume that ∆x is uniformly distributed on [−∆, ∆], and to use the corresponding statistical techniques. In some situations, however, this approach underestimates the error of indirect measurements. It is therefore desirable to directly process this interval uncertainty. Such “interval computations” methods have been developed since the 1950s. In this chapter, we provide a brief overview of related algorithms, results, and remaining open problems.
Model Fusion under Probabilistic and Interval Uncertainty, with Application to Earth Sciences
"... Abstract. One of the most important studies of the earth sciences is that of the Earth’s interior structure. There are many sources of data for Earth tomography models: first-arrival passive seismic data (from the actual earthquakes), first-arrival active seismic data (from the seismic experiments), ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract. One of the most important studies of the earth sciences is that of the Earth’s interior structure. There are many sources of data for Earth tomography models: first-arrival passive seismic data (from the actual earthquakes), first-arrival active seismic data (from the seismic experiments), gravity data, and surface waves. Currently, each of these datasets is processed separately, resulting in several different Earth models that have specific coverage areas, different spatial resolutions and varying degrees of accuracy. These models often provide complimentary geophysical information on earth structure (P and S wave velocity structure). Combining the information derived from each requires a joint inversion approach. Designing such joint inversion techniques presents an important theoretical and practical challenge. While such joint inversion methods are being developed, as a first step, we propose a practical solution: to fuse the Earth models coming from different datasets. Since these Earth models have different areas of coverage, model fusion is especially important because some of the resulting models provide better accuracy and/or spatial resolution in some spatial areas and in some depths while other models provide a better accuracy and/or spatial resolution in other areas or depths. The models used in this paper contain measurements that have not only different accuracy and coverage, but also different spatial resolution. We describe how to fuse such models under interval and probabilistic uncertainty. The resulting techniques can be used in other situations when we need to merge models of different accuracy and spatial resolution.
Propagation and provenance of probabilistic and interval uncertainty in cyberinfrastructure-related data processing and data fusion
- Proceedings of the International Workshop on Reliable Engineering Computing REC’08
, 2008
"... Abstract. In the past, communications were much slower than computations. As a result, researchers and practitioners collected different data into huge databases located at a single location such as NASA and US Geological Survey. At present, communications are so much faster that it is possible to k ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. In the past, communications were much slower than computations. As a result, researchers and practitioners collected different data into huge databases located at a single location such as NASA and US Geological Survey. At present, communications are so much faster that it is possible to keep different databases at different locations, and automatically select, transform, and collect relevant data when necessary. The corresponding cyberinfrastructure is actively used in many applications. It drastically enhances scientists ’ ability to discover, reuse and combine a large number of resources, e.g., data and services. Because of this importance, it is desirable to be able to gauge the the uncertainty of the results obtained by using cyberinfrastructure. This problem is made more urgent by the fact that the level of uncertainty associated with cyberinfrastructure resources can vary greatly – and that scientists have much less control over the quality of different resources than in the centralized database. Thus, with the cyberinfrastructure promise comes the need to analyze how data uncertainty propagates via this cyberinfrastructure. When the resulting accuracy is too low, it is desirable to produce the provenance of this inaccuracy: to find out which data points contributed most to it, and how an improved accuracy of these data points will improve the accuracy of the result. In this paper, we describe algorithms for propagating uncertainty and for finding the provenance for this uncertainty.
Statistical Hypothesis Testing Under Interval Uncertainty: An Overview
- International Journal of Intelligent Technologies and Applied Statistics
"... An important part of statistical data analysis is hypothesis testing. For example, we know the probability distribution of the characteristics corresponding to a certain disease, we have the values of the characteristics describing a patient, and we must make a conclusion whether this patient has th ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
An important part of statistical data analysis is hypothesis testing. For example, we know the probability distribution of the characteristics corresponding to a certain disease, we have the values of the characteristics describing a patient, and we must make a conclusion whether this patient has this disease. Traditional hypothesis testing techniques are based on the assumption that we know the exact values of the characteristic(s) x describing a patient. In practice, the value ˜x comes from measurements and is, thus, only known with uncertainty: ˜x ̸ = x. In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x def = ˜x−x. In such situation, after the measurement, the only information that we have about the (unknown) value x of this characteristic is that x belongs to the interval [˜x − ∆, ˜x + ∆]. In this paper, we overview different approaches on how to test a hypothesis under such interval uncertainty. This overview is based on a general approach to decision making under interval uncertainty, approach developed by the 2007 Nobelist L. Hurwicz. 1 1 Formulation of the Problem
Reducing Over-Conservative Expert Failure Rate Estimates in the Presence of Limited Data: A New Probabilistic/Fuzzy Approach
"... Abstract—Unique highly reliable components are typical for aerospace industry. For such components, due to their high reliability and uniqueness, we do not have enough empirical data to make statistically reliable estimates about their failure rate. To overcome this limitation, the empirical data is ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—Unique highly reliable components are typical for aerospace industry. For such components, due to their high reliability and uniqueness, we do not have enough empirical data to make statistically reliable estimates about their failure rate. To overcome this limitation, the empirical data is usually supplemented with expert estimates for the failure rate. The problem is that experts tend to be – especially in aerospace industry – over-cautious, over-conservative; their estimates for the failure rate are usually much higher than the actual observed failure rate. In this paper, we provide a new fuzzy-related statistically justified approach for reducing this over-estimation. I. FORMULATION OF THE PROBLEM Reliability: how it is usually described and evaluated. Failures are ubiquitous. As a result, reliability analysis is an important part of engineering design.
Towards Fast Algorithms for Processing Type-2 Fuzzy Data: Extending Mendel’s Algorithms From IntervalValued to a More General Case
- Proceedings of the 27th International Conference of the North American Fuzzy Information Processing Society NAFIPS’2008
, 2008
"... Abstract—It is known that processing of data under general type-1 fuzzy uncertainty can be reduced to the simplest case – of interval uncertainty: namely, Zadeh's extension principle is equivalent to level-by-level interval computations applied to α-cuts of the corresponding fuzzy numbers. However, ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract—It is known that processing of data under general type-1 fuzzy uncertainty can be reduced to the simplest case – of interval uncertainty: namely, Zadeh's extension principle is equivalent to level-by-level interval computations applied to α-cuts of the corresponding fuzzy numbers. However, type-1 fuzzy numbers may not be the most adequate way of describing uncertainty, because they require that an expert can describe his or her degree of con dence in a statement by an exact value. In practice, it is more reasonable to expect that the expert estimates his or her degree by using imprecise words from natural language – which can be naturally formalized as fuzzy sets. The resulting type-2 fuzzy numbers more adequately represent the expert's opinions, but their practical use is limited by the seeming computational complexity of their use. In his recent research, J. Mendel has shown that for the practically
Estimating Covariance for Privacy Case under Interval (and Fuzzy) Uncertainty
"... Abstract — One of the main objectives of collecting data in statistical databases (medical databases, census databases) is to find important correlations between different quantities. To enable researchers to looks for such correlations, we should allow them them to ask queries testing different com ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract — One of the main objectives of collecting data in statistical databases (medical databases, census databases) is to find important correlations between different quantities. To enable researchers to looks for such correlations, we should allow them them to ask queries testing different combinations of such quantities. However, when we receive answers to many such questions, we may inadvertently disclose information about individual patients, information that should be private. One way to preserve privacy in statistical databases is to store ranges instead of the original values. For example, instead of an exact age of a patient in a medical database, we only store the information that this age is, e.g., between 60 and 70. This idea solves the privacy problem, but it make statistical analysis more complex. Different possible values from the corresponding ranges lead, in general, to different values of the corresponding statistical characteristic; it is therefore desirable to find the range of all such values. It is known that for mean and variance, there exist feasible algorithms for computing such ranges. In this paper, we show that similar algorithms are possible for another important statistical characteristic – covariance, whose value is important in computing correlations. I.
1 Interval Approach to Preserving Privacy in Statistical Databases
"... Need for statistical databases. In many practical situations, it is very useful tocollect large amounts of data. For example, from the data that we collect during a census, we can extract alot of information about health, mortality, employment in different regions- for different age ranges, and for ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Need for statistical databases. In many practical situations, it is very useful tocollect large amounts of data. For example, from the data that we collect during a census, we can extract alot of information about health, mortality, employment in different regions- for different age ranges, and for people from different genders and of different ethnicgroups. By analyzing this statistics, we can reveal troubling spots and allocate (usually limited) resources so that the help goes first to social groups that needit most. Similarly, by gathering data about people's health in a large medicaldatabase, we can extract a lot of useful information on how the geographic location, age, and gender affect a person's health. Thus, we can make measures,which are aimed at improving public health, more focused. Finally, a large statistical database of purchases can help find out what peo-ple are looking for, make shopping easier for customers and at the same time, decrease the stores ' expenses related to storing unnecessary items. Need for privacy. Privacy is an important issue in the statistical analysis ofhuman-related data. For example, to check whether in a certain geographic area,

