Results 1 - 10
of
31
Population Variance under Interval Uncertainty: A
- New Algorithm, Reliable Computing
, 2006
"... In statistical analysis of measurement results, it is often beneficial to compute the range V of the population variance V = 1 n · n∑ (xi − E) i=1 ..."
Abstract
-
Cited by 20 (17 self)
- Add to MetaCart
In statistical analysis of measurement results, it is often beneficial to compute the range V of the population variance V = 1 n · n∑ (xi − E) i=1
Using expert knowledge in solving the seismic inverse problem
- In: Proceedings of the 24nd International Conference of the North American Fuzzy Information Processing Society NAFIPS’2005, Ann Arbor
, 2005
"... For many practical applications, it it important to solve the seismic inverse problem, i.e., to measure seismic travel times and reconstruct velocities at different depths from this data. The existing algorithms for solving the seismic inverse problem often take too long and/or produce un-physical r ..."
Abstract
-
Cited by 12 (11 self)
- Add to MetaCart
For many practical applications, it it important to solve the seismic inverse problem, i.e., to measure seismic travel times and reconstruct velocities at different depths from this data. The existing algorithms for solving the seismic inverse problem often take too long and/or produce un-physical results – because they do not take into account the knowledge of geophysicist experts. In this paper, we analyze how expert knowledge can be used in solving the seismic inverse problem. Key words: seismic inverse problem, expert knowledge
Detecting Outliers under Interval Uncertainty: A New Algorithm Based on Constraint Satisfaction
- Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems IPMU’06
"... In many application areas, it is important to detect outliers. The traditional engineering approach to outlier detection is that we start with some “normal ” values x1,..., xn, compute the sample average E, the sample standard deviation σ, and then mark a value x as an outlier if x is outside the k0 ..."
Abstract
-
Cited by 10 (9 self)
- Add to MetaCart
In many application areas, it is important to detect outliers. The traditional engineering approach to outlier detection is that we start with some “normal ” values x1,..., xn, compute the sample average E, the sample standard deviation σ, and then mark a value x as an outlier if x is outside the k0sigma interval [E − k0 · σ, E + k0 · σ] (for some pre-selected parameter k0). In real life, we often have only interval ranges [xi, xi] for the normal values x1,..., xn. In this case, we only have intervals of possible values for the bounds L def = E − k0 · σ and U def = E + k0 · σ. We can therefore identify outliers as values that are outside all k0-sigma intervals, i.e., values which are outside the interval [L, U]. In general, the problem of computing L and U is NP-hard; a polynomial-time algorithm is known for the case when the measurements are sufficiently accurate, i.e., � when “narrowed ” intervals � 1 + α2 1 + α2
Computing Population Variance and Entropy under Interval Uncertainty: Linear-Time Algorithms
, 2006
"... In statistical analysis of measurement results it is often necessary to compute the range [V, V] of the population variance V = 1 n · n∑ (xi − E) 2 where E = 1 n · n∑ xi when we only know the intervals i=1 [˜xi − ∆i, ˜xi + ∆i] of possible values of the xi. While V can be computed efficiently, the pr ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
In statistical analysis of measurement results it is often necessary to compute the range [V, V] of the population variance V = 1 n · n∑ (xi − E) 2 where E = 1 n · n∑ xi when we only know the intervals i=1 [˜xi − ∆i, ˜xi + ∆i] of possible values of the xi. While V can be computed efficiently, the problem of computing V is, in general, NP-hard. In our previous paper “Population Variance under Interval Uncertainty: A New Algorithm ” (Reliable Computing, 2006, Vol. 12, No. 4, pp. 273–280) we showed that in
Interval versions of statistical techniques, with applications to environmental analysis, bioinformatics, and privacy in statistical databases
- Journal of Computational and Applied Mathematics
"... Typical situation: we observe a pollution level x(t) in a lake at different moments of time t, and we would like to estimate standard statistical characteristics such as mean, variance, covariance, etc. In environmental measurements, we often know the values with interval uncertainty. Example: if we ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
Typical situation: we observe a pollution level x(t) in a lake at different moments of time t, and we would like to estimate standard statistical characteristics such as mean, variance, covariance, etc. In environmental measurements, we often know the values with interval uncertainty. Example: if we did not detect any pollution, the pollution value can be anywhere between 0 and the detection limit DL. Another example: to study the effect of a pollutant on the fish, we check on the fish daily; if a fish was alive on Day 5 but dead on Day 6, then the lifetime of this fish is ∈ [5, 6]. We must modify the existing statistical algorithms to process such interval data. In general, the resulting problems are NP-hard; we overview cases when feasible algorithms exist: e.g., when measurements are very accurate, or when all the measurements are done with one (or few) instruments. Other applications: • In bioinformatics, we must solve systems of linear equations in which 1 coefficients come from experts and are only known with interval uncertainty. • To maintain privacy, we only keep a salary range; we must perform statistical analysis based on such interval data.
Fast Algorithms for Computing Statistics under Interval Uncertainty, with Applications to Computer Science and to Electrical and Computer Engineering
, 2007
"... Computing statistics is important. In many engineering applications, we are interested in computing statistics. For example, in environmental analysis, we observe a pollution level x(t) in a lake at different moments of time t, and we would like to estimate standard statistical characteristics such ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Computing statistics is important. In many engineering applications, we are interested in computing statistics. For example, in environmental analysis, we observe a pollution level x(t) in a lake at different moments of time t, and we would like to estimate standard statistical characteristics such as mean, variance, autocorrelation, correlation with other measurements. For each of these characteristics C, there is an expression C(x1,..., xn) that enables us to provide an estimate for C based on the observed values x1,..., xn. For example: a reasonable statistic for estimating the mean value of a probability distribution is the population average E(x1,..., xn) = 1 n · (x1 +... + xn); a reasonable statistic for estimating the variance V is the population variance V (x1,..., xn) = 1 n · n∑
Interval Computations and Interval-Related Statistical Techniques: Tools for Estimating Uncertainty of the Results of Data Processing and Indirect Measurements
"... In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x, i.e., we only know that the measurement error is located on the interval [−∆, ∆]. The traditional engineering approach to such situations is to assume that ∆x is uniformly distributed on ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x, i.e., we only know that the measurement error is located on the interval [−∆, ∆]. The traditional engineering approach to such situations is to assume that ∆x is uniformly distributed on [−∆, ∆], and to use the corresponding statistical techniques. In some situations, however, this approach underestimates the error of indirect measurements. It is therefore desirable to directly process this interval uncertainty. Such “interval computations” methods have been developed since the 1950s. In this chapter, we provide a brief overview of related algorithms, results, and remaining open problems.
Statistical Data Processing under Interval Uncertainty: Algorithms and Computational Complexity
- Soft Methods for Integrated Uncertainty Modeling
, 2006
"... Why indirect measurements? In many real-life situations, we are interested in the value of a physical quantity y that is difficult or impossible to measure directly. Examples of such quantities are the distance to a star and the amount of oil in a given well. Since we cannot measure y directly, a na ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Why indirect measurements? In many real-life situations, we are interested in the value of a physical quantity y that is difficult or impossible to measure directly. Examples of such quantities are the distance to a star and the amount of oil in a given well. Since we cannot measure y directly, a natural idea is to measure y indirectly. Specifically, we find some easier-to-measure quantities x1,..., xn which are related to y by a known relation y = f(x1,..., xn); this relation may be a simple functional transformation, or complex algorithm (e.g., for the amount of oil, numerical solution to an inverse problem). Then, to estimate y, we first measure the values of the quantities x1,..., xn, and then we use the results �x1,..., �xn of these measurements to to compute an estimate �y for y as �y = f(�x1,..., �xn): �x1 �x2
Estimating Covariance for Privacy Case under Interval (and Fuzzy) Uncertainty
"... Abstract — One of the main objectives of collecting data in statistical databases (medical databases, census databases) is to find important correlations between different quantities. To enable researchers to looks for such correlations, we should allow them them to ask queries testing different com ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract — One of the main objectives of collecting data in statistical databases (medical databases, census databases) is to find important correlations between different quantities. To enable researchers to looks for such correlations, we should allow them them to ask queries testing different combinations of such quantities. However, when we receive answers to many such questions, we may inadvertently disclose information about individual patients, information that should be private. One way to preserve privacy in statistical databases is to store ranges instead of the original values. For example, instead of an exact age of a patient in a medical database, we only store the information that this age is, e.g., between 60 and 70. This idea solves the privacy problem, but it make statistical analysis more complex. Different possible values from the corresponding ranges lead, in general, to different values of the corresponding statistical characteristic; it is therefore desirable to find the range of all such values. It is known that for mean and variance, there exist feasible algorithms for computing such ranges. In this paper, we show that similar algorithms are possible for another important statistical characteristic – covariance, whose value is important in computing correlations. I.
1 Interval Approach to Preserving Privacy in Statistical Databases
"... Need for statistical databases. In many practical situations, it is very useful tocollect large amounts of data. For example, from the data that we collect during a census, we can extract alot of information about health, mortality, employment in different regions- for different age ranges, and for ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Need for statistical databases. In many practical situations, it is very useful tocollect large amounts of data. For example, from the data that we collect during a census, we can extract alot of information about health, mortality, employment in different regions- for different age ranges, and for people from different genders and of different ethnicgroups. By analyzing this statistics, we can reveal troubling spots and allocate (usually limited) resources so that the help goes first to social groups that needit most. Similarly, by gathering data about people's health in a large medicaldatabase, we can extract a lot of useful information on how the geographic location, age, and gender affect a person's health. Thus, we can make measures,which are aimed at improving public health, more focused. Finally, a large statistical database of purchases can help find out what peo-ple are looking for, make shopping easier for customers and at the same time, decrease the stores ' expenses related to storing unnecessary items. Need for privacy. Privacy is an important issue in the statistical analysis ofhuman-related data. For example, to check whether in a certain geographic area,

