Results 1 - 10
of
52
Foundations of Statistical Processing of Set-valued Data: Towards Efficient Algorithms
- Proceedings of the Fifth International Conference on Intelligent Technologies InTech’04
, 2004
"... Abstract — Due to measurement uncertainty, often, instead of the actual values xi of the measured quantities, we only know the intervals xi = [�xi − ∆i, �xi + ∆i], where �xi is the measured value and ∆i is the upper bound on the measurement error (provided, e.g., by the manufacturer of the measuring ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Abstract — Due to measurement uncertainty, often, instead of the actual values xi of the measured quantities, we only know the intervals xi = [�xi − ∆i, �xi + ∆i], where �xi is the measured value and ∆i is the upper bound on the measurement error (provided, e.g., by the manufacturer of the measuring instrument). These intervals can be viewed as random intervals, i.e., as samples from the interval-valued random variable. In such situations, instead of the exact value of a sample statistic such as covariance Cx,y, we can only have an interval Cx,y of possible values of this statistic. In this paper, we extend the foundations of traditional statistics to statistics of such set-valued data, and describe how this foundation can lead to efficient algorithms for computing the corresponding set-valued statistics. I. STATISTICAL ESTIMATION:
Hybrid metaheuristics for the vehicle routing problem with stochastic demands
- JOURNAL OF MATHEMATICAL
, 2006
"... ..."
An overview of similarity measures for clustering XML documents
- Chapter in Athena Vakali and George Pallis
, 2006
"... The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together do ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together documents with similar structures has interesting applications in the context of information extraction, of heterogeneous data integration, of personalized content delivery, of access control definition, of web site structural analysis, of comparison of RNA secondary structures. Many approaches have been proposed for evaluating the structural and content similarity between tree-based and vector-based representations of XML documents. Link-based similarity approaches developed for Web data clustering have been adapted for XML documents. This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering.
Trade-Off Between Sample Size and Accuracy: Case of Measurements under Interval Uncertainty
, 2009
"... In many practical situations, we are not satisfied with the accuracy of the existing measurements. There are two possible ways to improve the measurement accuracy: • first, instead of a single measurement, we can make repeated measurements; the additional information coming from these additional mea ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In many practical situations, we are not satisfied with the accuracy of the existing measurements. There are two possible ways to improve the measurement accuracy: • first, instead of a single measurement, we can make repeated measurements; the additional information coming from these additional measurements can improve the accuracy of the result of this series of measurements; • second, we can replace the current measuring instrument with a more accurate one; correspondingly, we can use a more accurate (and more expensive) measurement procedure provided by a measuring lab – e.g., a procedure that includes the use of a higher quality reagent. In general, we can combine these two ways, and make repeated measurements with a more accurate measuring instrument. What is the appropriate trade-off between sample size and accuracy? This is the general problem that we address in this paper.
Measures of Deviation (and Dependence) for Heavy-Tailed Distributions and their Estimation under Interval and Fuzzy Uncertainty
"... techniques are based on the assumption that the random variables are normally distributed. For such distributions, a natural characteristic of the “average ” value is the mean, and a natural characteristic of the deviation from the average is the variance. However, in many practical situations, e.g. ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
techniques are based on the assumption that the random variables are normally distributed. For such distributions, a natural characteristic of the “average ” value is the mean, and a natural characteristic of the deviation from the average is the variance. However, in many practical situations, e.g., in economics and finance, we encounter probability distributions for which the variance is infinite; such distributions are called heavy-tailed. For such distributions, we describe which characteristics can be used to describe the average and the deviation from the average, and how to estimate these characteristics under interval and fuzzy uncertainty. We also discuss what are the reasonable analogues of correlation for such heavy-tailed distributions.
Interval Computations and Interval-Related Statistical Techniques: Tools for Estimating Uncertainty of the Results of Data Processing and Indirect Measurements
"... In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x, i.e., we only know that the measurement error is located on the interval [−∆, ∆]. The traditional engineering approach to such situations is to assume that ∆x is uniformly distributed on ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x, i.e., we only know that the measurement error is located on the interval [−∆, ∆]. The traditional engineering approach to such situations is to assume that ∆x is uniformly distributed on [−∆, ∆], and to use the corresponding statistical techniques. In some situations, however, this approach underestimates the error of indirect measurements. It is therefore desirable to directly process this interval uncertainty. Such “interval computations” methods have been developed since the 1950s. In this chapter, we provide a brief overview of related algorithms, results, and remaining open problems.
Proactive Communication in Multi-agent Teamwork
, 2005
"... Sharing common goals and acting cooperatively are critical issues in multi-agent teamwork. Traditionally, agents cooperate with each other by inferring others' actions implicitly or explicitly, based on established norms for behavior or on knowledge about the preferences or interests of others. Thi ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Sharing common goals and acting cooperatively are critical issues in multi-agent teamwork. Traditionally, agents cooperate with each other by inferring others' actions implicitly or explicitly, based on established norms for behavior or on knowledge about the preferences or interests of others. This kind of cooperation either requires that agents share a large amount of knowledge about the teamwork, which is unrealistic in a distributed team, or requires high-frequency message exchange, which weakens teamwork efficiency, especially for a team that may involve human members. In this research, we designed and developed a new approach called Proactive Communication, which helps to produce realistic behavior and interactions for multi-agent teamwork. We emphasize that multi-agent teamwork is governed by the same principles that underlie human cooperation. Psychological studies of human teamwork have shown that members of an effective team often anticipate the needs of other members and choose to assist them proactively. Human team members are also naturally capable of observing the environment and others so they can establish certain
Data pre-processing in liquid chromatography-mass spectrometrybased proteomics
- Bioinformatics
, 2005
"... Motivation: In a liquid chromatography-mass spectrometry (LC-MS) based expressional proteomics, multiple samples from different groups are analyzed in parallel. It is necessary to develop a data mining system to perform peak quantification, peak alignment, and data quality assurance. Results: We hav ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Motivation: In a liquid chromatography-mass spectrometry (LC-MS) based expressional proteomics, multiple samples from different groups are analyzed in parallel. It is necessary to develop a data mining system to perform peak quantification, peak alignment, and data quality assurance. Results: We have developed an algorithm for spectrum deconvolution. A two-step alignment algorithm is proposed for recognizing peaks generated by the same peptide but detected in different samples. The quality of LC-MS data is evaluated using statistical tests and alignment quality tests. Availability: Xalign software is available upon request from the author. Contact:
Statistical Hypothesis Testing Under Interval Uncertainty: An Overview
- International Journal of Intelligent Technologies and Applied Statistics
"... An important part of statistical data analysis is hypothesis testing. For example, we know the probability distribution of the characteristics corresponding to a certain disease, we have the values of the characteristics describing a patient, and we must make a conclusion whether this patient has th ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
An important part of statistical data analysis is hypothesis testing. For example, we know the probability distribution of the characteristics corresponding to a certain disease, we have the values of the characteristics describing a patient, and we must make a conclusion whether this patient has this disease. Traditional hypothesis testing techniques are based on the assumption that we know the exact values of the characteristic(s) x describing a patient. In practice, the value ˜x comes from measurements and is, thus, only known with uncertainty: ˜x ̸ = x. In many practical situations, we only know the upper bound ∆ on the (absolute value of the) measurement error ∆x def = ˜x−x. In such situation, after the measurement, the only information that we have about the (unknown) value x of this characteristic is that x belongs to the interval [˜x − ∆, ˜x + ∆]. In this paper, we overview different approaches on how to test a hypothesis under such interval uncertainty. This overview is based on a general approach to decision making under interval uncertainty, approach developed by the 2007 Nobelist L. Hurwicz. 1 1 Formulation of the Problem
Reducing Over-Conservative Expert Failure Rate Estimates in the Presence of Limited Data: A New Probabilistic/Fuzzy Approach
"... Abstract—Unique highly reliable components are typical for aerospace industry. For such components, due to their high reliability and uniqueness, we do not have enough empirical data to make statistically reliable estimates about their failure rate. To overcome this limitation, the empirical data is ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—Unique highly reliable components are typical for aerospace industry. For such components, due to their high reliability and uniqueness, we do not have enough empirical data to make statistically reliable estimates about their failure rate. To overcome this limitation, the empirical data is usually supplemented with expert estimates for the failure rate. The problem is that experts tend to be – especially in aerospace industry – over-cautious, over-conservative; their estimates for the failure rate are usually much higher than the actual observed failure rate. In this paper, we provide a new fuzzy-related statistically justified approach for reducing this over-estimation. I. FORMULATION OF THE PROBLEM Reliability: how it is usually described and evaluated. Failures are ubiquitous. As a result, reliability analysis is an important part of engineering design.

