Results 1  10
of
11
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
, 1998
"... This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I err ..."
Abstract

Cited by 531 (8 self)
 Add to MetaCart
This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paireddifferences t test based on taking several random traintest splits. A third test, a paireddifferences t test based on 10fold crossvalidation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar’s test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold crossvalidation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The crossvalidated t test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar’s test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, McNemar’s test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5×2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.
Abstract Evaluation of SupraThreshold Perceptual Metrics for 3D Models
"... Measures of dissimilarity of 3D models are necessary in a wide range of applications such as geometry compression, simplification, and 3D model retrieval. In many cases a metric that models perceptual dissimilarity is desirable. Recently, metrics for 3D models have been evaluated in that respect usi ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Measures of dissimilarity of 3D models are necessary in a wide range of applications such as geometry compression, simplification, and 3D model retrieval. In many cases a metric that models perceptual dissimilarity is desirable. Recently, metrics for 3D models have been evaluated in that respect using concepts such as just noticeable differences, rankings, and others. We propose a simple experimental setup for evaluating suprathreshold perception of 3D models in which users select models at equal perceptual distance to given pairs of models. We discuss the advantages of our approach and report the results of a field study comparing six objective distance measures applied to palettes of simplified reference models. We found that the objective measures are biased, and generally imagebased metrics perform better than metrics based on the original 3D geometry.
Random Scale Effects
"... Data sets with random location effects can have random scale effects as well and often do. This paper describes an approach to modeling data with both effects. We treat the case where there is a normal error distribution, but the methodology could be extended to other cases. The scale effects are mo ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Data sets with random location effects can have random scale effects as well and often do. This paper describes an approach to modeling data with both effects. We treat the case where there is a normal error distribution, but the methodology could be extended to other cases. The scale effects are modeled by mixing the normal error variable with a random scale variable. Tools are provided for model building: (1) methods for identifying the location distributions and the scale distributions; (2) methods for checking specifications that are fundamental to the model such as independence of the location and scale effects and the normality of the errors.
Prediction of missing values in microarray and use of mixed models to evaluate the predictors
 Stat. Appl. Genet. Mol. Biol
, 2005
"... be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, bepress, which has been given certain exclusive rights by the
Design Of Experiments In The Möbius Modeling Framework
 Master’s thesis, U. of Illinois at UrbanaChampaign
, 2002
"... Performance and dependability modeling in system design is a complicated process. Model solutions often require large amounts of computer time. As the models become more complex, the solution time required for the performance and dependability measures increases. Models that are more complex also co ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Performance and dependability modeling in system design is a complicated process. Model solutions often require large amounts of computer time. As the models become more complex, the solution time required for the performance and dependability measures increases. Models that are more complex also contain more parameters that can vary and affect the reward measures. The techniques of Design of Experiments (DOE) can increase the efficiency of the modeling process and alleviate some of the problems introduced by sophisticated system models.
On CASE tool usage at Nokia
"... We present the results of a research work targeted to understanding CASE tools usage in Nokia. By means of a survey questionnaire, we collected data aimed to identify what features are most useful and best implemented in current CASE tools according to senior developers and managers. With the aid of ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present the results of a research work targeted to understanding CASE tools usage in Nokia. By means of a survey questionnaire, we collected data aimed to identify what features are most useful and best implemented in current CASE tools according to senior developers and managers. With the aid of both descriptive and inferential statistical data analysis methods, we found out that the features that are rated most useful belong to the graphical editing, version management and document generation categories. The statistical methods we use allow us to extend the results to the whole population with a certain degree of confidence. The analysis of the data seems to give the indication that there is a general level of dissatisfaction on the quality of currently available CASE tools. Also, there is evidence that some of the most advanced features (reverse engineering, code generation) are not deemed as useful as others. Further research should focus on extending the survey to other types of industries, and attempt generalization of the results. This may constitute precious feedback for the software tools industry in order to develop products that correspond more to industry needs.
An Empirical Study of Tracing Techniques
"... Tracing is a dynamic analysis technique to continuously capture events of interest on a running program. The occurrence of a statement, the invocation of a function, and the trigger of a signal are examples of traced events. Software engineers employ traces to accomplish various tasks, ranging from ..."
Abstract
 Add to MetaCart
Tracing is a dynamic analysis technique to continuously capture events of interest on a running program. The occurrence of a statement, the invocation of a function, and the trigger of a signal are examples of traced events. Software engineers employ traces to accomplish various tasks, ranging from performance monitoring to failure analysis. Despite its capabilities, tracing can negatively impact the performance and general behavior of an application. In order to minimize that impact, traces are normally buffered and transferred to (slower) permanent storage at specific intervals. This scenario presents a delicate balance. Increased buffering can minimize the impact on the target program, but it increases the risk of losing valuable collected data in the event of a failure. Frequent disk transfers can ensure traced data integrity, but it risks a high impact on the target program. We conducted an experiment involving six tracing schemes and various buffer sizes to address these tradeoffs. Our results highlight opportunities for tailored tracing schemes that would benefit failure analysis.
Shortterm fitness benefits of physiological integration in the clonal herb Hydrocotyle peduncularis
"... We test whether physiological integration enhances the shortterm fitness of the clonal herb Hydrocotyle peduncularis (Apiaceae, R. Brown ex A. Richards) subjected to spatial variation in water availability. Our measures of fitness and costs and benefits are based on the relative growth rate of frag ..."
Abstract
 Add to MetaCart
We test whether physiological integration enhances the shortterm fitness of the clonal herb Hydrocotyle peduncularis (Apiaceae, R. Brown ex A. Richards) subjected to spatial variation in water availability. Our measures of fitness and costs and benefits are based on the relative growth rate of fragmented genets. Physiological integration over a gradient in soil moisture resulted in a highly significant net benefit to genet growth of 0.015 g g –1 day –1. This net benefit represents a significant enhancement of the average fitness of fragmented genets spanning the moisture gradient relative to the average of those growing in homogeneous moist or dry conditions. Sections of genet fragments growing in dry conditions in spatially heterogeneous treatments had significantly higher growth than the sections they were connected to that were growing in moist conditions. Within fragments, older (parent) sections growing in moist conditions experienced significant costs from connection to younger (offspring) sections growing in dry conditions. In contrast, offspring sections with ample water did not experience any costs when connected to parent sections growing in dry conditions. However, the net benefit of physiological integration was similar for parent and offspring sections, suggesting that parent and offspring sections contributed equally to the net benefit of physiological integration to genet growth and shortterm fitness.
Food and Drug Administration
, 2002
"... We develop a rankbased analysis of clinical trials with unbalanced repeated measures data. We assume that the errors within each subject are exchangeable random variables. This rankbased inference is valid when the unbalanced data are missing either completely at random or by design. A drop in dis ..."
Abstract
 Add to MetaCart
We develop a rankbased analysis of clinical trials with unbalanced repeated measures data. We assume that the errors within each subject are exchangeable random variables. This rankbased inference is valid when the unbalanced data are missing either completely at random or by design. A drop in dispersion test is developed for general linear hypotheses. A numerical example is given to illustrate the procedure.
Immunity Commentary t Testing the Immune System
"... Amid the flurry of grant writing and experimentation, statistical analysis sometimes gets less attention than it requires. Here, we describe fully the considerations that should go into the employment of the statistical twosample t test. The biological significance of immunological data is paramoun ..."
Abstract
 Add to MetaCart
Amid the flurry of grant writing and experimentation, statistical analysis sometimes gets less attention than it requires. Here, we describe fully the considerations that should go into the employment of the statistical twosample t test. The biological significance of immunological data is paramount in their interpretation. Nevertheless, immunological data are variable, and because statistical science aims to make sense of variability, statistical methods are not superfluous to immunology. The most informative interpretation of experimental data emerges from a combination of biological and statistical insight. Proper statistical analysis of results can only be achieved if the researcher has