Results 1  10
of
29
Decision Combination in Multiple Classifier Systems
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 16. NO. I. JANUARY 1994
, 1994
"... A multiple classifier system is a powerful solution to difficult pattern recognition problems involving large class sets and noisy input because it allows simultaneous use of arbitrary feature descriptors and classification procedures. Decisions by the classifiers can be represented as rankings of ..."
Abstract

Cited by 310 (5 self)
 Add to MetaCart
A multiple classifier system is a powerful solution to difficult pattern recognition problems involving large class sets and noisy input because it allows simultaneous use of arbitrary feature descriptors and classification procedures. Decisions by the classifiers can be represented as rankings of classes so that they are comparable across different types of classifiers and different instances of a problem. The rankings can be combined by methods that either reduce or rerank a given set of classes. An intersection method and a union method are proposed for class set reduction. Three methods based on the highest rank, the Borda count, and logistic regression are proposed for class set reranking. These methods have been tested in applications on degraded machineprinted characters and words from large lexicons, resulting in substantial improvement in overall correctness.
Distributional Information: A Powerful Cue for Acquiring Syntactic Categories
 COGNITIVE SCIENCE
, 1998
"... Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, p ..."
Abstract

Cited by 130 (5 self)
 Add to MetaCart
Many theorists have dismissed a priori the idea that distributional information could play a significant role in syntactic category acquisition. We demonstrate empirically that such information provides a powerful cue to syntactic category membership, which can be exploited by a variety of simple, psychologically plausible mechanisms. We present a range of results using a large corpus of childdirected speech and explore their psychological implications. While our results show that a considerable amount of information concerning the syntactic categories can be obtained from distributional information alone, we stress that many other sources of information may also be potential contributors to the identification of syntactic classes.
The unicorn, the normal curve, and other improbable creatures
 Psychological Bulletin
, 1989
"... An investigation of the distributional characteristics of 440 largesample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
An investigation of the distributional characteristics of 440 largesample achievement and psychometric measures found all to be significantly nonnormal at the alpha.01 significance level. Several classes of contamination were found, including tail weights from the uniform to the double exponential, exponentiallevel asymmetry, severe digit preferences, multimodalities, and modes external to the mean/median interval. Thus, the underlying tenets of normalityassuming statistics appear fallacious for these commonly used types of data. However, findings here also fail to support the types of distributions used in most prior robustness research suggesting the failure of such statistics under nonnormal conditions. A reevaluation of the statistical robustness literature appears appropriate in light of these findings. 1 During recent years a considerable literature devoted to robust statistics has appeared. This research reflects a growing concern among statisticians regarding the robustness, or insensitivity, of parametric statistics to violations of their underlying assumptions. Recent findings suggest that the most commonly used of these statistics exhibit varying degrees of nonrobustness to certain violations of the normality assumption. Although the importance of such findings is underscored by numerous empirical studies documenting nonnormality in a variety of fields, a startling lack of such evidence exists for achievement
A Theory of Multiple Classifier Systems And Its Application to Visual Word Recognition
, 1992
"... Despite the success of many pattern recognition systems in constrained domains, problems that involve noisy input and many classes remain difficult. A promising direction is to use several classifiers simultaneously, such that they can complement each other in correctness. This thesis is concerned w ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
Despite the success of many pattern recognition systems in constrained domains, problems that involve noisy input and many classes remain difficult. A promising direction is to use several classifiers simultaneously, such that they can complement each other in correctness. This thesis is concerned with decision combination in a multiple classifier system that is critical to its success. A multiple classifier system consists of a set of classifiers and a decision combination function. It is a preferred solution to a complex recognition problem because it allows simultaneous use of feature descriptors of many types, corresponding measures of similarity, and many classification procedures. It also allows dynamic selection, so that classifiers adapted to inputs of a particular type may be applied only when those inputs are encountered. Decisions by the classifiers are represented as rankings of the class set that are derivable from the results of feature matching. Rank scores contain more ...
Machine learning methods for predicting failures in hard drives: A multipleinstance application
 Journal of Machine Learning research
, 2005
"... We compare machine learning methods applied to a difficult realworld problem: predicting computer harddrive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametricallydistributed data. We develop a ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
We compare machine learning methods applied to a difficult realworld problem: predicting computer harddrive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametricallydistributed data. We develop a new algorithm based on the multipleinstance learning framework and the naive Bayesian classifier (miNB) which is specifically designed for the low falsealarm case, and is shown to have promising performance. Other methods compared are support vector machines (SVMs), unsupervised clustering, and nonparametric statistical tests (ranksum and reverse arrangements). The failureprediction performance of the SVM, ranksum and miNB algorithm is considerably better than the threshold method currently implemented in drives, while maintaining low false alarm rates. Our results suggest that nonparametric statistical tests should be considered for learning problems involving detecting rare events in time series data. An appendix details the calculation of ranksum significance probabilities in the case of discrete, tied observations, and we give new recommendations about when the exact calculation should be used instead of the commonlyused normal approximation. These normal approximations may be particularly inaccurate for rare event problems like hard drive failures.
Sieved empirical likelihood ratio tests for nonparametric functions
 Ann. Statist
, 2004
"... Generalized likelihood ratio statistics have been proposed in Fan, Zhang and Zhang [Ann. Statist. 29 (2001) 153–193] as a generally applicable method for testing nonparametric hypotheses about nonparametric functions. The likelihood ratio statistics are constructed based on the assumption that the d ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Generalized likelihood ratio statistics have been proposed in Fan, Zhang and Zhang [Ann. Statist. 29 (2001) 153–193] as a generally applicable method for testing nonparametric hypotheses about nonparametric functions. The likelihood ratio statistics are constructed based on the assumption that the distributions of stochastic errors are in a certain parametric family. We extend their work to the case where the error distribution is completely unspecified via newly proposed sieve empirical likelihood ratio (SELR) tests. The approach is also applied to test conditional estimating equations on the distributions of stochastic errors. It is shown that the proposed SELR statistics follow asymptotically rescaled χ 2distributions, with the scale constants and the degrees of freedom being independent of the nuisance parameters. This demonstrates that the Wilks phenomenon observed in Fan, Zhang and Zhang [Ann. Statist. 29 (2001) 153–193] continues to hold under more relaxed models and a larger class of techniques. The asymptotic power of the proposed test is also derived, which achieves the optimal rate for nonparametric hypothesis testing. The proposed approach has two advantages over the generalized likelihood ratio method: it requires one only to specify some conditional estimating equations rather than the entire distribution of the stochastic error, and the procedure adapts automatically to the unknown error distribution including heteroscedasticity. A simulation study is conducted to evaluate our proposed procedure empirically.
The Estimating Function Bootstrap
 SUBMITTED. FISHER LECTURE OF THE 1999 JOINT STATISTICAL MEETING
, 1999
"... The authors propose a bootstrap procedure which estimates the distribution of an estimating function by resampling its terms using bootstrap techniques. Studentized versions of this socalled estimating function (EF) bootstrap yield methods which are invariant under reparametrizations. This approach ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The authors propose a bootstrap procedure which estimates the distribution of an estimating function by resampling its terms using bootstrap techniques. Studentized versions of this socalled estimating function (EF) bootstrap yield methods which are invariant under reparametrizations. This approach often has substantial advantage, both in computation and accuracy, over more traditional bootstrap methods and it applies to a wide class of practical problems where the data are independent but not necessarily identically distributed. The methods allow for simultaneous estimation of vector parameters and their components. The authors use simulations to compare the EF bootstrap with competing methods in several examples including the common means problem and nonlinear regression. They also prove symptotic results showing that the studentized EF bootstrap yields higher order approximations for the whole vector parameter in a wide class of problems.
More than a Dozen Alternative Ways of Spelling Gini
 Research in Economic Inequality
, 1998
"... by ..."
AN EMPIRICAL ANALYSIS OF THE CANADIAN BUDGET PROCESS
"... Le CIRANO est une corporation privée à but non lucratif constituée en vertu de la Loi des compagnies du Québec. Le financement de son infrastructure et de ses activités de recherche provient des cotisations de ses organisationsmembres, d’une subvention d’infrastructure du ministère de l’Industrie, ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Le CIRANO est une corporation privée à but non lucratif constituée en vertu de la Loi des compagnies du Québec. Le financement de son infrastructure et de ses activités de recherche provient des cotisations de ses organisationsmembres, d’une subvention d’infrastructure du ministère de l’Industrie, du Commerce, de la Science et de la Technologie, de même que des subventions et mandats obtenus par ses équipes de recherche. La Série Scientifique est la réalisation d’une des missions que s’est données le CIRANO, soit de développer l’analyse scientifique des organisations et des comportements stratégiques. CIRANO is a private nonprofit organization incorporated under the Québec Companies Act. Its infrastructure and research activities are funded through fees paid by member organizations, an infrastructure grant from the Ministère de l’Industrie, du Commerce, de la Science et de la Technologie, and grants and research mandates obtained by its research teams. The Scientific Series fulfils one of the missions of CIRANO: to develop the scientific analysis of organizations and strategic behaviour. Les organisationspartenaires / The Partner Organizations •Ministère de l’Industrie, du Commerce, de la Science et de la Technologie.
Describing Multivariate Distributions with Nonlinear Variation Using Data Depth 1
"... Growth curves of plants and animals, human speech, gene expression signals, and medical images or 3dimensional shapes of cancer tumors, are all real life examples of high dimensional multivariate data referred to as functional data [80, 81, 26, 33]. Variability in these data sets could be represent ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Growth curves of plants and animals, human speech, gene expression signals, and medical images or 3dimensional shapes of cancer tumors, are all real life examples of high dimensional multivariate data referred to as functional data [80, 81, 26, 33]. Variability in these data sets could be representative of