Results 1  10
of
32
Mining highspeed data streams
, 2000
"... Categories and Subject ���������� � �¨�������������������������¦���¦����������¡¤�� ¡ � ¡����������������¦¡¤����§�£���� ..."
Abstract

Cited by 391 (10 self)
 Add to MetaCart
(Show Context)
Categories and Subject ���������� � �¨�������������������������¦���¦����������¡¤�� ¡ � ¡����������������¦¡¤����§�£����
Neural networks for classification: a survey
 and Cybernetics  Part C: Applications and Reviews
, 2000
"... Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability esti ..."
Abstract

Cited by 132 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. Our purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics. Index Terms—Bayesian classifier, classification, ensemble methods, feature variable selection, learning and generalization, misclassification costs, neural networks. I.
Multiple Comparisons in Induction Algorithms
 MACHINE LEARNING
, 1998
"... A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We ..."
Abstract

Cited by 94 (10 self)
 Add to MetaCart
(Show Context)
A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and crossvalidation.
Particle swarm model selection
 JMLR, Special Topic on Model Selection
, 2009
"... This paper proposes the application of particle swarm optimization (PSO) to the problem of full model selection, FMS, for classification tasks. FMS is defined as follows: given a pool of preprocessing methods, feature selection and learning algorithms, to select the combination of these that obtains ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
This paper proposes the application of particle swarm optimization (PSO) to the problem of full model selection, FMS, for classification tasks. FMS is defined as follows: given a pool of preprocessing methods, feature selection and learning algorithms, to select the combination of these that obtains the lowest classification error for a given data set; the task also includes the selection of hyperparameters for the considered methods. This problem generates a vast search space to be explored, well suited for stochastic optimization techniques. FMS can be applied to any classification domain as it does not require domain knowledge. Different model types and a variety of algorithms can be considered under this formulation. Furthermore, competitive yet simple models can be obtained with FMS. We adopt PSO for the search because of its proven performance in different problems and because of its simplicity, since neither expensive computations nor complicated operations are needed. Interestingly, the way the search is guided allows PSO to avoid overfitting to some extend. Experimental results on benchmark data sets give evidence that the proposed approach is very effective, despite its simplicity. Furthermore, results obtained in the framework of a model selection challenge show the competitiveness of the models selected with PSO, compared to models selected with other techniques that focus on a single algorithm and that use domain knowledge.
Automatic Bias Learning: An Inquiry into the Inductive Basis of Induction
, 1999
"... This thesis combines an epistemological concern about induction with a computational exploration of inductive mechanisms. It aims to investigate how inductive performance could be improved by using induction to select appropriate generalisation procedures. The thesis revolves around a metalearning ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
This thesis combines an epistemological concern about induction with a computational exploration of inductive mechanisms. It aims to investigate how inductive performance could be improved by using induction to select appropriate generalisation procedures. The thesis revolves around a metalearning system, called designed to investigate how inductive performances could be improved by using induction to select appropriate generalisation procedures. The performance of is discussed against the background of epistemological issues concerning induction, such as the role of theoretical vocabularies and the value of simplicity.
On the use of multiobjective evolutionary algorithms for survival analysis
, 2006
"... This paper proposes and evaluates a multiobjective evolutionary algorithm for survival analysis. One aim of survival analysis is the extraction of models from data that approximate lifetime/failure time distributions. These models can be used to estimate the time that an event takes to happen to an ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
This paper proposes and evaluates a multiobjective evolutionary algorithm for survival analysis. One aim of survival analysis is the extraction of models from data that approximate lifetime/failure time distributions. These models can be used to estimate the time that an event takes to happen to an object. To use of multiobjective evolutionary algorithms for survival analysis has several advantages. They can cope with feature interactions, noisy data, and are capable of optimising several objectives. This is important, as model extraction is a multiobjective problem. It has at least two objectives, which are the extraction of accurate and simple models. Accurate models are required to achieve good predictions. Simple models are important to prevent overfitting, improve the transparency of the models, and to save computational resources. Although there is a plethora of evolutionary approaches to extract models for classification and regression, the presented approach is one of the first applied to survival analysis. The approach is evaluated on several artificial datasets and one medical dataset. It is shown that the approach is capable of producing accurate models, even for problems that violate some of the assumptions made by classical approaches.
Feature selection filters based on the permutation test
 Pedreschi (Eds.), Machine Learning: ECML 2004, 15th European Conference on Machine Learning
, 2004
"... Abstract. We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the twoclass problems, the feature strength is inversely proportional to the pvalue of the null hypothesis that its classconditional densities, p(X  Y = 0) and p(X ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the twoclass problems, the feature strength is inversely proportional to the pvalue of the null hypothesis that its classconditional densities, p(X  Y = 0) and p(X  Y = 1), are identical. To estimate the pvalues, we use Fisher’s permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric KullbackLeibler distance, information gain, and chisquare statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the abovementioned filters and can be used effectively when sample size is relatively small and number of features relatively large. 1
Evolving distributed algorithms with genetic programming
 IEEE Transactions on Evolutionary Computation
, 2012
"... Abstract—In this article, we evaluate the applicability of Genetic Programming (GP) for the evolution of distributed algorithms. We carry out a largescale experimental study in which we tackle three wellknown problems from distributed computing with six different program representations. For this ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Abstract—In this article, we evaluate the applicability of Genetic Programming (GP) for the evolution of distributed algorithms. We carry out a largescale experimental study in which we tackle three wellknown problems from distributed computing with six different program representations. For this purpose, we first define a simulation environment in which phenomena such as asynchronous computation at changing speed and messages taking over each other, i.e., outoforder message delivery, occur with high probability. Second, we define extensions and adaptations of established GP approaches (such as treebased and Linear Genetic Programming) in order to make them suitable for representing distributed algorithms. Third, we introduce novel rulebased Genetic Programming methods designed especially with the characteristic difficulties of evolving algorithms (such as epistasis) in mind. Based on our extensive experimental study of these approaches, we conclude that GP is indeed a viable method for evolving nontrivial, deterministic, nonapproximative distributed algorithms. Furthermore, one of the two rulebased approaches is shown to exhibit superior performance in most of the tasks and thus can be considered as an interesting idea also for other problem domains.
A Machine Learning Approach to Performance Prediction of Total Order Broadcast Protocols (∗)
"... Abstract—Total Order Broadcast (TOB) is a fundamental building block at the core of a number of strongly consistent, faulttolerant replication schemes. While it is widely known that the performance of existing TOB algorithms varies greatly depending on the workload and deployment scenarios, the pro ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
(Show Context)
Abstract—Total Order Broadcast (TOB) is a fundamental building block at the core of a number of strongly consistent, faulttolerant replication schemes. While it is widely known that the performance of existing TOB algorithms varies greatly depending on the workload and deployment scenarios, the problem of how to forecast their performance in realistic settings is, at current date, still largely unexplored. In this paper we address this problem by exploring the possibility of leveraging on machine learning techniques for building, in a fully decentralized fashion, performance models of TOB protocols. Based on an extensive experimental study considering heterogeneous workloads and multiple TOB protocols, we assess the accuracy and efficiency of alternative machine learning methods including neural networks, support vector machines, and decision treebased regression models. We propose two heuristics for the feature selection phase that allow to reduce its execution time up to two orders of magnitude incurring in a very limited loss of prediction accuracy.