Results 1 
8 of
8
Segment and combine approach for biological sequence classification
 In: Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology
, 2005
"... Abstract — This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample o ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Abstract — This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample of training subsequences. This generic approach is combined with decision tree based ensemble methods, scalable both with respect to sample size and vocabulary size. The method is applied to three families of problems: DNA sequence recognition, splice junction detection, and gene regulon prediction. With respect to standard approaches based on ngrams, it appears competitive in terms of accuracy, flexibility, and scalability. The paper also highlights the possibility to exploit the resulting models to identify interpretable patterns specific of a given class of biological sequences. I.
Segment and combine: a generic approach for supervised learning of invariant classifiers from topologically structured data
 in Machine Learning Conference of Belgium and The Netherlands (Benelearn
, 2006
"... A generic method for supervised classification of structured objects is presented. The approach induces a classifier by (i) deriving a surrogate dataset from a preclassified dataset of structured objects, by segmenting them into pieces, (ii) learning a model relating pieces to objectclasses, (iii) ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
A generic method for supervised classification of structured objects is presented. The approach induces a classifier by (i) deriving a surrogate dataset from a preclassified dataset of structured objects, by segmenting them into pieces, (ii) learning a model relating pieces to objectclasses, (iii) classifying structured objects by combining predictions made for their pieces. The segmentation allows to exploit local information and can be adapted to inject invariances into the resulting classifier. The framework is illustrated on practical sequence, timeseries and image classification problems. 1.
M.: The influence of global constraints on similarity measures for timeseries databases
 KnowlBased Syst
, 2014
"... Abstract. A time series consists of a series of values or events obtained over repeated measurements in time. Analysis of time series represents an important tool in many application areas, such as stockmarket analysis, process and quality control, observation of natural phenomena, medical diagnosi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. A time series consists of a series of values or events obtained over repeated measurements in time. Analysis of time series represents an important tool in many application areas, such as stockmarket analysis, process and quality control, observation of natural phenomena, medical diagnosis, etc. A vital component in many types of timeseries analyses is the choice of an appropriate distance/similarity measure. Numerous measures have been proposed to date, with the most successful ones based on dynamic programming. Being of quadratic time complexity, however, global constraints are often employed to limit the search space in the matrix during the dynamic programming procedure, in order to speed up computation. Furthermore, it has been reported that such constrained measures can also achieve better accuracy. In this paper, we investigate four representative timeseries distance/similarity measures based on dynamic programming, namely Dynamic Time Warping (DTW), Longest Common Subsequence (LCS), Edit distance with Real Penalty (ERP) and Edit Distance on Real sequence (EDR), and the effects of global constraints on them when applied via the SakoeChiba band. To better understand the influence of global constraints and provide deeper insight into their advantages and limitations we explore the change of the 1nearest neighbor graph with respect to the change of the constraint size. Also, we examine how these changes reflect on the classes of the nearest neighbors of time series, and evaluate the performance of the 1nearest neighbor classifier with respect to different distance measures and constraints. Since we determine that
Automatic learning for advanced sensing, monitoring and control of electric power systems
 In Proceedings of the Second Carnegie Mellon Conference in Electric Power Systems
, 2006
"... Abstract The paper considers the possible uses of automatic learning for improving power system performance by software methodologies. Automatic learning per se is first reviewed and recent developements of the field are highlighted. Then the authors ’ views of its main actual or potential applicat ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract The paper considers the possible uses of automatic learning for improving power system performance by software methodologies. Automatic learning per se is first reviewed and recent developements of the field are highlighted. Then the authors ’ views of its main actual or potential applications related to power system operation and control are described, and in each application present status and needs for further developments are discussed.
Time Series Classification: Decision Forests and SVM on Interval and DTW Features
"... This paper describes the methods used for our submission to the KDD 2007 Challenge on Time Series Classification. For each dataset we selected from a pool of methods (individual classifiers or classifier ensembles) using cross validation (CV). Three types of classifiers were considered: nearest neig ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper describes the methods used for our submission to the KDD 2007 Challenge on Time Series Classification. For each dataset we selected from a pool of methods (individual classifiers or classifier ensembles) using cross validation (CV). Three types of classifiers were considered: nearest neighbour (using DTW), support vector machines (linear and perceptron kernel) and decision forests (boosting, random forest, rotation forest and random oracles). SVM and decision forests used extracted features of two types: similaritybased and intervalbased. Two feature selection approaches were applied: FCBF or SVMRFE. Where the minimum CV errors of several classifiers tied, the labels were assigned through majority vote. We report results with both the practice and the contest data. 1.
Decision Trees for Functional Variables
"... Classification problems with functionally structured input variables arise naturally in many applications. In a clinical domain, for example, input variables could include a time series of blood pressure measurements. In a financial setting, different time series of stock returns might serve as pred ..."
Abstract
 Add to MetaCart
Classification problems with functionally structured input variables arise naturally in many applications. In a clinical domain, for example, input variables could include a time series of blood pressure measurements. In a financial setting, different time series of stock returns might serve as predictors. In an archeological application, the 2D profile of an artifact may serve as a key input variable. In such domains, accuracy of the classifier is not the only reasonable goal to strive for; classifiers that provide easily interpretable results are also of value. In this work, we present an intuitive scheme for extending decision trees to handle functional input variables. Our results show that such decision trees are both accurate and readily interpretable. immunosorbent assay (IgG), various interleukin measures (IL2, IL4, IL6), and a socalled “stimulation index” (SI), to name a few, with the number of measurements varying somewhat from animal to animal. The goal of the study is to understand the predictive value of the various assays with respect to survival. 8 6
Mach Learn (): DOI 10.1007/s1099400662261 Extremely randomized trees
, 2006
"... Abstract This paper proposes a new treebased ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cutpoint choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structur ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract This paper proposes a new treebased ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cutpoint choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the ExtraTrees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.
unknown title
"... Ensembles of extremely randomized trees and some generic applications Abstract In this paper we present a new treebased ensemble method called “ExtraTrees”. This algorithm averages predictions of trees obtained by partitioning the inputspace with randomly generated splits, leading to significant ..."
Abstract
 Add to MetaCart
(Show Context)
Ensembles of extremely randomized trees and some generic applications Abstract In this paper we present a new treebased ensemble method called “ExtraTrees”. This algorithm averages predictions of trees obtained by partitioning the inputspace with randomly generated splits, leading to significant improvements of precision, and various algorithmic advantages, in particular reduced computational complexity and scalability. We also discuss two generic applications of this algorithm, namely for timeseries classification and for the automatic inference of nearoptimal sequential decision policies from experimental data.