Results 1 - 10
of
15
Fast Time Series Classification Using Numerosity Reduction
- In ICML’06
, 2006
"... Many algorithms have been proposed for the problem of time series classification. However, it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Many algorithms have been proposed for the problem of time series classification. However, it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime applications. One way to mitigate this problem is to speed up the DTW calculations. Nonetheless, there is a limit to how much this can help. In this work, we propose an additional technique, numerosity reduction, to speed up one-nearestneighbor DTW. While the idea of numerosity reduction for nearest-neighbor classifiers has a long history, we show here that we can leverage off an original observation about the relationship between dataset size and DTW constraints to produce an extremely compact dataset with little or no loss in accuracy. We test our ideas with a comprehensive set of experiments, and show that it can efficiently produce extremely fast accurate classifiers. 1.
Anytime classification using the nearest neighbor algorithm with applications to stream mining
- IEEE International Conference on Data Mining (ICDM
, 2006
"... For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have from milliseconds to minutes to return a class prediction. For such problems an anytime algorithm ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
For many real world problems we must perform classification under widely varying amounts of computational resources. For example, if asked to classify an instance taken from a bursty stream, we may have from milliseconds to minutes to return a class prediction. For such problems an anytime algorithm may be especially useful. In this work we show how we can convert the ubiquitous nearest neighbor classifier into an anytime algorithm that can produce an instant classification, or if given the luxury of additional time, can utilize the extra time to increase classification accuracy. We demonstrate the utility of our approach with a comprehensive set of experiments on data from diverse domains.
Content-based Retrieval of Medical Images by Combining Global Features
"... Abstract. A combination of several classifiers using global features for the content description of medical images is proposed. Beside well known texture histogram features, downscaled representations of the original images are used, which preserve spatial information and utilize distance measures w ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. A combination of several classifiers using global features for the content description of medical images is proposed. Beside well known texture histogram features, downscaled representations of the original images are used, which preserve spatial information and utilize distance measures which are robust with regard to common variations in radiation dose, translation, and local deformation. These features were evaluated for the annotation task and the retrieval task in ImageCLEF 2005 without using additional textual information or query refinement mechanisms. For the annotation task, a categorization rate of 86.7 % was obtained, which ranks second among all submissions. When applied in the retrieval task, the image content descriptors yielded a mean average precision (MAP) of 0.0751, which is rank 14 of 28 submitted runs. As the image deformation model is not fit for interactive retrieval tasks, two mechanisms are evaluated with regard to the trade-off between loss of accuracy and speed increase: hierarchical filtering and prototype selection. 1
Interactive search by direct manipulation of dissimilarity space
- IEEE Transactions on Multimedia. VOL. 9, NO
, 2007
"... Abstract—In this paper, we argue to learn dissimilarity for interactive search in content based image retrieval. In literature, dissimilarity is often learned via the feature space by feature selection, feature weighting or by adjusting the parameters of a function of the features. Other than existi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—In this paper, we argue to learn dissimilarity for interactive search in content based image retrieval. In literature, dissimilarity is often learned via the feature space by feature selection, feature weighting or by adjusting the parameters of a function of the features. Other than existing techniques, we use feedback to adjust the dissimilarity space independent of feature space. This has the great advantage that it manipulates dissimilarity directly. To create a dissimilarity space, we use the method proposed by Pekalska and Duin, selecting a set of images called prototypes and computing distances to those prototypes for all images in the collection. After the user gives feedback, we apply active learning with a one-class support vector machine to decide the movement of images such that relevant images stay close together while irrelevant ones are pushed away (the work of Guo et al.). The dissimilarity space is then adjusted accordingly. Results on a Corel dataset of 10000 images and a TrecVid collection of 43907 keyframes show that our proposed approach is not only intuitive, it also significantly improves the retrieval performance. Index Terms—Active learning, dissimilarity learning, interactive image search, visualization. I.
Castellanos-Domínguez, “Generalizing dissimilarity representations using feature lines
- the 12th Iberoamerican Congress on Pattern Recognition
, 2007
"... Abstract. A crucial issue in dissimilarity-based classification is the choice of the representation set. In the small sample case, classifiers capable of a good generalization and the injection or addition of extra information allow to overcome the representational limitations. In this paper, we pre ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. A crucial issue in dissimilarity-based classification is the choice of the representation set. In the small sample case, classifiers capable of a good generalization and the injection or addition of extra information allow to overcome the representational limitations. In this paper, we present a new approach for enriching dissimilarity representations. It is based on the concept of feature lines and consists in deriving a generalized version of the original dissimilarity representation by using feature lines as prototypes. We use a linear normal density-based classifier and the nearest neighbor rule, as well as two different methods for selecting prototypes: random choice and a length-based selection of the feature lines. An important observation is that just a few long feature lines are needed to obtain a significant improvement in performance over the other representation sets and classifiers. In general, the experiments show that this alternative representation is especially profitable for some correlated datasets.
Hit Miss Networks with Applications to Instance Selection
"... In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation between each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classifier by removing instances from the training set (instance selection). We introduce three new algorithms based on HMN for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the training set, HMN-E, which removes more instances than HMN-C, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 artificial and real life datasets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise containt, and presence of redundant variables. Results of experiments on these datasets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these datasets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results seem to show that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning. Keywords: Graph-based training set representation, nearest neighbor, instance selection for instance-based learning. 1
Mining Massive Archives of Mice Sounds with Symbolized Representations
"... Many animals produce long sequences of vocalizations best described as “songs. ” In some animals, such as crickets and frogs, these songs are relatively simple and repetitive chirps or trills. However, animals as diverse as whales, bats, birds and even the humble mice considered here produce intrica ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Many animals produce long sequences of vocalizations best described as “songs. ” In some animals, such as crickets and frogs, these songs are relatively simple and repetitive chirps or trills. However, animals as diverse as whales, bats, birds and even the humble mice considered here produce intricate and complex songs. These songs are worthy of study in their own right. For example, the study of bird songs has helped to cast light on various questions in the nature vs. nurture debate. However, there is a particular reason why the study of mice songs can benefit mankind. The house mouse (Mus musculus) has long been an important model organism in biology and medicine, and it is by far the most commonly used genetically altered laboratory mammal to address human diseases. While there has been significant recent efforts to analyze mice songs, advances in sensor technology have created a situation where our ability to collect data far outstrips our ability to analyze it. In this work we argue that the time is ripe for archives of mice songs to fall into the purview of data mining. We show a novel technique for mining mice vocalizations directly in the visual (spectrogram) space that practitioners currently use. Working in this space allows us to bring an arsenal of data mining tools to bear on this important domain, including similarity search, classification, motif discovery and contrast set mining.
Real-Time Classification of Streaming Sensor Data
"... The last decade has seen a huge interest in classification of time series. Most of this work assumes that the data resides in main memory and is processed offline. However, recent advances in sensor technologies require resource-efficient algorithms that can be implemented directly on the sensors as ..."
Abstract
- Add to MetaCart
The last decade has seen a huge interest in classification of time series. Most of this work assumes that the data resides in main memory and is processed offline. However, recent advances in sensor technologies require resource-efficient algorithms that can be implemented directly on the sensors as real-time algorithms. We show how a recently introduced framework for time series classification, time series bitmaps, can be implemented as efficient classifiers which can be updated in constant time and space in the face of very high data arrival rates. We describe results from a case study of an important entomological problem, and further demonstrate the generality of our ideas with an example from robotics. 1.
Trends in Nearest Feature Classification for Face Recognition – Achievements and Perspectives
"... Face recognition has become one of the most intensively investigated topics in biometrics. Recent and comprehensive surveys found in the literature, such as (Zhao et al., 2003; Ruizdel Solar & Navarrete, 2005; Delac & Grgic, 2007), provide a good indication of how active are the research activities ..."
Abstract
- Add to MetaCart
Face recognition has become one of the most intensively investigated topics in biometrics. Recent and comprehensive surveys found in the literature, such as (Zhao et al., 2003; Ruizdel Solar & Navarrete, 2005; Delac & Grgic, 2007), provide a good indication of how active are the research activities in this area. Likewise in other fields in pattern recognition, the

