## Learning from Time Series in the Presence of Noise: Unsupervised and Semi-Supervised Approaches (2008)

### BibTeX

@MISC{Yankov08learningfrom,

author = {Dragomir Dimitrov Yankov},

title = {Learning from Time Series in the Presence of Noise: Unsupervised and Semi-Supervised Approaches},

year = {2008}

}

### OpenURL

### Abstract

### Citations

9021 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...gure 4.14 bottom left). For clarity of the discussion we list this formulation, pointing for many of the details and how it is derived naturally from the 148Structural Risk Minimization principle to =-=[50, 94]-=-: min (w,ξ,ξ ∗ ,b,y ∗ ) 1 2 ||w||2 + C m∑ i=1 ξi + C ∗ subject to yi((w · xi) + b) ≥ 1 − ξi p∑ i=1 ξ ∗ i (4.10) i = 1, m y ∗ i ((w · x ∗ i ) + b) ≥ 1 − ξ ∗ i i = 1, p y ∗ i = ±1 ξi, ξ ∗ j ≥ 0 i = 1, m... |

2185 |
Density Estimation for Statistics and Data Analysis
- Silverman
- 1986
(Show Context)
Citation Context ...the distance based outliers [58]. The definition can be generalized further to compute the average distance to all k nearest neighbors, which is in fact the non-parametric density estimation approach =-=[87]-=-. The algorithm proposed in the next sections can easily be adapted with any of these outlier 27definitions. We use Definition 2.3.1 because of its intuitive interpretation. Our choice is further jus... |

2044 | Online learning with kernels
- Kivinen, Smola, et al.
- 2004
(Show Context)
Citation Context ...ficant fraction of randomness added to them, that are the focus of our discussion. Such examples 2are referred to in the literature, and also in the current text, as pattern noise or simply as noise =-=[18, 83]-=-. In many applications, the noisy examples can poise the learning process and obscure the fact that certain patterns of similarity exists in the data. We study instances of those in Chapter 3 and Chap... |

1734 | Mapreduce: Simplified data processing on large clusters
- Dean, Ghemawat
- 2004
(Show Context)
Citation Context ...doption of these methods across the global data mining community. Recently, however, an intuitive yet extremely scalable framework for parallel data mining has emerged, namely the MapReduce framework =-=[34]-=-. MapReduce is quickly turning into a parallel data mining standard and is already adopted by large companies, such as Google, Yahoo! and Microsoft. The framework operates in two steps. All examples i... |

1696 |
A global geometric framework for nonlinear dimensionality reduction, Science 290
- Tenenbaum, Silva, et al.
- 2000
(Show Context)
Citation Context ...ccurate, object contours. The obscured examples are then seen as widely spread Gaussian noise distributed along the main trajectory of the manifold. Manifold reconstruction techniques, such as Isomap =-=[93]-=- and Locally Linear Embedding (LLE) [79], are known to be highly unstable for such noisy 13manifolds [9]. We thus derive an improved Isomap method capable of isolating the noisy examples and of follo... |

1623 | Nonlinear dimensionality reduction by locally linear embedding, Science 290
- Roweis, Saul
- 2000
(Show Context)
Citation Context ...xamples are then seen as widely spread Gaussian noise distributed along the main trajectory of the manifold. Manifold reconstruction techniques, such as Isomap [93] and Locally Linear Embedding (LLE) =-=[79]-=-, are known to be highly unstable for such noisy 13manifolds [9]. We thus derive an improved Isomap method capable of isolating the noisy examples and of following closely the true trajectory of the ... |

1255 | Shape matching and object recognition using shape contexts
- Belongie, Puzicha
- 2002
(Show Context)
Citation Context ...on the accurate identification of shapes. Features such as color, texture, positioning etc., though important, are insufficient to convey the information that could be obtained through shape analysis =-=[13, 64, 81, 96]-=-. In this chapter we propose an algorithm for clustering of 2D shapes. The method is invariant to basic geometric transformations, e.g. scale, shift, and most importantly, rotation. It is robust to no... |

1107 | On Spectral Clustering: Analysis and an Algorithm
- Ng, Jordan, et al.
- 2001
(Show Context)
Citation Context ...sed approach in perspective with those directions. A number of clustering algorithms have been demonstrated to be particularly suitable for learning of non-convex formations, e.g. spectral clustering =-=[70]-=-, spectral graph par115titioning [39], or kernel K-means [84]. A close relation between all of these approaches has been pointed out before [20]. We focus on one of these algorithms - spectral cluste... |

1059 | Nonlinear component analysis as a kernel eigenvalue problem - Scholkopf, Smola, et al. - 1998 |

740 | Laplacian eigenmaps for dimensionality reduction and data representation
- Belkin, Niyogi
- 2003
(Show Context)
Citation Context ...ng approaches try to infer the nonlinear structure of the data by considering small regions around each example. Some popular methods following this paradigm are, for example, the Laplacian eigenmaps =-=[11]-=- and Isomap [93]. The general idea behind these algorithms is to compute a neighborhood graph G, where 116each example xi is connected only to examples in its close proximity. The graph is then augme... |

684 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ...one of the most popular semi-supervised learning techniques - the transductive support vector machines. Generally considered impractical when more than a few thousand unlabeled examples are available =-=[50]-=-, we demonstrate how our constraint support vector clustering can scale them to efficiently handle several orders of magnitude lager unlabeled datasets. 161.3 Contributions We can summarize the main ... |

510 | Estimating the support of a high-dimensional distribution. Neural Comput 2001;13(7):1443–71
- Schölkopf, Platt, et al.
(Show Context)
Citation Context ...e, we refer to one common density support that is met across the entire data space as illustrated in Figure 1.4. The density support is estimated using the so called one-class support vector machines =-=[82, 85]-=-. It computes a global decision functions which outlines contours around the dense regions in the space. The density estimation method, however, cannot recognize whether the dense regions are part of ... |

424 | Fast subsequence matching in time-series databases
- Faloutsos, Ranganathan, et al.
- 1994
(Show Context)
Citation Context ...d without additional human intervention, but solely based on the representation and the data placement in space, we then say that we have performed unsupervised learning from the time series examples =-=[37, 51, 53, 54]-=-. Often unsupervised time series methods are sufficient to infer the generating distributions. In the presence of high noise rates, however, additional supervision might still be required, despite of ... |

407 |
Algebraic connectivity of graphs
- Fiedler
- 1975
(Show Context)
Citation Context ... directions. A number of clustering algorithms have been demonstrated to be particularly suitable for learning of non-convex formations, e.g. spectral clustering [70], spectral graph par115titioning =-=[39]-=-, or kernel K-means [84]. A close relation between all of these approaches has been pointed out before [20]. We focus on one of these algorithms - spectral clustering. Interestingly, the algorithm sha... |

356 | Image retrieval: current techniques, promising directions, and open issues
- Rui, Huang, et al.
- 1999
(Show Context)
Citation Context ...on the accurate identification of shapes. Features such as color, texture, positioning etc., though important, are insufficient to convey the information that could be obtained through shape analysis =-=[13, 64, 81, 96]-=-. In this chapter we propose an algorithm for clustering of 2D shapes. The method is invariant to basic geometric transformations, e.g. scale, shift, and most importantly, rotation. It is robust to no... |

305 | LOF: Identifying Density-Based Local Outliers
- Breunig, Kriegel, et al.
(Show Context)
Citation Context ...distance based algorithm is a parallel modification of Bay’s randomized nested loop algorithm [10], and the density based version is a modification of the popular local outlier factor (LOF) algorithm =-=[22]-=-. Both parallel variants proceed in a similar fashion: First the data space is partitioned across different computers and outliers, local for each computer, are identified. Subsequently, the results f... |

284 |
Semi-Supervised Learning
- Chapelle, Schölkopf, et al.
- 2006
(Show Context)
Citation Context ...m of semi-supervised learning where apart of the labeled data we are also given a set of unlabeled examples, whose position in space might be explored in order to improve the inductive learners alone =-=[28, 50]-=-. More formally, The observations X = {xi} n i=1 now give rise to two separate sets: a set L = {(xi, yi)} m i=1 of labeled examples, where yi = ±1; and a set of unlabeled examples U = {x ∗ i = xm+i} p... |

272 | Data structures and algorithms for nearest neighbor search in general metric spaces
- Yianilos
- 1993
(Show Context)
Citation Context ... dataset elements. A number of techniques that utilize the triangle inequality have been proposed over the years, e.g. [24, 41], as well as some popular indexing structures as the Vantage Point trees =-=[115]-=-. Here we show that, provided the inner distance d(·, ·) satisfies the triangle inequality, the rotation distance satisfies it too. Proposition 3.3.1. If the inner distance d(vi, vj) is a pseudo-metri... |

259 | Algorithms for mining distance-based outliers in large datasets
- Knorr, Ng
- 1998
(Show Context)
Citation Context ...ast, we can handle a dataset of size two million objects with dimensionality 512 in less than an hour, most of which is I/O time. Distance based outliers are also the problem of study in Knorr et al. =-=[58]-=- and Tao et al. [91]. Both works discuss a quadratic (in the dataset size) nested loop algorithm for outlier detection and subsequently suggest ways for its improvement. Knorr et al. [58] propose an a... |

248 |
eds., Time Series Prediction: Forecasting The Future And Understanding The Past
- Weigend, Gershenfeld
- 1993
(Show Context)
Citation Context ...onal embedding in the high dimensional space, and if so, what can this be attributed to? For a number of domains, such as time series produced by dynamical systems, the answer is known to be positive =-=[104]-=-. In Chapter 3, we surprisingly observe that manifold structures also appear in the space defined by a special type of pseudo time series - i.e. time series extracted from shapes of objects. Figure 1.... |

247 |
Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm
- Borgefors
- 1988
(Show Context)
Citation Context ...asets the (approximate) correct rotation was known. We removed this information by randomly rotating the images. The MixedBag dataset is small enough to run the more computationally expensive Chamfer =-=[19]-=- and Hausdorff [71] distance measures. They achieved an error rate of 6.0% and 7.0% respectively (see also [97]), slightly worse than Euclidean distance. Likewise the Chicken dataset allows us to comp... |

226 | The EM algorithm for mixtures of factor analyzers
- Ghahramani, Hinton
- 1996
(Show Context)
Citation Context ...o facilitate this, a local approach of the data comes into play. We build a mixture of factor analyzers - Gaussian like models that can capture noise of different variance along individual dimensions =-=[43]-=-. The analyzers make local decisions of which points constitute noisy examples and do not conform with the rest of the data within a neighborhood. These decisions are then used to regularize the decis... |

223 | On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and knowledge discovery - Keogh, Kasetty |

194 | D.: Shape distributions
- OSADA, FUNKHOUSER, et al.
(Show Context)
Citation Context ...truct a feature vector using only the points from the shape boundary. To obtain better efficiency, certain contour methods extract a very limited number of features that are either rotation invariant =-=[72]-=-, or allow a corresponding alignment [17]. Both of the approaches, while suitable for particular settings, do not have good discriminative ability in the presence of noise and distortions [55, 116, 11... |

184 | A cost model for nearest neighbor search in highdimensional data space
- Berchtold, Böhm, et al.
- 1997
(Show Context)
Citation Context ...ion (nndd) of the dataset, and more precisely the number of elements that fall in its tail. Computing the nndd, however, is hard, especially in high dimensional spaces as is the case with time series =-=[16, 90]-=-. The available methods require that random portions of the space are sampled and the nearest neighbor distances in those portions to be computed. Unfortunately, for a robust estimate, this requires s... |

174 | Semi-supervised support vector machines
- Bennett, Demiriz
- 1999
(Show Context)
Citation Context ...ll possible ±1 label assignments, which is equivalent to checking 2p combinations. Even for modest number of unlabeled examples p, this is merely intractable. Mixed-integer programming solutions like =-=[15]-=- are also shown to be extremely inefficient [50]. An approximate solution that scales “up to 100,000 examples in reasonable time” has been suggested in [50]. The solution starts with the inductive sol... |

164 | Support vector clustering
- Ben-Hur, Horn, et al.
- 2001
(Show Context)
Citation Context ...ed (see Figure 1.4 right). Manifold reconstruction is not the only unsupervised task that can benefit of the dual view of the data. One-class classification extends naturally into a clustering scheme =-=[14]-=-, called support vector clustering (SVC). One-class SVM possesses all nice theoretical properties attributed to kernel machines for supervised learning, such as provable bounds on the complexity of th... |

150 | Clustering with instancelevel constraints
- WAGSTAFF, CARDIE
- 2000
(Show Context)
Citation Context ...nd the accuracy of the similarity measure. Partially labeled examples for instance have been shown to improve time series nearest neighbor learners [102], to allow the inference of better clusterings =-=[98]-=-, or to help rebuild more accurately the nonlinear subspaces that are occupied by the high dimensional patterns [12]. Here we demonstrate that our unsupervised methods for time series learning in the ... |

149 | Variational Inference for Bayesian Mixtures of Factor Analysers
- Ghahramani, Beal
- 2000
(Show Context)
Citation Context ...rch could study the effect of automatically inferring the suitable number of analyzers to be used with the model. Such non-parametric Bayesian extensions of the MFA model have been proposed before in =-=[42]-=-, but whether they will be effective and efficient for the purpose of nonlinear manifold reconstruction needs to be further verified. 152Chapter 5 Conclusion There is an amazing wealth of domains whe... |

148 | Review of shape. representation and description techniques
- Zhang, Lu
(Show Context)
Citation Context ... outline the possible shape representation techniques and point out some of their strengths and drawbacks. For more detailed information on the topic, we refer the reader to extensive surveys such as =-=[25, 95, 116]-=-. As outlined by Zhang et al. [116], the representation methods could roughly be divided into contour and region based. Region based methods extract features from the two dimensional image information... |

129 |
Some approaches to best-match file searching
- Burkhard, Keller
- 1973
(Show Context)
Citation Context ...cantly decrease 74the searching time by excluding from consideration many of the dataset elements. A number of techniques that utilize the triangle inequality have been proposed over the years, e.g. =-=[24, 41]-=-, as well as some popular indexing structures as the Vantage Point trees [115]. Here we show that, provided the inner distance d(·, ·) satisfies the triangle inequality, the rotation distance satisfie... |

128 | Efficient and robust retrieval by shape content through curvature scale space
- Mokhtarian, Abbasi, et al.
- 1997
(Show Context)
Citation Context ...ase of noise and when there is no strong distinction between the existing classes. 1043.5.2 Marine creatures dataset We used the prototype database of marine creatures discussed by Mokhtarian et al. =-=[67]-=-. The images for four classes of different types of fish were selected, with each class containing 50 examples (Figure 3.15). Figure 3.15: Marine creatures dataset: fish shapes - top, their time serie... |

122 | Semi-supervised classification by low density separation
- Chapelle, Zien
(Show Context)
Citation Context ...es causing the decision bound to pass through sparser regions of the data space has been observed before and is believed to be the main cause for semi-supervised approaches to achieve better accuracy =-=[29]-=-. The idea that we develop here is sparked by the question: are all unlabeled examples equally important for pushing the decision function away from the dense clusters, and if not, how can we select t... |

119 | Probabilistic discovery of time series motifs
- Chiu, Keogh, et al.
- 2003
(Show Context)
Citation Context ...ead to a conclusion that the subsequence C is not a rare example in the database. In these cases, when p1 and p2 are not “significantly” different, the subsequences C and M are called trivial matches =-=[31]-=-. The positions p1 and p2 are significantly different with respect to a distance function Dist, if there exists a subsequence Q starting at position p3, such that p1 < p3 < p2 and Dist(C, M) < Dist(C,... |

115 | Content-based image retrieval systems: A survey
- Veltkamp, Tanase
- 2000
(Show Context)
Citation Context ...on the accurate identification of shapes. Features such as color, texture, positioning etc., though important, are insufficient to convey the information that could be obtained through shape analysis =-=[13, 64, 81, 96]-=-. In this chapter we propose an algorithm for clustering of 2D shapes. The method is invariant to basic geometric transformations, e.g. scale, shift, and most importantly, rotation. It is robust to no... |

113 | Training with noise is equivalent to Tikhonov regularization
- Bishop
- 1995
(Show Context)
Citation Context ...ficant fraction of randomness added to them, that are the focus of our discussion. Such examples 2are referred to in the literature, and also in the current text, as pattern noise or simply as noise =-=[18, 83]-=-. In many applications, the noisy examples can poise the learning process and obscure the fact that certain patterns of similarity exists in the data. We study instances of those in Chapter 3 and Chap... |

112 | Analysis of planar shapes using geodesic paths on shape spaces
- Klassen, Srivastava, et al.
(Show Context)
Citation Context ...101], which renders a computationally intensive method. A potential way to remedy the problem is to consider the spectral information of the extracted time series by applying a Fourier transformation =-=[30, 57, 97]-=-. Charalampidis [30] and Klassen et al. [57] further utilize the transformation in partitioning and hierarchical shape clustering schemes. They demonstrate accuracy in performance, for cases when all ... |

105 | Mining distance-based outliers in near linear time with randomization and a simple pruning rule - Bay, Schwabacher - 2003 |

102 | Support vector method for novelty detection
- Schölkopf, Williamson, et al.
- 1999
(Show Context)
Citation Context ...e, we refer to one common density support that is met across the entire data space as illustrated in Figure 1.4. The density support is estimated using the so called one-class support vector machines =-=[82, 85]-=-. It computes a global decision functions which outlines contours around the dense regions in the space. The density estimation method, however, cannot recognize whether the dense regions are part of ... |

101 | Automatic Target Recognition by Matching Oriented Edge Pixels
- Olson, Huttenlocher
- 1997
(Show Context)
Citation Context ...ate) correct rotation was known. We removed this information by randomly rotating the images. The MixedBag dataset is small enough to run the more computationally expensive Chamfer [19] and Hausdorff =-=[71]-=- distance measures. They achieved an error rate of 6.0% and 7.0% respectively (see also [97]), slightly worse than Euclidean distance. Likewise the Chicken dataset allows us to compare directly to [68... |

83 |
Support vector data description
- Tax, Duin
(Show Context)
Citation Context ... kernel to be provided as an input −γ‖xi−xj‖ 2 from the user. The radial basis function k(xi, xj) = e has been recognized as a preferred kernel function because of its ability to form closed contours =-=[14, 92]-=-. This means that the user needs to provide a suitable kernel width γ. Small values of γ (i.e. large kernel width) may disguise or merge some of the clusters, while very large γ may create a large num... |

78 | Clustering of time-series subsequences is meaningless: implications for previous and future research." Knowledge and information systems 8.2 (2005): 154-177. 57 Keogh, Eamonn, et al. "An online algorithm for segmenting time series
- Keogh, Lin
- 2001
(Show Context)
Citation Context ...d without additional human intervention, but solely based on the representation and the data placement in space, we then say that we have performed unsupervised learning from the time series examples =-=[37, 51, 53, 54]-=-. Often unsupervised time series methods are sufficient to infer the generating distributions. In the presence of high noise rates, however, additional supervision might still be required, despite of ... |

77 | Global coordination of local linear models
- Roweis, Saul, et al.
(Show Context)
Citation Context ...uctuations are observed in the data and yet an obvious clustering is available. In this sense, the proposed method is closest in spirit to the manifold reconstruction method proposed by Roweis et al. =-=[80]-=-. They use a mixture of factor analyzers to infer the local structure of the underlying manifold, but then a global constraint is imposed, so that all local models are aligned to follow a consistent t... |

63 | Hot sax: Efficiently finding the most unusual time series subsequence
- Keogh, Lin, et al.
(Show Context)
Citation Context ...d without additional human intervention, but solely based on the representation and the data placement in space, we then say that we have performed unsupervised learning from the time series examples =-=[37, 51, 53, 54]-=-. Often unsupervised time series methods are sufficient to infer the generating distributions. In the presence of high noise rates, however, additional supervision might still be required, despite of ... |

61 |
The isomap algorithm and topological stability
- Balasubramanian, Schwartz
- 2002
(Show Context)
Citation Context ...along the main trajectory of the manifold. Manifold reconstruction techniques, such as Isomap [93] and Locally Linear Embedding (LLE) [79], are known to be highly unstable for such noisy 13manifolds =-=[9]-=-. We thus derive an improved Isomap method capable of isolating the noisy examples and of following closely the true trajectory of the manifolds. Chapter 3 also demonstrates how the unsupervised shape... |

53 | A unifying theorem for spectral embedding and clustering
- Brand, Huang
- 1981
(Show Context)
Citation Context ...ing of non-convex formations, e.g. spectral clustering [70], spectral graph par115titioning [39], or kernel K-means [84]. A close relation between all of these approaches has been pointed out before =-=[20]-=-. We focus on one of these algorithms - spectral clustering. Interestingly, the algorithm shares a lot of commonalities with SVC. They both start by computing a Gaussian kernel matrix, emulating the h... |

49 |
On two geometric problems related to the traveling salesman problem
- Papadimitriou, Vazirani
- 1984
(Show Context)
Citation Context ...ng trees. 3.4.3 Degree-bounded Isomap The degree-k-bounded minimum spanning tree (k-MST) is an approximation of the MST of a connected graph, in which every vertex is allowed to have degree at most k =-=[73]-=-. The problem has emerged in the context of network modeling, where a network with minimum flow is needed but there is a limit imposed on the capacity of flow that can go through each node. In the cas... |

49 | Statistical shape analysis: Clustering, learning, and testing
- Srivastava, Joshi, et al.
- 2005
(Show Context)
Citation Context ...eduction should be applied. Manifold approaches have been demonstrated to be particularly suitable for projecting image extracted data [21, 79, 88, 93]. In their clustering approach Srivastava et al. =-=[89]-=-, also observe the manifold structure of the shape data. The authors implicitly assume a 2D structure for the embedding and build a Markov model to partition the reconstructed 2D surface. Instead, we ... |

40 | LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary
- Keogh, Li, et al.
(Show Context)
Citation Context ... “q”. Another challenge in the shape clustering task is introduced by the high dimensionality of the input space. Accurate shape representations generally require selecting a large number of features =-=[55]-=-. Additionally, there is significant amount of noise for many of the features, which is either related to the complexity of the studied shapes or is accumulated during certain preprocessing steps as i... |

40 | An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases
- Kim, Park, et al.
- 2001
(Show Context)
Citation Context ...B D (i.e. LB D(vi, vj) ≤ d(vi, vj), ∀vi, vj ∈ V ) which is a metric. For example, for the dynamic time warping (equation (1.3)), such a metric bounding function has been demonstrated to be the LB Kim =-=[56]-=- lower bound. In general, the tighter the lower bounding metric that we find, the better the pruning capability of any algorithm utilizing the triangle inequality. Next we demonstrate how the obtained... |