Results 1 
9 of
9
Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms
, 2003
"... Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this pape ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
Many clustering and segmentation algorithms both suffer from the limitation that the number of clusters/segments are specified by a human user. It is often impractical to expect a human with sufficient domain knowledge to be available to select the number of clusters/segments to return. In this paper, we investigate techniques to determine the number of clusters or segments to return from hierarchical clustering and segmentation algorithms. We propose an efficient algorithm, the L method, that finds the “knee ” in a ‘ # of clusters vs. clustering evaluation metric ’ graph. Using the knee is wellknown, but is not a particularly wellunderstood method to determine the number of clusters. We explore the feasibility of this method, and attempt to determine in which situations it will and will not work. We also compare the L method to existing methods based on the accuracy of the number of clusters that are determined and efficiency. Our results show favorable performance for these criteria compared to the existing methods that were evaluated.
A better alternative to piecewise linear time series segmentation
, 2006
"... Time series are difficult to monitor, summarize and predict. Segmentation organizes time series into few intervals having uniform characteristics (flatness, linearity, modality, monotonicity and so on). For scalability, we require fast linear time algorithms. The popular piecewise linear model can d ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Time series are difficult to monitor, summarize and predict. Segmentation organizes time series into few intervals having uniform characteristics (flatness, linearity, modality, monotonicity and so on). For scalability, we require fast linear time algorithms. The popular piecewise linear model can determine where the data goes up or down and at what rate. Unfortunately, when the data does not follow a linear model, the computation of the local slope creates overfitting. We propose an adaptive time series model where the polynomial degree of each interval vary (constant, linear and so on). Given a number of regressors, the cost of each interval is its polynomial degree: constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so on. Our goal is to minimize the Euclidean (l2) error for a given model complexity. Experimentally, we investigate the model where intervals can be either constant or linear. Over synthetic random walks, historical stock market prices, and electrocardiograms, the adaptive model provides a more accurate segmentation than the piecewise linear model without increasing the crossvalidation error or the running time, while providing a richer vocabulary to applications. Implementation issues, such as numerical stability and realworld performance, are discussed. 1
Learning states and rules for detecting anomalies in time series
 Applied Intelligence
"... The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The normal operation of a device can be characterized in different temporal states. To identify these states, we introduce a segmentation algorithm called Gecko that can determine a reasonable number of segments using our proposed L method. We then use the RIPPER classification algorithm to describe these states in logical rules. Finally, transitional logic between the states is added to create a finite state automaton. Our empirical results, on data obtained from the NASA shuttle program, indicate that the Gecko segmentation algorithm is comparable to a human expert in identifying states, and our L method performs better than the existing permutation tests method when determining the number of segments to return in segmentation algorithms. Empirical results have also shown that our overall system can track normal behavior and detect anomalies.
Fuzzy Clustering Based Segmentation of TimeSeries
 Lecture Notes in Computer Science
, 2003
"... The segmentation of timeseries is a constrained clustering problem: the data points should be grouped by their similarity, but with the constraint that all points in a cluster must come from successive time points. The changes of the variables of a timeseries are usually vague and do not focus ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The segmentation of timeseries is a constrained clustering problem: the data points should be grouped by their similarity, but with the constraint that all points in a cluster must come from successive time points. The changes of the variables of a timeseries are usually vague and do not focused on any particular time point. Therefore it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to timeseries segmentation. This paper proposes a clustering algorithm for the simultaneous identification of fuzzy sets which represent the segments in time and the local PCA models used to measure the homogeneity of the segments. The algorithm is applied to the monitoring of the production of highdensity polyethylene.
Monitoring Process Transitions by Kalman Filtering and TimeSeries Segmentation
 Computers and Chemical Engineering, Edited by Ana Paula, BarbosaPovoa & Henrique A. Matos
"... The analysis of historical process data of technological systems plays important role in process monitoring, modelling and control. Timeseries segmentation algorithms are often used to detect homogenous periods of operation based on inputoutput process data. However, historical process data alone ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The analysis of historical process data of technological systems plays important role in process monitoring, modelling and control. Timeseries segmentation algorithms are often used to detect homogenous periods of operation based on inputoutput process data. However, historical process data alone may not be su#cient for the monitoring of complex processes. This paper incorporates the firstprinciple model of the process into the segmentation algorithm. The key idea is to use a modelbased nonlinear stateestimation algorithm to detect the changes in the correlation among the statevariables. The homogeneity of the timeseries segments is measured using a PCA similarity factor calculated from the covariance matrices given by the stateestimation algorithm. The whole approach is applied to the monitoring of an industrial highdensity polyethylene plant.
Time Series Change Detection using Segmentation: A Case Study for Land Cover Monitoring
"... Abstract—Automatic identification of changes in land cover from remote sensing data is a critical aspect of monitoring the planet’s ecosystems. We use time series segmentation methodology for detecting land cover changes from Moderate Resolution Imaging Spectroradiometerbased vegetation index. In t ..."
Abstract
 Add to MetaCart
Abstract—Automatic identification of changes in land cover from remote sensing data is a critical aspect of monitoring the planet’s ecosystems. We use time series segmentation methodology for detecting land cover changes from Moderate Resolution Imaging Spectroradiometerbased vegetation index. In this paper, we investigate segmentation scores based on difference between models and propose two approaches for normalizing the difference based score. The first approach uses permutation testing to assign a pvalue to model difference. The second approach builds on bootstrapping methodology used in statistics which estimates the null distribution of complex statistics whose standard errors are not analytically derivable by generating alternative versions of the data by a resampling strategy. More specifically, given a time series with either a single or two segments, we propose a method to estimate the distribution of model difference statistic for each segment. The proposed approach allows normalizing model difference statistic when complex models are being used in the segmentation algorithm. We study the strengths and weaknesses of the two normalizing approaches in the context of characteristics of land cover data such as seasonality and noise using synthetic and real data sets. We show that relative performance of normalization approaches can vary significantly depending on the characteristics of the data. We illustrate the utility of these approaches for detection of deforestation in Mato Grosso (Brazil). I.
SIGNAL SEGMENTATION AND MODELLING BASED ON EQUIPARTITION PRINCIPLE
"... In this paper, we propose a method for time interval segmentation of signals based on an EquiPartition principle (EP). According to EP, the signal is segmented into segments that give equal errors in reconstruction selecting the most suitable model to describe each segment. Moreover, the segments ar ..."
Abstract
 Add to MetaCart
In this paper, we propose a method for time interval segmentation of signals based on an EquiPartition principle (EP). According to EP, the signal is segmented into segments that give equal errors in reconstruction selecting the most suitable model to describe each segment. Moreover, the segments are equivalent in the content domain, since the signal is segmented into segments that are modelled by the same number of coefficients. The proposed method has been successfully applied on different types of signals like: physiologic, speech, human motion, financial time series. Finally, the proposed methodology is very flexible on changes of error criteria, signal modelling and on signal dimension yielding a robust method for segmentation and modelling of signals. Index Terms — Signal segmentation, signal modelling, timefrequency analysis, equipartition.