Results 1  10
of
59
Model selection for probabilistic clustering using crossvalidatedlikelihood
 Statistics and Computing
, 2000
"... ..."
Modelbased clustering and visualization of navigation patterns on a web site
 Data Mining and Knowledge Discovery
, 2003
"... We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through th ..."
Abstract

Cited by 68 (0 self)
 Add to MetaCart
(Show Context)
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach weemployis modelbased (as opposed to distancebased) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of rstorder Markov models using the ExpectationMaximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data � and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on usertra c data from msnbc.com. Keywords: Modelbased clustering, sequence clustering, data visualization, Internet, web 1
Discovery of Climate Indices Using Clustering
 In Proc. of the 9th ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining
, 2003
"... To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth’s oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eig ..."
Abstract

Cited by 33 (11 self)
 Add to MetaCart
(Show Context)
To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth’s oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eigenvalue analysis techniques, such as principal components analysis (PCA) and singular value decomposition (SVD), to discover climate indices. However, eigenvalue techniques are only useful for finding a few of the strongest signals. Furthermore, they impose a condition that all discovered signals must be orthogonal to each other, making it difficult to attach a physical interpretation to them. This paper presents an alternative clusteringbased methodology for the discovery of climate indices that overcomes these limitations and is based on clusters that represent regions with relatively homogeneous behavior. The centroids of these clusters are time series that summarize the behavior of the ocean or atmosphere in those regions. Some of these centroids correspond to known climate indices and provide a validation of our methodology; other centroids are variants of known indices that may provide better predictive power for some land areas; and still other indices may represent potentially new Earth science phenomena. Finally, we show that cluster based indices generally outperform SVD derived indices, both in terms of area weighted correlation and direct correlation with the known indices.
Multilevel Regression Modeling of Nonlinear Processes: Derivation and Applications to Climatic Variability
 JOURNAL OF CLIMATE
, 2004
"... Predictive models are constructed to best describe an observed field’s statistics within a given class of nonlinear dynamics driven by a spatially coherent noise that is white in time. For linear dynamics, such inverse stochastic models are obtained by multiple linear regression (MLR). Nonlinear dyn ..."
Abstract

Cited by 24 (12 self)
 Add to MetaCart
Predictive models are constructed to best describe an observed field’s statistics within a given class of nonlinear dynamics driven by a spatially coherent noise that is white in time. For linear dynamics, such inverse stochastic models are obtained by multiple linear regression (MLR). Nonlinear dynamics, when more appropriate, is accommodated by applying multiple polynomial regression (MPR) instead; the resulting model uses polynomial predictors, but the dependence on the regression parameters is linear in both MPR and MLR. The basic concepts are illustrated using the Lorenz convection model, the classical doublewell problem, and a threewell problem in two space dimensions. Given a data sample that is long enough, MPR successfully reconstructs the model coefficients in the former two cases, while the resulting inverse model captures the threeregime structure of the system’s probability density function (PDF) in the latter case. A novel multilevel generalization of the classic regression procedure is introduced next. In this generalization, the residual stochastic forcing at a given level is subsequently modeled as a function of variables at this level and all the preceding ones. The number of levels is determined so that the lag0 covariance of the residual forcing converges to a constant matrix, while its lag1 covariance vanishes. This method has been applied to the output of a threelayer, quasigeostrophic model and to the analysis of Northern Hemisphere wintertime geopotential height anomalies. In both cases, the inverse model simulations reproduce well the multiregime structure of the PDF constructed in the subspace spanned by the dataset’s leading empirical orthogonal functions, as well as the detailed spectrum of the dataset’s temporal evolution. These encouraging results are interpreted in terms of the modeled lowfrequency flow’s feedback on the statistics of the subgridscale processes.
Weather Regimes and Preferred Transition Paths In a ThreeLevel . . .
, 2003
"... Multiple flow regimes are reexamined in a global, threelevel, quasigeostrophic model with realistic topography in spherical geometry. This QG3 model, using a T21 triangular truncation in the horizontal, has a fairly realistic climatology for Northern Hemisphere winter, and exhibits multiple regime ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
Multiple flow regimes are reexamined in a global, threelevel, quasigeostrophic model with realistic topography in spherical geometry. This QG3 model, using a T21 triangular truncation in the horizontal, has a fairly realistic climatology for Northern Hemisphere winter, and exhibits multiple regimes that resemble those found in atmospheric observations. Four regimes are robust to changes in the classification method, kmeans vs. mixture modeling, and its parameters. These regimes correspond roughly to opposite phases of the Arctic Oscillation (AO) and the North Atlantic Oscillation (NAO), respectively. The Markov
A Boolean delay equation model of colliding cascades. Part II: Prediction of critical transitions
, 2003
"... We consider a prominent feature of hierarchical nonlinear (‘‘complex’’) systems: persistent recurrence of abrupt overall changes, called here ‘‘critical transitions.’’ Motivated by the earthquake prediction problem, we formulate a model that uses heuristic constraints taken from the dynamics of seis ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
We consider a prominent feature of hierarchical nonlinear (‘‘complex’’) systems: persistent recurrence of abrupt overall changes, called here ‘‘critical transitions.’’ Motivated by the earthquake prediction problem, we formulate a model that uses heuristic constraints taken from the dynamics of seismicity. Our conclusions, though, may apply to hierarchical systems that arise in other areas.We use the Boolean delay equation (BDE) framework to model the dynamics of colliding cascades, in which a direct cascade of loading interacts with an inverse cascade of failures. The elementary interactions of elements in the system are replaced by their integral effect, represented by the delayed switching of an element’s state.The present paper is the first of two on the BDE approach to modeling seismicity. Its major results are the following: (i) A model that implements the approach. (ii) Simulating three basic types of seismic regime. (iii) A study of regime switching in a parameter space of the loading and healing rates. The second paper focuses on the earthquake prediction problem.
Cluster analysis of typhoon tracks. Part I: General properties
 J. CLIMATE
, 2007
"... A new probabilistic clustering technique, based on a regression mixture model, is used to describe tropical cyclone trajectories in the western North Pacific. Each component of the mixture model consists of a quadratic regression curve of cyclone position against time. The besttrack 1950–2002 datas ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
(Show Context)
A new probabilistic clustering technique, based on a regression mixture model, is used to describe tropical cyclone trajectories in the western North Pacific. Each component of the mixture model consists of a quadratic regression curve of cyclone position against time. The besttrack 1950–2002 dataset is described by seven distinct clusters. These clusters are then analyzed in terms of genesis location, trajectory, landfall, intensity, and seasonality. Both genesis location and trajectory play important roles in defining the clusters. Several distinct types of straightmoving, as well as recurving, trajectories are identified, thus enriching this main distinction found in previous studies. Intensity and seasonality of cyclones, though not used by the clustering algorithm, are both highly stratified from cluster to cluster. Three straightmoving trajectory types have very small withincluster spread, while the recurving types are more diffuse. Tropical cyclone landfalls over East and Southeast Asia are found to be strongly cluster dependent, both in terms of frequency and region of impact. The relationships of each cluster type with the largescale circulation, sea surface temperatures, and the
Multiple regimes and lowfrequency oscillations in the Northern Hemisphere’s zonalmean flow
 J. ATMOS. SCI
, 2006
"... This paper studies multiple regimes and lowfrequency oscillations in the Northern Hemisphere zonalmean zonal flow in winter, using 55 years of daily observational data. The probability density function estimated in the phase space spanned by the two leading empirical orthogonal functions exhibits ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
This paper studies multiple regimes and lowfrequency oscillations in the Northern Hemisphere zonalmean zonal flow in winter, using 55 years of daily observational data. The probability density function estimated in the phase space spanned by the two leading empirical orthogonal functions exhibits two distinct, statistically significant maxima. The two regimes associated with these maxima describe persistent zonalflow states that are characterized by meridional displacements of the midlatitude jet, poleward and equatorward of its timemean position. The geopotential height anomalies of either regime have a pronounced zonally symmetric component, but largestamplitude anomalies are located over the Atlantic and Pacific oceans. Highfrequency synoptic transients participate in the maintenance of and transitions between these regimes. Significant oscillatory components with periods of 147 and 72 days are identified by spectral analysis of the zonalflow time series. These oscillations are described by singular spectrum analysis and the multitaper method. The 147day oscillation involves zonalflow anomalies that propagate poleward, while the 72day oscillation only manifests northward propagation in the Atlantic sector. Both modes mainly describe changes in the midlatitudejet position and intensity. In the horizontal plane though, the two modes exhibit synchronous centers of action located over the Atlantic and Pacific oceans. The two persistent flow regimes are associated with slow phases of either oscillation.
Probabilistic Clustering of Extratropical Cyclones Using Regression Mixture Models
 Climate Dynamics
, 2006
"... A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. We use a regression mixture model to describe the longitudetime and latitude–time propagation of the ETCs. A simple tracking algorithm is applied to 6hourl ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. We use a regression mixture model to describe the longitudetime and latitude–time propagation of the ETCs. A simple tracking algorithm is applied to 6hourly mean sealevel pressure fields to obtain the tracks from either a general circulation model (GCM) or a reanalysis data set. Quadratic curves are found to provide the best description of the data. We select a threecluster classification for both data sets, based on a mix of objective and subjective criteria. The track orientations in each of the clusters are broadly similar for the GCM and reanalyzed data; they are characterized by predominantly southtonorth (S–N), westtoeast (W–E), and southwesttonortheast (SW–NE) tracking cyclones, respectively. The reanalysis cyclone tracks, however, are found to be much more tightly clustered geographically than those of the GCM. For the reanalysis data, a link is found between the occurrence of cyclones belonging to different clusters of trajectoryshape, and the phase of the North Atlantic Oscillation (NAO). The positive
2006: Subseasonaltointerdecadal variability of the Australian monsoon over north Queensland
 Quart. J. Royal Meteor. Soc
"... Daily rainfall occurrence and amount at 11 stations over North Queensland are examined during summer 1958–1997, using a Hidden Markov Model (HMM). Daily rainfall variability is described in terms of the occurrence of five discrete “weather states, ” identified by the HMM. Three states are characteri ..."
Abstract

Cited by 14 (10 self)
 Add to MetaCart
(Show Context)
Daily rainfall occurrence and amount at 11 stations over North Queensland are examined during summer 1958–1997, using a Hidden Markov Model (HMM). Daily rainfall variability is described in terms of the occurrence of five discrete “weather states, ” identified by the HMM. Three states are characterized respectively by very wet, moderately wet, and dry conditions at most stations; two states have enhanced rainfall along the coast and dry conditions inland. Each HMM rainfall state is associated with a distinct atmospheric circulation regime. The two wet states are accompanied by monsoonal circulation patterns, with largescale ascent, lowlevel inflow from the northwest, and a phase reversal with height; the dry state is characterized by circulation anomalies of the opposite sense. Two of the states show significant associations with midlatitude synoptic waves. Variability of the monsoon on time scales from subseasonal to interdecadal is interpreted in terms of changes in the frequency of occurrence of the five HMM rainfall states. Large subseasonal variability is identified in terms of active and break phases, and a highly variable monsoon onset date. The occurrence of the verywet and dry states is somewhat modulated by the MaddenJulian oscillation. On interannual timescales, there are clear relationships with the El NiñoSouthern Oscillation and Indian Ocean sea surface temperatures. Interdecadal monsoonal variability is characterized by stronger monsoons during the 1970s, and weaker monsoons plus an increased prevalence of dryer states since then.