Results 1 - 10
of
17
Mixtures of ARMA Models for Model-Based Time Series Clustering
- In Proceedings of the IEEE International Conference on Data Mining
, 2002
"... Clustering problems are central to many knowledge discovery and data mining tasks. However, most existing clustering methods can only work with fixed-dimensional representations of data patterns. In this paper, we study the clustering of data patterns that are represented as sequences or time series ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Clustering problems are central to many knowledge discovery and data mining tasks. However, most existing clustering methods can only work with fixed-dimensional representations of data patterns. In this paper, we study the clustering of data patterns that are represented as sequences or time series possibly of di#erent lengths. We propose a model-based approach to this problem using mixtures of autoregressive moving average (ARMA) models. We derive an expectation-maximization (EM) algorithm for learning the mixing coe#cients as well as the parameters of the component models. The algorithm can determine the number of clusters in the data automatically. Experiments were conducted on a number of simulated and real datasets. Results from the experiments show that our method compares favorably with another method recently proposed by others for similar time series clustering problems.
Data Mining for Web Personalization
- The Adaptive Web: Methods and Strategies of Web Personalization. Lecture
, 2006
"... Abstract. In this chapter we present an overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle. These phases include data collection and preprocessing, pattern discovery and evaluation, and finally applying ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract. In this chapter we present an overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle. These phases include data collection and preprocessing, pattern discovery and evaluation, and finally applying the discovered knowledge in real-time to mediate between the user and the Web. This view of the personalization process provides added flexibility in leveraging multiple data sources and in effectively using the discovered models in an automatic personalization system. The chapter provides a detailed discussion of a host of activities and techniques used at different stages of this cycle, including the preprocessing and integration of data from multiple sources, as well as pattern discovery techniques that are typically applied to this data. We consider a number of classes of data mining algorithms used particularly for Web personalization, including techniques based on clustering, association rule discovery, sequential pattern mining, Markov models, and probabilistic mixture and hidden (latent) variable models. Finally, we discuss hybrid data mining frameworks that leverage data from a variety
An Overview of Web Data Clustering Practices
- PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON EXTENDING DATABASE TECHNOLOGY - EDBT’04, SPRINGER-LNCS 3268
, 2004
"... Clustering is a challenging topic in the area of Web data management. Various ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Clustering is a challenging topic in the area of Web data management. Various
Model-Based Cluster Analysis for Web Users Sessions
- IN PROCEEDINGS OF THE 15TH INTERNATIONAL SYMPOSIUM ON METHODOLOGIES FOR INTELLIGENT SYSTEMS (ISMIS 2005
, 2005
"... One of the main issues in Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of users'sessions and discovering association rules or frequent navigational paths, do not generally allow to characterize or quantify the uno ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
One of the main issues in Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of users'sessions and discovering association rules or frequent navigational paths, do not generally allow to characterize or quantify the unobservable factors that lead to common navigational patterns. Therefore, it is necessary to develop techniques that can discover hidden and useful relationships among users as well as between users andWeb objects. Correspondence Analysis (CO-AN) is particularly useful in this context, since it can uncover meaningful associations among users and pages. We present
Scalable model-based cluster analysis using clustering features
- Pattern Recognition
, 2005
"... We present two scalable model-based clustering systems based on a Gaussian mix-ture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm — EMACF. EMACF approximates the ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We present two scalable model-based clustering systems based on a Gaussian mix-ture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm — EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably con-verges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few loss of accuracy.
A probabilistic validation algorithm for Web users’ clusters
- In Proceedings of the IEEE international conference on systems, man and cybernetics (SMC
, 2004
"... Abstract – Cluster analysis is one of the most important aspects in the data mining process for discovering groups and identifying interesting distributions or patterns over the considered data sets. In the context of Web data mining, model-based clustering algorithms are often used to cluster simil ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract – Cluster analysis is one of the most important aspects in the data mining process for discovering groups and identifying interesting distributions or patterns over the considered data sets. In the context of Web data mining, model-based clustering algorithms are often used to cluster similar users ’ sessions in order to determine Website access behaviors. An important issue in cluster analysis is the evaluation of clustering results to find the partitioning that best fits the underlying data. In this paper, we present a novel validation technique for modelbased clustering approaches.
Quicklink Selection for Navigational Query Results
"... Quicklinks for a website are navigational shortcuts displayed below the website homepage on a search results page, and that let the users directly jump to selected points inside the website. Since the real-estate on a search results page is constrained and valuable, picking the best set of quicklink ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Quicklinks for a website are navigational shortcuts displayed below the website homepage on a search results page, and that let the users directly jump to selected points inside the website. Since the real-estate on a search results page is constrained and valuable, picking the best set of quicklinks to maximize the benefits for a majority of the users becomes an important problem for search engines. Using user browsing trails obtained from browser toolbars, and a simple probabilistic model, we formulate the quicklink selection problem as a combinatorial optimizaton problem. We first demonstrate the hardness of the objective, and then propose an algorithm that is provably within a factor of (1 − 1/e) of the optimal. We also propose a different algorithm that works on trees and that can find the optimal solution; unlike the previous algorithm, this algorithm can incorporate natural constraints on the set of chosen quicklinks. The efficacy of our methods is demonstrated via empirical results on both a manually labeled set of websites and a set for which quicklink click-through rates for several webpages were obtained from a real-world search engine.
Robust Process Discovery with Artificial Negative Events
- The Journal of Machine Learning Research
"... Process discovery is the automated construction of structured process models from information system event logs. Such event logs often contain positive examples only. Without negative examples, it is a challenge to strike the right balance between recall and specificity, and to deal with problems su ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Process discovery is the automated construction of structured process models from information system event logs. Such event logs often contain positive examples only. Without negative examples, it is a challenge to strike the right balance between recall and specificity, and to deal with problems such as expressiveness, noise, incomplete event logs, or the inclusion of prior knowledge. In this paper, we present a configurable technique that deals with these challenges by representing process discovery as a multi-relational classification problem on event logs supplemented with Artificially Generated Negative Events (AGNEs). This problem formulation allows using learning algorithms and evaluation techniques that are well-know in the machine learning community. Moreover, it allows users to have a declarative control over the inductive bias and language bias.
Validation and interpretation of Web users’ sessions clusters
, 2007
"... Understanding users’ navigation on the Web is important towards improving the quality of information and the speed of accessing large-scale Web data sources. Clustering of users ’ navigation into sessions has been proposed in order to identify patterns and similarities which are then managed in the ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Understanding users’ navigation on the Web is important towards improving the quality of information and the speed of accessing large-scale Web data sources. Clustering of users ’ navigation into sessions has been proposed in order to identify patterns and similarities which are then managed in the context of Web users oriented applications (searching, e-commerce, etc.). This paper deals with the problem of assessing the quality of user session clusters in order to make inferences regarding the users ’ navigation behavior. A common model-based clustering algorithm is used to result in clusters of Web users ’ sessions. These clusters are validated by using a statistical test, which measures the distances of the clusters’ distributions to infer their dissimilarity and distinguishing level. Furthermore, a visualization method is proposed in order to interpret the relation between clusters. Using real data sets, we illustrate how the proposed analysis can be applied in popular application scenarios to reveal valuable associations among Web users ’ navigation sessions.
Web usage mining. Structuring semantically enriched clickstream data
, 2004
"... Web servers worldwide generate a vast amount of information on web users ’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Web servers worldwide generate a vast amount of information on web users ’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational) of the requests. The goal of this project is to analyse user behaviour by mining enriched web access log data. We discuss techniques and processes required for preparing, structuring and enriching web access logs. Furthermore we present several web usage mining methods for extracting useful features. Finally we employ all these techniques to cluster the users of the domain www.cs.vu.nl and to study their behaviours comprehensively. The contributions of this thesis are a data enrichment that is content and origin based and a treelike visualization of frequent navigational sequences. This visualization allows for an easily interpretable tree-like view of patterns with highlighted relevant information. The results of this project can be applied on diverse purposes, including marketing, web content

