Results 1 -
8 of
8
Hybrid Genetic Algorithms are Better for Spatial Clustering
- PRICAI 2000 Topics in Artificial Intelligence. 6th Pacific Rim Internationa Conference on Artificial Intelligence
, 1998
"... Iterative methods and genetic algorithms have been used separately to minimise the loss function of many representative-based clustering formulations. Neither of them alone seems to be significantly better. Moreover, the trade-off of effort vs quality slightly favours gradient descent. We present a ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Iterative methods and genetic algorithms have been used separately to minimise the loss function of many representative-based clustering formulations. Neither of them alone seems to be significantly better. Moreover, the trade-off of effort vs quality slightly favours gradient descent. We present a unifying view for the three most popular loss functions: least sum of squares, its fuzzy version and the log likelihood function. We identify commonalties in gradient descent algorithms for the three loss functions and the evaluation of the loss function itself. We can then construct hybrids (genetic algorithms with a mutation operation that performs few gradient descent steps) for all three clustering approaches. We demonstrate that these hybrids are much efficient and effective (significantly render better performance as normalised by the number of function evaluations). Keywords. Evolutionary computation, knowledge discovery in databases, statistical approaches, clustering...
Fast Randomized Algorithms for Robust Estimation of Location
"... . A fundamental procedure appearing within such clustering methods as k-Means, Expectation Maximization, Fuzzy-C-Means and Minimum Message Length is that of computing estimators of location. Most estimators of location exhibiting useful robustness properties require at least quadratic time to co ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
. A fundamental procedure appearing within such clustering methods as k-Means, Expectation Maximization, Fuzzy-C-Means and Minimum Message Length is that of computing estimators of location. Most estimators of location exhibiting useful robustness properties require at least quadratic time to compute, far too slow for large data mining applications. In this paper, we propose O(Dn p n)-time randomized algorithms for computing robust estimators of location, where n is the size of the data set, and D is the dimension. Keywords: clustering, spatial data mining, robust statistics, location. 1 Introduction When analyzing large sets of spatial information (both 2-dimensional and higherdimensional) , classical multivariate statistical procedures such as variable standardization, multivariate studentizing, outlier detection, discriminant analysis, principal components, factor analysis, structural models and canonical correlations all require that the center and scatter of a cloud of...
Clustering Items in Different Data Sources Induced by Stability
, 2007
"... Abstract: Many multi-branch companies transact from different branches. Each branch of such a company maintains a separate database over time. The variation of sales of an item over time is an important issue. Thus, we introduce the notion of stability of an item. Stable items are useful in making m ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract: Many multi-branch companies transact from different branches. Each branch of such a company maintains a separate database over time. The variation of sales of an item over time is an important issue. Thus, we introduce the notion of stability of an item. Stable items are useful in making many strategic decisions for a company. Based on the degree of stability of an item, we design an algorithm for clustering items in different data sources. We have proposed the notion of best cluster by considering average degree of variation of a class. Also, we have designed an alternative algorithm to find best cluster among items in different data sources. Experimental results are provided on three transactional databases.
Web Sessions Clustering With Artificial Ants Colonies
, 2003
"... In this paper, we present AntClust, an ant based clustering algorithm and its application to the Web usage mining problem. We define a Web session as a weighted multi-modal vector and we also develop a similarity measure between two sessions. We show that the partitions found by AntClust are stable ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we present AntClust, an ant based clustering algorithm and its application to the Web usage mining problem. We define a Web session as a weighted multi-modal vector and we also develop a similarity measure between two sessions. We show that the partitions found by AntClust are stable on a data set made of real sessions extracted from a Web site of the University of Tours. Contrary to some other studies, we do not only consider the transactions model to describe the sessions. We show that our algorithm performs well and is able to find non-noisy clusters when dealing with sessions defined by a vector containing the number of hits recorded for each of the Web page.
with Self Adaptive Genetic Operators (ECSAGO). This
"... Abstract — We present an algorithm for Evolutionary Clustering ..."
Chapter 15 CLUSTERING METHODS
"... Keywords: This chapter presents a tutorial overview of the main clustering methods used in Data Mining. The goal is to provide a self-contained review of the concepts and the mathematics underlying clustering techniques. The chapter begins by providing measures and criteria that are used for determi ..."
Abstract
- Add to MetaCart
Keywords: This chapter presents a tutorial overview of the main clustering methods used in Data Mining. The goal is to provide a self-contained review of the concepts and the mathematics underlying clustering techniques. The chapter begins by providing measures and criteria that are used for determining whether two objects are similar or dissimilar. Then the clustering methods are presented, divided into: hierarchical, partitioning, density-based, model-based, grid-based, and soft-computing methods. Following the methods, the challenges of performing clustering in large data sets are discussed. Finally, the chapter presents how to determine the number of clusters. Clustering, K-means, Intra-cluster homogeneity, Inter-cluster separability, 1.
Comparing Language Similarity across Genetic and Typologically-Based Groupings
"... Recent studies have shown the potential benefits of leveraging resources for resource-rich languages to build tools for similar, but resource-poor languages. We examine what constitutes “similarity ” by comparing traditional phylogenetic language groups, which are motivated largely by genetic relati ..."
Abstract
- Add to MetaCart
Recent studies have shown the potential benefits of leveraging resources for resource-rich languages to build tools for similar, but resource-poor languages. We examine what constitutes “similarity ” by comparing traditional phylogenetic language groups, which are motivated largely by genetic relationships, with language groupings formed by clustering methods using typological features only. Using data from the World Atlas of Language Structures (WALS), our preliminary experiments show that typologically-based clusters look quite different from genetic groups, but perform as good or better when used to predict feature values of member languages. 1
Mining and Analysis of Clickstream Patterns
"... Abstract. The explosive growth of the web has drastically changed the way in which information is managed and accessed. The large-scale of web data sources and the wide availability of services over the internet have increased the need for effective web data mining techniques and mechanisms. A sophi ..."
Abstract
- Add to MetaCart
Abstract. The explosive growth of the web has drastically changed the way in which information is managed and accessed. The large-scale of web data sources and the wide availability of services over the internet have increased the need for effective web data mining techniques and mechanisms. A sophisticated method to organize the layout of the information and assist user navigation is therefore particularly important. In this work, we focus on web usage mining, applying data mining techniques to web server logs. Web usage mining is the non-trivial process of distinguishing implicit, previously unknown but potentially useful clickstream patterns that may exist in any collection of web access logs. The required abstraction can be generated by clustering the web access logs based on some sort of similarity measure. Clustering is done such that the web access logs within the same group or cluster are more similar than data points from different clusters. In this chapter, we propose a partitional algorithm namely Multi Pass Combined Standard Deviation(CSD) Means algorithm which automatically generates the optimum number of clusters from the web clickstream patterns. The quality of clusters obtained using these algorithms are compared using K-Means algorithm, Rough K-Means algorithm and model based algorithms ANTCLUST and ACCANTCLUST. The experimental analysis of mined clickstream patterns shows the effectiveness of the proposed algorithm.

