Results 1  10
of
14
CLICKS: An Effective Algorithm for Mining Subspace Clusters in Categorical Datasets
, 2005
"... We present a novel algorithm called Clicks, that finds clusters in categorical datasets based on a search for kpartite maximal cliques. Unlike previous methods, Clicks mines subspace clusters. It uses a selective vertical method to guarantee complete search. Clicks outperforms previous approaches b ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We present a novel algorithm called Clicks, that finds clusters in categorical datasets based on a search for kpartite maximal cliques. Unlike previous methods, Clicks mines subspace clusters. It uses a selective vertical method to guarantee complete search. Clicks outperforms previous approaches by over an order of magnitude and scales better than any of the existing method for highdimensional datasets. These results are demonstrated in a comprehensive performance study on real and synthetic datasets.
CLICKS: Mining Subspace Clusters in Categorical Data Via kPartite Maximal Cliques
"... We present a novel algorithm called CLICKS, that finds clusters in categorical datasets based on a search for kpartite maximal cliques. Unlike previous methods, CLICKS mines subspace clusters. It uses a selective vertical method to guarantee complete search. CLICKS outperforms previous approaches by ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We present a novel algorithm called CLICKS, that finds clusters in categorical datasets based on a search for kpartite maximal cliques. Unlike previous methods, CLICKS mines subspace clusters. It uses a selective vertical method to guarantee complete search. CLICKS outperforms previous approaches by over an order of magnitude and scales better than any of the existing method for highdimensional datasets. We demonstrate this improvement in an excerpt from our comprehensive performance studies.
Research Paper Recommender Systems: A Subspace Clustering Approach
 IN INTERNATIONAL CONFERENCE ON WEBAGE INFORMATION MANAGEMENT (WAIM
, 2005
"... Researchers from the same lab often spend a considerable amount of time searching for published articles relevant to their current project. Despite having similar interests, they conduct independent, time consuming searches. While they may share the results afterwards, they are unable to leverage pr ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Researchers from the same lab often spend a considerable amount of time searching for published articles relevant to their current project. Despite having similar interests, they conduct independent, time consuming searches. While they may share the results afterwards, they are unable to leverage previous search results during the search process. We propose a research paper recommender system that avoids such time consuming searches by augmenting existing search engines with recommendations based on previous searches performed by others in the lab. Most existing recommender systems were developed for commercial domains with millions of users. The research paper domain has relatively few users compared to the large number of online research papers. The two major challenges with this type of data are the large number of dimensions and the sparseness of the data. The novel contribution of the paper is a scalable subspace clustering algorithm (SCuBA 1)thattackles these problems. Both synthetic and benchmark datasets are used to evaluate the clustering algorithm and to demonstrate that it performs better than the traditional collaborative filtering approaches when recommending research papers.
A subspace clustering framework for research group collaboration
 International Journal of Information Technology and Web Engineering
, 2006
"... Researchers spend considerable time searching for relevant papers on the topic in which they are currently interested. Often, despite having similar interests, researchers in the same lab do not find it convenient to share results of bibliographic searches and thus conduct independent timeconsuming ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Researchers spend considerable time searching for relevant papers on the topic in which they are currently interested. Often, despite having similar interests, researchers in the same lab do not find it convenient to share results of bibliographic searches and thus conduct independent timeconsuming searches. Research paper recommender systems can help the researcher avoid such timeconsuming searches by allowing each researcher to automatically take advantage of previous searches performed by others in the lab. Existing recommender systems were developed for commercial domains to assist users by focussing towards products of their interests. Unlike those domains, the research paper domain has relatively few users when compared with the huge number of research papers. In this paper we present a novel system to recommend relevant research papers to a user based on the user’s recent querying and browsing habits. The core of the system is a scalable subspace clustering algorithm (SCuBA 1) that performs well on the sparse, highdimensional data collected in this domain. Both synthetic and benchmark datasets are used to evaluate the recommendation system and to demonstrate that it performs better than the traditional collaborative filtering approaches when recommending research papers.
Hierarchical densitybased clustering of categorical data and a simplification
 In: Proceedings of the 11th PacificAsia Conference on Knowledge Discovery and Data Mining (PAKDD 2007), Springer LNCS 4426/2007
, 2007
"... Abstract. A challenge involved in applying densitybased clustering to categorical datasets is that the ‘cube ’ of attribute values has no ordering defined. We propose the HIERDENC algorithm for hierarchical densitybased clustering of categorical data. HIERDENC offers a basis for designing simpler ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. A challenge involved in applying densitybased clustering to categorical datasets is that the ‘cube ’ of attribute values has no ordering defined. We propose the HIERDENC algorithm for hierarchical densitybased clustering of categorical data. HIERDENC offers a basis for designing simpler clustering algorithms that balance the tradeoff of accuracy and speed. The characteristics of HIERDENC include: (i) it builds a hierarchy representing the underlying cluster structure of the categorical dataset, (ii) it minimizes the userspecified input parameters, (iii) it is insensitive to the order of object input, (iv) it can handle outliers. We evaluate HIERDENC on smalldimensional standard categorical datasets, on which it produces more accurate results than other algorithms. We present a faster simplification of HIERDENC called the MULIC algorithm. MULIC performs better than subspace clustering algorithms in terms of finding the multilayered structure of special datasets. 1
HACS: Heuristic Algorithm for Clustering Subsets
"... The term consideration set is used in marketing to refer to the set of items a customer thought about purchasing before making a choice. While consideration sets are not directly observable, finding common ones is useful for market segmentation and choice prediction. We approach the problem of induc ..."
Abstract
 Add to MetaCart
(Show Context)
The term consideration set is used in marketing to refer to the set of items a customer thought about purchasing before making a choice. While consideration sets are not directly observable, finding common ones is useful for market segmentation and choice prediction. We approach the problem of inducing common consideration sets as a clustering problem on the space of possible item subsets. Our algorithm combines ideas from binary clustering and itemset mining, and differs from other clustering methods by reflecting the inherent structure of subset clusters. Experiments on both real and simulated datasets show that our algorithm clusters effectively and efficiently even for sparse datasets. In addition, a novel evaluation method is developed to compare clusters found by our algorithm with known ones. 1
REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE BY RESEARCH (COMPUTER SCIENCES) Under the Gudanice of
, 2006
"... ii The process of grouping similar objects in the given dataset is known as clustering. A large variety of clustering algorithms have been proposed to nd clusters in the given dataset. Not many reallife datasets are available for testing the proposed algorithms. Moreover the existing datasets do no ..."
Abstract
 Add to MetaCart
(Show Context)
ii The process of grouping similar objects in the given dataset is known as clustering. A large variety of clustering algorithms have been proposed to nd clusters in the given dataset. Not many reallife datasets are available for testing the proposed algorithms. Moreover the existing datasets do not have actual clustering result. This leads to the idea of generating benchmarking datasets with high dimensionality and noise, which can evaluate clustering algorithms on various aspects like scalability, accuracy and robustness to noise. We rst propose few algorithms and methodologies that generate highdimensional cluster datasets in Rd space along with the original clustering results. We developed a toolkit called SynDECA[1] that generates synthetic datasets based on the algorithms proposed. Given inputs like the number of clusters; dimensionality; maximum value of a dimension and size of the dataset by the user, SynDECA generates the clustering dataset. The proposed methods ensure that there are exactly the requested number of clusters in the dataset. Traditional clustering algorithms try to nd clusters in all dimensions of the dataset. When the dimensionality of the dataset increases, some dimensions could be irrelevant for few data points. There could be clusters which are spread in subset of dimensions of the dataset, these clusters may not be visible when seen in all the dimensions of
On finding kcliques in kpartite graphs
"... In this paper, a branchandbound algorithm for finding all cliques of size k in a kpartite graph is proposed that improves upon the method of Grunert et al (2002). The new algorithm uses bitvectors, or bitsets, as the main data structure in bitparallel operations. Bitsets enable a new form of da ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, a branchandbound algorithm for finding all cliques of size k in a kpartite graph is proposed that improves upon the method of Grunert et al (2002). The new algorithm uses bitvectors, or bitsets, as the main data structure in bitparallel operations. Bitsets enable a new form of data representation that improves branching and backtracking of the branchandbound procedure. Numerical studies on randomly generated instances of kpartite graphs demonstrate competitiveness of the developed method.