Results 11  20
of
115
Genres In Formation? An Exploratory Study of Web Pages using Cluster Analysis
 Proceedings of the 8th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK
, 2005
"... The Web is a new, large and heterogeneous community where the interaction among the users and the possibility offered by technology may modify existing genres or create new ones. In fact, most genres being borrowed from the paper world have undergone adjustments when moving on to the Web (for instan ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The Web is a new, large and heterogeneous community where the interaction among the users and the possibility offered by technology may modify existing genres or create new ones. In fact, most genres being borrowed from the paper world have undergone adjustments when moving on to the Web (for instance, online newspapers and online manuals). Also, there is a family of genres, which have been created specifically for the Web, e.g. home pages, splash screens, newsletters, hotlists. Besides these, are there other emerging genres on the Web for which a genre label has not been coined yet? Is it possible to capture genres in formation in an automated way? An experiment using cluster analysis has been set up to provide initial answers to these questions. Results show that the main clusters have a shape which is quite welldefined and show a number of regularities. Interestingly, Web pages appear to have been clustered according to their rhetorical/discoursal types (informational, instructional, argumentative, etc.), rather than genre classes (e.g. sermons and editorials, both argumentative, belong to the same cluster). The perception of rhetorical/discoursal types in Web pages has been confirmed by a smallscale Web user study. 1
Evolving Fuzzy Decision Trees with Genetic Programming and Clustering
 In (Lecture Notes in Computer Science
, 2002
"... In this paper we present a new fuzzy decision tree representation for data classi cation using genetic programming. The new fuzzy representation utilizes fuzzy clusters for handling continuous attributes. ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
In this paper we present a new fuzzy decision tree representation for data classi cation using genetic programming. The new fuzzy representation utilizes fuzzy clusters for handling continuous attributes.
A genetic algorithm using hyperquadtrees for lowdimensional kmeans clustering
 IEEE Trans. on Pattern Analysis and Machine Intelligence
, 2006
"... Abstract—The kmeans algorithm is widely used for clustering because of its computational efficiency. Given n points in ddimensional space and the number of desired clusters k, kmeans seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point an ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract—The kmeans algorithm is widely used for clustering because of its computational efficiency. Given n points in ddimensional space and the number of desired clusters k, kmeans seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the kmeans algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyperquadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets. Index Terms—kmeans algorithm, clustering, genetic algorithms, quadtrees, optimal partition, center selection. 1
Kboost: A Scalable Algorithm for High Quality Clustering of Microarray Gene Expression Data TR IIT2007015, Istituto di Informatica e Telematica del CNR
, 2007
"... We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k nonoverlapping clusters. We augment the wellknown furthestpointfirst algorithm for kcenter clustering in metric spaces with a filtering scheme based on the ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k nonoverlapping clusters. We augment the wellknown furthestpointfirst algorithm for kcenter clustering in metric spaces with a filtering scheme based on the triangular inequality. We apply this algorithm to Web snippet clustering, comparing it against strong baselines consisting of recent, fast variants of the classical kmeans iterative algorithm. Our main conclusion is that our method attains solutions of better or comparable accuracy, and does this within a fraction of the time required by the baselines. Our algorithm is thus valuable when, as in Web snippet clustering, either the realtime nature of the task or the large amount of data make the poorly scalable, traditional clustering methods unsuitable.
Realtime compression for dynamic 3d environments
 in MULTIMEDIA ’03: Proceedings of the eleventh ACM international conference on Multimedia
, 2003
"... The goal of teleimmersion has long been to enable people at remote locations to share a sense of presence. A teleimmersion system acquires the 3D representation of a collaborator’s environment remotely and sends it over the network where it is rendered in the user’s environment. Acquisition, recon ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
The goal of teleimmersion has long been to enable people at remote locations to share a sense of presence. A teleimmersion system acquires the 3D representation of a collaborator’s environment remotely and sends it over the network where it is rendered in the user’s environment. Acquisition, reconstruction, transmission, and rendering all have to be done in realtime to create a sense of presence. With added commodity hardware resources, parallelism can increase the acquisition volume and reconstruction data quality while maintaining realtime performance. However this is not as easy for rendering since all of the data need to be combined into a single display. In this paper we present an algorithm to compress data from such 3D environments in realtime to solve this imbalance. We expect the compression algorithm to scale comparably to the acquisition and reconstruction, reduce network transmission bandwidth, and reduce the rendering requirement for realtime performance. We have tested the algorithm using a synthetic office data set and have achieved a 5 to 1 compression for 22 depth streams.
Improvements to the scalability of multiobjective clustering
 In Proceedings of the 2005 IEEE Congress on Evolutionary Computation
, 2005
"... Abstract In previous work, we have proposed a novel approach to data clustering based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [4, 5, 6]. In a comparison to alternative clustering techniques, the approach showed a high performance in ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract In previous work, we have proposed a novel approach to data clustering based on the explicit optimization of a partitioning with respect to two complementary clustering objectives [4, 5, 6]. In a comparison to alternative clustering techniques, the approach showed a high performance in terms of its capability to deal with a range of difcult data properties, including overlapping clusters, elongated cluster shapes and unequally sized clusters. In this paper, we make three modications to the algorithm that improve its scalability to large data sets with high dimensionality and large numbers of clusters. Specically, we introduce new initialization and mutation schemes that enable a more efcient exploration of the search space, and modify the null data model that is used as a basis for selecting the most signicant solution from the Pareto front. The high performance of the resulting algorithm is demonstrated on a newly developed clustering test suite. 1
Cluster generation and labeling for web snippets: A fast, accurate hierarchical solution
 Journal of Internet Mathematics
, 2006
"... Abstract. This paper describes Armil, a metasearch engine that groups the web snippets returned by auxiliary search engines into disjoint labeled clusters. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to his/her information n ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. This paper describes Armil, a metasearch engine that groups the web snippets returned by auxiliary search engines into disjoint labeled clusters. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to his/her information need. Striking the right balance between running time and cluster wellformedness was a key point in the design of our system. Both the clustering and the labeling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and they use no external sources of knowledge. Clustering is performed by means of a fast version of the furthestpointfirst algorithm for metric kcenter clustering. Cluster labeling is achieved by combining intracluster and intercluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Directory Project hierarchy. According to two widely accepted “external ” metrics of clustering quality, Armil achieves better performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labeling algorithms. On a standard desktop PC (AMD Athlon 1Ghz Clock with 750 Mbytes RAM), Armil performs clustering and labeling altogether of up to 200 snippets in less than one second. 1.
A Fuzzy Clustering and Fuzzy Merging Algorithm
, 1999
"... . Some major problems in clustering are: i) find the optimal number K of clusters; ii) assess the validity of a given clustering; iii) permit the classes to form natural shapes rather than forcing them into normed balls of the distance function; iv) prevent the order in which the feature vectors are ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
. Some major problems in clustering are: i) find the optimal number K of clusters; ii) assess the validity of a given clustering; iii) permit the classes to form natural shapes rather than forcing them into normed balls of the distance function; iv) prevent the order in which the feature vectors are read in from affecting the clustering; and v) prevent the order of merging from affecting the clustering. The kmeans algorithm is the most efficient, easiest to implement and has known convergence, but it suffers from all of the above deficiencies. We employ a relatively large number K of uniformly randomly distributed initial prototypes and then thin by deleting any prototypes that are too close to another in a manner to leave fewer uniformly distributed prototypes. We then employ the kmeans algorithm, eliminate empty and very small clusters and iterate a process of computing a new type of fuzzy prototypes and reassigning the feature vectors until the prototypes become fixed. At that poi...