Results 1 -
6 of
6
An overview of similarity measures for clustering XML documents
- Chapter in Athena Vakali and George Pallis
, 2006
"... The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together do ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The large amount and heterogeneity of XML documents on the Web require the development of clustering techniques to group together similar documents. Documents can be grouped together according to their content, their structure, and links inside and among documents. For instance, grouping together documents with similar structures has interesting applications in the context of information extraction, of heterogeneous data integration, of personalized content delivery, of access control definition, of web site structural analysis, of comparison of RNA secondary structures. Many approaches have been proposed for evaluating the structural and content similarity between tree-based and vector-based representations of XML documents. Link-based similarity approaches developed for Web data clustering have been adapted for XML documents. This chapter discusses and compares the most relevant similarity measures and their employment for XML document clustering.
Using Element Clustering to Increase the Efficiency of XML Schema Matching
- Ontology-Driven Semantic Matches between Database Schemas,” Proceedings of the twenty-second International Conference on Data Engineering
, 2006
"... Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research. 1.
Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search
- Proceedings of the 8th International Conference on Discovery Science - DS’05, Springer-LNAI 3735
, 2005
"... Abstract. In this paper, we discuss a method of finding useful clusters of web pages which are significant in the sense that their contents are similar or closely related to ones of higher-ranked pages. Since we are usually careless of pages with lower ranks, they are unconditionally discarded even ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. In this paper, we discuss a method of finding useful clusters of web pages which are significant in the sense that their contents are similar or closely related to ones of higher-ranked pages. Since we are usually careless of pages with lower ranks, they are unconditionally discarded even if their contents are similar to some pages with high ranks. We try to extract such hidden pages together with significant higherranked pages as a cluster. In order to obtain such clusters, we first extract semantic correlations among terms by applying Singular Value Decomposition(SVD) to the term-document matrix generated from a corpus w.r.t. a specific topic. Based on the correlations, we can evaluate potential similarities among web pages from which we try to obtain clusters. The set of web pages is represented as a weighted graph G basedonthesimilaritiesandtheir ranks. Our clusters can be found as pseudo-cliques in G. Wepresentan algorithm for finding Top-N weighted pseudo-cliques. Our experimental result shows that quite valuable clusters can be actually extracted accordingtoourmethod. 1
A Novel Approach for clustering web user sessions using RST
"... Web usage mining has assumed importance in learning about web user’s behavior and user interactions with the website. It uses data mining techniques to discover non-trivial user behavior patterns. These patterns can then be used to make the predictions of next page to be accessed by the user. Web us ..."
Abstract
- Add to MetaCart
Web usage mining has assumed importance in learning about web user’s behavior and user interactions with the website. It uses data mining techniques to discover non-trivial user behavior patterns. These patterns can then be used to make the predictions of next page to be accessed by the user. Web usage mining consists of the steps like web log preprocessing, pattern discovery and pattern analysis. This paper proposes a novel approach for preprocessing wherein rough set clustering is applied to form the clusters of sessions. These sessions could later on be used to form the knowledge base of rules on the basis of which the next page to be accessed could be prefetched. I.
XML Data Clustering: An Overview
"... In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data ..."
Abstract
- Add to MetaCart
In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data in the form of similar contents, tags, paths, structures, and semantics. In this article, we first outline the application representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. In this presentation, we aim to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering component. Finally, the article moves into the description of future trends and research issues that still need to be faced.
1 Innovations in Web Personalization
"... Abstract. The diffusion of the Web and the huge amount of information available online have given rise to the urgent need for systems able to intelligently assist users, when they browse the network. Web personalization offers this invaluable opportunity, representing one of the most important techn ..."
Abstract
- Add to MetaCart
Abstract. The diffusion of the Web and the huge amount of information available online have given rise to the urgent need for systems able to intelligently assist users, when they browse the network. Web personalization offers this invaluable opportunity, representing one of the most important technologies required by an ever increasing number of real-world applications. This chapter presents an overview of the Web personalization in the endeavor of Intelligent systems. 1

