Results 1 -
2 of
2
A Method for Monolingual Thesauri Merging
, 1997
"... Thesauri merging is the activity of consolidating a set of thesauri into a thesaurus which accommodates the vocabularies and the structure of all thesauri being merged. In this paper, we introduce a general framework for monolingual thesauri merging. We also present a domain independent set-theoreti ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Thesauri merging is the activity of consolidating a set of thesauri into a thesaurus which accommodates the vocabularies and the structure of all thesauri being merged. In this paper, we introduce a general framework for monolingual thesauri merging. We also present a domain independent set-theoretic model for the representation of terms, relationships, and integrity constraints. Finally, we present a method for the merging of monolingual thesauri focusing on its mechanisms for the detection of equivalent terms among the thesauri being merged. Our method expands previous work on the problem; we introduce equivalence assumptions that express similarity between terms and we propose a term distance model which can be used to guide the confirmation or rejection of equivalence assumptions.
Hierarchical Taxonomies using Divisive Partitioning
, 1998
"... We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many favorable properties. In particular, the algorithm shows excellent scalability to large data collections and produces high quality clusters which are competitive with other clustering methods. The alg ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many favorable properties. In particular, the algorithm shows excellent scalability to large data collections and produces high quality clusters which are competitive with other clustering methods. The algorithm yields information on the significant and distinctive words within each cluster, and these words can be inserted into the naturally occuring hierarchical structure produced by the algorithm. The result is an automatically generated hierarchical topical taxonomy of a document set. In this paper, we show how the algorithm's cost scales up linearly with the size of the data, illustrate experimentally the quality of the clusters produced, and show how the algorithm can produce a hierarchical topical taxonomy.

