Results 1 -
4 of
4
Iterative Optimization and Simplification of Hierarchical Clusterings
- Journal of Artificial Intelligence Research
, 1995
"... Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high qual ..."
Abstract
-
Cited by 96 (1 self)
- Add to MetaCart
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been construct...
Fast and Intuitive Clustering of Web Documents
- In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining
, 1997
"... Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in response to user queries. Recently, document clustering has been put forth as an alternative method of organizing retrieval results (Cutting et al. 1992). A person browsing the clusters can discover ..."
Abstract
-
Cited by 87 (2 self)
- Add to MetaCart
Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in response to user queries. Recently, document clustering has been put forth as an alternative method of organizing retrieval results (Cutting et al. 1992). A person browsing the clusters can discover patterns that could be overlooked in the traditional presentation. This paper describes two novel clustering methods that intersect the documents in a cluster to determine the set of words (or phrases) shared by all the documents in the cluster. We report on experiments that evaluate these intersectionbased clustering methods on collections of snippets returned from Web search engines. First, we show that word-intersection clustering produces superior clusters and does so faster than standard techniques. Second, we show that our O(n log n) time phrase-intersection clustering method produces comparable clusters and does so more than two orders of magnitude faster than all methods tested. I...
Unsupervised Learning of Probabilistic Concept Hierarchies
- Machine Learning and Its Applications, volume 2049 of Lecture Notes in Computer Science
, 2001
"... Fisher's Cobweb provided a well-defined framework for research on the unsupervised induction of probabilistic concept hierarchies. The system also sparked the development of many successors that extended this framework along various dimensions. In this paper, we summarize the assumptions that Cobweb ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Fisher's Cobweb provided a well-defined framework for research on the unsupervised induction of probabilistic concept hierarchies. The system also sparked the development of many successors that extended this framework along various dimensions. In this paper, we summarize the assumptions that Cobweb embodies about the representation, organization, use, and formation of probabilistic concepts, along with experimental studies that examine its sources of power. After this, we consider three systems -- Arachne, Twilix, and Oxbow -- that incorporate significant extensions and present empirical evidence that these improve behavior. In closing, we discuss other paradigms for the unsupervised learning of probabilistic knowledge and their relation to the Cobweb framework. We thank our collaborators, including John Gennari, Kathleen McKusick, Kevin Thompson, and John Allen, for their contributions to the research described in this paper. Grant MDA 903-85-C0324 from the Army Research Insitute sup...
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
, 1991
"... j_ REPORT DOCUMENTATION PAGE OMBNo.07o4o188 Oijt31!C _e_3rt,r1 _ burden +c,r this, oile(tlOtl]f,nformatlOn,s estimated to average 1 hour per resooqse irlcIu_4_g rife t_me tor re_e'hmg irlstruGIOr;$, sear(rang e,rsDng dat _ sour¢_., _ather_r _:_r_d-na!_t_l_r_g the _a[a needed. 3nd Como/etlng 3nO revi ..."
Abstract
- Add to MetaCart
j_ REPORT DOCUMENTATION PAGE OMBNo.07o4o188 Oijt31!C _e_3rt,r1 _ burden +c,r this, oile(tlOtl]f,nformatlOn,s estimated to average 1 hour per resooqse irlcIu_4_g rife t_me tor re_e'hmg irlstruGIOr;$, sear(rang e,rsDng dat _ sour¢_., _ather_r _:_r_d-na!_t_l_r_g the _a[a needed. 3nd Como/etlng 3nO reviewing lhe c3t{_,<_l©rl,)f rr f,_rma_l©n S_'nd:ommertt _ re_arctmg this burden estbma_e or _rly other asDe _ of l_s c,_lecliQt _ 3f mtormat_O r_. mc)udlrlg sugg_tl_n _, f_r r@d_ng thl' _ DurGen to,a/a_hl_gT,,Dr _ _4eaclGuarter _ Ser,_ces, Curectorate for mforma_lorl ODerat_on' _ and ReoK)r%, 12'15 Je_fE, r_on

