• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Hierarchical Taxonomies using Divisive Partitioning (1998)

by Daniel Boley
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Co-clustering documents and words using Bipartite Spectral Graph Partitioning

by Inderjit S. Dhillon , 2001
"... ..."
Abstract - Cited by 220 (6 self) - Add to MetaCart
Abstract not found

Partitioning-based clustering for web document categorization. Decision Support Systems

by Daniel Boley, Maria Gini, Robert Gross, Eui-hong (sam Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, Jerome Moore , 1999
"... Clustering techniques have been used by manyintelligent software agents in order to retrieve, lter, and categorize documents available on the World Wide Web. Clustering is also useful in extracting salient features of related web documents to automatically formulate queries and search for other simi ..."
Abstract - Cited by 56 (12 self) - Add to MetaCart
Clustering techniques have been used by manyintelligent software agents in order to retrieve, lter, and categorize documents available on the World Wide Web. Clustering is also useful in extracting salient features of related web documents to automatically formulate queries and search for other similar documents on the Web. Traditional clustering algorithms either use a priori knowledge of document structures to de ne a distance or similarity among these documents, or use probabilistic techniques such as Bayesian classi cation. Many of these traditional algorithms, however, falter when the dimensionality of the feature space becomes high relative to the size of the document space. In this paper, we introduce two new clustering algorithms that can e ectively cluster documents, even in the presence of a very high dimensional feature space. These clustering techniques, which are based on generalizations of graph partitioning, do not require pre-speci ed ad hoc distance functions, and are capable of automatically discovering document similarities or associations. We conduct several experiments on real Web data using various feature selection heuristics, and compare our clustering schemes to standard distance-based techniques, such ashierarchical agglomeration clustering, and Bayesian classi cation methods, such as AutoClass.

A client-side web agent for document categorization

by D. Boley, M. Gini, K. Hastings, B. Mobasher, J. Moore - J. Internet Research , 1998
"... We propose a client-side agent for exploring and categorizing documents on the World Wide Web. As the user browses the Web using a usual web browser, this agent is designed to aid the user by classifying the documents the user finds most interesting into clusters. The agent carries out the task comp ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
We propose a client-side agent for exploring and categorizing documents on the World Wide Web. As the user browses the Web using a usual web browser, this agent is designed to aid the user by classifying the documents the user finds most interesting into clusters. The agent carries out the task completely automatically and autonomously, with as little user intervention as the user desires. The principal novel components in this agent that make it possible are (i) a scalable hierarchical clustering algorithm and (ii) a taxonomic label generator. In this paper, we describe the overall architecture of this agent and discuss the details of the algorithms within its key components.

Computer Assisted Processing of Large Unstructured Document Sets: A Case Study in the Construction Industry

by John Mckechnie, Sameh Shaaban
"... Construction is one of the most information intensive industries; typically professionals access the industry information resources on a daily basis. The major constraints to the future development of a formally encoded knowledge base are fragmented information sources and lack of comprehensive ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Construction is one of the most information intensive industries; typically professionals access the industry information resources on a daily basis. The major constraints to the future development of a formally encoded knowledge base are fragmented information sources and lack of comprehensive classification schemes. In agreement with earlier research and over twenty years of practical experience we have found that manually categorising a large collection of documents is error-prone, timeconsuming, expensive and produces inconsistent results.

Inter-document similarity in web searches

by Bruno Martins, Bruno Emanuel, Bruno Emanuel, Da Graça Martins, Da Graça Martins, Mestre Em Informática, Mestre Em Informática, Mário Gaspar, Mário Gaspar, Da Silva, Da Silva, José Luís, José Luís, Cabral De, Cabral De, Moura Borges, Moura Borges, André Osório, André Osório, E Cruz, E Cruz, De Azevedo Falcão, De Azevedo Falcão, Thibault Nicolas Langlois, Thibault Nicolas Langlois , 2004
"... are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri: ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address. Orientador: Júri:
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University