• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Sufficient dimensionality reduction - a novel analysis method (2002)

by A Globerson, N Tishby
Venue:In ICML
Add To MetaCart

Tools

Sorted by:
Results 1 - 2 of 2

Exploration of dimensionality reduction for text visualization

by Shiping Huang, Matthew O. Ward, Elke A. Rundensteiner - In Proc. IEEE Third Intl. Conf. on Coordinated and Multiple Views in Exploratory Visualization , 2005
"... In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multidimensional scaling) and neurocomputation models (e.g., self-organizing feature maps) have been widely used for dimensionality reduction. Often the resulting dimensionality is set to ..."
Abstract - Cited by 17 (1 self) - Add to MetaCart
In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multidimensional scaling) and neurocomputation models (e.g., self-organizing feature maps) have been widely used for dimensionality reduction. Often the resulting dimensionality is set to two, as this facilitates plotting the results. The validity and effectiveness of these approaches largely depend on the specific data sets used and semantics of the targeted applications. To date, there has been little evaluation to assess and compare dimensionality reduction methods and dimensionality reduction processes, either numerically or empirically. The focus of this paper is to propose a mechanism for comparing and evaluating the effectiveness of dimensionality reduction techniques in the visual exploration of text document archives. We use multivariate visualization techniques and interactive visual exploration to study three problems: (a) Which dimensionality reduction technique best preserves the interrelationships within a set of text documents; (b) What is the sensitivity of the results to the number of output dimensions; (c) Can we automatically remove redundant or unimportant words from the vector extracted from the documents while still preserving the majority of information, and thus make dimensionality reduction more efficient. To study each problem, we generate supplemental dimensions based on several dimensionality reduction algorithms and parameters controlling these algorithms. We then visually analyze and explore the characteristics of the reduced dimensional spaces as implemented within a linked, multi-view multi-dimensional visual exploration tool, XmdvTool. We compare the derived dimensions to features known to be present in the original data. Quantitative measures are also used in identifying the quality of results using different numbers of output dimensions.
(Show Context)

Citation Context

...g PCA will deviate substantially from the optimal. A nonlinear dimension reduction method with minimal loss of (mutual) information contained in the original data was proposed for text classification =-=[13]-=-. In addition, dimension reduction by random mapping was also reported [26, 20]. 5 CONCLUSION AND FUTURE WORK In this paper, several existing dimension reduction techniques were explored and evaluated...

Clustering the Space of Phrases Identified by an Ensemble of Supervised Shallow Parsers

by Yuval Krymolowski, Zvika Marx , 2002
"... We present a novel method for clustering syntactic structures in raw text. The method involves rst generating an ensemble of classifiers, by training a statistical parser on samples from the training data. Then, the classifier outputs for each instance are used as input for clustering. The resulting ..."
Abstract - Add to MetaCart
We present a novel method for clustering syntactic structures in raw text. The method involves rst generating an ensemble of classifiers, by training a statistical parser on samples from the training data. Then, the classifier outputs for each instance are used as input for clustering. The resulting clusters group together instances whose internal representations in the statistical model are similar, and therefore depend on the feature set it uses. We apply our method to the simple task of clustering noun phrases. Experiments show that most encouraging results are obtained when the training samples are small. Possible applications of the method include speeding up error analysis of a supervised system, and adapting a supervised system to a new domain.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University