Results 1 -
2 of
2
Exploration of dimensionality reduction for text visualization
- In Proc. IEEE Third Intl. Conf. on Coordinated and Multiple Views in Exploratory Visualization
, 2005
"... In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multidimensional scaling) and neurocomputation models (e.g., self-organizing feature maps) have been widely used for dimensionality reduction. Often the resulting dimensionality is set to ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multidimensional scaling) and neurocomputation models (e.g., self-organizing feature maps) have been widely used for dimensionality reduction. Often the resulting dimensionality is set to two, as this facilitates plotting the results. The validity and effectiveness of these approaches largely depend on the specific data sets used and semantics of the targeted applications. To date, there has been little evaluation to assess and compare dimensionality reduction methods and dimensionality reduction processes, either numerically or empirically. The focus of this paper is to propose a mechanism for comparing and evaluating the effectiveness of dimensionality reduction techniques in the visual exploration of text document archives. We use multivariate visualization techniques and interactive visual exploration to study three problems: (a) Which dimensionality reduction technique best preserves the interrelationships within a set of text documents; (b) What is the sensitivity of the results to the number of output dimensions; (c) Can we automatically remove redundant or unimportant words from the vector extracted from the documents while still preserving the majority of information, and thus make dimensionality reduction more efficient. To study each problem, we generate supplemental dimensions based on several dimensionality reduction algorithms and parameters controlling these algorithms. We then visually analyze and explore the characteristics of the reduced dimensional spaces as implemented within a linked, multi-view multi-dimensional visual exploration tool, XmdvTool. We compare the derived dimensions to features known to be present in the original data. Quantitative measures are also used in identifying the quality of results using different numbers of output dimensions.
Clustering the Space of Phrases Identified by an Ensemble of Supervised Shallow Parsers
, 2002
"... We present a novel method for clustering syntactic structures in raw text. The method involves rst generating an ensemble of classifiers, by training a statistical parser on samples from the training data. Then, the classifier outputs for each instance are used as input for clustering. The resulting ..."
Abstract
- Add to MetaCart
We present a novel method for clustering syntactic structures in raw text. The method involves rst generating an ensemble of classifiers, by training a statistical parser on samples from the training data. Then, the classifier outputs for each instance are used as input for clustering. The resulting clusters group together instances whose internal representations in the statistical model are similar, and therefore depend on the feature set it uses. We apply our method to the simple task of clustering noun phrases. Experiments show that most encouraging results are obtained when the training samples are small. Possible applications of the method include speeding up error analysis of a supervised system, and adapting a supervised system to a new domain.