Results 1  10
of
31
A Survey of Dimension Reduction Techniques
, 2002
"... this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 ..."
Abstract

Cited by 88 (0 self)
 Add to MetaCart
this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 n}. If i and # i = # (i,i) denote the mean and the standard deviation of the ith random variable, respectively, then we will often standardize the observations x i,j by (x i,j i )/ # i , where i = x i = 1/n j=1 x i,j , and # i = 1/n j=1 (x i,j x i )
ConceptOriented Indexing of Video Databases: Toward Semantic Sensitive Retrieval and Browsing
 IEEE TRANS. ON IMAGE PROCESSING
, 2004
"... Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several contentbased video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
Digital video now plays an important role in medical education, health care, telemedicine and other medical applications. Several contentbased video retrieval (CBVR) systems have been proposed in the past, but they still suffer from the following challenging problems: semantic gap, semantic video concept modeling, semantic video classification, and conceptoriented video database indexing and access. In this paper, we propose a novel framework to make some advances toward the final goal to solve these problems. Specifically, the framework includes: 1) a semanticsensitive video content representation framework by using principal video shots to enhance the quality of features; 2) semantic video concept interpretation by using flexible mixture model to bridge the semantic gap; 3) a novel semantic videoclassifier training framework by integrating feature selection, parameter estimation, and model selection seamlessly in a single algorithm; and 4) a conceptoriented video database organization technique through a certain domaindependent concept hierarchy to enable semanticsensitive video retrieval and browsing.
Learning Nonlinear Image Manifolds by Global Alignment of Local Linear Models
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2006
"... ..."
Exploration of dimensionality reduction for text visualization
 In Proc. IEEE Third Intl. Conf. on Coordinated and Multiple Views in Exploratory Visualization
, 2005
"... In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multidimensional scaling) and neurocomputation models (e.g., selforganizing feature maps) have been widely used for dimensionality reduction. Often the resulting dimensionality is set to ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
In the text document visualization community, statistical analysis tools (e.g., principal component analysis and multidimensional scaling) and neurocomputation models (e.g., selforganizing feature maps) have been widely used for dimensionality reduction. Often the resulting dimensionality is set to two, as this facilitates plotting the results. The validity and effectiveness of these approaches largely depend on the specific data sets used and semantics of the targeted applications. To date, there has been little evaluation to assess and compare dimensionality reduction methods and dimensionality reduction processes, either numerically or empirically. The focus of this paper is to propose a mechanism for comparing and evaluating the effectiveness of dimensionality reduction techniques in the visual exploration of text document archives. We use multivariate visualization techniques and interactive visual exploration to study three problems: (a) Which dimensionality reduction technique best preserves the interrelationships within a set of text documents; (b) What is the sensitivity of the results to the number of output dimensions; (c) Can we automatically remove redundant or unimportant words from the vector extracted from the documents while still preserving the majority of information, and thus make dimensionality reduction more efficient. To study each problem, we generate supplemental dimensions based on several dimensionality reduction algorithms and parameters controlling these algorithms. We then visually analyze and explore the characteristics of the reduced dimensional spaces as implemented within a linked, multiview multidimensional visual exploration tool, XmdvTool. We compare the derived dimensions to features known to be present in the original data. Quantitative measures are also used in identifying the quality of results using different numbers of output dimensions.
On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions
, 2001
"... In nearest neighbor searching we are given a set of n data points in real ddimensional space, R d , and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported eciently. Because data sets can be quit ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In nearest neighbor searching we are given a set of n data points in real ddimensional space, R d , and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported eciently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. Given the limitation of linear storage, the best known data structures suer from expectedcase query times that grow exponentially in d. However, it is widely regarded in practice that data sets in high dimensional spaces tend to consist of clusters residing in much lower dimensional subspaces. This raises the question of whether data structures for nearest neighbor searching adapt to the presence of lower dimensional clustering, and further how performance varies when the clusters are aligned with the coordinate axes. We analyze the popular kdtree data structure in the form of two variants based on a modication of the splitting method, which produces cells satisfy the basic packing properties needed for eciency without producing empty cells. We show that when data points are uniformly distributed on a k dimensional hyperplane for k d, then expected number of leaves visited in such a kdtree grows exponentially in k, but not in d. We show that the growth rate is even smaller still if the hyperplane is aligned with the coordinate axes. We present empirical studies to support our theoretical results. Keywords: Nearest neighbor searching, kdtrees, splitting methods, expectedcase analysis, clustering. 1
Principal Manifold Learning by Sparse Grids
, 2008
"... In this paper we deal with the construction of lowerdimensional manifolds from highdimensional data which is an important task in data mining, machine learning and statistics. Here, we consider principal manifolds as the minimum of a regularized, nonlinear empirical quantization error functional. ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In this paper we deal with the construction of lowerdimensional manifolds from highdimensional data which is an important task in data mining, machine learning and statistics. Here, we consider principal manifolds as the minimum of a regularized, nonlinear empirical quantization error functional. For the discretization we use a sparse grid method in latent parameter space. This approach avoids, to some extent, the curse of dimension of conventional grids like in the GTM approach. The arising nonlinear problem is solved by a descent method which resembles the expectation maximization algorithm. We present our sparse grid principal manifold approach, discuss its properties and report on the results of numerical experiments for one, two and threedimensional model problems.
A Intuitive Visualization of Pareto Frontier for Multi Objective Optimization in nDimensional Performance Space
"... A visualization methodology is presented in which a Pareto Frontier can be visualized in an intuitive and straightforward manner for an ndimensional performance space. Based on this visualization, it is possible to quickly identify ‘good ’ regions of the performance and optimal design spaces for a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
A visualization methodology is presented in which a Pareto Frontier can be visualized in an intuitive and straightforward manner for an ndimensional performance space. Based on this visualization, it is possible to quickly identify ‘good ’ regions of the performance and optimal design spaces for a multiobjective optimization application, regardless of space complexity. Visualizing Pareto solutions for more than three objectives has long been a significant challenge to the multiobjective optimization community. The Hyperspace Diagonal Counting (HSDC) method described here enables the lossless visualization to be implemented. The proposed method requires no dimension fixing. In this paper, we demonstrate the usefulness of visualizing nf space (i.e. for more than three objective functions in a multiobjective optimization problem). The visualization is shown to aid in the final decision of what potential optimal design point should be chosen amongst all possible Pareto solutions. I.
Unsupervised Classification of High Dimensional Data by means of SelfOrganizing Neural Networks
, 1998
"... Contents Introduction 1 1 Unsupervised classification of highdimensional data 4 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 High dimensional data . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Properties of high dimensional spaces . . . . . . . . . 4 1.2.2 I ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Contents Introduction 1 1 Unsupervised classification of highdimensional data 4 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 High dimensional data . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Properties of high dimensional spaces . . . . . . . . . 4 1.2.2 Intrinsic dimension . . . . . . . . . . . . . . . . . . . . 7 1.3 Unsupervised classification . . . . . . . . . . . . . . . . . . . . 11 1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.2 Dimension reduction . . . . . . . . . . . . . . . . . . . 12 1.3.3 Available techniques . . . . . . . . . . . . . . . . . . . 15 1.4 Application: the Philips project . . . . . . . . . . . . . . . . . 15 1.4.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4.3 Data preprocessing . . . . . . . . . . . . . . . . . . . . 19 1.4.4 Intrinsic dimension . . . . . . . . . . . . . . . . . . .
PCA in Autocorrelation Space
 in International Conference on Pattern Recognition
, 2002
"... The use of higher order autocorrelations as features for pattern classification has been usually restricted to second or third orders due to high computational costs. Since the autocorrelation space is a high dimensional space we are interested in reducing the dimensionality of feature vectors for t ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The use of higher order autocorrelations as features for pattern classification has been usually restricted to second or third orders due to high computational costs. Since the autocorrelation space is a high dimensional space we are interested in reducing the dimensionality of feature vectors for the benefit of the pattern classification task.
On the visualization of highdimensional data
, 2013
"... Computing distances in highdimensional spaces is deemed with the empty space phenomenon, which may harm distancebased algorithms for data visualization. We focus on transforming highdimensional numeric data for their visualization using the kernel PCA 2D projection. Gaussian and pGaussian kernel ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Computing distances in highdimensional spaces is deemed with the empty space phenomenon, which may harm distancebased algorithms for data visualization. We focus on transforming highdimensional numeric data for their visualization using the kernel PCA 2D projection. Gaussian and pGaussian kernels are often advocated when confronted to such data; we propose to give some insight of their properties and behaviour in the context of a 2D projection for visualization. An alternative approach, that directly impacts the distribution of distances, is proposed. It also allows the indirect control of the distribution of the eventual kernel values as generated by the Gaussian kernel function. Finally, such projections induce some artifacts, which, if not handled, should not be ignored. 1 Distribution of distances in highdimensional spaces In highdimensional spaces, normalized pairwise Euclidian distances tend to become all equal to 1 (see [3, section 1.4] for a justification). This is a corollary of the wellknown curse of dimensionality, or empty space phenomenon. To illustrate this, we consider an artificial dataset of 3000 elements and 500 dimensions, each value being drawn independently from a uniform law in [0,1]. The dataset thus lies in the 500dimensional unit hypercube. The histogram of pairwise distances between elements in the dataset (figure 1) clearly illustrates the claim of their being excessively biased towards 1. This means that distancebased visulization methods (e.g. graph embedding that would use distances to discover a topology) would complicate the interpretation of the data by a user, all elements being equally dissimilar.