Results 1 - 10
of
68
Locality Preserving Projections
, 2002
"... Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data s ..."
Abstract
-
Cited by 142 (15 self)
- Add to MetaCart
Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA) -- a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.
Many birds with one stone: Multi-objective approximation algorithms
, 1992
"... We study network-design problems with multiple design objectives. In particular, we look at two cost measures to be minimized simultaneously: the total cost of the network and the maximum degree of any node in the network. Our main result can be roughly stated as follows: given an integer b, we p ..."
Abstract
-
Cited by 46 (13 self)
- Add to MetaCart
We study network-design problems with multiple design objectives. In particular, we look at two cost measures to be minimized simultaneously: the total cost of the network and the maximum degree of any node in the network. Our main result can be roughly stated as follows: given an integer b, we present approximation algorithms for a variety of network-design problems on an n- node graph in which the degree of the output network is O(b log( n b )) and the cost of this network is O(log n) times that of the minimum-cost degree-b-bounded network. These algorithms can handle costs on nodes as well as edges. Moreover, we can construct such networks so as to satisfy a variety of connectivity specifications including spanning trees, Steiner trees and generalized Steiner forests. The performance guarantee on the cost of the output network is nearly best-possible unless NP = ~ P . We also address the special case in which the costs obey the triangle inequality. In this case, we obtai...
Laplacian score for feature selection
- In NIPS
, 2005
"... In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsuperv ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
In supervised learning scenarios, feature selection has been studied widely in the literature. Selecting features in unsupervised learning scenarios is a much harder problem, due to the absence of class labels that would guide the search for relevant information. And, almost all of previous unsupervised feature selection methods are “wrapper ” techniques that require a learning algorithm to evaluate the candidate feature subsets. In this paper, we propose a “filter ” method for feature selection which is independent of any learning algorithm. Our method can be performed in either supervised or unsupervised fashion. The proposed method is based on the observation that, in many real world classification problems, data from the same class are often close to each other. The importance of a feature is evaluated by its power of locality preserving, or, Laplacian Score. We compare our method with data variance (unsupervised) and Fisher score (supervised) on two data sets. Experimental results demonstrate the effectiveness and efficiency of our algorithm. 1
Document clustering using locality preserving indexing
- IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality ..."
Abstract
-
Cited by 27 (14 self)
- Add to MetaCart
Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lower-dimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters-21578 and TDT2 data sets. Index Terms—Document clustering, locality preserving indexing, dimensionality reduction, semantics. æ 1
Short Shop Schedules
, 1995
"... We consider the open shop, job shop, and flow shop scheduling problems with integral processing times. We give polynomial-time algorithms to determine if an instance has a schedule of length at most 3, and show that deciding if there is a schedule of length at most 4 is NP- complete. The latter res ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
We consider the open shop, job shop, and flow shop scheduling problems with integral processing times. We give polynomial-time algorithms to determine if an instance has a schedule of length at most 3, and show that deciding if there is a schedule of length at most 4 is NP- complete. The latter result implies that, unless P = NP, there does not exist a polynomial-time approximation algorithm for any of these problems that constructs a schedule with length guaranteed to be strictly less than 5/4 times the optimal length. This work constitutes the first nontrivial theoretical evidence that shop scheduling problems are hard to solve even approximately.
Incremental spectral clustering with application to monitoring of evolving blog communities
- In SDM
, 2007
"... In recent years, spectral clustering method has gained attentions because of its superior performance compared to other traditional clustering algorithms such as K-means algorithm. The existing spectral clustering algorithms are all off-line algorithms, i.e., they can not incrementally update the cl ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
In recent years, spectral clustering method has gained attentions because of its superior performance compared to other traditional clustering algorithms such as K-means algorithm. The existing spectral clustering algorithms are all off-line algorithms, i.e., they can not incrementally update the clustering result given a small change of the data set. However, the capability of incrementally updating is essential to some applications such as real time monitoring of the evolving communities of websphere or blogsphere. Unlike traditional stream data, these applications require incremental algorithms to handle not only insertion/deletion of data points but also similarity changes between existing items. This paper extends the standard spectral clustering to such evolving data by introducing the incidence vector/matrix to represent two kinds of dynamics in the same framework and by incrementally updating the eigenvalue system. Our incremental algorithm, initialized by a standard spectral clustering, continuously and efficiently updates the eigenvalue system and generates instant cluster labels, as the data set is evolving. The algorithm is applied to a blog data set. Compared with recomputation of the solution by standard spectral clustering, it achieves similar accuracy but with much lower computational cost. Close inspection into the blog content shows that the incremental approach can discover not only the stable blog communities but also the evolution of the individual multi-topic blogs.
Lower bounds in a parallel model without bit operations
- TO APPEAR IN THE SIAM JOURNAL ON COMPUTING
, 1997
"... ..."
An Improved Approximation Algorithm for Minimum Size 2-Edge Connected Spanning Subgraphs
, 1999
"... We give a 17/12-approximation algorithm for the following NP-hard problem: Given an undirected graph, find a 2-edge connected spanning subgraph that has the minimum number of edges. The best previous approximation guarantee was 3/2. We conjecture that there is a 4/3-approximation algorithm. Thus ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We give a 17/12-approximation algorithm for the following NP-hard problem: Given an undirected graph, find a 2-edge connected spanning subgraph that has the minimum number of edges. The best previous approximation guarantee was 3/2. We conjecture that there is a 4/3-approximation algorithm. Thus our main result gets half-way to this target.

