Results 1  10
of
188
Locality Preserving Projections
, 2002
"... Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data s ..."
Abstract

Cited by 404 (16 self)
 Add to MetaCart
Many problems in information processing involve some form of dimensionality reduction. In this paper, we introduce Locality Preserving Projections (LPP). These are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set. LPP should be seen as an alternative to Principal Component Analysis (PCA)  a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the manifold. As a result, LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigenmaps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. This is borne out by illustrative examples on some high dimensional data sets.
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 242 (14 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 198 (17 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Empirical comparison of algorithms for network community detection
 In Proc. WWW’10
, 2010
"... Detecting clusters or communities in large realworld graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity ..."
Abstract

Cited by 164 (5 self)
 Add to MetaCart
(Show Context)
Detecting clusters or communities in large realworld graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that “look like” good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a sizeresolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have nonobvious sizedependent behavior.
Semisupervised discriminant analysis
 in Proc. of the IEEE Int’l Conf. on Comp. Vision (ICCV), Rio De Janeiro
, 2007
"... Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. In practice, when there is no suf ..."
Abstract

Cited by 99 (2 self)
 Add to MetaCart
(Show Context)
Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. The projection vectors are commonly obtained by maximizing the between class covariance and simultaneously minimizing the within class covariance. In practice, when there is no sufficient training samples, the covariance matrix of each class may not be accurately estimated. In this paper, we propose a novel method, called Semisupervised Discriminant Analysis (SDA), which makes use of both labeled and unlabeled samples. The labeled data points are used to maximize the separability between different classes and the unlabeled data points are used to estimate the intrinsic geometric structure of the data. Specifically, we aim to learn a discriminant function which is as smooth as possible on the data manifold. Experimental results on single training image face recognition and relevance feedback image retrieval demonstrate the effectiveness of our algorithm. 1.
Graph Laplacians and Stabilization of Vehicle Formations
, 2001
"... Control of vehicle formations has emerged as a topic of significant interest to the controls community. In this paper, we merge tools from graph theory and control theory to derive stability criteria for formation stabilization. The interconnection between vehicles (i.e., which vehicles are sensed b ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
Control of vehicle formations has emerged as a topic of significant interest to the controls community. In this paper, we merge tools from graph theory and control theory to derive stability criteria for formation stabilization. The interconnection between vehicles (i.e., which vehicles are sensed by other vehicles) is modeled as a graph, and the eigenvalues of the Laplacian matrix of the graph are used in stating a Nyquistlike stability criterion for vehicle formations. The location of the Laplacian eigenvalues can be correlated to the graph structure, and therefore used to identify desirable and undesirable formation interconnection topologies.
Variational Restoration Of Nonflat Image Features: Models And Algorithms
, 2000
"... We develop both mathematical models and computational algorithms for variational denoising and restoration of nonflat image features. Nonflat image features are those that live on Riemannian manifolds, instead of on the Euclidean spaces. Familiar examples include the orientation feature (from optica ..."
Abstract

Cited by 94 (14 self)
 Add to MetaCart
(Show Context)
We develop both mathematical models and computational algorithms for variational denoising and restoration of nonflat image features. Nonflat image features are those that live on Riemannian manifolds, instead of on the Euclidean spaces. Familiar examples include the orientation feature (from optical flows or gradient flows) that lives on the unit circle S&sup1;, the alignment feature (from fingerprint waves or certain texture images) that lives on the real projective line RP&sup1; and the chromaticity feature (from color images) that lives on the unit sphere S&sup2;. In this paper, we apply the variational method to denoise and restore general nonflat image features. Mathematical models for both continuous image domains and discrete domains (or graphs) are constructed. Riemannian objects such as metric, distance and LeviCivita connection play important roles in the models. Computational algorithms are also developed for the resulting nonlinear equations. The mathematical framework can be applied to restoring general nonflat data outside the scope of image processing and computer vision.
Systematic topology analysis and generation using degree correlations
 In SIGCOMM
"... Researchers have proposed a variety of metrics to measure important graph properties, for instance, in social, biological, and computer networks. Values for a particular graph metric may capture a graph’s resilience to failure or its routing efficiency. Knowledge of appropriate metric values may inf ..."
Abstract

Cited by 93 (7 self)
 Add to MetaCart
(Show Context)
Researchers have proposed a variety of metrics to measure important graph properties, for instance, in social, biological, and computer networks. Values for a particular graph metric may capture a graph’s resilience to failure or its routing efficiency. Knowledge of appropriate metric values may influence the engineering of future topologies, repair strategies in the face of failure, and understanding of fundamental properties of existing networks. Unfortunately, there are typically no algorithms to generate graphs matching one or more proposed metrics and there is little understanding of the relationships among individual metrics or their applicability to different settings. We present a new, systematic approach for analyzing network topologies. We first introduce the dKseries of probability distributions specifying all degree correlations within dsized subgraphs of a given graph G. Increasing values of d capture progressively more properties of G at the cost of more complex representation of the probability distribution. Using this series, we can quantitatively measure the distance between two graphs and construct random graphs that accurately reproduce virtually all metrics proposed in the literature. The nature of the dKseries implies that it will also capture any future metrics that may be proposed. Using our approach, we construct graphs for d =0, 1, 2, 3 and demonstrate that these graphs reproduce, with increasing accuracy, important properties of measured and modeled Internet topologies. We find that the d = 2 case is sufficient for most practical purposes, while d = 3 essentially reconstructs the Internet AS and routerlevel topologies exactly. We hope that a systematic method to analyze and synthesize topologies offers a significant improvement to the set of tools available to network topology and protocol researchers.
Graph regularized nonnegative matrix factorization for data representation
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2011
"... Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring dat ..."
Abstract

Cited by 87 (4 self)
 Add to MetaCart
(Show Context)
Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a lowdimensional manifold embedded in a highdimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the stateoftheart algorithms on realworld problems.
Document clustering using locality preserving indexing
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality ..."
Abstract

Cited by 78 (19 self)
 Add to MetaCart
Abstract—We propose a novel document clustering method which aims to cluster the documents into different semantic classes. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. By using Locality Preserving Indexing (LPI), the documents can be projected into a lowerdimensional semantic space in which the documents related to the same semantics are close to each other. Different from previous document clustering methods based on Latent Semantic Indexing (LSI) or Nonnegative Matrix Factorization (NMF), our method tries to discover both the geometric and discriminating structures of the document space. Theoretical analysis of our method shows that LPI is an unsupervised approximation of the supervised Linear Discriminant Analysis (LDA) method, which gives the intuitive motivation of our method. Extensive experimental evaluations are performed on the Reuters21578 and TDT2 data sets. Index Terms—Document clustering, locality preserving indexing, dimensionality reduction, semantics. æ 1