Results 1 - 10
of
51
Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation
- IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract
-
Cited by 55 (12 self)
- Add to MetaCart
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering
- Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence
, 2004
"... This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD -- a princi ..."
Abstract
-
Cited by 50 (13 self)
- Add to MetaCart
This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD -- a principal components analysis of the graph. It is based on a Markov-chain model of random walk through the graph. The model assigns transition probabilities to the links between nodes, so that a random walker can jump from node to node. A quantity, called the average commute time, computes the average time taken by a random walker for reaching node j when starting from node i, and coming back to node i. The square root of this quantity, the ECTD, is a distance measure between any two nodes, and has the nice property of decreasing when the number of paths connecting two nodes increases and when the "length" of any path decreases. The ECTD can be computed from the pseudoinverse of the Laplacian matrix of the graph, which is a kernel. We finally define the Principal Components Analysis (PCA) of a graph as the subspace projection that preserves as much variance as possible, in terms of the ECTD. This graph PCA has some interesting links with spectral graph theory, in particular spectral clustering.
Efficient MATLAB computations with sparse and factored tensors
- SIAM JOURNAL ON SCIENTIFIC COMPUTING
, 2007
"... In this paper, the term tensor refers simply to a multidimensional or $N$-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose stori ..."
Abstract
-
Cited by 33 (12 self)
- Add to MetaCart
In this paper, the term tensor refers simply to a multidimensional or $N$-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: A Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank-1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.
Estimating Normalizing Constants and Reweighting Mixtures in Markov Chain Monte Carlo
, 1994
"... Markov chain Monte Carlo (the Metropolis-Hastings algorithm and the Gibbs sampler) is a general multivariate simulation method that permits sampling from any stochastic process whose density is known up to a constant of proportionality. It has recently received much attention as a method of carrying ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Markov chain Monte Carlo (the Metropolis-Hastings algorithm and the Gibbs sampler) is a general multivariate simulation method that permits sampling from any stochastic process whose density is known up to a constant of proportionality. It has recently received much attention as a method of carrying out Bayesian, likelihood, and frequentist inference in analytically intractable problems. Although many applications of Markov chain Monte Carlo do not need estimation of normalizing constants, three do: calculation of Bayes factors, calculation of likelihoods in the presence of missing data, and importance sampling from mixtures. Here reverse logistic regression is proposed as a solution to the problem of estimating normalizing constants, and convergence and asymptotic normality of the estimates are proved under very weak regularity conditions. Markov chain Monte Carlo is most useful when combined with importance reweighting so that a Monte Carlo sample from one distribution can be used fo...
Multilinear operators for higher-order decompositions
, 2006
"... We propose two new multilinear operators for expressing the matrix compositions that are needed in the Tucker and PARAFAC (CANDECOMP) decompositions. The first operator,
which we call the Tucker operator, is shorthand for performing an n-mode matrix multiplication for every mode of a given tensor and ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
We propose two new multilinear operators for expressing the matrix compositions that are needed in the Tucker and PARAFAC (CANDECOMP) decompositions. The first operator,
which we call the Tucker operator, is shorthand for performing an n-mode matrix multiplication for every mode of a given tensor and can be employed to consisely express the Tucker decomposition. The second operator, which we call the Kruskal operator, is shorthand for the sum of the outer-products of the columns of N matrices and allows a divorce from a matricized representation and a very consise expression of the PARAFAC decomposition. We explore the
properties of the Tucker and Kruskal operators independently of the related decompositions.
Additionally, we provide a review of the matrix and tensor operations that are frequently used in the context of tensor decompositions.
A Hybrid Graphical Model for Rhythmic Parsing
- In Proc. of 17th Conf. on Uncertainty in Artif. Int
, 2001
"... A method is presented for the rhythmic parsing problem: Given a sequence of observed musical note onset times, we simultaneously estimate the corresponding norated rhythm and tempo process. A graphical model is developed that represents the evolution of tempo and rhythm and relates these hidden q ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
A method is presented for the rhythmic parsing problem: Given a sequence of observed musical note onset times, we simultaneously estimate the corresponding norated rhythm and tempo process. A graphical model is developed that represents the evolution of tempo and rhythm and relates these hidden quantities to an observable performance. The rhythm variables are discrete and the tempo and observation variables are continuous. We show how to compute the globally most likely configuration of the tempo and rhythm variables given an observation of note onset times. Experiments are presented on both MIDI data and a data set derived from an audio signal. A generalization to computing MAP estimates for arbitrary conditional Gaussian distributions is outlined.
Extreme learning machine: Theory and applications
, 2006
"... It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural net ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.
Ordering Monte Carlo Markov Chains
- School of Statistics, University of Minnesota
, 1999
"... Markov chains having the same stationary distribution ß can be partially ordered by performance in the central limit theorem. We say that one chain is at least as good as another in the efficiency partial ordering if the variance in the central limit theorem is at least as small for every L 2 (ß) ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Markov chains having the same stationary distribution ß can be partially ordered by performance in the central limit theorem. We say that one chain is at least as good as another in the efficiency partial ordering if the variance in the central limit theorem is at least as small for every L 2 (ß) functional of the chain. Peskun partial ordering implies efficiency partial ordering [25, 30]. Here we show that Peskun partial ordering implies, for finite state spaces, ordering of all the eigenvalues of the transition matrices, and, for general state spaces, ordering of the suprema of the spectra of the transition operators. We also define a covariance partial ordering based on lag one autocovariances and show that it is equivalent to the efficiency partial ordering when restricted to reversible Markov chains. Similar but weaker results are provided for non-reversible Markov chains. Keywords: Peskun ordering, Eigenvalues, Spectral decomposition, Non-reversible kernels. 1 Introduction I...
Extreme learning machine: RBF network case
- in Proc. 8th Int. Conf. Control, Autom., Robot., Vis. (ICARCV 2004
"... Abstract – A new learning algorithm called extreme learning machine (ELM) has recently been proposed for single-hidden layer feedforward neural networks (SLFNs) to easily achieve good generalization performance at extremely fast learning speed. ELM randomly chooses the input weights and analytically ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
Abstract – A new learning algorithm called extreme learning machine (ELM) has recently been proposed for single-hidden layer feedforward neural networks (SLFNs) to easily achieve good generalization performance at extremely fast learning speed. ELM randomly chooses the input weights and analytically determines the output weightsofSLFNs.ThispapershowsthatELMcanbe extended to radial basis function (RBF) network case, which allows the centers and impact widths of RBF kernels to be randomly generated and the output weights to be simply analytically calculated instead of iteratively tuned. Interestingly, the experimental results show that the ELM algorithm for RBF networks can complete learning at extremely fast speed and produce generalization performance very close to that of SVM in many artifical and real benchmarking function approximation and classification problems. Since ELM does not require validation and human-intervened parameters for given network architectures, ELM can be easily used. Index terms- Radial basis function network, feedforward neural networks, SLFN, real time learning, extreme learning machine, ELM. I.
A Novel Way of Computing Dissimilarities between Nodes of a Graph, with Application to Collaborative Filtering
, 2004
"... This work presents some general procedures for computing dissimilarities between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markov-chain model of random walk through the database. The model assigns transition probabilities to the links betw ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This work presents some general procedures for computing dissimilarities between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markov-chain model of random walk through the database. The model assigns transition probabilities to the links between elements, so that a random walker can jump from element to element. A quantity, called the average first-passage cost, computes the average cost incurred by a random walker for reaching element k for the first time when starting from element i.

