Results 1  10
of
87
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract

Cited by 116 (14 self)
 Add to MetaCart
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering
 Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence
, 2004
"... This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a princi ..."
Abstract

Cited by 66 (15 self)
 Add to MetaCart
This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a principal components analysis of the graph. It is based on a Markovchain model of random walk through the graph. The model assigns transition probabilities to the links between nodes, so that a random walker can jump from node to node. A quantity, called the average commute time, computes the average time taken by a random walker for reaching node j when starting from node i, and coming back to node i. The square root of this quantity, the ECTD, is a distance measure between any two nodes, and has the nice property of decreasing when the number of paths connecting two nodes increases and when the "length" of any path decreases. The ECTD can be computed from the pseudoinverse of the Laplacian matrix of the graph, which is a kernel. We finally define the Principal Components Analysis (PCA) of a graph as the subspace projection that preserves as much variance as possible, in terms of the ECTD. This graph PCA has some interesting links with spectral graph theory, in particular spectral clustering.
Efficient MATLAB computations with sparse and factored tensors
 SIAM JOURNAL ON SCIENTIFIC COMPUTING
, 2007
"... In this paper, the term tensor refers simply to a multidimensional or $N$way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose stori ..."
Abstract

Cited by 45 (13 self)
 Add to MetaCart
In this paper, the term tensor refers simply to a multidimensional or $N$way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: A Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.
Estimating Normalizing Constants and Reweighting Mixtures in Markov Chain Monte Carlo
, 1994
"... Markov chain Monte Carlo (the MetropolisHastings algorithm and the Gibbs sampler) is a general multivariate simulation method that permits sampling from any stochastic process whose density is known up to a constant of proportionality. It has recently received much attention as a method of carrying ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
Markov chain Monte Carlo (the MetropolisHastings algorithm and the Gibbs sampler) is a general multivariate simulation method that permits sampling from any stochastic process whose density is known up to a constant of proportionality. It has recently received much attention as a method of carrying out Bayesian, likelihood, and frequentist inference in analytically intractable problems. Although many applications of Markov chain Monte Carlo do not need estimation of normalizing constants, three do: calculation of Bayes factors, calculation of likelihoods in the presence of missing data, and importance sampling from mixtures. Here reverse logistic regression is proposed as a solution to the problem of estimating normalizing constants, and convergence and asymptotic normality of the estimates are proved under very weak regularity conditions. Markov chain Monte Carlo is most useful when combined with importance reweighting so that a Monte Carlo sample from one distribution can be used fo...
Extreme learning machine: Theory and applications
, 2006
"... It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradientbased learning algorithms are extensively used to train neural net ..."
Abstract

Cited by 35 (5 self)
 Add to MetaCart
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradientbased learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for singlehidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.
Multilinear operators for higherorder decompositions
, 2006
"... We propose two new multilinear operators for expressing the matrix compositions that are needed in the Tucker and PARAFAC (CANDECOMP) decompositions. The ﬁrst operator,
which we call the Tucker operator, is shorthand for performing an nmode matrix multiplication for every mode of a given tensor and ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
We propose two new multilinear operators for expressing the matrix compositions that are needed in the Tucker and PARAFAC (CANDECOMP) decompositions. The ﬁrst operator,
which we call the Tucker operator, is shorthand for performing an nmode matrix multiplication for every mode of a given tensor and can be employed to consisely express the Tucker decomposition. The second operator, which we call the Kruskal operator, is shorthand for the sum of the outerproducts of the columns of N matrices and allows a divorce from a matricized representation and a very consise expression of the PARAFAC decomposition. We explore the
properties of the Tucker and Kruskal operators independently of the related decompositions.
Additionally, we provide a review of the matrix and tensor operations that are frequently used in the context of tensor decompositions.
Tests for unit roots and the initial condition
 Econometrica
, 2003
"... The paper analyzes the impact of the initial condition on the problem of testing for unit roots. To this end, we derive a family of optimal tests that maximize a weighted average power criterion with respect to the initial condition. We then investigate the relationship of this optimal family to pop ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
The paper analyzes the impact of the initial condition on the problem of testing for unit roots. To this end, we derive a family of optimal tests that maximize a weighted average power criterion with respect to the initial condition. We then investigate the relationship of this optimal family to popular tests. We find that many unit root tests are closely related to specific members of the optimal family, but the corresponding members employ very different weightings for the initial condition. The popular DickeyFuller tests, for instance, put a large weight on extreme deviations of the initial observation from the deterministic component, whereas other popular tests put more weight on moderate deviations. Since the power of unit root tests varies dramatically with the initial condition, this paper explains the results of comparative power studies of unit root tests. The results allow a much deeper understanding of the merits of particular tests in specific circumstances, and a guide to choosing which statistics to use in practice.
A Hybrid Graphical Model for Rhythmic Parsing
 In Proc. of 17th Conf. on Uncertainty in Artif. Int
, 2001
"... A method is presented for the rhythmic parsing problem: Given a sequence of observed musical note onset times, we simultaneously estimate the corresponding norated rhythm and tempo process. A graphical model is developed that represents the evolution of tempo and rhythm and relates these hidden q ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
A method is presented for the rhythmic parsing problem: Given a sequence of observed musical note onset times, we simultaneously estimate the corresponding norated rhythm and tempo process. A graphical model is developed that represents the evolution of tempo and rhythm and relates these hidden quantities to an observable performance. The rhythm variables are discrete and the tempo and observation variables are continuous. We show how to compute the globally most likely configuration of the tempo and rhythm variables given an observation of note onset times. Experiments are presented on both MIDI data and a data set derived from an audio signal. A generalization to computing MAP estimates for arbitrary conditional Gaussian distributions is outlined.
Ordering Monte Carlo Markov Chains
 School of Statistics, University of Minnesota
, 1999
"... Markov chains having the same stationary distribution ß can be partially ordered by performance in the central limit theorem. We say that one chain is at least as good as another in the efficiency partial ordering if the variance in the central limit theorem is at least as small for every L 2 (ß) ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Markov chains having the same stationary distribution ß can be partially ordered by performance in the central limit theorem. We say that one chain is at least as good as another in the efficiency partial ordering if the variance in the central limit theorem is at least as small for every L 2 (ß) functional of the chain. Peskun partial ordering implies efficiency partial ordering [25, 30]. Here we show that Peskun partial ordering implies, for finite state spaces, ordering of all the eigenvalues of the transition matrices, and, for general state spaces, ordering of the suprema of the spectra of the transition operators. We also define a covariance partial ordering based on lag one autocovariances and show that it is equivalent to the efficiency partial ordering when restricted to reversible Markov chains. Similar but weaker results are provided for nonreversible Markov chains. Keywords: Peskun ordering, Eigenvalues, Spectral decomposition, Nonreversible kernels. 1 Introduction I...
Electrical neuroimaging based on biophysical constraints
 NeuroImage
, 2004
"... This paper proposes and implements biophysical constraints to select a unique solution to the bioelectromagnetic inverse problem. It first shows that the brain’s electric fields and potentials are predominantly due to ohmic currents. This serves to reformulate the inverse problem in terms of a restr ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
This paper proposes and implements biophysical constraints to select a unique solution to the bioelectromagnetic inverse problem. It first shows that the brain’s electric fields and potentials are predominantly due to ohmic currents. This serves to reformulate the inverse problem in terms of a restricted source model permitting noninvasive estimations of Local Field Potentials (LFPs) in depth from scalprecorded data. Uniqueness in the solution is achieved by a physically derived regularization strategy that imposes a spatial structure on the solution based upon the physical laws that describe electromagnetic fields in biological media. The regularization strategy and the source model emulate the properties of brain activity’s actual generators. This added information is independent of both the recorded data and head model and suffices for obtaining a unique solution compatible with and aimed at analyzing experimental data. The inverse solution’s features are evaluated with eventrelated potentials (ERPs) from a healthy