Results 1  10
of
234
Hidden Markov processes
 IEEE Trans. Inform. Theory
, 2002
"... Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finite ..."
Abstract

Cited by 170 (3 self)
 Add to MetaCart
Abstract—An overview of statistical and informationtheoretic aspects of hidden Markov processes (HMPs) is presented. An HMP is a discretetime finitestate homogeneous Markov chain observed through a discretetime memoryless invariant channel. In recent years, the work of Baum and Petrie on finitestate finitealphabet HMPs was expanded to HMPs with finite as well as continuous state spaces and a general alphabet. In particular, statistical properties and ergodic theorems for relative entropy densities of HMPs were developed. Consistency and asymptotic normality of the maximumlikelihood (ML) parameter estimator were proved under some mild conditions. Similar results were established for switching autoregressive processes. These processes generalize HMPs. New algorithms were developed for estimating the state, parameter, and order of an HMP, for universal coding and classification of HMPs, and for universal decoding of hidden Markov channels. These and other related topics are reviewed in this paper. Index Terms—Baum–Petrie algorithm, entropy ergodic theorems, finitestate channels, hidden Markov models, identifiability, Kalman filter, maximumlikelihood (ML) estimation, order estimation, recursive parameter estimation, switching autoregressive processes, Ziv inequality. I.
Extrapolation Methods for Accelerating PageRank Computations
 In Proceedings of the Twelfth International World Wide Web Conference
, 2003
"... We present a novel algorithm for the fast computation of PageRank, a hyperlinkbased estimate of the "importance" of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web li ..."
Abstract

Cited by 134 (13 self)
 Add to MetaCart
We present a novel algorithm for the fast computation of PageRank, a hyperlinkbased estimate of the "importance" of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Quadratic Extrapolation, accelerates the convergence of the Power Method by periodically subtracting off estimates of the nonprincipal eigenvectors from the current iterate of the Power Method. In Quadratic Extrapolation, we take advantage of the fact that the first eigenvalueof a Markov matrix is known to be 1 to compute the nonprincipal eigenvectorsusing successiveiterates of the Power Method. Empirically, we show that using Quadratic Extrapolation speeds up PageRank computation by 50300% on a Web graph of 80 million nodes, with minimal overhead.
Exploiting the Block Structure of the Web for Computing PageRank
, 2003
"... The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage alg ..."
Abstract

Cited by 129 (5 self)
 Add to MetaCart
The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage algorithm whereby (1) the local PageRanks of pages for each host are computed independently using the link structure of that host, (2) these local PageRanks are then weighted by the "importance" of the corresponding host, and (3) the standard PageRank algorithm is then run using as its starting vector the weighted concatenation of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different "personalized" PageRanks, and a variant that efficiently recomputes PageRank after node updates.
Gaussian processes for machine learning
 International Journal of Neural Systems
, 2004
"... Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. ..."
Abstract

Cited by 66 (15 self)
 Add to MetaCart
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible nonparametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations [13, 78, 31]. The mathematical literature on GPs is large and often uses deep
Face Recognition with Image Sets Using Manifold Density Divergence
, 2005
"... In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semiparametric model for learning probability densities confined to highly ..."
Abstract

Cited by 65 (14 self)
 Add to MetaCart
In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semiparametric model for learning probability densities confined to highly nonlinear but intrinsically lowdimensional manifolds. The model leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds. The proposed method is evaluated on a large data set, acquired in realistic imaging conditions with severe illumination variation. Our algorithm is shown to match the best and outperform other stateoftheart algorithms in the literature, achieving 94% recognition rate on average.
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
 Journal of Machine Learning Research
, 2001
"... We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relat ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relate the performance of these methods, which use sample paths, to the variance of estimates based on iid data. We derive the baseline function that minimizes this variance, and we show that the variance for any baseline is the sum of the optimal variance and a weighted squared distance to the optimal baseline. We show that the widely used average discounted value baseline (where the reward is replaced by the difference between the reward and its expectation) is suboptimal. The second approach we consider is the actorcritic method, which uses an approximate value function. We give bounds on the expected squared error of its estimates. We show that minimizing distance to the true value function is suboptimal in general; we provide an example for which the true value function gives an estimate with positive variance, but the optimal value function gives an unbiased estimate with zero variance. Our bounds suggest algorithms to estimate the gradient of the performance of parameterized baseline or value functions. We present preliminary experiments that illustrate the performance improvements on a simple control problem.
Stochastic Geometry and Random Graphs for the Analysis and Design of Wireless Networks
"... Wireless networks are fundamentally limited by the intensity of the received signals and by their interference. Since both of these quantities depend on the spatial location of the nodes, mathematical techniques have been developed in the last decade to provide communicationtheoretic results accoun ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
Wireless networks are fundamentally limited by the intensity of the received signals and by their interference. Since both of these quantities depend on the spatial location of the nodes, mathematical techniques have been developed in the last decade to provide communicationtheoretic results accounting for the network’s geometrical configuration. Often, the location of the nodes in the network can be modeled as random, following for example a Poisson point process. In this case, different techniques based on stochastic geometry and the theory of random geometric graphs – including point process theory, percolation theory, and probabilistic combinatorics – have led to results on the connectivity, the capacity, the outage probability, and other fundamental limits of wireless networks. This tutorial article surveys some of these techniques, discusses their application to model wireless networks, and presents some of the main results that have appeared in the literature. It also serves as an introduction to the field for the other papers in this special issue.
An Operational Semantics for Probabilistic Concurrent Constraint Programming
, 1998
"... This paper investigates a probabilistic version of the concurrent constraint programming paradigm (CCP). The aim is to introduce the possibility to formulate so called "randomised algorithms" within the CCP framework. Differently from common approaches in (imperative) highlevel programming language ..."
Abstract

Cited by 31 (13 self)
 Add to MetaCart
This paper investigates a probabilistic version of the concurrent constraint programming paradigm (CCP). The aim is to introduce the possibility to formulate so called "randomised algorithms" within the CCP framework. Differently from common approaches in (imperative) highlevel programming languages, which rely on some kind of random() function, we introduce randomness in the very definition of the language by means of a probabilistic choice construct. This allows a program to make stochastic moves during its execution. We call the resulting language Probabilistic Concurrent Constraint Programming (PCCP). We present an operational semantics for PCCP by means of a probabilistic transition system such that the execution of a PCCP program may be seen as a stochastic process, i.e. as a random walk on the transition graph. The transition probabilities are given explicitly. This semantics captures a notion of observables which combines results of computations and the probability of those re...
High dimensional analysis of semidefinite relaxations for sparse principal component analysis
, 2008
"... Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n ” setting, in which the problem dimension p is comparable to or la ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n ” setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this highdimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say with at most k nonzero components. We analyze two computationally tractable methods for recovering the support of this maximal eigenvector: (a) a simple diagonal cutoff method, which transitions from success to failure as a function of the order parameter θdia(n, p, k) = n/[k 2 log(p − k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the order parameter θsdp(n, p, k) = n/[k log(p − k)] is larger than a critical threshold. Our results thus highlight an interesting tradeoff between computational and statistical efficiency in highdimensional inference.