Results 11  20
of
68
Statistical eigeninference from large Wishart matrices
 Annals of Statistics
, 2008
"... We consider settings where the observations are drawn from a zeromean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on i ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
We consider settings where the observations are drawn from a zeromean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on inferential procedures that are based on the sample eigenvalues alone (i.e., “eigeninference”). Results found in the literature establish the asymptotic normality of the fluctuation in the trace of powers of the sample covariance matrix. We develop concrete algorithms for analytically computing the limiting quantities and the covariance of the fluctuations. We exploit the asymptotic normality of the trace of powers of the sample covariance matrix to develop eigenvaluebased procedures for testing and estimation. Specifically, we formulate a simple test of hypotheses for the population eigenvalues and a technique for estimating the population eigenvalues in settings where the cumulative distribution function of the (nonrandom) population eigenvalues has a staircase structure. Monte Carlo simulations are used to demonstrate the superiority of the proposed
2010): “Vast Volatility Matrix Estimation for HighFrequency Financial Data
 Annals of Statistics
"... Highfrequency data observed on the prices of financial assets are commonly modeled by diffusion processes with microstructure noise, and realized volatilitybased methods are often used to estimate integrated volatility. For problems involving a large number of assets, the estimation objects we ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Highfrequency data observed on the prices of financial assets are commonly modeled by diffusion processes with microstructure noise, and realized volatilitybased methods are often used to estimate integrated volatility. For problems involving a large number of assets, the estimation objects we face are volatility matrices of large size. The existing volatility estimators work well for a small number of assets but perform poorly when the number of assets is very large. In fact, they are inconsistent when both the number, p, of the assets and the average sample size, n, of the price data on the p assets go to infinity. This paper proposes a new type of estimators for the integrated volatility matrix and establishes asymptotic theory for the proposed estimators in the framework that allows both n and p to approach to infinity. The theory shows that the proposed estimators achieve high convergence rates under a sparsity assumption on the integrated volatility matrix. The numerical studies demonstrate that the proposed estimators perform well for large p and complex price and volatility models. The proposed method is applied to real highfrequency financial data. 1. Introduction. Intraday
Concentration of measure and spectra of random matrices: with applications to correlation matrices, elliptical distributions and beyond
 THE ANNALS OF APPLIED PROBABILITY TO APPEAR
, 2009
"... We place ourselves in the setting of highdimensional statistical inference, where the number of variables p in a dataset of interest is of the same order of magnitude as the number of observations n. More formally we study the asymptotic properties of correlation and covariance matrices under the s ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
We place ourselves in the setting of highdimensional statistical inference, where the number of variables p in a dataset of interest is of the same order of magnitude as the number of observations n. More formally we study the asymptotic properties of correlation and covariance matrices under the setting that p/n → ρ ∈ (0, ∞), for general population covariance. We show that spectral properties for large dimensional correlation matrices are similar to those of large dimensional covariance matrices, for a large class of models studied in random matrix theory. We also derive a MarčenkoPastur type system of equations for the limiting spectral distribution of covariance matrices computed from data with elliptical distributions and generalizations of this family. The motivation for this study comes partly from the possible relevance of such distributional assumptions to problems in econometrics and portfolio optimization, as well as robustness questions for certain classical random matrix results. A mathematical theme of the paper is the important use we make of concentration inequalities.
Subsampling Algorithms for Semidefinite Programming
, 2009
"... We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls the algorithm’s granularity, i.e. the tradeoff between cost per iter ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls the algorithm’s granularity, i.e. the tradeoff between cost per iteration and total number of iterations. Furthermore, the total computational cost is directly proportional to the complexity (i.e. rank) of the solution. We study numerical performance on some largescale problems arising in statistical learning.
The TracyWidom limit for the largest eigenvalues of singular complex Wishart matrices”,
 Annals of Applied Probability
, 2008
"... ..."
(Show Context)
Limits of spiked random matrices
, 2013
"... Given a large, highdimensional sample from a spiked population, the top sample covariance eigenvalue is known to exhibit a phase transition. We show that the largest eigenvalues have asymptotic distributions near the phase transition in the rank one spiked real Wishart setting and its general β ana ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Given a large, highdimensional sample from a spiked population, the top sample covariance eigenvalue is known to exhibit a phase transition. We show that the largest eigenvalues have asymptotic distributions near the phase transition in the rank one spiked real Wishart setting and its general β analogue, proving a conjecture of Baik, Ben Arous and Péche ́ (2005). We also treat shifted mean Gaussian orthogonal and β ensembles. Such results are entirely new in the real case; in the complex case we strengthen existing results by providing optimal scaling assumptions. One obtains the known limiting random Schrödinger operator on the halfline, but the boundary condition now depends on the perturbation. We derive several characterizations of the limit laws in which β appears as a parameter, including a simple linear boundary value problem. This PDE description recovers known explicit formulas at β = 2, 4, yielding in particular a new and simple proof of the Painleve ́ representations for these
The spectrum of kernel random matrices
, 2007
"... We place ourselves in the setting of highdimensional statistical inference, where the number of variables p in a dataset of interest is of the same order of magnitude as the number of observations n. We consider the spectrum of certain kernel random matrices, in particular n × n matrices whose (i, ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
We place ourselves in the setting of highdimensional statistical inference, where the number of variables p in a dataset of interest is of the same order of magnitude as the number of observations n. We consider the spectrum of certain kernel random matrices, in particular n × n matrices whose (i, j)th entry is f(X ′ i Xj/p) or f(‖Xi − Xj ‖ 2 /p), where p is the dimension of the data, and Xi are independent data vectors. Here f is assumed to be a locally smooth function. The study is motivated by questions arising in statistics and computer science, where these matrices are used to perform, among other things, nonlinear versions of principal component analysis. Surprisingly, we show that in highdimensions, and for the models we analyze, the problem becomes essentially linear which is at odds with heuristics sometimes used to justify the usage of these methods. The analysis also highlights certain peculiarities of models widely studied in random matrix theory and raises some questions about their relevance as tools to model highdimensional data encountered in practice. 1
Tail bounds for all eigenvalues of a sum of random matrices
, 2011
"... This work introduces the minimax Laplace transform method, a modification of the cumulantbased matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random selfadjoint matrices. This machinery is used to derive eigenvalue ana ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
This work introduces the minimax Laplace transform method, a modification of the cumulantbased matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random selfadjoint matrices. This machinery is used to derive eigenvalue analogs of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherencelike quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, Ω(ε−2κ2` ` log p) samples, where κ ` = λ1(C)/λ`(C), are sufficient to ensure that the dominant ` eigenvalues of the covariance matrix of a N (0,C) random vector are estimated to within a factor of 1 ± ε with high probability.