Results 1  10
of
144
Improving Text Classification by Shrinkage in a Hierarchy of Classes
, 1998
"... When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. ..."
Abstract

Cited by 238 (5 self)
 Add to MetaCart
When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples.
A Shrinkage Approach to LargeScale Covariance Matrix Estimation and Implications for Functional Genomics
, 2005
"... ..."
Relative Loss Bounds for Online Density Estimation with the Exponential Family of Distributions
 MACHINE LEARNING
, 2000
"... We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the n ..."
Abstract

Cited by 115 (10 self)
 Add to MetaCart
We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative loglikelihood of the example with respect to the past parameter of the algorithm. An oline algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the online algorithm over the total loss of the best oline parameter. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a Bregman divergence to derive and analyze each algorithm. These divergences are relative entropies between two exponential distributions. We also use our methods to prove relative loss bounds for linear regression.
Adaptive wavelet estimation: A block thresholding and oracle inequality approach
 Ann. Statist
, 1999
"... We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties ..."
Abstract

Cited by 99 (14 self)
 Add to MetaCart
We study wavelet function estimation via the approach of block thresholding and ideal adaptation with oracle. Oracle inequalities are derived and serve as guides for the selection of smoothing parameters. Based on an oracle inequality and motivated by the data compression and localization properties of wavelets, an adaptive wavelet estimator for nonparametric regression is proposed and the optimality of the procedure is investigated. We show that the estimator achieves simultaneously three objectives: adaptivity, spatial adaptivity and computational efficiency. Specifically, it is proved that the estimator attains the exact optimal rates of convergence over a range of Besov classes and the estimator achieves adaptive local minimax rate for estimating functions at a point. The estimator is easy to implement, at the computational cost of O�n�. Simulation shows that the estimator has excellent numerical performance relative to more traditional wavelet estimators. 1. Introduction. Wavelet
Extracting social networks and contact information from email and the web
 In Proceedings of CEAS1
, 2004
"... Abstract. We present an endtoend system that extracts a user’s social network and its members’ contact information given the user’s email inbox. The system identifies unique people in email, finds their Web presence, and automatically fills the fields of a contact address book using conditional ra ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
Abstract. We present an endtoend system that extracts a user’s social network and its members’ contact information given the user’s email inbox. The system identifies unique people in email, finds their Web presence, and automatically fills the fields of a contact address book using conditional random fields—a type of probabilistic model wellsuited for such information extraction tasks. By recursively calling itself on new people discovered on the Web, the system builds a social network with multiple degrees of separation from the user. Additionally, a set of expertisedescribing keywords are extracted and associated with each person. We outline the collection of statistical and learning components that enable this system, and present experimental results on the real email of two users; we also present results with a simple method of learning transfer, and discuss the capabilities of the system for addressbook population, expertfinding, and social network analysis. 1
Portfolio Selection with Parameter and Model Uncertainty: A MultiPrior Approach
, 2006
"... We develop a model for an investor with multiple priors and aversion to ambiguity. We characterize the multiple priors by a "confidence interval" around the estimated expected returns and we model ambiguity aversion via a minimization over the priors. Our model has several attractive features: (1) i ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
We develop a model for an investor with multiple priors and aversion to ambiguity. We characterize the multiple priors by a "confidence interval" around the estimated expected returns and we model ambiguity aversion via a minimization over the priors. Our model has several attractive features: (1) it has a solid axiomatic foundation; (2) it is flexible enough to allow for different degrees of uncertainty about expected returns for various subsets of assets and also about the returngenerating model; and (3) it delivers closedform expressions for the optimal portfolio. Our empirical analysis suggests that, compared with portfolios from classical and Bayesian models, ambiguityaverse portfolios are more stable over time and deliver a higher outof sample Sharpe ratio.
Information extraction from Wikipedia: Moving down the long tail
 Proceedings of KDD08
, 2008
"... Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable selfsupervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall ..."
Abstract

Cited by 36 (9 self)
 Add to MetaCart
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable selfsupervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on wellpopulated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia’s long tail of sparse classes: (1) shrinkage over an automaticallylearned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.
QueryRelevant Summarization using FAQs
 IN PROCEEDINGS OF THE 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 2000
"... This paper introduces a statistical model for queryrelevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a col ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
This paper introduces a statistical model for queryrelevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a collection of FAQ (frequentlyasked question) documents. Taking a learning approach enables a principled, quantitative evaluation of the proposed system, and the results of some initial experimentson a collection of Usenet FAQs and on a FAQlike set of customersubmitted questions to several large retail companiessuggest the plausibility of learning for summarization.
Covariance shaping leastsquares estimation
 IEEE Trans. Signal Process
, 2003
"... Abstract—A new linear estimator is proposed, which we refer to as the covariance shaping leastsquares (CSLS) estimator, for estimating a set of unknown deterministic parameters x observed through a known linear transformation H and corrupted by additive noise. The CSLS estimator is a biased estimat ..."
Abstract

Cited by 30 (19 self)
 Add to MetaCart
Abstract—A new linear estimator is proposed, which we refer to as the covariance shaping leastsquares (CSLS) estimator, for estimating a set of unknown deterministic parameters x observed through a known linear transformation H and corrupted by additive noise. The CSLS estimator is a biased estimator directed at improving the performance of the traditional leastsquares (LS) estimator by choosing the estimate of x to minimize the (weighted) total error variance in the observations subject to a constraint on the covariance of the estimation error so that we control the dynamic range and spectral shape of the covariance of the estimation error. The CSLS estimator presented in this paper is shown to achieve the CramérRao lower bound for biased estimators. Furthermore, analysis of the meansquared error (MSE) of both the CSLS estimator and the LS estimator demonstrates that the covariance of the estimation error can be chosen such that there is a threshold SNR below which the CSLS estimator yields a lower MSE than the LS estimator for all values of x. As we show, some of the wellknown modifications of the LS estimator can be formulated as CSLS estimators. This allows us to interpret these estimators as the estimators that minimize the total error variance in the observations, among all linear estimators with the same covariance. Index Terms—Biased estimation, covariance shaping, estimation, least squares, MMSE. I.
Modulation Estimators and Confidence Sets
 ANN. STATIST
, 1999
"... An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to a reasonable coordinate system. The procedure adaptively tapers the coefficients of the transformed data. If the class of candidate estimators satisfies a uniform entropy condition, then b is asymptotically minimax in Pinsker's sense over certain ellipsoids in the parameter space and shares one such asymptotic minimax property with the JamesStein estimator. We describe computational algorithms for b and construct confidence sets for the unknown signal. These confidence sets are centered at b , have correct asymptotic coverage probability, and have relatively small risk as setvalued estimators of .