Results 1 - 10
of
101
Improving Text Classification by Shrinkage in a Hierarchy of Classes
, 1998
"... When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. ..."
Abstract
-
Cited by 203 (5 self)
- Add to MetaCart
When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples.
Relative Loss Bounds for On-line Density Estimation with the Exponential Family of Distributions
- MACHINE LEARNING
, 2000
"... We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the n ..."
Abstract
-
Cited by 83 (10 self)
- Add to MetaCart
We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative loglikelihood of the example with respect to the past parameter of the algorithm. An o-line algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the on-line algorithm over the total loss of the best o-line parameter. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a Bregman divergence to derive and analyze each algorithm. These divergences are relative entropies between two exponential distributions. We also use our methods to prove relative loss bounds for linear regression.
A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics
, 2005
"... ..."
Extracting social networks and contact information from email and the web
- In Proceedings of CEAS-1
, 2004
"... Abstract. We present an end-to-end system that extracts a user’s social network and its members’ contact information given the user’s email inbox. The system identifies unique people in email, finds their Web presence, and automatically fills the fields of a contact address book using conditional ra ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
Abstract. We present an end-to-end system that extracts a user’s social network and its members’ contact information given the user’s email inbox. The system identifies unique people in email, finds their Web presence, and automatically fills the fields of a contact address book using conditional random fields—a type of probabilistic model well-suited for such information extraction tasks. By recursively calling itself on new people discovered on the Web, the system builds a social network with multiple degrees of separation from the user. Additionally, a set of expertise-describing keywords are extracted and associated with each person. We outline the collection of statistical and learning components that enable this system, and present experimental results on the real email of two users; we also present results with a simple method of learning transfer, and discuss the capabilities of the system for addressbook population, expert-finding, and social network analysis. 1
Query-Relevant Summarization using FAQs
- IN PROCEEDINGS OF THE 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 2000
"... This paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a col ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
This paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a collection of FAQ (frequently-asked question) documents. Taking a learning approach enables a principled, quantitative evaluation of the proposed system, and the results of some initial experiments---on a collection of Usenet FAQs and on a FAQ-like set of customer-submitted questions to several large retail companies---suggest the plausibility of learning for summarization.
Modulation Estimators and Confidence Sets
- ANN. STATIST
, 1999
"... An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to a reasonable coordinate system. The procedure adaptively tapers the coefficients of the transformed data. If the class of candidate estimators satisfies a uniform entropy condition, then b is asymptotically minimax in Pinsker's sense over certain ellipsoids in the parameter space and shares one such asymptotic minimax property with the James-Stein estimator. We describe computational algorithms for b and construct confidence sets for the unknown signal. These confidence sets are centered at b , have correct asymptotic coverage probability, and have relatively small risk as set-valued estimators of .
Information extraction from Wikipedia: Moving down the long tail
- Proceedings of KDD08
, 2008
"... Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia’s long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Our experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.
Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes
- In Proc. of the International Conference on Machine Learning (ICML
, 1999
"... We present a learning algorithm for non-parametric hidden Markov models with continuous state and observation spaces. All necessary probability densities are approximated using samples, along with density trees generated from such samples. AMonte Carlo version of Baum-Welch (EM) is employed to learn ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
We present a learning algorithm for non-parametric hidden Markov models with continuous state and observation spaces. All necessary probability densities are approximated using samples, along with density trees generated from such samples. AMonte Carlo version of Baum-Welch (EM) is employed to learn models from data. Regularization during learning is achieved using an exponential shrinking technique. The shrinkage factor, which determines the effective capacity of the learning algorithm, is annealed down over multiple iterations of BaumWelch, and early stopping is applied to select the right model. Once trained, Monte Carlo HMMs can be run in an any-time fashion. We prove that under mild assumptions, Monte Carlo Hidden Markov Models converge to a local maximum in likelihood space, just like conventional HMMs. In addition, we provide empirical results obtained in a gesture recognition domain. 1 Introduction Hidden Markov models (HMMs) [27] have been applied successfully to a large rang...
REACT Scatterplot Smoothers: Superefficiency through Basis Economy
- J. AMER. STATIST. ASSOC
, 1999
"... ..."

