Results 1 
7 of
7
Learning From Measurements in Exponential Families
"... Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most costeffective way to learn? To address this question, we introduce measurements, a general class ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints—both provide information about the desired model. In general, what is the most costeffective way to learn? To address this question, we introduce measurements, a general class of mechanisms for providing information about a target model. We present a Bayesian decisiontheoretic framework, which allows us to both integrate diverse measurements and choose new measurements to make. We use a variational inference algorithm, which exploits exponential family duality. The merits of our approach are demonstrated on two sequence labeling tasks. 1.
Unsupervised ontological induction from text
 In Proc. of ACL
, 2010
"... Extracting knowledge from unstructured text is a longstanding goal of NLP. Although learning approaches to many of its subtasks have been developed (e.g., parsing, taxonomy induction, information extraction), all endtoend solutions to date require heavy supervision and/or manual engineering, limi ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
(Show Context)
Extracting knowledge from unstructured text is a longstanding goal of NLP. Although learning approaches to many of its subtasks have been developed (e.g., parsing, taxonomy induction, information extraction), all endtoend solutions to date require heavy supervision and/or manual engineering, limiting their scope and scalability. We present OntoUSP, a system that induces and populates a probabilistic ontology using only dependencyparsed text as input. OntoUSP builds on the USP unsupervised semantic parser by jointly forming ISA and ISPART hierarchies of lambdaform clusters. The ISA hierarchy allows more general knowledge to be learned, and the use of smoothing for parameter estimation. We evaluate OntoUSP by using it to extract a knowledge base from biomedical abstracts and answer questions. OntoUSP improves on the recall of USP by 47 % and greatly outperforms previous stateoftheart approaches. 1
Machine Reading: A “Killer App ” for Statistical Relational AI
"... Machine reading aims to automatically extract knowledge from text. It is a longstanding goal of AI and holds the promise of revolutionizing Web search and other fields. In this paper, we analyze the core challenges of machine reading and show that statistical relational AI is particularly well suit ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Machine reading aims to automatically extract knowledge from text. It is a longstanding goal of AI and holds the promise of revolutionizing Web search and other fields. In this paper, we analyze the core challenges of machine reading and show that statistical relational AI is particularly well suited to address these challenges. We then propose a unifying approach to machine reading in which statistical relational AI plays a central role. Finally, we demonstrate the promise of this approach by presenting OntoUSP, an endtoend machine reading system that builds on recent advances in statistical relational AI and greatly outperforms stateoftheart systems in a task of extracting knowledge from biomedical abstracts and answering questions.
Collaborative Filtering via Rating Concentration
"... While most popular collaborative filtering methods use lowrank matrix factorization and parametric density assumptions, this article proposes an approach based on distributionfree concentration inequalities. Using agnostic hierarchical sampling assumptions, functions of observed ratings are provab ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
While most popular collaborative filtering methods use lowrank matrix factorization and parametric density assumptions, this article proposes an approach based on distributionfree concentration inequalities. Using agnostic hierarchical sampling assumptions, functions of observed ratings are provably close to their expectations over query ratings, on average. A joint probability distribution over queries of interest is estimated using maximum entropy regularization. The distribution resides in a convex hull of allowable candidate distributions which satisfy concentration inequalities that stem from the sampling assumptions. The method accurately estimates rating distributions on synthetic and real data and is competitive with low rank and parametric methods which make more aggressive assumptions about the problem. 1
Maxent grammars for the metrics of Shakespeare and Milton. Paper presented at the 2010 Annual Meeting of the Linguistic Society of America, Baltimore. www.linguistics.ucla.edu/people/hayes/papers/HayesShiskoWilsonSlides.pdf Hayes, Bruce and Colin Wilson (
 Linguistic Inquiry
, 2010
"... We propose a new approach to metrics based on maxent grammars, which employ weighted constraints and assign a wellformedness value to every metrically distinct line type. We claim two advantages for our approach. First, it offers an explicit account of metricality and metrical complexity, an accoun ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We propose a new approach to metrics based on maxent grammars, which employ weighted constraints and assign a wellformedness value to every metrically distinct line type. We claim two advantages for our approach. First, it offers an explicit account of metricality and metrical complexity, an account that has a principled mathematical basis and integrates information from all aspects of metrical scansion. Second, our approach permits statistical evaluation of proposed constraints. This makes it possible to determine when constraints are vacuous, their work being already done by simpler, independently needed constraints. We begin by setting up a system built on earlier work that defines the set of possible constraints following principles of stress matching, bracket matching, and contextual salience. Our analyses of two data corpora — Shakespeare’s Sonnets and Books VIII and IX of Milton’s Paradise Lost—shows that the basic concepts of this system work well in describing the data. However, one wellknown type of constraint, based on the principle of the stress maximum (Halle and Keyser 1966 et seq.), turns out to be vacuous; our testing indicates that the work of stress maximum constraints is better done by other constraints of the grammar.
Language modeling for . . .
, 2009
"... With the increasing focus of speech recognition and natural language processing applications on domains with limited amount of indomain training data, enhanced system performance often relies on approaches involving model adaptation and combination. In such domains, language models are often constr ..."
Abstract
 Add to MetaCart
With the increasing focus of speech recognition and natural language processing applications on domains with limited amount of indomain training data, enhanced system performance often relies on approaches involving model adaptation and combination. In such domains, language models are often constructed by interpolating component models trained from partially matched corpora. Instead of simple linear interpolation, we introduce a generalized linear interpolation technique that computes contextdependent mixture weights from features that correlate with the component confidence and relevance for each ngram context. Since the ngrams from partially matched corpora may not be of equal relevance to the target domain, we propose an ngram weighting scheme to adjust the component ngram probabilities based on features derived from readily available corpus segmentation and metadata to deemphasize outofdomain ngrams. In scenarios without any matched data for a development set, we examine unsupervised and active learning techniques for tuning the interpolation and weighting parameters. Results on a lecture transcription task using the proposed generalized linear interpolation and ngram weighting techniques yield up to a 1.4 % absolute word error rate reduction
Learning with DegreeBased Subgraph Estimation
, 2011
"... Networks and their topologies are critical to nearly every aspect of modern life, with social networks governing human interactions and computer networks governing global informationflow. Network behavior is inherently structural, and thus modeling data from networks benefits from explicitly modeli ..."
Abstract
 Add to MetaCart
Networks and their topologies are critical to nearly every aspect of modern life, with social networks governing human interactions and computer networks governing global informationflow. Network behavior is inherently structural, and thus modeling data from networks benefits from explicitly modeling structure. This thesis covers methods for and analysis of machine learning from network data while explicitly modeling one important measure of structure: degree. Central to this work is a procedure for exact maximum likelihood estimation of a distribution over graph structure, where the distribution factorizes into edgelikelihoods for each pair of nodes and degreelikelihoods for each node. This thesis provides a novel method for exact estimation of the maximum likelihood edge structure under the distribution. The algorithm solves the optimization by constructing an augmented graph containing, in addition to the original nodes, auxiliary nodes whose edges encode the degree potentials. The exact solution is then recoverable by finding the maximum weight bmatching on the augmented graph, a wellstudied combinatorial optimization. To solve the combinatorial optimization, this thesis focuses in particular on a belief propagationbased approach to finding the optimal bmatching and provides a novel proof of convergence for belief propagation on the loopy graphical model representing the bmatching objective. Additionally,