Results 1 
6 of
6
A framework for learning predictive structures from multiple tasks and unlabeled data
 Journal of Machine Learning Research
, 2005
"... One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods ar ..."
Abstract

Cited by 318 (3 self)
 Add to MetaCart
One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods are proposed, at the current stage, we still don’t have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semisupervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semisupervised learning setting. 1.
Boosted sampling: Approximation algorithms for stochastic optimization problems
 IN: 36TH STOC
, 2004
"... Several combinatorial optimization problems choose elements to minimize the total cost of constructing a feasible solution that satisfies requirements of clients. In the STEINER TREE problem, for example, edges must be chosen to connect terminals (clients); in VERTEX COVER, vertices must be chosen t ..."
Abstract

Cited by 90 (22 self)
 Add to MetaCart
Several combinatorial optimization problems choose elements to minimize the total cost of constructing a feasible solution that satisfies requirements of clients. In the STEINER TREE problem, for example, edges must be chosen to connect terminals (clients); in VERTEX COVER, vertices must be chosen to cover edges (clients); in FACILITY LOCATION, facilities must be chosen and demand vertices (clients) connected to these chosen facilities. We consider a stochastic version of such a problem where the solution is constructed in two stages: Before the actual requirements materialize, we can choose elements in a first stage. The actual requirements are then revealed, drawn from a prespecified probability distribution π; thereupon, some more elements may be chosen to obtain a feasible solution for the actual requirements. However, in this second (recourse) stage, choosing an element is costlier by a factor of σ> 1. The goal is to minimize the first stage cost plus the expected second stage cost. We give a general yet simple technique to adapt approximation algorithms for several deterministic problems to their stochastic versions via the following method. • First stage: Draw σ independent sets of clients from the distribution π and apply the approximation algorithm to construct a feasible solution for the union of these sets. • Second stage: Since the actual requirements have now been revealed, augment the firststage solution to be feasible for these requirements.
Arbitrary Source Models and Bayesian Codebooks in RateDistortion Theory
 IEEE Trans. Inform. Theory
, 2002
"... We characterize the best achievable performance of lossy compression algorithms operating on arbitrary random sources, and with respect to general distortion measures. Direct and converse coding theorems are given for variablerate codes operating at a xed distortion level, emphasizing: (a) nonasym ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
We characterize the best achievable performance of lossy compression algorithms operating on arbitrary random sources, and with respect to general distortion measures. Direct and converse coding theorems are given for variablerate codes operating at a xed distortion level, emphasizing: (a) nonasymptotic results, (b) optimal or nearoptimal redundancy bounds, and (c) results with probability one. This development is based in part on the observation that there is a precise correspondence between compression algorithms and probability measures on the reproduction alphabet. This is analogous to the Kraft inequality in lossless data compression. In the case of stationary ergodic sources our results reduce to the classical coding theorems. As an application of these general results, we examine the performance of codes based on mixture codebooks for discrete memoryless sources. A mixture codebook (or Bayesian codebook) is a random codebook generated from a mixture over some class of reproduction distributions. We demonstrate the existence of universal mixture codebooks, and show that it is possible to universally encode memoryless sources with redundancy of approximately (d=2) log n bits, where d is the dimension of the simplex of probability distributions on the reproduction alphabet.
Exact retrospective simulation of diffusion processes and optimal filtering
"... Review some of our recent work on simulation, inference and filtering for diffusion processes. Ideas are useful in more general contexts. Work in collaboration, references will be given at the end of the talk. Context: Diffusion processes ◮ SDE: microscopic motion dXs = b(Xs; θ)ds + σ(Xs; θ)dBs, Xs ..."
Abstract
 Add to MetaCart
Review some of our recent work on simulation, inference and filtering for diffusion processes. Ideas are useful in more general contexts. Work in collaboration, references will be given at the end of the talk. Context: Diffusion processes ◮ SDE: microscopic motion dXs = b(Xs; θ)ds + σ(Xs; θ)dBs, Xs ∈ R d, θ ∈ Θ (1) ◮ Discretetime skeleton of the process x = {Xt0, Xt1,..., Xtn}, 0 = t0 < t1 < · · · < tn, ∆ti = ti − ti−1. The discretetime/macroscopic dynamics are given by the transition density: pt(x, y; θ) = P [ Xt ∈ dy  X0 = x; θ] / dy, t> 0, x, y ∈ R d In all but few special cases the transition density is analytically intractable. Challenges due to unavailability of transition density 1. Exact simulation of the diffusion at given times ◮ Aim: simulate Xt  X0 = x ∼ pt(x, ·; θ). 2. Likelihoodbased inference for discretely observed diffusions. ◮ Observed data x and assume an SDE model (1). By the Markov property, n∑ ℓ(θ  x) = ℓi(θ):= i=1 n∑ i=1 log p∆ti (Xti−1, Xti; θ) 3. Filtering for unobserved diffusions. ◮ The data provide only partial information about X. Example 1 (low/medium freq): Yti = Xti + ɛi. Example 2 (high/arbitrary freq): X = (Z, Y), observe only Y. Previous approaches to these problems have mostly relied on numerical/model approximations. Typically, discretizing the SDE. Plan for today: ◮ Brief description of novel simulation algorithm for simulation of diffusion sample paths without approximation error. Construction for the simplest setting. ◮ Mention crucial generalizations: WienerPoisson factorization, gradienttype multivariate diffusions, conditional simulation ◮ Show how to derive an fully adapted optimal filter for unobserved component Poisson subsampling of the data. ◮ Extensions: Importance sampling, inference for partially observed processes ReferencesExact simulation of diffusion sample paths by rejection sampling Setup (1) : d = 1, σ = 1 dXs = α(Xs) ds + dBs, s ∈ [0, t] Wish to simulate X = (Xs, s ∈ [0, t]), given X0 = x. Think of X as a pathvalued random object. Let Q (t,x) be the distribution induced by X, W (t,x) the distribution induced by B, ω = (ωs, 0 ≤ s ≤ t) a typical path, ω0 = x. Girsanov’s formula: dQ (t,x) { ∫ t (ω) = exp dW (t,x)
P ̸ = NP
"... We demonstrate the separation of the complexity class NP from its subclass P. Throughout our proof, we observe that the ability to compute a property on structures in polynomial time is intimately related to the statistical notions of conditional independence and sufficient statistics. The presence ..."
Abstract
 Add to MetaCart
We demonstrate the separation of the complexity class NP from its subclass P. Throughout our proof, we observe that the ability to compute a property on structures in polynomial time is intimately related to the statistical notions of conditional independence and sufficient statistics. The presence of conditional independencies manifests in the form of economical parametrizations of the joint distribution of covariates. In order to apply this analysis to the space of solutions of random constraint satisfaction problems, we utilize and expand upon ideas from several fields spanning logic, statistics, graphical models, random ensembles, and statistical physics. We begin by introducing the requisite framework of graphical models for a set of interacting variables. We focus on the correspondence between Markov and Gibbs properties for directed and undirected models as reflected in the factorization of their joint distribution, and the number of independent parameters required to specify the distribution. Next, we build the central contribution of this work. We show that there are fundamental conceptual relationships between polynomial time computation, which is completely captured by the logic FO(LFP) on some classes of structures, and certain directed Markov properties
Working Paper presented at the Worlshop "Global Logistics " , EIASM,
"... Please do not quote without permission. ..."