Results 1  10
of
29
Spectral Experts for Estimating Mixtures of Linear Regressions
"... Discriminative latentvariable models are typically learned using EM or gradientbased optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Discriminative latentvariable models are typically learned using EM or gradientbased optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latentvariable model. Our approach relies on a lowrank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of convergence for our estimator and provide an empirical evaluation illustrating its strengths relative to local optimization (EM). 1.
Journal of Machine Learning Research 1–11 Supplementary Material for Spectral Experts for Estimating Mixtures of Linear Regressions
"... Let [n] = {1,..., n} denote the first n positive integers. We use x⊗p to represent the pth order tensor formed by taking the outer product of x ∈ Rd; i.e. x ⊗p i1...ip = xi1 · · · xip. We will use 〈·, · 〉 to denote the generalized dot product between two pth order tensors: 〈X, Y 〉 = ∑ i1,...ip ..."
Abstract
 Add to MetaCart
Let [n] = {1,..., n} denote the first n positive integers. We use x⊗p to represent the pth order tensor formed by taking the outer product of x ∈ Rd; i.e. x ⊗p i1...ip = xi1 · · · xip. We will use 〈·, · 〉 to denote the generalized dot product between two pth order tensors: 〈X, Y 〉 = ∑ i1,...ip Xi1,...ipYi1,...ip. A tensor X is symmetric if for all i, j ∈ [d] p which are permutations of each other, Xi1···ip = Xj1···jp (all tensors in this paper will be symmetric). For a pth order tensor X ∈ (Rd) ⊗p, the modei unfolding of X is a matrix, X (i) ∈ Rd×dp−1, whose jth row contains all the elements of X whose ith index is equal to j. For a vector X, let ‖X‖op denote the 2norm. For a matrix X, let ‖X‖ ∗ denote the nuclear (trace) norm (sum of singular values), let ‖X‖F denote the Frobenius norm (square root of sum of squares of singular values), let ‖X‖max denote the max norm (elementwise maximum), let ‖X‖op denote the operator norm (largest singular value), let σmin(X) be the smallest singular value of X. For a tensor X, let ‖X‖ ∗ = 1 p average nuclear norm over all p unfoldings, and let ‖X‖op = 1 p average operator norm over all p unfoldings. For a symmetric tensor X ∈ (Rd) ⊗p, let cvec(X) ∈ RCd,p, Cd,p = ( d+p+1 p ∑p i=1 ‖X (i)‖ ∗ denote the ∑p i=1 ‖X (i)‖op denote the be the collapsed vectorization of distinct elements in X, for example, for X ∈ R d×d, cvec(X) = (Xii: i ∈ [d]; Xij + Xji: i, j ∈ [d], i < j). In general, each component of cvec(X) is indexed by a vector of counts (c1,..., cd) with total sum ∑ i ci = p. The value of that component is k∈K(c) Xk1···kp, where K(c) = {k ∈ [d]p: ∀i ∈ [d], ci = {j ∈ [p] : kj = i}} are the set of index vectors k with that count profile.
Learning in a Small World
"... Understanding how we are able to perform a diverse set of complex tasks is a central question for the Artificial Intelligence community. A popular approach is to use temporal abstraction as a framework to capture the notion of subtasks. However, this transfers the problem to finding the right subtas ..."
Abstract
 Add to MetaCart
Understanding how we are able to perform a diverse set of complex tasks is a central question for the Artificial Intelligence community. A popular approach is to use temporal abstraction as a framework to capture the notion of subtasks. However, this transfers the problem to finding the right subtasks, which is still an open problem. Existing approaches for subtask generation require too much knowledge of the environment, and the abstractions they create can overwhelm the agent. We propose a simple algorithm inspired by small world networks to learn subtasks while solving a task that requires virtually no information of the environment. Additionally, we show that the subtasks we learn can be easily composed by the agent to solve any other task; more formally, we prove that any task can be solved using only a logarithmic combination of these subtasks and primitive actions. Experimental results show that the subtasks we generate outperform other popular subtask generation schemes on standard domains.
Estimating LatentVariable Graphical Models using Moments and Likelihoods
"... Recent work on the method of moments enable consistent parameter estimation, but only for certain types of latentvariable models. On the other hand, pure likelihood objectives, though more universally applicable, are difficult to optimize. In this work, we show that using the method of moments in ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Recent work on the method of moments enable consistent parameter estimation, but only for certain types of latentvariable models. On the other hand, pure likelihood objectives, though more universally applicable, are difficult to optimize. In this work, we show that using the method of moments in conjunction with composite likelihood yields consistent parameter estimates for a much broader class of discrete directed and undirected graphical models, including loopy graphs with high treewidth. Specifically, we use tensor factorization to reveal information about the hidden variables. This allows us to construct convex likelihoods which can be globally optimized to recover the parameters. 1.
Estimating Mixture Models via Mixtures of Polynomials
"... Mixture modeling is a general technique for making any simple model more expressive through weighted combination. This generality and simplicity in part explains the success of the Expectation Maximization (EM) algorithm, in which updates are easy to derive for a wide class of mixture models. Howev ..."
Abstract
 Add to MetaCart
Mixture modeling is a general technique for making any simple model more expressive through weighted combination. This generality and simplicity in part explains the success of the Expectation Maximization (EM) algorithm, in which updates are easy to derive for a wide class of mixture models. However, the likelihood of a mixture model is nonconvex, so EM has no known global convergence guarantees. Recently, method of moments approaches offer global guarantees for some mixture models, but they do not extend easily to the range of mixture models that exist. In this work, we present Polymom, an unifying framework based on method of moments in which estimation procedures are easily derivable, just as in EM. Polymom is applicable when the moments of a single mixture component are polynomials of the parameters. Our key observation is that the moments of the mixture model are a mixture of these polynomials, which allows us to cast estimation as a Generalized Moment Problem. We solve its relaxations using semidefinite optimization, and then extract parameters using ideas from computer algebra. This framework allows us to draw insights and apply tools from convex optimization, computer algebra and the theory of moments to study problems in statistical estimation. Simulations show good empirical performance on several models. 1
An Experimental Study of the Learnability of Congestion Control
"... When designing a distributed network protocol, typically it is infeasible to fully define the target network where the protocol is intended to be used. It is therefore natural to ask: How faithfully do protocol designers really need to understand the networks they design for? What are the importan ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
When designing a distributed network protocol, typically it is infeasible to fully define the target network where the protocol is intended to be used. It is therefore natural to ask: How faithfully do protocol designers really need to understand the networks they design for? What are the important signals that endpoints should listen to? How can researchers gain confidence that systems that work well on wellcharacterized test networks during development will also perform adequately on real networks that are inevitably more complex, or future networks yet to be developed? Is there a tradeoff between the performance of a protocol and the breadth of its intended operating range of networks? What is the cost of playing fairly with crosstraffic that is governed by another protocol? We examine these questions quantitatively in the context of congestion control, by using an automated protocoldesign tool to approximate the best possible congestioncontrol scheme given imperfect prior knowledge about the network. We found only weak evidence of a tradeoff between operating range in link speeds and performance, even when the operating range was extended to cover a thousandfold range of link speeds. We found that it may be acceptable to simplify some characteristics of the network—such as its topology—when modeling for design purposes. Some other features, such as the degree of multiplexing and the aggressiveness of contending endpoints, are important to capture in a model.
International Foundation for Autonomous Agents and Multiagent Systems
, 2012
"... www.ifaamas.org ..."
Session 1A – Innovative Applications
, 2012
"... www.ifaamas.org Copyright c ○ 2012 by the International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). Permissions to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for ..."
Abstract
 Add to MetaCart
www.ifaamas.org Copyright c ○ 2012 by the International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). Permissions to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage and that the copies bear the full citation on the first page. Copyrights for components of this work owned by others than IFAAMAS must
Acknowledgments
, 2015
"... First, I wish to thank my advisors Gal Chechik and Daphna Weinshall. Without their guidance, support, knowledge and generosity this journey would not have been possible. I will miss the excitement of approaching them with a new idea, knowing they will always be interested, thoughtful, and encouragi ..."
Abstract
 Add to MetaCart
First, I wish to thank my advisors Gal Chechik and Daphna Weinshall. Without their guidance, support, knowledge and generosity this journey would not have been possible. I will miss the excitement of approaching them with a new idea, knowing they will always be interested, thoughtful, and encouraging. Daphna has been a constant positive force, sharing her knowledge and curiosity, always teaching me new ways to look at the problems that confronted me. I always left our meetings uplifted with new ideas and possibilities. Gal has guided me through these years, always showing me that I have more in myself that I believed. He showed me how to better myself and my work in ways I never thought possible. I want to thank him for sharing his knowledge, excitement, and seemingly endless capacity for turning things around from bad to good. I wish to thank my colleagues and lab mates, for making this time so much more fun and interesting. Thank you to Noa, Lior, Ossnat, Hadas, Ronnie and Yuval from Gal’s lab. Often times they were the best reason to go into the office in the morning, not to mention a source of support, inspiration, ideas, papers, and much needed distractions. I also want to thank my friends from the learning corridor in Jerusalem, always there, in good times and bad: Daniel, Roi, Alon, Elad and Elad made parts of this journey with me and made it much better that way.
Results 1  10
of
29