Results 1  10
of
15
A discriminative matching approach to word alignment
 In Proceedings of HLTEMNLP
, 2005
"... We present a discriminative, largemargin approach to featurebased matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that pair, including measures of association between the words, distortion between their positions, similari ..."
Abstract

Cited by 100 (8 self)
 Add to MetaCart
(Show Context)
We present a discriminative, largemargin approach to featurebased matching for word alignment. In this framework, pairs of word tokens receive a matching score, which is based on features of that pair, including measures of association between the words, distortion between their positions, similarity of the orthographic form, and so on. Even with only 100 labeled training examples and simple features which incorporate counts from a large unlabeled corpus, we achieve AER performance close to IBM Model 4, in much less time. Including Model 4 predictions as features, we achieve a relative AER reduction of 22 % in over intersected Model 4 alignments. 1
Multitask feature selection
 In the workshop of structural Knowledge Transfer for Machine Learning in the 23rd International Conference on Machine Learning (ICML 2006). Citeseer
, 2006
"... We address joint feature selection across a group of classification or regression tasks. In many multitask learning scenarios, different but related tasks share a large proportion of relevant features. We propose a novel type of joint regularization for the parameters of support vector machines in ..."
Abstract

Cited by 76 (1 self)
 Add to MetaCart
(Show Context)
We address joint feature selection across a group of classification or regression tasks. In many multitask learning scenarios, different but related tasks share a large proportion of relevant features. We propose a novel type of joint regularization for the parameters of support vector machines in order to couple feature selection across tasks. Intuitively, we extend the ℓ1 regularization for singletask estimation to the multitask setting. By penalizing the sum of ℓ2norms of the blocks of coefficients associated with each feature across different tasks, we encourage multiple predictors to have similar parameter sparsity patterns. This approach yields convex, nondifferentiable optimization problems that can be solved efficiently using a simple and scalable extragradient algorithm. We show empirically that our approach outperforms independent ℓ1based feature selection on several datasets. 1.
Structured prediction, dual extragradient and Bregman projections
 Journal of Machine Learning Research
, 2006
"... We present a simple and scalable algorithm for maximummargin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem that allows us to use simple projection methods ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
We present a simple and scalable algorithm for maximummargin estimation of structured output models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem that allows us to use simple projection methods based on the dual extragradient algorithm (Nesterov, 2003). The projection step can be solved using dynamic programming or combinatorial algorithms for mincost convex flow, depending on the structure of the problem. We show that this approach provides a memoryefficient alternative to formulations based on reductions to a quadratic program (QP). We analyze the convergence of the method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1 1.
Structured prediction via the extragradient method
 In Advances in
, 2006
"... We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem and apply the extragradient method, yielding an algorith ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for mincost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1
Large margin transformation learning
, 2009
"... With the current explosion of data coming from many scientific fields and industry, machine learning algorithms are more important than ever to help make sense of this data in an automated manner. Support vector machine (SVMs) have been a very successful learning algorithm for many applied settings. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
With the current explosion of data coming from many scientific fields and industry, machine learning algorithms are more important than ever to help make sense of this data in an automated manner. Support vector machine (SVMs) have been a very successful learning algorithm for many applied settings. However, the support vector machine only finds linear classifiers so data often needs to be preprocessed with appropriately chosen nonlinear mappings in order to find a model with good predictive properties. These mappings can either take the form of an explicit transformation or be defined implicitly with a kernel function. Automatically choosing these mappings has been studied under the name of kernel learning. These methods typically optimize a cost function to find a kernel made up of a combination of base kernels thus implicitly learning mappings. This dissertation investigates methods for choosing explicit transformations automatically. This setting differs from the kernel learning framework by learning a combination of base transformations rather than base kernels. This allows prior knowledge to be exploited in the functional form of the transformations which may not be easily encoded as kernels such as when learning monotonic
Structured Prediction via the Extragradient
"... We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem and apply the extragradient method, yielding an algo ..."
Abstract
 Add to MetaCart
We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for mincost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.
Graduate Group Chairperson COPYRIGHT
, 2007
"... To my wife Ping, and my son Lucas iii Acknowledgments First and foremost I would like to thank my advisor Dr. Lawrence K. Saul. I was very fortunate to have Lawrence as my mentor. I have benefited greatly from Lawrence’s high standards on quality and elegance of scientific work. Always friendly, pat ..."
Abstract
 Add to MetaCart
(Show Context)
To my wife Ping, and my son Lucas iii Acknowledgments First and foremost I would like to thank my advisor Dr. Lawrence K. Saul. I was very fortunate to have Lawrence as my mentor. I have benefited greatly from Lawrence’s high standards on quality and elegance of scientific work. Always friendly, patient, and understanding, he has been a wonderful source of knowledge and encouragement. I am grateful to all the members of my thesis committee: Fernando Pereira, Daniel D. Lee, Mitch Marcus and Sam Roweis. They provided valuable feedback on my thesis and helped guide it to completion. I am especially indebted to Fernando, who mentored me during my first year at Penn. My early work with Fernando exposed me to the theoretical and algorithmic aspects of optimization, which I continue to find fascinating. I would also like to thank Dan for his generous support of my computational need. Many experiments were performed on his cluster. While the thesis is largely based on my close interaction with Lawrence, discussions with many people have helped me to look at the problems being studied from different perspectives. Among them, I would especially like to thank Yasemin Altun, Koby Crammer,
A Feedback Neural Network for Solving Nonlinear Programming Problems with Hybrid Constraints
"... This paper proposes a highperformance feedback neural network model for solving nonlinear convex programming problems with hybrid constraints in real time by means of the projection method. In contrary to the existing neural networks, this general model can operate not only on bound constraints, bu ..."
Abstract
 Add to MetaCart
This paper proposes a highperformance feedback neural network model for solving nonlinear convex programming problems with hybrid constraints in real time by means of the projection method. In contrary to the existing neural networks, this general model can operate not only on bound constraints, but also on hybrid constraints comprised of inequality and equality constraints. It is shown that the proposed neural network is stable in the sense of Lyapunov and can be globally convergent to an exact optimal solution of the original problem under some weaker conditions. Moreover, it has a simpler structure and a lower complexity. The advanced performance of the proposed neural network is demonstrated by simulation of several numerical examples.