Results

**1 - 2**of**2**### Stochastic Structured Prediction under Bandit Feedback

"... Abstract Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.

### Response-Based Learning for Patent Translation

"... In response-based structured prediction, instead of a gold-standard structure, the learner is given a response to a predicted structure from which a supervision signal for structured learning is extracted. Applied to statistical machine translation (SMT), different types of environments such as a do ..."

Abstract
- Add to MetaCart

In response-based structured prediction, instead of a gold-standard structure, the learner is given a response to a predicted structure from which a supervision signal for structured learning is extracted. Applied to statistical machine translation (SMT), different types of environments such as a downstream application, a professional translator, or an SMT user, may respond to predicted translations with a ranking, a correction, or an acceptance/rejection decision, respec-tively. We present algorithms and experiments that show that learning from responses alleviates the supervision problem and allows a direct optimization of SMT for tasks such as cross-lingual patent prior art retrieval, or translation of technical patent documents. 1