## Multiview discriminative sequential learning (2005)

### Cached

### Download Links

Venue: | Proceedings of the European Conference on Machine Learning |

Citations: | 7 - 1 self |

### BibTeX

@INPROCEEDINGS{Brefeld05multiviewdiscriminative,

author = {Ulf Brefeld and Tobias Scheffer},

title = {Multiview discriminative sequential learning},

booktitle = {Proceedings of the European Conference on Machine Learning},

year = {2005},

pages = {60--71},

publisher = {Springer}

}

### OpenURL

### Abstract

1 Introduction The problem of labeling observation sequences has applications that range fromlanguage processing tasks such as named entity recognition, part-of-speech tagging, and information extraction to biological tasks in which the instances areoften DNA strings. Traditionally, sequence models such as the hidden Markov model and variants thereof have been applied to the label sequence learningproblem. Learning procedures for generative models adjust the parameters such that the joint likelihood of training observations and label sequences is maxi-mized. By contrast, from the application point of view the true benefit of a label sequence predictor corresponds to its ability to find the correct label sequencegiven an observation sequence.

### Citations

2310 | Conditional random fields: probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ... point of view the true benefit of a label sequence predictor corresponds to its ability to find the correct label sequence given an observation sequence. In the last years, conditional random fields =-=[14,15]-=-, hidden Markov support vector machines [4] and their variants have become popular; their discriminative learning procedures minimize criteria that are directly linked to their accuracy of retrieving ... |

1245 | Combining labeled and unlabeled data with co-training
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ...g (Equation 1) can be performed by a Viterbi algorithm in time O(T |Σ| 2 ), with transition matrix A = {aσ,τ } and observation matrix Bx = {bs,σ(x)} given by aσ,τ = � αi(¯y) � [[¯yt−1 = σ ∧ ¯yt = τ]] =-=(5)-=- i,¯y t bs,σ(x) = � [[¯yt = σ]]αi(¯y)k(xs, xi,t). (6) i,t,¯y We utilize a kernel function K((x,y), (¯x, ¯y)) = 〈Φ(x,y), Φ(¯x, ¯y)〉 to compute the inner product of two observation and label sequences i... |

492 | Unsupervised word sense disambiguation rivaling supervised methods
- Yarowsky
- 1995
(Show Context)
Citation Context ...ication. A multi-view approach to word sense disambiguation combines a classifier that refers to the local context of a word with a second classifier that utilizes the document in which words cooccur =-=[23]-=-. Blum and Mitchell [5] introduce the co-training algorithm for semisupervised learning that greedily augments the training set of two classifiers. A version of the AdaBoost algorithm boosts the agree... |

439 | Maximum entropy markov models for information extraction and segmentation
- McCallum, Freitag, et al.
- 2000
(Show Context)
Citation Context ...concludes. 2 Related Work In a rapidly developing line of research, many variants of discriminative sequence models are being explored. Recently studied variants include maximum entropy Markov models =-=[17]-=-, conditional random fields [14], perceptron re-ranking [7], hidden Markov support vector machines [4], label sequence boosting [3], maxmargin Markov models [21], case-factor diagrams [16], sequential... |

436 | Max-margin Markov networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ... include maximum entropy Markov models [17], conditional random fields [14], perceptron re-ranking [7], hidden Markov support vector machines [4], label sequence boosting [3], maxmargin Markov models =-=[21]-=-, case-factor diagrams [16], sequential Gaussian process models [2], kernel conditional random fields [15] and support vector machines for structured output spaces [22]. De Sa [11] observes a relation... |

433 | Unsupervised models for named entity classification
- Collins, Singer
- 1999
(Show Context)
Citation Context ...training algorithm for semisupervised learning that greedily augments the training set of two classifiers. A version of the AdaBoost algorithm boosts the agreement between two views on unlabeled data =-=[9]-=-. Dasgupta et al. [10] and Abney [1] give PAC bounds on the error of cotraining in terms of the disagreement rate of hypotheses on unlabeled data in two independent views. This justifies the direct mi... |

312 | Support vector machine learning for interdependent and structured output spaces
- Tsochantaridis, Hofmann, et al.
- 2004
(Show Context)
Citation Context ...ng [3], maxmargin Markov models [21], case-factor diagrams [16], sequential Gaussian process models [2], kernel conditional random fields [15] and support vector machines for structured output spaces =-=[22]-=-. De Sa [11] observes a relationship between consensus of multiple hypotheses and their error rate and devises a semi-supervised learning method by cascading multi-view vector quantization and linear ... |

254 | Convolution kernels for natural language
- Collins, Duffy
- 2002
(Show Context)
Citation Context ...w Hidden Markov Perceptrons In this section we present the dual multi-view hidden Markov perceptron algorithm. For the reader’s convenience, we briefly review the single-view hidden Markov perceptron =-=[8, 4]-=- and extend it to semi-supervised learning. The Hidden Markov Perceptron The goal is to learn a linear discriminant function f : X × Y → R given by f(x,y) = 〈w, Φ(x,y)〉, (9)sthat correctly decodes any... |

191 | Analyzing the effectiveness and applicability of co-training
- Nigam, Ghani
- 2000
(Show Context)
Citation Context ... the direct minimization of the disagreement. The co-EM algorithm for semi-supervised learning probabilistically labels all unlabeled examples and iteratively exchanges those labels between two views =-=[20,12]-=-. Muslea et al. [19] extend co-EM for active learning and Brefeld and Scheffer [6] study a co-EM wrapper for the support vector machine.s3 Background In this section we review “input output spaces” [2... |

189 | T.: Hidden Markov support vector machines
- Altun, Tsochantaridis, et al.
(Show Context)
Citation Context ...ence predictor corresponds to its ability to find the correct label sequence given an observation sequence. In the last years, conditional random fields [14,15], hidden Markov support vector machines =-=[4]-=- and their variants have become popular; their discriminative learning procedures minimize criteria that are directly linked to their accuracy of retrieving the correct label sequence. In addition, ke... |

88 | Active+ semisupervised learning= robust multi-view learning
- Muslea, Minton, et al.
- 2002
(Show Context)
Citation Context ...n of the disagreement. The co-EM algorithm for semi-supervised learning probabilistically labels all unlabeled examples and iteratively exchanges those labels between two views [20,12]. Muslea et al. =-=[19]-=- extend co-EM for active learning and Brefeld and Scheffer [6] study a co-EM wrapper for the support vector machine.s3 Background In this section we review “input output spaces” [2] and the consensus ... |

75 |
PAC generalization bounds for cotraining
- Dasgupta, Littman, et al.
- 2002
(Show Context)
Citation Context ...itially independent hypotheses, and then minimize the disagreement of these hypotheses regarding the correct labels of the unlabeled data [11]. Thereby, they minimize an upper bound on the error rate =-=[10]-=-. The rest of our paper is structured as follows. Section 2 reports on related work and Section 3 reviews input output spaces and provides some background on multi-view learning. In Section 4 and 5 we... |

74 | Ranking algorithms for named-entity extraction: Boosting and the voted perceptron
- Collins
(Show Context)
Citation Context ...search, many variants of discriminative sequence models are being explored. Recently studied variants include maximum entropy Markov models [17], conditional random fields [14], perceptron re-ranking =-=[7]-=-, hidden Markov support vector machines [4], label sequence boosting [3], maxmargin Markov models [21], case-factor diagrams [16], sequential Gaussian process models [2], kernel conditional random fie... |

74 | Kernel conditional random fields: representation and clique selection
- Lafferty
- 2004
(Show Context)
Citation Context ... point of view the true benefit of a label sequence predictor corresponds to its ability to find the correct label sequence given an observation sequence. In the last years, conditional random fields =-=[14,15]-=-, hidden Markov support vector machines [4] and their variants have become popular; their discriminative learning procedures minimize criteria that are directly linked to their accuracy of retrieving ... |

71 | Identifying gene and protein mentions in text using conditional random fields
- McDonald, Pereira
- 2005
(Show Context)
Citation Context ...pean Conference on Machine Learning (ECML), c○Springer, 2005.stasks; for instance, some of the top scoring systems in the BioCreative named entity recognition challenge used conditional random fields =-=[18]-=-. In the training process of generative sequence models, additional inexpensive and readily available unlabeled sequences can easily be utilized by employing Baum-Welch, a variant of the EM algorithm.... |

42 | Case-factor diagrams for structured probabilistic modeling
- McAllester, Collins, et al.
- 2004
(Show Context)
Citation Context ...rkov models [17], conditional random fields [14], perceptron re-ranking [7], hidden Markov support vector machines [4], label sequence boosting [3], maxmargin Markov models [21], case-factor diagrams =-=[16]-=-, sequential Gaussian process models [2], kernel conditional random fields [15] and support vector machines for structured output spaces [22]. De Sa [11] observes a relationship between consensus of m... |

40 | Combining labeled and unlabeled data for multiclass text categorization
- Ghani
- 2002
(Show Context)
Citation Context ... the direct minimization of the disagreement. The co-EM algorithm for semi-supervised learning probabilistically labels all unlabeled examples and iteratively exchanges those labels between two views =-=[20,12]-=-. Muslea et al. [19] extend co-EM for active learning and Brefeld and Scheffer [6] study a co-EM wrapper for the support vector machine.s3 Background In this section we review “input output spaces” [2... |

36 |
Learning classification with unlabeled data
- Sa
- 1993
(Show Context)
Citation Context ...ed. Multi-view algorithms such as co-training [5] learn two initially independent hypotheses, and then minimize the disagreement of these hypotheses regarding the correct labels of the unlabeled data =-=[11]-=-. Thereby, they minimize an upper bound on the error rate [10]. The rest of our paper is structured as follows. Section 2 reports on related work and Section 3 reviews input output spaces and provides... |

34 | Gaussian process classification for segmenting and annotating sequences
- Altun, Hofmann, et al.
- 2004
(Show Context)
Citation Context ...ds [14], perceptron re-ranking [7], hidden Markov support vector machines [4], label sequence boosting [3], maxmargin Markov models [21], case-factor diagrams [16], sequential Gaussian process models =-=[2]-=-, kernel conditional random fields [15] and support vector machines for structured output spaces [22]. De Sa [11] observes a relationship between consensus of multiple hypotheses and their error rate ... |

34 | Co-em support vector learning
- Brefeld, Scheffer
- 2004
(Show Context)
Citation Context ...learning probabilistically labels all unlabeled examples and iteratively exchanges those labels between two views [20,12]. Muslea et al. [19] extend co-EM for active learning and Brefeld and Scheffer =-=[6]-=- study a co-EM wrapper for the support vector machine.s3 Background In this section we review “input output spaces” [2] and the consensus maximization principle that underlies multi-view algorithms fo... |

33 | M (2002) Discriminative learning for label sequences via boosting. In: Advances in Neural Information Processing Systems 15 Bakir G, Zien A, Tsuda K (2004) Learning to find graph pre-images
- Altun, Hofmann, et al.
- 2004
(Show Context)
Citation Context ...ed. Recently studied variants include maximum entropy Markov models [17], conditional random fields [14], perceptron re-ranking [7], hidden Markov support vector machines [4], label sequence boosting =-=[3]-=-, maxmargin Markov models [21], case-factor diagrams [16], sequential Gaussian process models [2], kernel conditional random fields [15] and support vector machines for structured output spaces [22]. ... |

19 |
Systematic Feature Evaluation for Gene Name Recognition
- Hakenberg, Bickel, et al.
- 2005
(Show Context)
Citation Context ...ognized. View 1 consists of the token itself together with letter 2, 3 and 4-grams; view 2 contains surface clues like capitalization, inclusion of Greek symbols, numbers, and others as documented in =-=[13]-=-. The CoNLL2002 data contains 9 label types which distinguish person, organization, location, and other names. We use 3100 sentences of between 10 and 40 tokens which we represent by a token view and ... |

5 | Bootstrapping. The 40th Annual Meeting of the Association for Computational Linguistics - Abney - 2002 |