## Constraint classification for multiclass classification and ranking (2003)

### Cached

### Download Links

- [books.nips.cc]
- [l2r.cs.uiuc.edu]
- [l2r.cs.uiuc.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 16th Annual Conference on Neural Information Processing Systems, NIPS-02 |

Citations: | 47 - 5 self |

### BibTeX

@INPROCEEDINGS{Har-peled03constraintclassification,

author = {Sariel Har-peled and Dan Roth and Dav Zimak},

title = {Constraint classification for multiclass classification and ranking},

booktitle = {In Proceedings of the 16th Annual Conference on Neural Information Processing Systems, NIPS-02},

year = {2003},

pages = {785--792},

publisher = {MIT Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

The constraint classification framework captures many flavors of multiclass classification including winner-take-all multiclass classification, multilabel classification and ranking. We present a meta-algorithm for learning in this framework that learns via a single linear classifier in high dimension. We discuss distribution independent as well as margin-based generalization bounds and present empirical and theoretical evidence showing that constraint classification benefits over existing methods of multiclass classification. 1

### Citations

8980 | Statistical Learning Theory - Vapnik - 1998 |

3921 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...ns, each � in dimensional space. 3.1 Kesler’s Construction Kesler’s construction for multiclass classification was first introduced by Nilsson in 1965[Nil65, 75–77] and can also be found more recently=-=[DH73]-=-. This subsection extends the Kesler construction for constraint classification. ¡£¢ ¢ ¡£¢ ¡s�s¢¡ � £�������££¡ Definition 3.1 (Chunk) A vector , is broken wheres� �s¢¡ � �¤¥����§¦ ¢ ¢ � £�������££¡��... |

2868 |
P.: UCI Repository of Machine Learning Databases
- Merz, Merphy
- 1996
(Show Context)
Citation Context ...develop generalization bounds. 5 Experiments As in previous multiclass classification work [DB95, ASS00], we tested our algorithm on a suite of problems from the Irvine Repository of machine learning =-=[BM98]-=- (see Table 2). In addition, we created a simple experiment using synthetic data. The data was generated according to a � WTA function over randomly generated linear functions in ¡ ��� � � , � each wi... |

739 |
Statistical Methods for Speech Recognition
- Jelinek
- 1998
(Show Context)
Citation Context ... a discrimination among several classes are ubiquitous. In machine learning, these include handwritten character recognition [LS97, LBD¢ 89], part-of-speech tagging [Bri94, EZR01], speech recognition =-=[Jel98]-=- and text categorization [ADW94, DKR97]. While binary classification is well understood, relatively little is known about multiclass classification. Indeed, the most common approach to multiclass clas... |

569 | Solving Multiclass Learning Problems via Error-Correcting Output Codes
- Dietterich, Bakiri
- 1995
(Show Context)
Citation Context ...ve online algorithms, and traditional one-versus-all approaches can be cast in this framework. It would be interesting to see if it could be combined with the error-correcting output coding method in =-=[DB95]-=- that provides another way to extend the OvA approach. Furthermore, this view allows for a very natural extension of multiclass classification to constraint classification – capturing within it comple... |

419 | Reducing multiclass to binary: A unifying approach for margin classifiers
- Allwein, Schapire, et al.
(Show Context)
Citation Context ... functions that uses an extended notion of VC-dimension for multiclass case [BCHL95] provides poor bounds on generalization for WTA, and the current best bounds rely on a generalized notion of margin =-=[ASS00]-=-. In this section, we prove tighter bounds using the new framework. We seek generalization bounds for learning � with , the class of linear sorting functions (Definition 2.6). Although both VC-dimensi... |

310 |
Neural Network Learning: Theoretical Foundations
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ...o Newtorks of Linear Threshold Gates (Perceptron) ¡�¢ ¡�© � ¡£¢s���© �s� ��¡ It is possible to implement the algorithm in Section using a network of linear classifiers such as multi-output Perceptron =-=[AB99]-=-, SNoW [CCRR99, Rot98], and multiclass SVM [CS00, WW99]. Such a network has as input and outputs, each represented by a weight vector, (b)). , where the -th output computes (see Figure 1 Typically, a ... |

304 | Backpropagation applied to handwritten zip code recognition - Cun, Boser, et al. - 1989 |

275 | Classification by pairwise coupling
- Hastie, Tibshirani
- 1996
(Show Context)
Citation Context ...rain the output labels. The OvA scheme assumes that for each class there exists a single (simple) separator between that class and all the other classes. Another common approach, all-versus-all (AvA) =-=[HT98]-=-, is a more expressive alternative which assumes the existence of a separator between any two classes. OvA classifiers are usually implemented using a winner-take-all (WTA) strategy that associates a ... |

255 | Automated learning of decision rules for text categorization - Apté, Damerau, et al. - 1994 |

255 | Some advances in transformation-based part of speech tagging - Brill - 1994 |

249 | Ultraconservative online algorithms for multiclass problems
- Crammer, Singer
- 2003
(Show Context)
Citation Context ... , for every constraints¢��£�� § © ¥ , ifs� ��¡��� �� ��¡ ,s� is ‘promoted’ ands�� is ‘demoted’. Using a network in this results in an ultraconservative online algorithm for multiclass classification =-=[CS01]-=-. This subtle change enables the commonly used network of linear threshold gates to learn every hypothesis it is capable of representing.sf = 0 f = 0 Figure 2: A 3-class classification example ins+ + ... |

167 | Learning to resolve natural language ambiguities: A unified approach - Roth - 1998 |

163 | Y (2000) On the learnability and design of output codes for multiclass problems - Crammer, Singer |

146 | Support vector machines for multi-class pattern recognition - Weston, Watkins - 1999 |

98 | Mistakedriven learning in text categorization - Dagan, Karov, et al. - 1997 |

75 | Learning Machines: Foundations of Trainable Pattern Classifying Systems - Nilsson - 1965 |

69 | The SNoW learning architecture - Carlson, Cumby, et al. - 1999 |

69 | Constraint classification: A new ap- proach to multiclass classification and ranking
- Har-Peled, Roth, et al.
- 2003
(Show Context)
Citation Context ...change to the standard (via OvA) approach to training a network of linear threshold gates. In Section 4, we discuss both VC-dimension and margin-based generalization bounds presented a companion paper=-=[HPRZ02]-=-. Our generalization bounds apply to WTA classifiers over linear functions, for which VC-style bounds were not known. In addition to multiclass classification, constraint classification generalizes mu... |

33 | A sequential model for multi-class classification. arXiv preprint cs/0106044
- Even-Zohar, Roth
- 2001
(Show Context)
Citation Context ..., preliminary experiments using various natural language data sets, such as part-of-speech tagging, do not yield any significant difference between the two approaches. We used a common transformation =-=[EZR01]-=- to convert raw data to approximately three million examples in one hundred thousand dimensional boolean feature space. There were about 50 different partof-speech tags. Because the constraint approac... |

33 | Unsupervised learning by convex and conic coding. Advances in neural information processing systems - Lee, Seung - 1997 |

33 | On the computational power of winner-take-all
- Maass
- 2000
(Show Context)
Citation Context ...rmine class membership. Specifically, an example belongs to the class which assigns it the highest value (i.e., the “winner”) among all classes. While it is known that WTA is an expressive classifier =-=[Maa00]-=-, it has limited expressivity when trained using the OvA assumption since OvA assumes that each class can be easily separated from the rest. In addition, little is known about the generalization prope... |

17 | Characterizations of learnability for classes of {0...n}-valued functions
- Ben-David, Cesa-Bianchi, et al.
- 1995
(Show Context)
Citation Context ...ld gates to learn every hypothesis it is capable of representing. 4 Generalization Bounds A PAC-style analysis of multiclass functions that uses an extended notion of VC-dimension for multiclass case =-=[BCHL95]-=- provides poor bounds on generalization for WTA, and the current best bounds rely on a generalized notion of margin [ASS00]. In this section, we prove tighter bounds using the new framework. We seek g... |

1 |
Characterizations of learnability for classes of ������������� - valued functions
- Ben-David, Cesa-Bianchi, et al.
- 1995
(Show Context)
Citation Context ...is sampled from a random linear sorting function (see Section 5). 4 Generalization Bounds A PAC-style analysis of multiclass functions that uses an extended notion of VC-dimension for multiclass case =-=[BCHL95]-=- provides poor bounds on generalization for WTA, and the current best bounds rely on a generalized notion of margin [ASS00]. In this section, we prove tighter bounds using the new framework. We seek g... |