## Lattice Kernels for Spoken Dialog Classification (2003)

Venue: | In Proceedings ICASSP'03, Hong Kong |

Citations: | 6 - 1 self |

### BibTeX

@INPROCEEDINGS{Cortes03latticekernels,

author = {Corinna Cortes and Patrick Haffner and Mehryar Mohri},

title = {Lattice Kernels for Spoken Dialog Classification},

booktitle = {In Proceedings ICASSP'03, Hong Kong},

year = {2003},

pages = {628--631}

}

### OpenURL

### Abstract

Classification is a key task in spoken-dialog systems. The response of a spoken-dialog system is often guided by the category assigned to the speaker’s utterance. Unfortunately, classifiers based on the one-best transcription of the speech utterances are not satisfactory because of the high word error rate of conversational speech recognition systems. Since the correct transcription may not be the highest ranking one but often will be represented in the word lattices output by the recognizer, the classification accuracy can be much higher if the full lattice is exploited both during training and classification. In this paper we present the first principled approach for classification based on full lattices. For this purpose, we use the Support Vector Machine (SVM) framework with kernels for lattices. The lattice kernel we define belongs to the general class of rational kernels. We give efficient algorithms for computing kernels for arbitrary lattices and report experiments using the algorithm in a difficult call-classification task with ¢¤ £ categories. Our experiments with a trigram lattice kernel show a ¥§¦© ¨ reduction in error rate at a ¢©�© ¨ rejection level. 1.

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...rnels was introduced [3]. These kernels apply to lattices or arbitrary weighted automata. Kernel methods [15] are widely used in statistical learning techniques such as Support Vector Machines (SVMs) =-=[2, 4, 16]-=- due to their computational efficiency in high-dimensional feature spaces. Combining rational kernels for lattices with SVMs constitutes the first principled technique for efficient classification alg... |

2171 | Support-vector networks - Cortes, Vapnik - 1995 |

2028 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...y, a general family of kernels based on weighted transducers or rational relations, rational kernels was introduced [3]. These kernels apply to lattices or arbitrary weighted automata. Kernel methods =-=[15]-=- are widely used in statistical learning techniques such as Support Vector Machines (SVMs) [2, 4, 16] due to their computational efficiency in high-dimensional feature spaces. Combining rational kerne... |

1291 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
(Show Context)
Citation Context ...rnels was introduced [3]. These kernels apply to lattices or arbitrary weighted automata. Kernel methods [15] are widely used in statistical learning techniques such as Support Vector Machines (SVMs) =-=[2, 4, 16]-=- due to their computational efficiency in high-dimensional feature spaces. Combining rational kernels for lattices with SVMs constitutes the first principled technique for efficient classification alg... |

532 |
Efficient string matching: An AID to bibliographic search
- Aho, Corasick
- 1975
(Show Context)
Citation Context ...ithm for ����� � ��� � � computings���§ ������§��§ �� � � is . � The use of the tree can be made more efficient by using the notion of failure function as in many efficient string-matching algorithms =-=[1, 9]-=-. Using failure functions, the computation of the kernel can be dones���§�� �������� �� � � � in for lattices reduced to just one path. To node� each of that tree, we associate its failure � ¢��¤ node... |

303 | Finite-State Transducers in Language and Speech Processing
- Mohri
- 1997
(Show Context)
Citation Context ...s to the HMIHY system. � and� � which are typically larger than any of the support lattices� � . But,� � and� � can be optimized off-line using deter� minization and minimization of weighted automata =-=[8]-=-. In practice, this can significantly speed up classification when the kernel� linear is used directly. Linear lattice classifiers also offer an efficient alternative to a common approach taken in spo... |

124 | Speech Recognition by Composition of Weighted Finite Automata
- Pereira, Riley
- 1997
(Show Context)
Citation Context ...n Algorithms The definition of the kernel� suggests a simple algorithm for its computation based on general weighted automata and graph algorithms: composition of weighted transducers to compute� ��� =-=[11, 13]-=-, and a general shortest-distance algorithm to compute the � � ��� -sum of the weights of all the paths of this machine [10]. The general utilities provided by the FSM library can be used to compute��... |

80 | Weighted automata in text and speech processing
- Mohri, Pereira, et al.
- 1996
(Show Context)
Citation Context ...els. Then, any§ for -gram � ,¢¢� � ��� ��� ����¤¤�� ����� �����¤�����©��� � ������� sequence , and ¢¢� � ��� � � similarly �¤¤�� ����� � ���©�����©��� � ������� . Thus, by definition of � composition =-=[11]-=-: � � � � ��� � ����� �����©����� � � ���� �¤�����§�������¤��� � ������� �����¤�����������§��� � ���� � � � � � ��� 2/0 � ���©�����©������������� � � �¤�����©��� � ������� ¢¢� � ����� � ����¤¤�� �����... |

73 | Semiring Frameworks and Algorithms for Shortest-Distance Problems
- Mohri
(Show Context)
Citation Context ...d graph algorithms: composition of weighted transducers to compute� ��� [11, 13], and a general shortest-distance algorithm to compute the � � ��� -sum of the weights of all the paths of this machine =-=[10]-=-. The general utilities provided by the FSM library can be used to compute�������§��� � � for lattices��� and� � [12]. Note that the size of the transducer � is ins���§ � � � � . In prac��� � ��� � � ... |

54 | Incorporating prior knowledge into boosting
- Schapire, Rochery, et al.
- 2002
(Show Context)
Citation Context ...the result of the use of the trigram kernel enhanced a� with -degree polynomial. In comparison to Boostexter, that otherwise has demonstrated superior performance on similar tasks to the HMIHY system =-=[14]-=-, the lattice kernel brings a significant reduction of the one-error rate from ¥¡ ��s©¨ to ¥§¢���¥�¨ at a rejection level of ¢©�©¨ , a typical operation point for the task. The plot also demonstrates ... |

39 | Optimizing svms for complex call classification
- Haffner, Tur, et al.
- 2003
(Show Context)
Citation Context ...ore than ¥§¦©¨ at the ¢©��¨ rejection level. This rejection level is the standard operation point for the task. Note that this task is different from the easier ¥�� -class task described in the paper =-=[6]-=-. 2. PRELIMINARIES In this section, we present the definition and notation necessary to introduce lattice kernels. A weighted automaton is a finite automaton in which each transition carries some weig... |

28 | Rational kernels
- Cortes, Haffner, et al.
- 2003
(Show Context)
Citation Context ...d these weights must be used to guide appropriately the classification task. Recently, a general family of kernels based on weighted transducers or rational relations, rational kernels was introduced =-=[3]-=-. These kernels apply to lattices or arbitrary weighted automata. Kernel methods [15] are widely used in statistical learning techniques such as Support Vector Machines (SVMs) [2, 4, 16] due to their ... |

27 |
Automated natural spoken dialog
- Gorin, Abella, et al.
- 2002
(Show Context)
Citation Context ...anscription, is 91.7%. Within the SLU component, the objective is to classify the input telephone call into one of 38 classes (call-types and named entities), such as Billing Credit, or Calling Plans =-=[5]-=-. Each utterance may be assigned to several classes and it is considered to be an error if the highest scoring class is not one of these labels. In our experiments, we used 7,449 utterances as our tra... |

8 | String-matching with automata
- Mohri
(Show Context)
Citation Context ...ithm for ����� � ��� � � computings���§ ������§��§ �� � � is . � The use of the tree can be made more efficient by using the notion of failure function as in many efficient string-matching algorithms =-=[1, 9]-=-. Using failure functions, the computation of the kernel can be dones���§�� �������� �� � � � in for lattices reduced to just one path. To node� each of that tree, we associate its failure � ¢��¤ node... |