## CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning

Citations: | 2 - 0 self |

### BibTeX

@MISC{Esposito_carpediem:optimizing,

author = {Roberto Esposito and Daniele P. Radicioni and Michael Collins},

title = {CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning},

year = {}

}

### OpenURL

### Abstract

The growth of information available to learning systems and the increasing complexity of learning tasks determine the need for devising algorithms that scale well with respect to all learning parameters. In the context of supervised sequential learning, the Viterbi algorithm plays a fundamental role, by allowing the evaluation of the best (most probable) sequence of labels with a time complexity linear in the number of time events, and quadratic in the number of labels. In this paper we propose CarpeDiem, a novel algorithm allowing the evaluation of the best possible sequence of labels with a sub-quadratic time complexity. 1 We provide theoretical grounding together with solid empirical results supporting two chief facts. CarpeDiem always finds the optimal solution requiring, in most cases, only a small fraction of the time taken by the Viterbi algorithm; meantime, CarpeDiem is never asymptotically worse than the Viterbi algorithm, thus confirming it as a sound replacement.

### Citations

8530 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ...by Viterbi is then Θ(K 2 T). The standard formulation of the Viterbi algorithm would also store the optimal path information as it becomes available. Since this can be done using standard techniques (=-=Cormen et al., 1990-=-, page 520) without affecting the complexity of the algorithm, we do not explicitly report that in the pseudo-code. Let us now consider how the above definitions instantiate in the context of a learni... |

4273 | A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition
- Rabiner
- 1989
(Show Context)
Citation Context ...e classification time can grow prohibitively high. Some recent works propose techniques that under precise assumptions allow faster execution time of classifiers based on hidden Markov models (HMMs) (=-=Rabiner, 1989-=-). One feature shared by these approaches is the assumption that the transition matrix has a specific form allowing one to rule out most transitions. Such approaches are highly valuable when the probl... |

2309 | Conditional random fields: probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...fferent techniques. Among others, we recall Sliding Windows (Dietterich, 2002), hidden Markov models (Rabiner, 1989), Maximum Entropy Markov Models (McCallum et al., 2000), Conditional Random Fields (=-=Lafferty et al., 2001-=-), Dynamic Conditional Random Fields (Sutton et al., 2007), and the voted perceptron algorithm (Collins, 2002). The voted perceptron uses the Viterbi algorithm at both learning and classification time... |

1164 |
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
- Viterbi
- 1967
(Show Context)
Citation Context ...ible combinations of labels are to be considered by SSL classifiers. Most systems deal with such complexity by assuming that relations may span only over nearby objects and use the Viterbi algorithm (=-=Viterbi, 1967-=-) to find the globally optimal sequence of labels in Θ(T K 2 ) time. In the last few years it has become increasingly important for supervised sequential learning algorithms to handle problems with la... |

975 |
A Formal Basis for the Heuristic Determination of Minimum Cost Paths
- Hart, Nilsson, et al.
- 1968
(Show Context)
Citation Context ...i beam search approach. 1870CA R P EDI E M: OPTIMIZING THE VITERBI ALGORITHM 6.6.1 RELATIONS WITH A ∗ It could be argued that the CarpeDiem algorithm looks interestingly similar to the A∗ algorithm (=-=Hart et al., 1968-=-). In order to investigate this similarity let us consider the following heuristic, based on the same ideas underlying CarpeDiem: h(yt) = ∑ t ′ >t [ max y t ′ ( S 0 ) y + S t ′ 1∗ ] = (T −t)S 1∗ [ + ∑... |

488 | Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. The ACL02 conference on Empirical methods in natural language processingVolume 10
- Collins
- 2002
(Show Context)
Citation Context ... Maximum Entropy Markov Models (McCallum et al., 2000), Conditional Random Fields (Lafferty et al., 2001), Dynamic Conditional Random Fields (Sutton et al., 2007), and the voted perceptron algorithm (=-=Collins, 2002-=-). The voted perceptron uses the Viterbi algorithm at both learning and classification time. It is then particularly appropriate for the application of our technique. Moreover, it relies on the boolea... |

439 | Maximum entropy markov models for information extraction and segmentation
- McCallum, Freitag, et al.
- 2000
(Show Context)
Citation Context ...oes not. Moreover, they assume the transition matrix to be known beforehand and fixed over time. While this is a natural assumption in HMMs, recent algorithms based on the boolean features framework (=-=McCallum et al., 2000-=-) allow for more general settings where the transition matrix is itself a function of the observations around the object to be labelled. In such cases it is hard to figure out how the aforementioned a... |

158 |
The cognition of basic musical structures
- Temperley
- 2001
(Show Context)
Citation Context ...lem. In these last two cases, we have set S 1∗ using Formula 7. 6.1 Tonal Harmony Analysis Given a musical flow, the task of music harmony analysis consists in associating a label to each time point (=-=Temperley, 2001-=-; Pardo and Birmingham, 2002). Such labels reveal the underlying harmony by indicating a fundamental note (root) and a mode, using chord names such as ‘C minor’. 1865ESPOSITO AND RADICIONI Music anal... |

140 | Incremental parsing with the perceptron algorithm - Collins, Roark - 2004 |

83 | Machine learning for sequential data: A review
- Dietterich
- 2002
(Show Context)
Citation Context ...e in particular, in a system based on the voted perceptron algorithm. 5. Grounding the Voted Perceptron Algorithm on CarpeDiem The supervised sequential learning problem can be formulated as follows (=-=Dietterich, 2002-=-). Let {(⃗xi,⃗yi)} N i=1 be a set of N training examples. Each example is a pair of sequences (⃗xi,⃗yi), where ⃗xi = 〈xi,1,xi,2,...,xi,Ti 〉 and ⃗yi = 〈yi,1,yi,2,...,yi,Ti 〉. The goal is to construct a... |

78 |
Letter Recognition Using Holland-Style Adaptive Classifiers
- Frey, Slate
- 1991
(Show Context)
Citation Context ...tly asked questions (FAQs) segmentation problem (McCallum et al., 2000), and a text recognition problem built starting from the “letter recognition” data set from the UCI machine learning repository (=-=Frey and Slate, 1991-=-). The running time of an execution of CarpeDiem depends on how the weights of vertical and horizontal features compare: the more discriminative are vertical features with respect to horizontal featur... |

73 |
A heuristic discussion of probabilistic decoding
- Fano
(Show Context)
Citation Context ...) there exist ad hoc solutions that allow one to tame the complexity of the Viterbi algorithm by means of hardware implementations (Austin et al., 1990) or methods for approximating the optimum path (=-=Fano, 1963-=-). For instance, in the research field of speech recognition, the Viterbi algorithm is routinely applied to huge problems. This is a typical case where approximate solutions really pay off: suboptimal... |

57 |
Improvements in beam search for 10000-word continuous speechrecognition
- Ney, Haeb-Umbach, et al.
- 1992
(Show Context)
Citation Context ...most promising solutions are retained at each step. Many improvements over this basic strategy have been proposed to refine either the computational performance or the accuracy of the solution (e.g., =-=Ney et al., 1992-=-). In most cases domain-based knowledge (such as language constraints) is used to restrict the search efforts to some relevant regions of the search space (Ney et al., 1987). Also, in recent years, se... |

28 | Algorithms for chordal analysis - Pardo, Birmingham - 2002 |

26 | Fast algorithms for large-state-space hmms with applications to web usage analysis, in
- Felzenszwalb, Huttenlocher, et al.
- 2003
(Show Context)
Citation Context ... al., 2008). Unfortunately, even the drastic reduction in complexity achieved by the Viterbi algorithm may be not sufficient in such domains. For instance, this is the case of web-logs related tasks (=-=Felzenszwalb et al., 2003-=-), music analysis (Radicioni and Esposito, 2007), and activity monitoring through body sensors (Siddiqi and Moore, 2005), where the number of possible labels is so large that the classification time c... |

20 |
An algorithm for connected word recognition
- BRIDLE, BROWN, et al.
- 1982
(Show Context)
Citation Context ...d be tolerated (to some extent) and tight time constraints prevent exhaustive search. A popular approach in this field is the Viterbi beam search (VBS) (Lowerre and Reddy, 1980; Spohrer et al., 1980; =-=Bridle et al., 1982-=-): essentially, VBS performs a breadth-first suboptimal search in which only the most promising solutions are retained at each step. Many improvements over this basic strategy have been proposed to re... |

13 | Fast inference and learning in large-state-space hmms
- Siddiqi, Moore
- 2005
(Show Context)
Citation Context ...ent in such domains. For instance, this is the case of web-logs related tasks (Felzenszwalb et al., 2003), music analysis (Radicioni and Esposito, 2007), and activity monitoring through body sensors (=-=Siddiqi and Moore, 2005-=-), where the number of possible labels is so large that the classification time can grow prohibitively high. Some recent works propose techniques that under precise assumptions allow faster execution ... |

11 | Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions - Lifshits, Mozes, et al. - 2009 |

10 | Structured machine learning: the next ten years - Dietterich, Domingos, et al. - 2008 |

7 | Towards a Real-Time Spoken Language System Using Commercial Hardware
- Austin, Peterson, et al.
- 1990
(Show Context)
Citation Context ...ntial learning techniques. In other fields (e.g., telecommunications) there exist ad hoc solutions that allow one to tame the complexity of the Viterbi algorithm by means of hardware implementations (=-=Austin et al., 1990-=-) or methods for approximating the optimum path (Fano, 1963). For instance, in the research field of speech recognition, the Viterbi algorithm is routinely applied to huge problems. This is a typical ... |

6 | Khashayar Rohanimanesh. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data - Sutton, McCallum - 2007 |

4 |
Annedore Paeseler, “Data driven search organization for continuous speech recognition
- Ney, Mergel, et al.
- 1992
(Show Context)
Citation Context ...racy of the solution (e.g., Ney et al., 1992). In most cases domain-based knowledge (such as language constraints) is used to restrict the search efforts to some relevant regions of the search space (=-=Ney et al., 1987-=-). Also, in recent years, several algorithms have been proposed that overcome the difficulties inherent in heuristic ranking strategies by learning ranking functions specifically optimized for the pro... |

3 |
Partial traceback in continuous speech recognition
- Spohrer, Brown, et al.
- 1980
(Show Context)
Citation Context ... suboptimal paths could be tolerated (to some extent) and tight time constraints prevent exhaustive search. A popular approach in this field is the Viterbi beam search (VBS) (Lowerre and Reddy, 1980; =-=Spohrer et al., 1980-=-; Bridle et al., 1982): essentially, VBS performs a breadth-first suboptimal search in which only the most promising solutions are retained at each step. Many improvements over this basic strategy hav... |

2 | R.: Tonal Harmony Analysis: a Supervised Sequential Learning Approach
- Radicioni, Esposito
- 2007
(Show Context)
Citation Context ... reduction in complexity achieved by the Viterbi algorithm may be not sufficient in such domains. For instance, this is the case of web-logs related tasks (Felzenszwalb et al., 2003), music analysis (=-=Radicioni and Esposito, 2007-=-), and activity monitoring through body sensors (Siddiqi and Moore, 2005), where the number of possible labels is so large that the classification time can grow prohibitively high. Some recent works p... |

1 |
Trends in Speech Recognition, chapter The Harpy Speech Understanding System
- Lowerre, Reddy
- 1980
(Show Context)
Citation Context ...solutions really pay off: suboptimal paths could be tolerated (to some extent) and tight time constraints prevent exhaustive search. A popular approach in this field is the Viterbi beam search (VBS) (=-=Lowerre and Reddy, 1980-=-; Spohrer et al., 1980; Bridle et al., 1982): essentially, VBS performs a breadth-first suboptimal search in which only the most promising solutions are retained at each step. Many improvements over t... |