MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Identifying Hierarchical Structure in Sequences: A linear-time algorithm (1997)

by A Linear-time Algorithm ,  Craig G. Nevill-Manning ,  Ian H. Witten
Add To MetaCart

Abstract:

SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences. 1. Introduction Many sequences of discrete symbols exhibit natural hierarchical structure. Text is made up of paragraphs, sentences, phrases, and words. Music is composed from major sections, motifs, bars, and notes. Records of ...

Citations

663 Language identification in the limit – Gold - 1967
567 An Introduction to hidden Markov models – Rabiner, Juang - 1986
536 Text Compression – Bell, Cleary, et al. - 1990
122 Inference of reversible languages – Angluin - 1982
86 Inducing probabilistic grammars by Bayesian model merging – Stolcke, Omohundro - 1994
34 Inferring Sequential Structure – Nevill-Manning
33 Attention and structure in sequence learning – Cohen, Ivry, et al. - 1990
31 A version space approach to learning contextfree grammars – VanLehn, Ball - 1987
30 Learning syntax by automata induction – Berwick, Pilato - 1987
30 Discrete Sequence Prediction and its Applications – Laird - 1992
26 Manual of Information to Accompany the Lancaster/Oslo-Bergen Corpus of British English, for Use with Digital Computers – JOHANSSON, LEECH, et al. - 1978
23 Browsing in digital libraries: A phrase-based approach – Nevill-Manning, Witten, et al. - 1997
22 Grammatical Inference by HillClimbing – Cook, Rosenfeld, et al. - 1976
18 Language acquisition and the discovery of phrase structure – Wolff - 1980
17 Behaviour/structure transformations under uncertainty – Gaines - 1976
16 The discovery of segments in natural language – Wolff - 1977
15 An algorithm for the segmentation of an artificial language analogue – Wolff
12 Simplicity and Representation Change in Grammar Induction (Unpublished Manuscript). Palo Alto, CA: Institute for the Study of Learning and Expertise – Langley - 1995
10 Grammar enumeration and inference – Wharton - 1977
9 The art of computer programming 1: fundamental algorithms – Knuth - 1968
5 Thinking With The Teachable Machine – Andreae - 1977