Results 1 
6 of
6
A Generalized Suffix Tree and Its (Un)Expected Asymptotic Behaviors
 SIAM J. Computing
, 1996
"... Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions and codes. Despite this, very little is known about their typical behaviors. In a probabilistic framework, we consider a family of suffix trees  further calle ..."
Abstract

Cited by 55 (29 self)
 Add to MetaCart
Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions and codes. Despite this, very little is known about their typical behaviors. In a probabilistic framework, we consider a family of suffix trees  further called bsuffix trees  built from the first n suffixes of a random word. In this family a noncompact suffix tree (i.e., such that every edge is labeled by a single symbol) is represented by b = 1, and a compact suffix tree (i.e., without unary nodes) is asymptotically equivalent to b ! 1 as n ! 1. We study several parameters of bsuffix trees, namely: the depth of a given suffix, the depth of insertion, the height and the shortest feasible path. Some new results concerning typical (i.e., almost sure) behaviors of these parameters are established. These findings are used to obtain several insights into certain algorithms on words, molecular biology and universal data compression schemes. Key Wo...
A New Efficient Radix Sort
, 1994
"... We present new improved algorithms for the sorting problem. The algorithms are not only efficient but also clear and simple. First, we introduce Forward Radix Sort which combines the advantages of traditional lefttoright and righttoleft radix sort in a simple manner. We argue that this algorithm ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
We present new improved algorithms for the sorting problem. The algorithms are not only efficient but also clear and simple. First, we introduce Forward Radix Sort which combines the advantages of traditional lefttoright and righttoleft radix sort in a simple manner. We argue that this algorithm will work very well in practice. Adding a preprocessing step, we obtain an algorithm with attractive theoretical properties. For example, n binary strings can be sorted in \Theta i n log i B n log n + 2 jj time, where B is the minimum number of bits that have to be inspected to distinguish the strings. This is an improvement over the previously best known result by Paige and Tarjan. The complexity may also be expressed in terms of H, the entropy of the input: n strings from a stationary ergodic process can be sorted in \Theta \Gamma n log \Gamma 1 H + 1 \Delta\Delta time, an improvement over the result recently presented by Chen and Reif.
Structure from statistics  unsupervised activity analysis using suffix trees
 In IEEE ICCV
, 2007
"... Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suf ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent eventsubsequences over multiple temporal scales. We empirically compare Suffix Trees with some of the previous approaches in terms of feature cardinality, discriminative prowess, noise sensitivity and activityclass discovery. Finally, exploiting properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose an algorithm to detect them in lineartime. We present comparative results over experimental data, collected from a kitchen environment to demonstrate the competence of our proposed framework. 1. Introduction & Previous
Using Difficulty of Prediction to Decrease Computation: Fast Sort, Priority Queue and Convex Hull on Entropy Bounded Inputs
"... There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see [Vitter,KrishnanSl], [Karlin,Philips,Raghavan92], [Raghavan9 for use of Markov models for online algorithms, e.g., cashi ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see [Vitter,KrishnanSl], [Karlin,Philips,Raghavan92], [Raghavan9 for use of Markov models for online algorithms, e.g., cashing and prefetching). Their results used the fact that compressible sources are predictable (and vise versa), and showed that online algorithms can improve their performance by prediction. Actual page access sequences are in fact somewhat compressible, so their predictive methods can be of benefit. This paper investigates the interesting idea of decreasing computation by using learning in the opposite way, namely to determine the difficulty of prediction. That is, we will ap proximately learn the input distribution, and then improve the performance of the computation when the input is not too predictable, rather than the reverse. To our knowledge,
Optimal Lossless Compression of a Class of Dynamic Sources
 Proc Data Compression Conference, edited by J.A. Storer and J.H. Reif. IEEE Computer Society Press, Los Alamitos, CA
, 1997
"... . The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with nonstationary probability distributions occur in many practical situations where the data source is constructed by a composition of distinct sources, for example, a document ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
. The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with nonstationary probability distributions occur in many practical situations where the data source is constructed by a composition of distinct sources, for example, a document with multiple authors, a multimedia document, or the composition of distinct packets sent over a communication channel. There is a vast literature of adaptive methods used to tailor the compression to dynamic sources. However, little is known about optimal or near optimal methods for lossless compression of strings generated by sources that are not stationary ergodic. Here we do not assume the source is stationary. Instead we assume that the source produces an infinite sequence of concatenated finite strings s 1 ; s 2 ; : : : where (i) each finite string s i is generated by a sampling of a (possibly distinct) stationary ergodic source S i , and (ii) the length of each of the s i is lower b...
Unsupervised Analysis of Everyday Human Activities Using Suffix Trees
, 2008
"... Traditional approaches for activity modeling assume prior knowledge about the structure of activities, based on which explicitly defined models are learned in a supervised manner. However, such activity structure is generally not completely known a priori. It is therefore imperative to find represe ..."
Abstract
 Add to MetaCart
Traditional approaches for activity modeling assume prior knowledge about the structure of activities, based on which explicitly defined models are learned in a supervised manner. However, such activity structure is generally not completely known a priori. It is therefore imperative to find representations that facilitate learning of this structure with minimal supervision. Recent representational approaches to this end are limited by their computational complexity, and their ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent eventsubsequences over multiple temporal scales. We prove how the featurespace induced by Suffix Trees is representationally superior to some of the previous approaches, and compare such approaches with Suffix Trees in terms of their discriminative power, noise sensitivity and unsupervised activityclass discovery. Moreover, exploiting certain properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose a lineartime algorithm for their automatic detection. We present comparative results using an extensive dataset of activities for cooking different recipes collected from multiple subjects in a household kitchen environment to demonstrate the competence of our framework.