Results 1  10
of
35
An efficient, probabilistically sound algorithm for segmentation and word discovery
 MACHINE LEARNING
, 1999
"... This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstract ..."
Abstract

Cited by 146 (2 self)
 Add to MetaCart
This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, wordorder, and word frequency can be replaced in a modular fashion. The model yields a languageindependent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that our algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances.
Toward RealTime Path Planning in Changing Environments
, 2000
"... We present a new method for generating collisionfree paths for robots operating in changing environments. Our approach is closely related to recent probabilistic roadmap approaches. These planners use preprocessing and query stages, and are aimed at planning many times in the same environment. In co ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
We present a new method for generating collisionfree paths for robots operating in changing environments. Our approach is closely related to recent probabilistic roadmap approaches. These planners use preprocessing and query stages, and are aimed at planning many times in the same environment. In contrast, our preprocessing stage creates a representation of the configuration space that can be easily modified in real time to account for changes in the environment. As with previous approaches, we begin by constructing a graph that represents a roadmap in the configuration space, but we do not construct this graph for a specific workspace. Instead, we construct the graph for an obstaclefree workspace, and encode the mapping from workspace cells to nodes and arcs in the graph. When the environment changes, this mapping is used to make the appropriate modifications to the graph, and plans can be generated by searching the modified graph. After presenting the approach, we address a number of performance issues via extensive simulation results for robots with as many as twenty degrees of freedom. We evaluate memory requirements, preprocessing time, and the time to dynamically modify the graph and replan, all as a function of the number of degrees of freedom of the robot.
An Implementable Lossy Version of the LempelZiv Algorithm  Part I: Optimality. . . Optimality for Memoryless Sources
, 1998
"... A new lossy variant of the FixedDatabase LempelZiv coding algorithm for encoding at a fixed distortion level is proposed, and its asymptotic optimality and universality for memoryless sources (with respect to bounded singleletter distortion measures) is demonstrated: As the database size m increa ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
A new lossy variant of the FixedDatabase LempelZiv coding algorithm for encoding at a fixed distortion level is proposed, and its asymptotic optimality and universality for memoryless sources (with respect to bounded singleletter distortion measures) is demonstrated: As the database size m increases to infinity, the expected compression ratio approaches the ratedistortion function. The complexity and redundancy characteristics of the algorithm are comparable to those of its lossless counterpart. A heuristic argument suggests that the redundancy is of order (log log m)= log m, and this is also confirmed experimentally; simulation results are presented that agree well with this rate. Also, the complexity of the algorithm is seen to be comparable to that of the corresponding lossless scheme. We show that there is a tradeoff between compression performance and encoding complexity, and we discuss how the relevant parameters can be chosen to balance this tradeoff in practice. We also d...
An Efficient Format for Nearly ConstantTime Access to Arbitrary Time Intervals in Large Trace
, 2007
"... A powerful method to aid in understanding the performance of parallel applications uses log or trace files containing timestamped events and states (pairs of events). These trace files can be very large, often hundreds or even thousands of megabytes. Because of the cost of accessing and displaying ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
A powerful method to aid in understanding the performance of parallel applications uses log or trace files containing timestamped events and states (pairs of events). These trace files can be very large, often hundreds or even thousands of megabytes. Because of the cost of accessing and displaying such files, other methods are often used that reduce the size of the tracefiles at the cost of sacrificing detail or other information. This paper describes a hierarchical trace file format that provides for display of an arbitrary time window in a time independent of the total size of the file and roughly proportional to the number of events within the time window. This format eliminates the need to sacrifice data to achieve a smaller trace file size (since storage is inexpensive, it is necessary only to make efficient use of bandwidth to that storage). The format can be used to organize a trace file or to create a separate file of annotations that may be used with conventional trace files. We present an analysis of the time to access all of the events relevant to an interval of time and we describe experiments demonstrating the performance of this file format. 1
An informationtheoretic framework for visualization
 IEEE Transactions on Visualization and Computer Graphics
"... Abstract—In this paper, we examine whether or not information theory can be one of the theoretic frameworks for visualization. We formulate concepts and measurements for qualifying visual information. We illustrate these concepts with examples that manifest the intrinsic and implicit use of informat ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract—In this paper, we examine whether or not information theory can be one of the theoretic frameworks for visualization. We formulate concepts and measurements for qualifying visual information. We illustrate these concepts with examples that manifest the intrinsic and implicit use of information theory in many existing visualization techniques. We outline the broad correlation between visualization and the major applications of information theory, while pointing out the difference in emphasis and some technical gaps. Our study provides compelling evidence that information theory can explain a significant number of phenomena or events in visualization, while no example has been found which is fundamentally in conflict with information theory. We also notice that the emphasis of some traditional applications of information theory, such as data compression or data communication, may not always suit visualization, as the former typically focuses on the efficient throughput of a communication channel, whilst the latter focuses on the effectiveness in aiding the perceptual and cognitive process for data understanding and knowledge discovery. These findings suggest that further theoretic developments are necessary for adopting and adapting information theory for visualization. Index Terms—Information theory, theory of visualization, quantitative evaluation. 1
Lossy Compression in NearLinear Time via Efficient Random Codebooks and Databases
, 904
"... The compressioncomplexity tradeoff of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of GuptaVerdúWeissman (GVW) and their underlying connections with the patternmatching scheme of Kontoyiannis ’ lossy Lem ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The compressioncomplexity tradeoff of lossy compression algorithms that are based on a random codebook or a random database is examined. Motivated, in part, by recent results of GuptaVerdúWeissman (GVW) and their underlying connections with the patternmatching scheme of Kontoyiannis ’ lossy LempelZiv algorithm, we introduce a nonuniversal version of the lossy LempelZiv method (termed LLZ). The optimality of LLZ for memoryless sources is established, and its performance is compared to that of the GVW divideandconquer approach. Experimental results indicate that the GVW approach often yields better compression than LLZ, but at the price of much higher memory requirements. To combine the advantages of both, we introduce a hybrid algorithm (HYB) that utilizes both the divideandconquer idea of GVW and the singledatabase structure of LLZ. It is proved that HYB shares with GVW the exact same ratedistortion performance and implementation complexity, while, like LLZ, requiring less memory, by a factor which may become unbounded, depending on the choice or the relevant design parameters. Experimental results are also presented, illustrating the performance of all three methods on data generated by simple discrete memoryless sources. In particular, the HYB algorithm is shown to outperform existing schemes for the compression of some simple discrete sources with respect to the Hamming distortion criterion.
A Fast and Efficient NearlyOptimal Adaptive Fano Coding Scheme
"... Adaptive coding techniques have been increasingly used in lossless data compression. They are suitable for a wide range of applications, in which online compression is required, including communications, internet, email, and ecommerce. In this paper, we present an adaptive Fano coding method appl ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Adaptive coding techniques have been increasingly used in lossless data compression. They are suitable for a wide range of applications, in which online compression is required, including communications, internet, email, and ecommerce. In this paper, we present an adaptive Fano coding method applicable to binary and multisymbol code alphabets. We introduce the corresponding partitioning procedure that deals with consecutive partitionings, and that possesses, what we have called, the nearlyequalprobability property, i.e. that satisfy the principles of Fano coding. To determine the optimal partitioning, we propose a brute force algorithm that searches the entire space of all possible partitionings. We show that this algorithm operates in polynomialtime complexity on the size of the input alphabet, where the degree of the polynomial is given by the size of the output alphabet. As opposed to this, we also propose a greedy algorithm that quickly nds a suboptimal, but accurate, consecutive partitioning. The empirical results on reallife benchmark data les demonstrate that our scheme compresses and decompresses faster than adaptive Huffman coding, while consuming less memory resources.
Mitigating I/O latency in SSDbased Graph Traversal
, 2012
"... Mining large graphs has now become an important aspect of many applications. Recent interest in low cost graph traversal on single machines has lead to the construction of systems that use solid state drives (SSDs) to store the graph. An SSD can be accessed with far lower latency than magnetic media ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Mining large graphs has now become an important aspect of many applications. Recent interest in low cost graph traversal on single machines has lead to the construction of systems that use solid state drives (SSDs) to store the graph. An SSD can be accessed with far lower latency than magnetic media, while remaining cheaper than main memory. Unfortunately SSDs are slower than main memory and algorithms running on such systems are hampered by large IO latencies when accessing the SSD. In this paper we present two novel techniques to reduce the impact of SSD IO latency on semiexternal memory graph traversal. We introduce a variant of the Compressed Sparse Row (CSR) format that we call Compressed Enumerated Encoded Sparse Offset Row (CEESOR). CEESOR is particularly efficient for graphs with hierarchical structure and can reduce the space required to represent connectivity information by amounts varying from 5 % to as much as 76%. CEESOR allows a larger number of edges to be moved for each unit of IO transfer from the SSD to main memory and more effective use of operating system caches. Our second contribution is a runtime prefetching technique that exploits the ability of solid state drives to service multiple random access requests in parallel. We present a novel Run Along SSD Prefetcher (RASP). RASP is capable of hiding the effect of IO latency in single threaded graph traversal in breadthfirst and shorted path order to the extent that it improves iteration time for large graphs by amounts varying from 2.6X6X.
Informationbased Feature Enhancement in Scientific
"... Scientific visualization is a research area which gives insight into volumetric data acquired through measurement or simulation. The visualization allows a faster and more intuitive exploration of the data. Due to the rapid development in hardware for the measurement and simulation of scientific dat ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Scientific visualization is a research area which gives insight into volumetric data acquired through measurement or simulation. The visualization allows a faster and more intuitive exploration of the data. Due to the rapid development in hardware for the measurement and simulation of scientific data, the size and complexity of data is constantly increasing. This has the benefit that it is possible to get a more accurate insight into the measured or simulated phenomena. A drawback of the increasing data size and complexity is the problem of generating an expressive representation of the data. Since only certain parts of the data are necessary to make a decision, it is possible to mask parts of the data along the visualization pipeline to enhance only those parts which are important in the visualization. For the masking various properties are extracted from the data which are used to classify a part as important or not. In general a transfer function is used for this classification process which has to be designed by the user. In this thesis three novel approaches are presented which use methods from
High Performance Lossless Multimedia Data Compression through Improved Dictionary
"... The advent of modern electronic world has opened up various fronts in multimedia interaction. They are used in various fields for various purposes of education, entertainment, research and many more. This has led to storage and retrieval of multimedia content regularly. But due to limitations of cur ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The advent of modern electronic world has opened up various fronts in multimedia interaction. They are used in various fields for various purposes of education, entertainment, research and many more. This has led to storage and retrieval of multimedia content regularly. But due to limitations of current technology the disk space and the transmission bandwidth fall behind in the race with the requirement of multimedia content. This imposes a need to compress multimedia content so that they can be easily stored requiring lesser space and easily transferred from one point to another. Some online dictionary based compression technique can be applied to reduce the data packet size. When the repetition rate of the same symbols within the data are high the compression techniques works very well. During the process of encoding and decoding, the building of online dictionary in the primary memory ensures the single pass over the data, and the dictionary need not to be transmitted over the network. Our proposed Improved Dictionary technique scans the data bytewise, so that the chances of repetition of individual symbols are higher for text messages. Fixed length coding transmits fixed length codes for all dictionary entries. For bigger messages better optimization in terms of size reduction can be achieved through variable length coding with LZ technique, where transmitted code length corresponding to individual dictionary entries will vary according to the requirement dynamically.