Results 1 - 10
of
18
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Self Organization of a Massive Document Collection
- IEEE Transactions on Neural Networks
"... This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the Self-Organizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The m ..."
Abstract
-
Cited by 183 (14 self)
- Add to MetaCart
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the Self-Organizing Map (SOM) algorithm. As the feature vectors for the documents we use statistical representations of their vocabularies. The main goal in our work has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms. Keywords Data mining, exploratory data analysis, knowledge discovery, large databases, parallel implementation, random projection, Self-Organizing Map (SOM), textual documents. I. Introduction A. From simple searches to browsing of self-organized data collections Locating documents on the basis of keywords and simple search expressions is a c...
Statistical model of lossy links in wireless sensor networks
- In IPSN
, 2005
"... Abstract—Recently, several landmark wireless sensor network deployment studies clearly demonstrated a large discrepancy between experimentally observed communication properties and properties produced by widely used simulation models. Our first goal is to provide sound foundations for conclusions dr ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
Abstract—Recently, several landmark wireless sensor network deployment studies clearly demonstrated a large discrepancy between experimentally observed communication properties and properties produced by widely used simulation models. Our first goal is to provide sound foundations for conclusions drawn from these studies by extracting the relationship between pairs of location (e.g distance) and communication properties (e.g. reception rate) using non-parametric statistical techniques and by calculating intervals of confidence for all claims. The objective is to determine not only the most likely value of one feature for an alternate given feature value, but also to establish a complete characterization of the relationship by providing a probability density function (PDF). The PDF provides the likelihood that any particular value of one feature is associated with a given value of another feature. Furthermore, we study not only individual link properties, but also their correlation with respect to common transmitters and receivers and their geometrical location. The second objective is to develop a series of wireless network simulation environments that produce networks of an arbitrary size and under arbitrary deployment rules with realistic communication properties. For this task we use an iterative improvement-based optimization procedure to generate instances of the network that are statistically similar to empirically observed networks. We evaluate the accuracy of the conclusions drawn using the proposed model and therefore comprehensiveness of the considered properties on a set of standard communication tasks, such as connectivity maintenance and routing. Index terms: sensor networks, wireless channel modeling, simulations, network measurements, experimentation with real networks/testbeds, statistics. I.
A toolkit for interactive sonification
- IEEE Multimedia
, 2004
"... This paper argues for a special focus on the use of dynamic human interaction to explore datasets while they are being transformed into sound. We describe why this is a special case of both human computer interaction (HCI) techniques and sonification methods. Humans are adapted for interacting with ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
This paper argues for a special focus on the use of dynamic human interaction to explore datasets while they are being transformed into sound. We describe why this is a special case of both human computer interaction (HCI) techniques and sonification methods. Humans are adapted for interacting with their physical environment and making continuous use of all their senses. When this exploratory interaction is applied to a dataset (by continuously controlling its transformation into sound) new insights are gained into the data’s macro and micro-structure, which are not obvious in a visual rendering. This paper reviews the importance of interaction in sonification, describes how a certain quality of interaction is required, provides examples of the techniques being applied interactively, and outlines a plan of future work to develop interaction techniques to aid sonification. 1.
An Overview of Median and Stack Filtering
- Circuits, Systems, and Signal Processing, Special issue on Median and Morphological Filtering
, 1992
"... Abstract. Within the last two decades a small group of researchers has built a useful, nontrivial theory of nonlinear signal processing around the median-related filters known as rank-order filters, order-statistic filters, weighted median filters, and stack filters. This required significant effort ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. Within the last two decades a small group of researchers has built a useful, nontrivial theory of nonlinear signal processing around the median-related filters known as rank-order filters, order-statistic filters, weighted median filters, and stack filters. This required significant effort to overcome the bias, both in education and research, toward linear theory, which has been dominant since the days of Fourier, Laplace, and "Convolute." We trace the development of this theory of nonlinear filtering from its beginnings in the study of noise-removal properties and structural behavior of the median filter to the recently developed theory of optimal stack filtering. The theory of stack filtering provides a point of view which unifies many different filter classes, including morphological filters, so it is discussed in detail. Of particular importance is the way this theory has brought together, in a single analytical framework, both the estimation-based and the structural-based approaches to the design of these filters. Some recent applications of median and stack filters are provided to demonstrate the effectiveness of this approach to nonlinear filtering. They include: the design of an optimal stack filter for image restoration; the use of vector median filters to attenuate impulsive noise in color images and to eliminate cross luminance and cross color in TV images; and the use of median-based filters for image sequence coding, reconstruction, and scan rate conversion in normal TV and HDTV systems. 1.
Discovering Coherent Structures in Nonlinear Spatial Systems
- BRANDT A., RAMBERG S., SHLESINGERM., Eds., Nonlinear Dynamics of Ocean Waves
, 1992
"... A synthesis of elementary computation and dynamical system theories leads to a constructive approach to discovering coherent structures in spatial systems and to quantifying a pattern's complexity. The basic technique reviewed here builds probabilistic automata from temporal and spatial data series ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A synthesis of elementary computation and dynamical system theories leads to a constructive approach to discovering coherent structures in spatial systems and to quantifying a pattern's complexity. The basic technique reviewed here builds probabilistic automata from temporal and spatial data series generated by a simple nonlinear spatial system. In this way, a given pattern's unpredictability and structure are measured by the entropy rate and complexity, respectively, of the "machine" reconstructed from the pattern data. Ancillary remarks indicate how the analysis gives a global view of the high-dimensional state space structures associated with spatial systems and, in particular, the geometry of coherent structure interactions. The bulk of the review, though, emphasizes practical results on inferring coherent space-time structures and on building detectors to track particle-like objects. also in Complexity in Physics and Technology, R. Vilela-Mendes, editor, World Scientific, Singapo...
On the Distribution of Performance from Multiple Neural-Network Trials
, 1997
"... The performance of neural-network simulations is often reported in terms of the mean and standard deviation of a number of simulations performed with different starting conditions. However, in many cases, the distribution of the individual results does not approximate a Gaussian distribution, may no ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The performance of neural-network simulations is often reported in terms of the mean and standard deviation of a number of simulations performed with different starting conditions. However, in many cases, the distribution of the individual results does not approximate a Gaussian distribution, may not be symmetric, and may be multimodal. We present the distribution of results for practical problems and show that assuming Gaussian distributions can significantly affect the interpretation of results, especially those of comparison studies. For a controlled task which we consider, we find that the distribution of performance is skewed toward better performance for smoother target functions and skewed toward worse performance for more complex target functions. We propose new guidelines for reporting performance which provide more information about the actual distribution.
Crystallization sonification of high-dimensional datasets
- Nucleic Acids Research
, 2002
"... This paper introduces Crystallization Sonification, a sonification model for exploratory analysis of high-dimensional datasets. The model is designed to provide information about the intrinsic data dimensionality (which is a local feature) and the global data dimensionality, as well as the transitio ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper introduces Crystallization Sonification, a sonification model for exploratory analysis of high-dimensional datasets. The model is designed to provide information about the intrinsic data dimensionality (which is a local feature) and the global data dimensionality, as well as the transitions between a local and global view on a dataset. Furthermore the sound allows to display the clustering in high-dimensional datasets. The model defines a crystal growth process in the high-dimensional data-space which starts at a user selected “condensation nucleus ” and incrementally includes neighboring data according to some growth criterion. The sound summarizes the temporal evolution of this crystal growth process. For introducing the model, a simple growth law is used. Other growth laws which are used in the context of hierarchical clustering are also suited and their application in crystallization sonification offers new ways to inspect the results of data clustering as an alternative to dendrogram plots. In this paper, the sonification model is described and example sonifications are presented for some synthetic high-dimensional datasets. 1.
EDGE-PRESERVING PREFILTERING FOR DOCUMENT IMAGE BINARIZATION
"... We propose a novel coplanar filter, which exploits the coplanarity of gray-level distribution of neighboring pixels, to pre-filter the document images. Experiments show that the proposed filter exhibits the following desired properties for document image binarization: (1) impulsive noise removal, (2 ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We propose a novel coplanar filter, which exploits the coplanarity of gray-level distribution of neighboring pixels, to pre-filter the document images. Experiments show that the proposed filter exhibits the following desired properties for document image binarization: (1) impulsive noise removal, (2) piecewise smoothing, and (3) sharp edge preservation. 1.
Lessons about likelihood functions from nuclear physics
"... Abstract. Least-squares data analysis is based on the assumption that the normal (Gaussian) distribution appropriately characterizes the likelihood, that is, the conditional probability of each measurement d, given a measured quantity y, p(d |y). On the other hand, there is ample evidence in nuclear ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Least-squares data analysis is based on the assumption that the normal (Gaussian) distribution appropriately characterizes the likelihood, that is, the conditional probability of each measurement d, given a measured quantity y, p(d |y). On the other hand, there is ample evidence in nuclear physics of significant disagreements among measurements, which are inconsistent with the normal distribution, given their stated uncertainties. In this study the histories of 99 measurements of the lifetimes of five elementary particles are examined to determine what can be inferred about the distribution of their values relative to their stated uncertainties. Taken as a whole, the variations in the data are somewhat larger than their quoted uncertainties would indicate. These data strongly support using a Student t distribution for the likelihood function instead of a normal. The most probable value for the order of the t distribution is 2.6 ± 0.9. It is shown that analyses based on long-tailed t-distribution likelihoods gracefully cope with outlying data.

