Results 1  10
of
24
Information Preserving XML Schema Embedding
, 2005
"... A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, information pr ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
A fundamental concern of information integration in an XML context is the ability to embed one or more source documents in a target document so that (a) the target document conforms to a target schema and (b) the information in the source document(s) is preserved. In this paper, information preservation for XML is formally studied, and the results of this study guide the definition of a novel notion of schema embedding between two XML DTD schemas represented as graphs. Schema embedding generalizes the conventional notion of graph similarity by allowing an edge in a source DTD schema to be mapped to a path in the target DTD. Instancelevel embeddings can be defined from the schema embedding in a straightforward manner, such that conformance to a target schema and information preservation are guaranteed. We show that it is NPcomplete to find an embedding between two DTD schemas. We also provide efficient heuristic algorithms to find candidate embeddings, along with experimental results to evaluate and compare the algorithms. These yield the first systematic and effective approach to finding information preserving XML mappings.
Nearest Neighbor Retrieval Using DistanceBased Hashing
"... Abstract — A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as Locality Sensitive Hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string space ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Abstract — A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as Locality Sensitive Hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string spaces under the Hamming distance. The key novelty of the hashing technique proposed here is that it can be applied to spaces with arbitrary distance measures, including nonmetric distance measures. First, we describe a domainindependent method for constructing a family of binary hash functions. Then, we use these functions to construct multiple multibit hash tables. We show that the LSH formalism is not applicable for analyzing the behavior of these tables as index structures. We present a novel formulation, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method. Experiments on several realworld data sets demonstrate that our method produces good tradeoffs between accuracy and efficiency, and significantly outperforms VPtrees, which are a wellknown method for distancebased indexing. I.
BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval
, 2008
"... This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for any three objects X, A, B whether X is closer to A or to B. It is shown that a linear combination of such embeddingbased classifiers naturally corresponds to an embedding and a distance measure. Based on this property, the BoostMap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier. The classification accuracy of the resulting strong classifier is a direct measure of the amount of nearest neighbor structure preserved by the embedding. An important property of BoostMap is that the embedding optimization criterion is equally valid in both metric and nonmetric spaces. Performance is evaluated in databases of hand images, handwritten digits, and time series. In all cases, BoostMap significantly improves retrieval efficiency with small losses in accuracy compared to bruteforce search. Moreover, BoostMap significantly outperforms existing nearest neighbor retrieval methods such as Lipschitz embeddings, FastMap, and VPtrees.
Putting Context into Schema Matching
, 2006
"... Attributelevel schema matching has proven to be an important first step in developing mappings for data exchange, integration, restructuring and schema evolution. In this paper we investigate contextual schema matching, in which selection conditions are associated with matches by the schema matchin ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Attributelevel schema matching has proven to be an important first step in developing mappings for data exchange, integration, restructuring and schema evolution. In this paper we investigate contextual schema matching, in which selection conditions are associated with matches by the schema matching process in order to improve overall match quality. We define a general space of matching techniques, and within this framework we identify a variety of novel, concrete algorithms for contextual schema matching. Furthermore, we show how common schema mapping techniques can be generalized to take more effective advantage of contextual matches, enabling automatic construction of mappings across certain forms of schema heterogeneity. An experimental study examines a wide variety of quality and performance issues. In addition, it demonstrates that contextual schema matching is an effective and practical technique to further automate the definition of complex data transformations.
Approximate embeddingbased subsequence matching of time series
 In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
, 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTWbased subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to bruteforce search, with very small losses (< 1%) in retrieval accuracy.
Global distancebased segmentation of trajectories
 In KDD
, 2006
"... This work introduces distancebased criteria for segmentation of object trajectories. Segmentation leads to simplification of the original objects into smaller, less complex primitives that are better suited for storage and retrieval purposes. Previous work on trajectory segmentation attacked the pr ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
This work introduces distancebased criteria for segmentation of object trajectories. Segmentation leads to simplification of the original objects into smaller, less complex primitives that are better suited for storage and retrieval purposes. Previous work on trajectory segmentation attacked the problem locally, segmenting separately each trajectory of the database. Therefore, they did not directly optimize the interobject separability, which is necessary for mining operations such as searching, clustering, and classification on large databases. In this paper we analyze the trajectory segmentation problem from a global perspective, utilizing data aware distancebased optimization techniques, which optimize pairwise distance estimates hence leading to more efficient object pruning. We first derive exact solutions of the distancebased formulation. Due to the intractable complexity of the exact solution, we present an approximate, greedy solution that exploits forward searching of locally optimal solutions. Since the greedy solution also imposes a prohibitive computational cost, we also put forward more lightweight variancebased segmentation techniques, which intelligently “relax ” the pairwise distance only in the areas that affect the least the mining operations.
Unified Framework for Fast Exact and Approximate Search in Dissimilarity Spaces
, 2007
"... In multimedia systems we usually need to retrieve DB objects based on their similarity to a query object, while the similarity assessment is provided by a measure which defines a (dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure is required to ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
In multimedia systems we usually need to retrieve DB objects based on their similarity to a query object, while the similarity assessment is provided by a measure which defines a (dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure is required to be a metric, where the triangle inequality is utilized to speedup the search for relevant objects by use of metric access methods (MAMs), e.g. the Mtree. A recent research has shown, however, that nonmetric measures are more appropriate for similarity modeling due to their robustness and ease to model a madetomeasure similarity. Unfortunately, due to the lack of triangle inequality, the nonmetric measures cannot be directly utilized by MAMs. From another point of view, some sophisticated similarity measures could be available in a blackbox nonanalytic form (e.g. as an algorithm or even a hardware device), where no information about their topological properties is provided, so we have to consider them as nonmetric measures as well. From yet another point of view, the concept of similarity measuring itself is inherently imprecise and we often prefer fast but approximate retrieval over an exact but slower one. To date, the mentioned aspects of similarity retrieval have been solved separately, i.e. exact vs. approximate search or metric vs. nonmetric search. In this paper we introduce a similarity retrieval framework which incorporates both of the aspects into a single unified model. Based on the framework, we show that for any dissimilarity measure (either a metric or nonmetric) we are able to change the ”amount ” of triangle inequality, and so to obtain an approximate or full metric which can be used for MAMbased retrieval. Due to the varying ”amoun ” of triangle inequality, the measure is modified in a way suitable for either an exact but slower or an approximate but faster retrieval. Additionally, we introduce the TriGen algorithm aimed to construct the desired modification of any blackbox distance automatically, using just a small fraction of the database.
On Nonmetric Similarity Search Problems in Complex Domains
, 2010
"... The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated qualification that selects entities relevant to a query. A popular type of such a mechanism is similarity querying. For a long time, the databaseoriented applications of similarity search employed the definition of similarity restricted to metric distances. Due to its topological properties, metric similarity can be effectively used to index a database which can be then queried efficiently by socalled metric access methods. However, together with the increasing complexity of data entities across various domains, in recent years there appeared many similarities that were not metrics – we call them nonmetric similarity functions. In this paper we survey domains employing nonmetric functions for effective similarity search, and methods for efficient nonmetric similarity search. First, we show that the ongoing research in many of these domains requires complex representations of data entities. Simultaneously, such complex representations allow us to model also complex and computationally expensive similarity functions (often represented by various matching algorithms). However, the more complex similarity function one develops, the more likely it will be a nonmetric. Second, we review the stateoftheart techniques for efficient (fast) nonmetric similarity search, concerning both exact and approximate search. Finally, we discuss some open problems and possible future research trends.
A DatabaseBased Framework for Gesture Recognition
 PERSONAL AND UBIQUITOUS COMPUTING
"... Gestures are an important modality for humanmachine communication. Computer vision modules performing gesture recognition can be important components of intelligent homes, assistive environments, and humancomputer interfaces. A key problem in recognizing gestures is that the appearance of a gestu ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Gestures are an important modality for humanmachine communication. Computer vision modules performing gesture recognition can be important components of intelligent homes, assistive environments, and humancomputer interfaces. A key problem in recognizing gestures is that the appearance of a gesture can vary widely depending on variables such as the person performing the gesture, or the position and orientation of the camera. This paper presents a databasebased approach for addressing this problem. The large variability in appearance among different examples of the same gesture is addressed by creating large gesture databases, that store enough exemplars from each gesture to capture the variability within that gesture. This databasebased approach is applied to two gesture recognition problems: handshape categorization and motionbased recognition of American Sign Language (ASL) signs. A key aspect of our approach is the use of database indexing methods, in order to address the challenge of searching large databases without violating the time constraints of an online interactive system, where system response times of over a few seconds are oftentimes considered unacceptable. Our experiments demonstrate the benefits of the proposed databasebased framework, and the feasibility of integrating large gesture databases into online interacting systems.
Nearest Neighbor Search Methods for Handshape Recognition
"... Gestures are an important modality for humanmachine communication, and robust gesture recognition can be an important component of intelligent homes and assistive environments in general. An important aspect of gestures is handshape. Handshapes can hold important information about the meaning of a ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Gestures are an important modality for humanmachine communication, and robust gesture recognition can be an important component of intelligent homes and assistive environments in general. An important aspect of gestures is handshape. Handshapes can hold important information about the meaning of a gesture, for example in sign languages, or about the intent of an action, for example in manipulative gestures or in virtual reality interfaces. At the same time, recognizing handshape can be a very challenging task, because the same handshape can look very different in different images, depending on the 3D orientation of the hand and the viewpoint of the camera. In this paper we examine a database approach for handshape classification, whereby a large database of tens of thousands of images is used to represent the wide variability of handshape appearance. Efficient and accurate indexing methods are important in such a database approach, to ensure that the system can match every incoming image to the large number of database images at interactive times. In this paper we examine the use of embeddingbased and hash tablebased indexing methods for handshape recognition, and we experimentally compare these two approaches on the task of recognizing 20 handshapes commonly used in American Sign Language (ASL).