• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Similarity-based operators and query optimization for multimedia database systems,” (2001)

by S Atnafu, L Brunie, H Kosch
Venue:in IDEAS,
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Learning similarity measures in non-orthogonal space

by Ning Liu, Benyu Zhang, Jun Yan, Qiang Yang, Shuicheng Yan, Zheng Chen, Fengshan Bai, Wei-ying Ma - In CIKM ’04: Proceedings of the thirteenth ACM conference on Information and knowledge management , 2004
"... Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the most commonly used similarity measures. However, in many practical tasks such as text categorization a ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the most commonly used similarity measures. However, in many practical tasks such as text categorization and document clustering, the Cosine similarity is calculated under the assumption that the input space is an orthogonal space which usually could not be satisfied due to synonymy and polysemy. Various algorithms such as Latent Semantic Indexing (LSI) were used to solve this problem by projecting the original data into an orthogonal space. However LSI also suffered from the high computational cost and data sparseness. These shortcomings led to increases in computation time and storage requirements for large scale realistic data. In this paper, we propose a novel and effective similarity

Similarity Group-by Operators for Multi-dimensional Relational Data

by Mingjie Tang , Ruby Y Tahboub , Senior Member, IEEE Walid G Aref , Fellow, IEEE Mikhail J Atallah , Acm , Qutaibah M Malluhi , Member, IEEE Mourad Ouzzani , Yasin N Silva
"... Abstract-The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarityaware grouping provides a more realistic view o ..."
Abstract - Add to MetaCart
Abstract-The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarityaware grouping provides a more realistic view on real-world data that coud lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values. While existing similarity-based grouping operators efficiently materialize this "approximate" semantics, they primarily focus on one-dimensional attributes and treat multi-dimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multi-dimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multi-dimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in peformance over baseline methods developed to solve the same problem.
(Show Context)

Citation Context

...w SGB operators. Section 5 presents application scenarios that demonstrate the use and practicality of the various proposed semantics for SGB operators. Sections 6 and 7 introduce the algorithmic frameworks for SGB-All and SGB-Any operators, respectively. Section 8 describes the in-database extensions to support the two operators and their performance evaluation from within PostgreSQL. Section 9 concludes the paper. 2 RELATED WORK Previous work on similarity-aware query processing addressed the theoretical foundation and query optimization issues for similarity-aware query operators [2]. [3], [4] introduce similarity algebra that extends relational algebra operations, e.g., joins and set operations, with similarity semantics. Similarity queries and their optimizations include algorithms for similarity range search and K-Nearest Neighbor (KNN) [5], similarity join [6], and similarity aggregates [7]. Most of work focus on semantic and transformation rules for query optimization purpose independently from actual algorithms to realize similarity-aware operators. In contrast, our focus is on the latter. Clustering forms groups of similar data for the purpose of learning hidden knowledge. C...

Efficient Content-Based and Metadata Retrieval in Image Database

by Solomon Atnafu, Richard Chbeir, Lionel Brunie
"... Abstract: Managing image data in a database system using metadata has been practiced since the last two decades. However, describing an image fully and adequately with metadata is practically not possible. The other alternative is describing image content by its low-level features such as color, tex ..."
Abstract - Add to MetaCart
Abstract: Managing image data in a database system using metadata has been practiced since the last two decades. However, describing an image fully and adequately with metadata is practically not possible. The other alternative is describing image content by its low-level features such as color, texture, shape, etc. and using the same for similarity-based image retrieval. However, practice has shown that using only the low-level features can not as well be complete. Hence, systems need to integrate both low-level and metadata descriptions for an efficient image data management. However, due to lack of adequate image data model, absence of a formal algebra for content-based image operations, and lack of precision of the existing image processing and retrieval techniques, no much work is done to integrate the use of lowlevel and metadata description and retrieval methods. In this paper, we first present a global image data model that supports both metadata and low-level descriptions of images and their salient objects. This allows to make multi-criteria image retrieval (context-, semantic-, and content-based queries). Furthermore, we present an image data repository model that captures all data described in the model and permits to integrate heterogeneous operations in a DBMS. In particular, similarity-based operations (similarity-based join and selection) in combination with traditional ones can be carried out. Finally, we present an image DBMS architecture that we use to develop a prototype in order to support both content-based and metadata retrieval.

A Similarity Reinforcement Algorithm for Heterogeneous Web Pages

by Ning Liu, Jun Yan, Fengshan Bai, Benyu Zhang, Wensi Xi, Weiguo Fan, Zheng Chen, Lei Ji, Chenyong Hu, Wei-ying Ma
"... Abstract. Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- a ..."
Abstract - Add to MetaCart
Abstract. Many machine learning and data mining algorithms crucially rely on the similarity metrics. However, most early research works such as Vector Space Model or Latent Semantic Index only used single relationship to measure the similarity of data objects. In this paper, we first use an Intra- and Inter-Type Relationship Matrix (IITRM) to represent a set of heterogeneous data objects and their inter-relationships. Then, we propose a novel similaritycalculating algorithm over the Inter- and Intra- Type Relationship Matrix. It tries to integrate information from heterogeneous sources to serve their purposes by iteratively computing. This algorithm can help detect latent relationships among heterogeneous data objects. Our new algorithm is based on the intuition that the intra-relationship should affect the inter-relationship, and vice versa. Experimental results on the MSN logs dataset show that our algorithm outperforms the traditional Cosine similarity. 1
(Show Context)

Citation Context

...mance of many data mining algorithms such as document clustering and text categorization critically depends on a good metric that reflects the relationship between the data objects in the input space =-=[1]-=- [30]. It is therefore important to calculate the similarity as effectively as possible [28]. Most early research works only used single relationship to measure the similarity of data objects. In the ...

unknown title

by G. Unel, M. E. D€onderler, O. Ulusoy , 2003
"... An efficient query optimization strategy for spatio-temporal queries in video databases q ..."
Abstract - Add to MetaCart
An efficient query optimization strategy for spatio-temporal queries in video databases q

An Efficient Query Optimization Strategy for Spatio-Temporal

by Queries In Video, Gulay Unel, Mehmet Emin Donderler, Ozgur Ulusoy, Ugur Gudukbay - Journal of Systems and Software , 2002
"... The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building blockofamultimedia database system is the query processor, and a query optimizer embedded to the query processor ..."
Abstract - Add to MetaCart
The interest for multimedia database management systems has grown rapidly due to the need for the storage of huge volumes of multimedia data in computer systems. An important building blockofamultimedia database system is the query processor, and a query optimizer embedded to the query processor is needed to answer user queries efficiently. Query optimization problem has been widely studied for conventional database systems, however it is a new researchareaformultimedia database systems. Due to the differences in query processing strategies, query optimization techniques used in multimedia database systems are different from those used in traditional databases. In this paper, a query optimization strategy is proposed for processing spatio-temporal queries in video database systems. The proposed strategy includes reordering algorithms to be applied on query execution tree. The performance results obtained by testing the reordering algorithms on different query sets are also presented.

ABSTRACT LEARNING SIMILARITY MEASURES IN NON-ORTHOGONAL SPACES

by Ning Liu, Benyu Zhang, Jun Yan, Qiang Yang, Shuicheng Yan, Zheng Chen, Fengshan Bai, Wei-ying M
"... Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the most commonly used similarity measures. However, in many practical tasks such as text categorization a ..."
Abstract - Add to MetaCart
Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the most commonly used similarity measures. However, in many practical tasks such as text categorization and document clustering, the Cosine similarity is calculated under the assumption that the input space is an orthogonal space which usually could not be satisfied due to synonymy and polysemy. Various algorithms such as Latent Semantic Indexing ( LSI) were used to solve this problem by projecting the original data into an orthogonal space. However LSI also suffered from the high computational cost and data sparseness. These shortcomings led to increases in computation time and storage requirements for large scale realistic data. In this paper, we propose a novel and effective similarity metric in the non-orthogonal input space.

Salient-Object-Based Image Query By Visual Content Dawit Bulcha,

by Solomon Atnafu, Lionel Brunie
"... Content-Based Image Retrieval (CBIR) has attracted much attention of the research community. As exact matching is not possible with image retrieval, the approach is to use similarity-based matching using the global features of the entire image to compute a similarity score between two images. Equall ..."
Abstract - Add to MetaCart
Content-Based Image Retrieval (CBIR) has attracted much attention of the research community. As exact matching is not possible with image retrieval, the approach is to use similarity-based matching using the global features of the entire image to compute a similarity score between two images. Equally important is the use of salient-objects: objects in an image that are of particular interest, as the basis of similarity-based computation. However, the current works on CBIR do not address very well the issues related to salient-objects. In this work, we propose a data repository model so that spatial features of salient objects are captured. Moreover, we propose an extension to the similaritybased selection operator defined earlier to allow salient object based selection. We also propose spatial operators that can be used to compute spatial relations between an image and its contained salient objects. To demonstrate the viability of our proposals, we extend a previous system named EMIMS, to develop EMIMS-S (Extended Medical Image Management System to support Salient objects). We also experimentally evaluate the retrieval effectiveness of salient-objects-based image queries. Keywords: Salient-object-based image retrieval, image database, image data model, similarity-based algebra, spatial relation of salient-objects.
(Show Context)

Citation Context

... query processing have been studied by the database community [4, 3, 1]. Currently, several CBIR systems are in use. Most of these systems use low-level features (color, texture, shape). As stated in =-=[2, 4]-=-, each system has made some contribution to this research field. However, these works mostly concentrated on retrieval using the entire image. Moreover, not much attention was given to the modeling an...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University