Results 1 - 10
of
87
Support vector machine active learning for image retrieval
, 2001
"... Relevance feedback is often a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively determinines a user’s desired output or query concept by asking the user whether certain proposed images ..."
Abstract
-
Cited by 456 (28 self)
- Add to MetaCart
(Show Context)
Relevance feedback is often a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively determinines a user’s desired output or query concept by asking the user whether certain proposed images are relevant or not. For a relevance feedback algorithm to be effective, it must grasp a user’s query concept accurately and quickly, while also only asking the user to label a small number of images. We propose the use of a support vector machine active learning algorithm for conducting effective relevance feedback for image retrieval. The algorithm selects the most informative images to query a user and quickly learns a boundary that separates the images that satisfy the user’s query concept from the rest of the dataset. Experimental results show that our algorithm achieves significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of relevance feedback.
Evaluating Top-k Queries over Web-Accessible Databases
- ACM TRANS. ON DATABASE SYSTEMS
, 2004
"... ... In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but ..."
Abstract
-
Cited by 253 (15 self)
- Add to MetaCart
(Show Context)
... In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes sourceaccess parallelism to minimize query response time, while satisfying source-access constraints.
Efficient IR-Style Keyword Search over Relational Databases
- In VLDB
, 2003
"... Applications in which plain text coexists with structured data are pervasive. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies, but this sea ..."
Abstract
-
Cited by 211 (10 self)
- Add to MetaCart
Applications in which plain text coexists with structured data are pervasive. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies, but this search functionality requires that queries specify the exact column or columns against which a given list of keywords is to be matched.
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
- In Proceedings of ICDE’99
, 1999
"... Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search ..."
Abstract
-
Cited by 119 (13 self)
- Add to MetaCart
(Show Context)
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree – a multidimensional data structure for indexing high dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (e.g., R-tree, SS-tree, SRtree) or a pure space partitioning (SP) one (e.g., KDB-tree, hBtree); rather, it “combines ” positive aspects of the two types of index structures a single data structure to achieve search performance more scalable to high dimensionalities than either of the above techniques (hence, the name “hybrid”). Furthermore, unlike many data structures (e.g., distance based index structures like SS-tree, SR-tree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on “real” high dimensional large size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DPbased and SP-based index mechanisms as well as linear scan at all dimensionalities for large sized databases. 1.
J.S.: Supporting incremental join queries on ranked inputs
- In: Proc. 27th Intl. Conference on Very Large Databases (VLDB ’01
, 2001
"... This paper investigates the problem of incremental joins of multiple ranked data sets when the join condition is a list of arbitrary user-defined predicates on the input tuples. This problem arises in many important applications dealing with ordered inputs and multiple ranked data sets, and requirin ..."
Abstract
-
Cited by 85 (7 self)
- Add to MetaCart
(Show Context)
This paper investigates the problem of incremental joins of multiple ranked data sets when the join condition is a list of arbitrary user-defined predicates on the input tuples. This problem arises in many important applications dealing with ordered inputs and multiple ranked data sets, and requiring the top k solutions. We use multimedia applications as the motivating examples but the problem is equally applicable to traditional database applications involving optimal resource allocation, scheduling, decision making, ranking, etc. We propose an algorithm J that enables querying of ordered data sets by imposing arbitrary userdefined join predicates. The basic version of the algorithm does not use any random access but a JPA variation can exploit available indexes for efficient random access based on the join predicates. A special case includes the join scenario considered by Fagin [1] for joins based on identical keys, and in that case, our algorithms perform as efficiently as Fagin’s. Our main contribution, however, is the generalization to join scenarios that were previously unsupported, including cases where random access in the algorithm is not possible due to lack of unique keys. In addition, J can support multiple join levels, or nested join hierarchies, which are the norm for modeling multimedia data. We also give-approximation versions of both of the above algorithms. Finally, we give strong optimality results for some of the proposed algorithms, and we study their performance empirically.
The truth about Corel -- evaluation in image retrieval
- IN PROCEEDINGS OF THE CHALLENGE OF IMAGE AND VIDEO RETRIEVAL (CIVR2002
, 2002
"... To demonstrate the performance of content-based image retrieval systems (CBIRSs), there is not yet any standard data set that is widely used. The only dataset used by a large number of research groups are the Corel Photo CDs. There are more than 800 of those CDs, each containing 100 pictures roughl ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
To demonstrate the performance of content-based image retrieval systems (CBIRSs), there is not yet any standard data set that is widely used. The only dataset used by a large number of research groups are the Corel Photo CDs. There are more than 800 of those CDs, each containing 100 pictures roughly similar in theme. Unfortunately, basically every evaluation is done on a different subset of the image sets thus making comparison impossible. In this article, we compare different ways of evaluating the performance using a subset of the Corel images with the same CBIRS and the same set of evaluation measures. The aim is to show how easy it is to get differing results, even when using the same image collection, the same CBIRS and the same performance measures. This pinpoints the fact that we need a standard database of images with a query set and corresponding relevance judgments (RJs) to really compare systems. The techniques used in this article to “enhance ” the apparent performance of a CBIRS are commonly used, sometimes described, sometimes not. They all have a justification and seem to change the performance of a CBIRS but they do actually not. With a larger subset of images it is of course much easier to generate even bigger differences in performance. The goal of this article is not to be a guide of how to make the “apparent ” performance of systems look good, but rather to make readers aware of CBIRS evaluations and the importance of standardized image databases, queries and RJ.
Optimizing top-k selection queries over multimedia repositories
, 2003
"... Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated wi ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Further- more, unlike in the relational model, users may just want the k top-ranked objects for their selection queries, for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. In this paper, we investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repos- itory strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus,
Support vector machine concept-dependent active learning for image retrieval
- IEEE Transactions on Multimedia
, 2005
"... Relevance feedback is a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively learns a user’s desired output or query concept by asking the user whether certain proposed images are relevant ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
(Show Context)
Relevance feedback is a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively learns a user’s desired output or query concept by asking the user whether certain proposed images are relevant or not. For a learning algorithm to be effective, it must learn a user’s query concept accurately and quickly, while also asking the user to label only a small number of images. In addition, the concept-learning algorithm should consider the complexity a concept in determining its learning strategies. In this paper, we present the use of support vector machines active learning in a concept-dependent way (SVM CD Active) for conducting relevance feedback. We characterize a concept’s complexity using three measures: hit-rate, isolation and diversity. To reduce concept complexity so as to improve concept learnability, we propose a multimodal learning approach that uses images ’ semantic labels to intelligently adjust the sampling strategy and the sampling pool of SVM CD Active. Our empirical study on several datasets shows that active learning outperforms traditional passive learning, and concept-dependent learning is superior to the traditional conceptindependent relevance-feedback schemes.