Results 1 - 10
of
29
Support vector machine active learning for image retrieval
, 2001
"... Relevance feedback is often a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively determinines a user’s desired output or query concept by asking the user whether certain proposed images ..."
Abstract
-
Cited by 248 (22 self)
- Add to MetaCart
Relevance feedback is often a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively determinines a user’s desired output or query concept by asking the user whether certain proposed images are relevant or not. For a relevance feedback algorithm to be effective, it must grasp a user’s query concept accurately and quickly, while also only asking the user to label a small number of images. We propose the use of a support vector machine active learning algorithm for conducting effective relevance feedback for image retrieval. The algorithm selects the most informative images to query a user and quickly learns a boundary that separates the images that satisfy the user’s query concept from the rest of the dataset. Experimental results show that our algorithm achieves significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of relevance feedback.
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
- SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract
-
Cited by 169 (41 self)
- Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Automated ranking of database query results
- In CIDR
, 2003
"... We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlatio ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system. 1.
A novel log-based relevance feedback technique in content-based image retrieval
, 2004
"... Relevance feedback has been proposed as an important technique to boost the retrieval performance in content-based image retrieval (CBIR). However, since there exists a semantic gap between low-level features and high-level semantic concepts in CBIR, typical relevance feedback techniques need to per ..."
Abstract
-
Cited by 29 (10 self)
- Add to MetaCart
Relevance feedback has been proposed as an important technique to boost the retrieval performance in content-based image retrieval (CBIR). However, since there exists a semantic gap between low-level features and high-level semantic concepts in CBIR, typical relevance feedback techniques need to perform a lot of rounds of feedback for achieving satisfactory results. These procedures are time-consuming and may make the users bored in the retrieval tasks. For a long-term study purpose in CBIR, we notice that the users’ feedback logs can be available and employed for helping the retrieval tasks in CBIR systems. In this paper, we propose a novel scheme to study the log-based relevance feedback (LRF) technique for improving retrieval performance and reducing the semantic gap in CBIR. In order to effectively incorporate the users ’ feedback logs, we propose a modified support vector machine (SVM) technique called soft label support vector machine (SLSVM) to construct the LRF algorithm in CBIR. We conduct extensive experiments to evaluate the performance of our proposed algorithm. Compared with the typical approach using query expansion (QEX) technique, we demonstrate that our proposed scheme can significantly improve the retrieval performance of semantic image retrieval from detailed experiments.
Qcluster: Relevance Feedback Using Adaptive Clustering for Content-Based Image Retrieval
- In Proc. of the ACM SIGMOD Int. Conf. on Management of Data
, 2003
"... The learning-enhanced relevance feedback has been one of the most active research areas in content-based image retrieval in recent years. However, few methods using the relevance feedback are currently available to process relatively complex queries on large image databases. In the case of complex i ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
The learning-enhanced relevance feedback has been one of the most active research areas in content-based image retrieval in recent years. However, few methods using the relevance feedback are currently available to process relatively complex queries on large image databases. In the case of complex image queries, the feature space and the distance function of the user's perception are usually di#erent from those of the system. This di#erence leads to the representation of a query with multiple clusters (i.e., regions) in the feature space. Therefore, it is necessary to handle disjunctive queries in the feature space.
Probabilistic information retrieval approach for ranking of database query results
- ACM Transactions on Database Systems (TODS
, 2006
"... We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.
MEGA --- The Maximizing Expected Generalization Algorithm for Learning Complex Query Concepts
- ACM Transaction on Information Systems
, 2000
"... Specifying exact query concepts has become increasingly challenging to end-users. This is because many query concepts #e.g., those for looking up a multimedia object# can be hard to articulate, and articulation can be subjective. In this study,we propose a query-concept learner that learns query ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Specifying exact query concepts has become increasingly challenging to end-users. This is because many query concepts #e.g., those for looking up a multimedia object# can be hard to articulate, and articulation can be subjective. In this study,we propose a query-concept learner that learns query criteria through an intelligent sampling process. Our concept learner aims to ful#ll two primary design objectives: 1# it has to be expressive in order to model most practical query concepts, and 2# it must learn a concept quickly and with a small number of labeled data since online users tend to be too impatient to provide much feedback. To ful#ll the #rst goal, we model query concepts in k-CNF, which can express almost all practical query concepts. To ful#ll the second design goal, we propose our maximizing expected generalization algorithm #MEGA#, which converges to target concepts quickly by its two complementary steps: sample selection and concept re#nement. We also propose a divide-and-conquer method that divides the concept-learning task into G subtasks to achieve speedup. We notice that a task must be divided carefully, or search accuracy may su#er. Wethus employ a genetic-based mining algorithm to discover good feature groupings. Through analysis and mining results, we observe that organizing image features in a multi-resolution manner, and minimizing intragroup feature correlation, can speed up query-concept learning substantially while maintaining high search accuracy. Through examples, analysis, experiments, and an prototype implementation, we show that MEGA converges to query concepts signi#cantly faster than traditional methods. Keywords: query concept, relevance feedback, active learning, data mining. 1
An Adaptive Recommendation System without Explicit Acquisition of User Relevance Feedback
- Distributed and Parallel Databases
, 2003
"... Recommendation systems are widely adopted in e-commerce businesses for helping customers locate products they would like to purchase. In an earlier work, we introduced a recommendation system, termed Yoda, which employs a hybrid approach that combines collaborative filtering (CF) and content-based q ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Recommendation systems are widely adopted in e-commerce businesses for helping customers locate products they would like to purchase. In an earlier work, we introduced a recommendation system, termed Yoda, which employs a hybrid approach that combines collaborative filtering (CF) and content-based querying to achieve higher accuracy for large-scale Web-based applications. To reduce the complexity of the hybrid approach, Yoda is structured as a tunable model that is trained off-line and employed for real-time recommendation on-line. The on-line process benefits from an optimized aggregation function with low complexity that allows the real-time aggregation based on confidence values of an active user to pre-defined sets of recommendations. In this paper, we extend Yoda to include more recommendation sets. The recommendation sets can be obtained from different sources, such as human experts, web navigation patterns, and clusters of user evaluations. Moreover, the extended Yoda can learn the confidence values automatically by utilizing implicit users' relevance feedback through web navigations using genetic algorithms (GA). Our end-to-end experiments show while Yoda's complexity is low and remains constant as the number of users and/or items grow, its accuracy surpasses that of the basic nearest-neighbor method by a wide margin (in most cases more than 100%). The experimental results also indicate that the retrieval accuracy is significantly increased by using the GA-based learning mechanism.
Evaluating Refined Queries in Top-k Retrieval Systems
- IEEE Transactions on Knowledge and Data Engineering
, 2003
"... In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of s ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In many applications, users specify target values for certain attributes/features without requiring exact matches to these values in return. Instead, the result is typically a ranked list of "top k" objects that best match the specified feature values. User subjectivity is an important aspect of such queries, i.e., which objects are relevant to the user and which are not depends on the perception of the user. Due to the subjective nature of top-k queries, the answers returned by the system to an user query often do not satisfy the users need right away, either because the weights and the distance functions associated with the features do not accurately capture the users perception or because the specified target values do not fully capture her information need or both. In such cases, the user would like to refine the query and resubmit it in order to get back a better set of answers. While there has been a lot of research on query refinement models, there is no work that we are aware of on supporting refinement of top-k queries efficiently in a database system. Done naively, each "refined" query can be treated as a "starting" query and evaluated from scratch. This paper explores alternative approaches that significantly improve the cost of evaluating refined queries by exploiting the observation that the refined queries are not modified drastically from one iteration to another. Our experiments over a real-life multimedia data set show that the proposed techniques save more than 80 percent of the execution cost of refined queries over the naive approach and is more than an order of magnitude faster than a simple sequential scan.
Relevance Feedback in Multimedia Databases
- In Handbook of Video Databases: Design and Applications
, 2003
"... INTRODUCTION The popularity of web search engines has familiarized countless users with the similarity search paradigm. In this paradigm a user provides an example or simple sketch of desired information to a system and receives a list of items that "best" match the information provided. These resu ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
INTRODUCTION The popularity of web search engines has familiarized countless users with the similarity search paradigm. In this paradigm a user provides an example or simple sketch of desired information to a system and receives a list of items that "best" match the information provided. These results are typically sorted by a system-generated estimate of how closely they match the sketch/requirement provided by users. Consider a typical web search engine. The users' sketch takes the form of keywords and the search engine finds the web pages that best match those keywords. User expectations have grown to demand powerful and flexible search capabilities for multimedia data such as images and video in addition to the traditional unstructured web pages. Consider a user searching for pictures depicting a "sunset by the sea" in an image database. One possibility is to attach a text description to each image and use standard text search engine techniques to find the results. The proble

