Results 1 - 10
of
84
Combining Fuzzy Information From Multiple Systems
, 1996
"... : In a traditional database system, the result of a query is a set of values (those values that satisfy the query). In other data servers, such as a system with queries based on image content, or many text retrieval systems, the result of a query is a sorted list. For example, in the case of a syste ..."
Abstract
-
Cited by 296 (6 self)
- Add to MetaCart
: In a traditional database system, the result of a query is a set of values (those values that satisfy the query). In other data servers, such as a system with queries based on image content, or many text retrieval systems, the result of a query is a sorted list. For example, in the case of a system with queries based on image content, the query might ask for objects that are a particular shade of red, and the result of the query would be a sorted list of objects in the database, sorted by how well the color of the object matches that given in the query. A multimedia system must somehow synthesize both types of queries (those whose result is a set, and those whose result is a sorted list) in a consistent manner. In this paper we discuss the solution adopted by Garlic, a multimedia information system being developed at the IBM Almaden Research Center. This solution is based on "graded" (or "fuzzy") sets. Issues of efficient query evaluation in a multimedia system are very different fro...
Evaluating Top-k Selection Queries
- In VLDB
, 1999
"... In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the "top k" tuples that best match the given attribute values. In this paper, we study the advantages and li ..."
Abstract
-
Cited by 117 (4 self)
- Add to MetaCart
In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the "top k" tuples that best match the given attribute values. In this paper, we study the advantages and limitations of processing a top-k query by translating it into a single range query that traditional relational DBMSs can process e#ciently. In particular, we study how to determine a range query to evaluate a top-k query by exploiting the statistics available to a relational DBMS, and the impact of the quality of these statistics on the retrieval e#ciency of the resulting scheme. 1 Introduction Internet Search engines rank the objects in the results of selection queries according to how well these objects match the original selection condition. For such engines, query results are not flat sets of objects that match a given condition. Instead, query results are ranked starting ...
COMBINING APPROACHES TO INFORMATION RETRIEVAL
"... The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the W ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
The combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval. Combination, for example, has been studied extensively in the TREC evaluations and is the basis of the “meta-search” engines used on the Web. This paper examines the development of this technique, including both experimental results and the retrieval models that have been proposed as formal frameworks for combination. We show that combining approaches for information retrieval can be modeled as combining the outputs of multiple classifiers based on one or more representations, and that this simple model can provide explanations for many of the experimental results. We also show that this view of combination is very similar to the inference net model, and that a new approach to retrieval based on language models supports combination and can be integrated with the inference net model.
Modeling Score Distributions for Combining the Outputs of Search Engines
, 2001
"... In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant docu ..."
Abstract
-
Cited by 72 (4 self)
- Add to MetaCart
In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Experiments show that this model fits TREC-3 and TREC-4 data for not only probabilistic search engines like INQUERY but also vector space search engines like SMART for English. We have also used this model to fit the output of other search engines like LSI search engines and search engines indexing other languages like Chinese. It is then shown that given a query for which relevance information is not available, a mixture model consisting of an exponential and a normal distribution can be fitted to the score distribution. These distributions can be used to map the scores of a search engine to probabilities. We also discuss how the shape of the score distributions arise given certain assumptions about word distributions in documents. We hypothesize that all 'good' text search engines operating on any language have similar characteristics. This model has many possible applications. For example, the outputs of different search engines can be combined by averaging the probabilities (optimal if the search engines are independent) or by using the probabilities to select the best engine for each query. Results show that the technique performs as well as the best current combination techniques. This material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC-9209623, in part by the National Science Foundation under grant numbers IRI-9619117 and IIS-9909073, in part by N...
Data Integration Using Similarity Joins and a Word-Based Information Representation Language
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2000
"... ..."
Automated ranking of database query results
- In CIDR
, 2003
"... We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlatio ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system. 1.
Multi-dimensional Selectivity Estimation Using Compressed Histogram
- In SIGMOD
, 1999
"... The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates. In this paper, we propose a novel approach for the multidimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This ena...
Lower bounds for high dimensional nearest neighbor search and related problems
, 1999
"... In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40 ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
In spite of extensive and continuing research, for various geometric search problems (such as nearest neighbor search), the best algorithms known have performance that degrades exponentially in the dimension. This phenomenon is sometimes called the curse of dimensionality. Recent results [38, 37, 40] show that in some sense it is possible to avoid the curse of dimensionality for the approximate nearest neighbor search problem. But must the exact nearest neighbor search problem suffer this curse? We provide some evidence in support of the curse. Specifically we investigate the exact nearest neighbor search problem and the related problem of exact partial match within the asymmetric communication model first used by Miltersen [43] to study data structure problems. We derive non-trivial asymptotic lower bounds for the exact problem that stand in contrast to known algorithms for approximate nearest neighbor search. 1
A flexible image database system for content-based retrieval
- IN 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
, 1999
"... There is a growing need for the ability to query image databases based on similarity of image content rather than strict keyword search. As distance computations can be expensive, there is a need for indexing systems and algorithms that can eliminate candidate images without performing distance calc ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
There is a growing need for the ability to query image databases based on similarity of image content rather than strict keyword search. As distance computations can be expensive, there is a need for indexing systems and algorithms that can eliminate candidate images without performing distance calculations. As user needs may change from session to session, there is also a need for runtime creation of distance measures. In this paper, we present FIDS, “flexible image database system. ” FIDS allows the user to query the database based on complex combinations of dozens of predefined distance measures. Using an indexing scheme and algorithms based on the triangle inequality, FIDS can often return matches to the query image without directly comparing the query image to more than a small percentage of the database. This paper describes the technical contributions of the FIDS approach to content-based image retrieval.
Reasoning with very expressive fuzzy description logics
- Journal of Artificial Intelligence Research
"... It is widely recognized today that the management of imprecision and vagueness will yield more intelligent and realistic knowledge-based applications. Description Logics (DLs) are a family of knowledge representation languages that have gained considerable attention the last decade, mainly due to th ..."
Abstract
-
Cited by 32 (16 self)
- Add to MetaCart
It is widely recognized today that the management of imprecision and vagueness will yield more intelligent and realistic knowledge-based applications. Description Logics (DLs) are a family of knowledge representation languages that have gained considerable attention the last decade, mainly due to their decidability and the existence of empirically high performance of reasoning algorithms. In this paper, we extend the well known fuzzy ALC DL to the fuzzy SHIN DL, which extends the fuzzy ALC DL with transitive role axioms (S), inverse roles (I), role hierarchies (H) and number restrictions (N). We illustrate why transitive role axioms are difficult to handle in the presence of fuzzy interpretations and how to handle them properly. Then we extend these results by adding role hierarchies and finally number restrictions. The main contributions of the paper are the decidability proof of the fuzzy DL languages fuzzy-SI and fuzzy-SHIN, as well as decision procedures for the knowledge base satisfiability problem of the fuzzy-SI and fuzzy-SHIN. 1.

