Results 11 - 20
of
25
Optimised Phrase Querying and Browsing of Large Text Databases
- Proc. Australasian Computer Science Conference
, 2001
"... Most search systems for querying large document collections---for example, web search engines---are based on well-understood information retrieval principles. These systems are both efficient and effective in finding answers to many user information needs, expressed through informal ranked or struct ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Most search systems for querying large document collections---for example, web search engines---are based on well-understood information retrieval principles. These systems are both efficient and effective in finding answers to many user information needs, expressed through informal ranked or structured Boolean queries. Phrase querying and browsing are additional techniques that can augment or replace conventional querying tools. In this paper, we propose optimisations for phrase querying with a nextword index, an efficient structure for phrase-based searching. We show that careful consideration of which search terms are evaluated in a query plan and optimisation of the order of evaluation of the plan can reduce query evaluation costs by more than a factor of five. We conclude that, for phrase querying and browsing with nextword indexes, an ordered query plan should be used for all browsing and querying. Moreover, we show that optimised phrase querying is practical on large text collections.
Partial match queries in random k-d trees
- SIAM Journal on Computing
, 2005
"... Abstract. We solve the open problem of characterizing the leading constant in the asymptotic approximation to the expected cost used for random partial match queries in random k-d trees. Our approach is new and of some generality; in particular, it is applicable to many problems involving differenti ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. We solve the open problem of characterizing the leading constant in the asymptotic approximation to the expected cost used for random partial match queries in random k-d trees. Our approach is new and of some generality; in particular, it is applicable to many problems involving differential equations (or difference equations) with polynomial coefficients. Key words. k-d trees, partial-match queries, differential equations, average-case analysis of algorithms, method of linear operators, asymptotic analysis. AMS subject classifications. 68W40 68P05 68P10 68U05 1. Introduction. Multidimensional
Inverting the Database
- Final Report for NSF Grant IRI
, 2001
"... We wish to propose a database architecture combining a general view of bioinformatics data as a graph of nodes (data objects) and edges (data relationships), with the efficiency and robustness of data management and query provided by indexing and generic programming techniques. We refer to this arch ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We wish to propose a database architecture combining a general view of bioinformatics data as a graph of nodes (data objects) and edges (data relationships), with the efficiency and robustness of data management and query provided by indexing and generic programming techniques. We refer to this architecture as "inverting the database" because it replaces tabular schema---the primary interface of relational databases ---with the ability to query relationships via indexes, which are ordinarily hidden in relational query languages. In most database systems the index has second-class status: an index cannot be explicitly referenced in a query. This treatment of indexes has been adopted for decades, apparently as a reaction to the complexities introduced by explicit access paths and navigational queries. In many applications today, however, the existence of an index is vital for retrieving information. The second-class status of indexing stands in contrast to its increasing importance. In this paper we invert the role of the index, and make it a first-class citizen in the query language. It is possible to do this in a structured way, allowing users to mention indexes explicitly without yielding to a procedural query model, by converting functional relations into indexes (explicit functions). In the limit, the database becomes a graph, in which the edges are these indexes. Function composition can be specified either explicitly or implicitly as path queries. The net effect of the inversion is to convert the database into a hyperdatabase: a database of databases, connected by indexes or functions. The inversion approach was motivated by our work in biological databases, for which hyperdatabases are a good model. The need for a good model has slowed progress in bioinformatics.
A Semantic Caching Method Based on Linear Constraints
- In Proc. of International Symposium on Database Applications in Non-Traditional Environments (DANTE’99
, 1999
"... Because performance is a crucial issue in database systems, data caching techniques have been studied in database research field, especially in client-server databases and distributed databases. Recently, the idea of semantic caching has been proposed. The approach uses semantic information to desc ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Because performance is a crucial issue in database systems, data caching techniques have been studied in database research field, especially in client-server databases and distributed databases. Recently, the idea of semantic caching has been proposed. The approach uses semantic information to describe cached data items so that it tries to exploit not only temporal locality but also semantic locality to improve query response time. In this paper, we propose linear constraint-based semantic caching as a new approach to semantic caching. Based on the idea of constraint databases, we describe the semantic information about the cached relational tuples as compact constraint tuples. The main focus in this paper is the representation method of cache information and the cache examination algorithm. 1. Introduction Data caching has been investigated in various fields of database research such as client-server databases [5, 18], data warehouses [6], and distributed and heterogeneous database...
Research Challenges In Information Access And Dissemination in a Mobile Environment
- In Proceedings of the PanYellow-Sea International Workshop on Information Technologies for Network Era
, 2002
"... A wireless environment has both benefits and limitations when it is used as a medium for information access and dissemination. In a wireless environment, we can access to not only information anytime and anywhere as if we were connected to the Internet but information tailored to our particular need ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A wireless environment has both benefits and limitations when it is used as a medium for information access and dissemination. In a wireless environment, we can access to not only information anytime and anywhere as if we were connected to the Internet but information tailored to our particular needs depending on where we are located. This leads to location-dependent information services (e.g., querying for local traffic, restaurants, etc.). The capability of the system is also enhanced by the fact that genuine broadcast (as opposed to multicast) can be realized in a wireless environment, making massive broadcast of commonly requested data a simple and inexpensive task. On the other hand, we have to overcome a lot of limitations such as the limited bandwidth, high-cost of data transmission (both in terms of power consumption on the client devices and system bandwidth) . In this paper, we give an overview of our research in this area. We then discuss our recent work on query processing in location-dependent information services and outline our future research.
Fast High-Dimensional Data Search in Incomplete Databases
, 1998
"... We propose and evaluate two indexing schemes for improving the efficiency of data retrieval in high-dimensional databases that are incomplete. These schemes are novel in that the search keys may contain missing attribute values. The first is a multi-dimensional index structure, called the Bitstring- ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose and evaluate two indexing schemes for improving the efficiency of data retrieval in high-dimensional databases that are incomplete. These schemes are novel in that the search keys may contain missing attribute values. The first is a multi-dimensional index structure, called the Bitstring-augmented R-tree (BR-tree), whereas the second comprises a family of multiple one-dimensional one-attribute (MOSAIC) indexes. Our results show that both schemes can be superior over exhaustive search. Experimental results suggest that BRtrees have lower update and storage costs and are able to support range queries more efficiently under most circumstances, when compared to the MOSAIC indexing scheme. However, contrary to conventional wisdom, the MOSAIC structure outperforms the BR-tree in retrieval time for point queries, as well as in range queries over incomplete databases for dimension-unrestricted data distributions. 1 Introduction We examine the problem of high-dimensional data sear...
UB-Tree Indexing for Semantic Query Optimization of Range Queries
"... Abstract—Semantic query optimization consists in restricting the search space in order to reduce the set of objects of interest for a query. This paper presents an indexing method based on UB-trees and a static analysis of the constraints associated to the views of the database and to any constraint ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Semantic query optimization consists in restricting the search space in order to reduce the set of objects of interest for a query. This paper presents an indexing method based on UB-trees and a static analysis of the constraints associated to the views of the database and to any constraint expressed on attributes. The result of the static analysis is a partitioning of the object space into disjoint blocks. Through Space Filling Curve (SFC) techniques, each fragment (block) of the partition is assigned a unique identifier, enabling the efficient indexing of fragments by UB-trees. The search space corresponding to a range query is restricted to a subset of the blocks of the partition. This approach has been developed in the context of a KB-DBMS but it can be applied to any relational system.
Other Requirements
"... ate students must work independently 2.2 Output of build The output consists of 2n files, where n is the number of attributes in the schema. The bitmap index is defined above in Section 1, and should be named as described in Section 2.1. The statistics file should include the following information: ..."
Abstract
- Add to MetaCart
ate students must work independently 2.2 Output of build The output consists of 2n files, where n is the number of attributes in the schema. The bitmap index is defined above in Section 1, and should be named as described in Section 2.1. The statistics file should include the following information: number of tuples in the relation, number of distinct domain values and the selectivity (percentage) of each, and percentage of unused space (zero values) in the bitmap. The statistics file should be named as described in Section 2.1. Note that you will design the format of each of these files, and thus you will have to document the format you choose in BNF. 2.3 Input to query The input to query consists of a sequence of simplified queries from standard input. The format of the query file is a series of queries and is described below. All predicates are assumed to be equality predicates. !file format? ::= !q
Open Constraint Programming
- In Principles and Practice of Constraint Programming – CP’98, 4th International Conference
, 1998
"... Implementation Model A central issue behind an implementation of an OCP system is clearly the management of efficient wakeups of blocked reactors. In analogy with active databases, there is a dependency on some primitive triggering mechanism on the part of the constraint system. However, such a mec ..."
Abstract
- Add to MetaCart
Implementation Model A central issue behind an implementation of an OCP system is clearly the management of efficient wakeups of blocked reactors. In analogy with active databases, there is a dependency on some primitive triggering mechanism on the part of the constraint system. However, such a mechanism works by considering only the internal state of the system, and not at all on the environment of reactors currently blocked by the constraint system. This gap motivates this section, which is an informal discussion on where the major implementation considerations lie. The underlying philoshpy throughout this section is that there are in general a large number of reactors, and each of the frequent updates of the constraint store is of no consequence to most of the reactors. 3.1 Trigger Table In what follows, we informally say that a reactor r is blocked at \Delta if no progress is possible for r using the constraint system \Delta. (This can be formalized using the transition rules i...
Indexing Reduced Dimensionality Spaces Using Single Dimensional Indexes
"... The dimensionality curse has greatly affected the scalability of high-dimensional indexes. A well known approach to improving the indexing performance is dimensionality reduction before indexing the data in the reduced-dimensionality space. However, the reduction may cause loss of distance informa ..."
Abstract
- Add to MetaCart
The dimensionality curse has greatly affected the scalability of high-dimensional indexes. A well known approach to improving the indexing performance is dimensionality reduction before indexing the data in the reduced-dimensionality space. However, the reduction may cause loss of distance information when the data set is not globally correlated. To reduce loss of information and degradation of search quality, cluster based dimensionality reduction should be used instead. In this paper, we present an adaptive local dimensionality reduction (LDR) technique which first identifies effective clusters based on Mahalanobis distance, and for each cluster, performs local dimensionality reduction. The data points in each cluster of the reduced-dimensionality space are then transformed into single distance values with reference to the centroid of the cluster, and indexed using a single dimensional index for nearest neighbor search. Unlike an existing LDR technique which uses an index for each cluster, we use one single B + -tree for the whole data set. Extensive performance studies using both real and synthetic data show that the method achieves higher precision compared to existing global dimensionality reduction and local dimensionality reduction methods, and is more efficient in terms of query performance. 1

