Results 1 - 10
of
29
Parallel crawlers
- In Proceedings of the 11th international conference on World Wide Web
, 2002
"... In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We first propose multiple architectures for a parallel crawler and ident ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We first propose multiple architectures for a parallel crawler and identify fundamental issues related to parallel crawling. Based on this understanding, we then propose metrics to evaluate a parallel crawler, and compare the proposed architectures using 40 million pages collected from the Web. Our results clarify the relative merits of each architecture and provide a good guideline on when to adopt which architecture. 1
Object orientation in multidatabase systems
- ACM Computing Surveys
, 1995
"... Abstract A multidatabase system (MDBS) is a confederation of pre-existing distributed, heterogeneous, and autonomous database systems. There has been a recent proliferation of research suggesting the application of object-oriented techniques to facilitate the complex task of designing and implementi ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
Abstract A multidatabase system (MDBS) is a confederation of pre-existing distributed, heterogeneous, and autonomous database systems. There has been a recent proliferation of research suggesting the application of object-oriented techniques to facilitate the complex task of designing and implementing MDBSs. Although this approach seems promising, the lack of a general framework impedes any further development. The goal of this paper is to provide a concrete analysis and categorization of the various ways in which object orientation has affected the task of designing and implementing MDBSs.
A Mobile Transaction Model That Captures Both The Data And Movement Behavior
- Mobile Networks and Applications
, 1997
"... Unlike distributed transactions, mobile transactions do not originate and end at the same site. The implication of the movement of such transactions is that classical atomicity, concurrency and recovery solutions must be revisited to capture the movement behavior. As an effort in this direction, ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
Unlike distributed transactions, mobile transactions do not originate and end at the same site. The implication of the movement of such transactions is that classical atomicity, concurrency and recovery solutions must be revisited to capture the movement behavior. As an effort in this direction, we define a model of mobile transactions by building on the concepts of split transactions and global transactions in a multidatabase environment. Our view of mobile transactions, called Kangaroo Transactions, incorporates the property that transactions in a mobile computing system hop from one base station to another as the mobile unit moves through cells. Our model is the first to capture this movement behavior as well as the data behavior which reflects the access to data located in databases throughout the static network. The mobile behavior is dynamic and is realized in our model via the use of split operations. The data access behavior is captured by using the idea of global and local transactions in a multidatabase system. This research was partially supported by the National Science Foundation under Grant Number INT-9417907 and by a Massive Digital Data System (MDDS) effort sponsored by the Advanced Research and Development Committee of the Community Management Staff. Part of this research was performed while Margaret Dunham (then Margaret Eich) was on sabbatical at the University of Queensland in Brisbane, Australia. y This research was partially supported by Hughes Research Laboratories under Grant Number 906356. 1 1
Two can keep a secret: A distributed architecture for secure database services
- In Proc. CIDR
, 2005
"... Recent trends towards database outsourcing, as well as concerns and laws governing data privacy, have led to great interest in enabling secure database services. Previous approaches to enabling such a service have been based on data encryption, causing a large overhead in query processing. We propos ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Recent trends towards database outsourcing, as well as concerns and laws governing data privacy, have led to great interest in enabling secure database services. Previous approaches to enabling such a service have been based on data encryption, causing a large overhead in query processing. We propose a new, distributed architecture that allows an organization to outsource its data management to two untrusted servers while preserving data privacy. We show how the presence of two servers enables efficient partitioning of data so that the contents at any one server are guaranteed not to breach data privacy. We show how to optimize and execute queries in this architecture, and discuss new challenges that emerge in designing the database schema. 1
Dynamically Distributed Query Evaluation
- In PODS
, 2001
"... Distributed query evaluation usually assumes a fixed topology, where the set of servers and the partitioning of data on the servers is known in advance. Given a query expression, an optimizer will first produce a global plan, then assign ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Distributed query evaluation usually assumes a fixed topology, where the set of servers and the partitioning of data on the servers is known in advance. Given a query expression, an optimizer will first produce a global plan, then assign
LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers
, 1995
"... LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, e ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, easily reaching Gbytes. Address calculus does not require any centralized component that could lead to a hot- spot. Access times to the le can be under a millisecond and the le can be used in parallel by several client processors. We showthe LH*lh design, and report on the performance analysis. This includes experiments on the Parsytec GC/PowerPlus multicomputer with up to 128 Power PCs and 32 MB of distributed RAM per node. We prove the e ciency of the method and justify various algorithmic choices that were made. LH*lh opens a new perspective for high-performance applications, especially for the database management of new types of data and in real-time environments.
Peer-to-Peer Grid Databases for Web service Discovery
- CERN IT Division. 2002
, 2002
"... Grids are collaborative distributed Internet systems characterized by large scale, heterogeneity, lack of central control, multiple autonomous administrative domains, unreliable components and frequent dynamic change. In such systems, it is desirable to maintain and query dynamic and timely informat ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Grids are collaborative distributed Internet systems characterized by large scale, heterogeneity, lack of central control, multiple autonomous administrative domains, unreliable components and frequent dynamic change. In such systems, it is desirable to maintain and query dynamic and timely information about active participants such as services, resources and user communities. The web services vision promises that programs are made more flexible, adaptive and powerful by querying Internet databases (registries) at runtime in order to discover information and network attached building blocks, enabling the assembly of distributed higher-level components. In support of this vision, we introduce the Web Service Discovery Architecture (WSDA), which subsumes an array of disparate concepts, interfaces and protocols under a single semi-transparent umbrella. WSDA specifies a small set of orthogonal multi-purpose communication primitives (building blocks) for discovery, covering service identification, service description retrieval, data publication as well as minimal and powerful query support. The individual primitives can be combined and plugged together by specific clients and services to yield a wide range of behaviors and emerging synergies. Based
Profile driven data management for pervasive environments
- In 3th International Conference on Database and Expert Systems Applications (DEXA 2002), Aix en Provence
, 2002
"... Abstract. The past few years have seen significant work in mobile data management, typically based on the client/proxy/server model. Mobile/wireless devices are treated as clients that are data consumers only, while data sources are on servers that typically reside on the wired network. With the adv ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
Abstract. The past few years have seen significant work in mobile data management, typically based on the client/proxy/server model. Mobile/wireless devices are treated as clients that are data consumers only, while data sources are on servers that typically reside on the wired network. With the advent of “pervasive computing ” environments an alternative scenario arises where mobile devices gather and exchange data from not just wired sources, but also from their ethereal environment and one another. This is accomplished using ad-hoc connectivity engendered by Bluetooth like systems. In this new scenario, mobile devices become both data consumers and producers. We describe the new data management challenges which this scenario introduces. We describe the design and present an implementation prototype of our framework, MoGATU, which addresses these challenges. An important component of our approach is to treat each device as an autonomous entity with its “goals ” and “beliefs”, expressed using a semantically rich language. We have implemented this framework over a combined Bluetooth and Ad-Hoc 802.11 network with clients running on a variety of mobile devices. We present experimental results validating our approach and measure system performance. 1
Customizable Parallel Execution of Scientific Stream
, 2005
"... Scientific applications require processing highvolume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
Scientific applications require processing highvolume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are described as high-level data flow distribution templates.
CloudTPS: Scalable transactions for Web applications in the cloud
, 2010
"... NoSQL Cloud data services provide scalability and high availability properties for web applications but at the same time they sacrifice data consistency. However, many applications cannot afford any data inconsistency. CloudTPS is a scalable transaction manager to allow cloud database services to ex ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
NoSQL Cloud data services provide scalability and high availability properties for web applications but at the same time they sacrifice data consistency. However, many applications cannot afford any data inconsistency. CloudTPS is a scalable transaction manager to allow cloud database services to execute the ACID transactions of web applications, even in the presence of server failures and network partitions. We implement this approach on top of the two main families of scalable data layers: Bigtable and SimpleDB. Performance evaluation on top of HBase (an open-source version of Bigtable) in our local cluster and Amazon SimpleDB in the Amazon cloud shows that our system scales linearly at least up to 40 nodes in our local cluster and 80 nodes in the

