Results 11 - 20
of
182
Analysis of Task Assignment Policies in Scalable Distributed Web-server Systems
- IEEE Transactions on Parallel and Distributed Systems
, 1998
"... A distributed multi-server Web site can provide the scalability necessary to keep up with growing client demand at popular sites. Load balancing of these distributed Web-server systems, consisting of multiple Web servers for document retrieval and a Domain name server (DNS) for address resolution, o ..."
Abstract
-
Cited by 61 (7 self)
- Add to MetaCart
A distributed multi-server Web site can provide the scalability necessary to keep up with growing client demand at popular sites. Load balancing of these distributed Web-server systems, consisting of multiple Web servers for document retrieval and a Domain name server (DNS) for address resolution, opens interesting new problems. In this paper, we investigate the effects of using a more active DNS which, as an atypical centralized scheduler, applies some scheduling strategy in routing the requests to the most suitable Web server. Unlike traditional parallel/distributed systems in which a centralized scheduler has full control of the system, the DNS controls only a very small fraction of the requests reaching the multi-server Web site. This peculiarity, especially in the presence of highly skewed load, makes it very difficult to achieve acceptable load balancing and avoid overloading some Web server. This paper adapts traditional scheduling algorithms to the DNS, proposes new policies, a...
The complex dynamics of collaborative tagging
- In Proc. of ACM WWW
, 2007
"... The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including wheth ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from the social bookmarking site del.icio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for “popular ” sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.
Origin Authentication in Interdomain Routing
, 2003
"... Attacks against Internet routing are increasing in number and severity. Contributing greatly to these attacks is the absence of origin authentication: there is no way to validate claims of address ownership or location. The lack of such services enables not only attacks by malicious entities, but in ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
Attacks against Internet routing are increasing in number and severity. Contributing greatly to these attacks is the absence of origin authentication: there is no way to validate claims of address ownership or location. The lack of such services enables not only attacks by malicious entities, but indirectly allow seemingly inconsequential miconfigurations to disrupt large portions of the Internet. This paper considers the semantics, design, and costs of origin authentication in interdomain routing. We formalize the semantics of address delegation and use on the Internet, and develop and characterize broad classes of origin authentication proof systems. We estimate the address delegation graph representing the current use of IPv4 address space using available routing data. This effort reveals that current address delegation is dense and relatively static: as few as 16 entities perform 80% of the delegation on the Internet. We conclude by evaluating the proposed services via traced based simulation. Our simulation shows the enhanced proof systems can significantly reduce resource costs associated with origin authentication.
Characterizing User Access To Videos On The World Wide Web
- In Proceedings of MMCN
, 2000
"... Despite evidence of rising popularity of video on the web (or VOW), little is known about how users access video. However, such a characterization can greatly benefit the design of multimedia systems such as web video proxies and VOW servers. Hence, this paper presents an analysis of trace data obta ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
Despite evidence of rising popularity of video on the web (or VOW), little is known about how users access video. However, such a characterization can greatly benefit the design of multimedia systems such as web video proxies and VOW servers. Hence, this paper presents an analysis of trace data obtained from an ongoing VOW experiment in Luleå University of Technology, Sweden. This experiment is unique as video material is distributed over a high bandwidth network allowing users to make access decisions without the network being a major factor. Our analysis revealed a number of interesting discoveries regarding user VOW access. For example, accesses display high temporal locality: several requests for the same video title often occur within a short time span. Accesses also exhibited spatial locality of reference whereby a small number of machines accounted for a large number of overall requests. Another finding was a browsing pattern where users preview the initial portion of a video to find out if they are interested. If they like it, they continue watching, otherwise they halt it. This pattern suggests that caching the first several minutes of video data should prove effective. Lastly, the analysis shows that, contrary to previous studies, rankings of
Authorship Attribution with Support Vector Machines
- APPLIED INTELLIGENCE
, 2000
"... In this paper we explore the use of text-mining methods for the identification of the author of a text. For the first time we apply the support vector machine (SVM) to this problem. As it is able to cope with half a million of inputs it requires no feature selection and can process the frequency v ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
In this paper we explore the use of text-mining methods for the identification of the author of a text. For the first time we apply the support vector machine (SVM) to this problem. As it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of a text. We performed a number of experiments with texts from a German newspaper. With nearly perfect reliability the SVM was able to reject other authors and detected the target author in 60-80% of the cases. In a second experiment we ignored nouns, verbs and adjectives and replaced them by grammatical tags and bigrams. This resulted in slightly reduced performance. Author detection with SVM on full word forms was remarkably robust even if the author wrote about different topics.
Continuous Media Sharing in Multimedia Database Systems
, 1995
"... The timeliness and synchronization requirement of multimedia data demands efficient buffer management and disk access schemes for multimedia database systems (MMDBS). The data rates involved are also very high and despite the development of efficient storage and retrieval strategies, disk I/O is lik ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
The timeliness and synchronization requirement of multimedia data demands efficient buffer management and disk access schemes for multimedia database systems (MMDBS). The data rates involved are also very high and despite the development of efficient storage and retrieval strategies, disk I/O is likely to be a bottleneck, thereby limiting the number of concurrent sessions supported by a system. This calls for better use of data that has already been brought into the buffer by exploiting sharing whenever possible using advance knowledge of the multimedia stream to be accessed. This paper introduces the notion of continuous media caching which is a simple and novel technique where buffers that have been played back by a user are preserved in a controlled fashion for use by subsequent users requesting the same data. This is shown to have considerable impact on the performance of buffer management schemes. When continuous media sharing is used in conjunction with batching of user requests...
Traffic Analysis of a Web Proxy Caching Hierarchy
- IEEE Network, special issue on Web performance
, 2000
"... 1 Introduction The World-Wide Web (WWW or Web) has experienced phenomenal growth in recent years. This growth of the Web has contributed significantly to the network traffic on the Internet, and motivated much research into improving the performance and scalability of the Web. In recent years, Web p ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
1 Introduction The World-Wide Web (WWW or Web) has experienced phenomenal growth in recent years. This growth of the Web has contributed significantly to the network traffic on the Internet, and motivated much research into improving the performance and scalability of the Web. In recent years, Web proxy caches have been deployed to reduce network traffic and provide better response time for Web accesses. A Web proxy consists of application level software that accepts document retrieval requests from a set of clients, forwards these requests to appropriate servers if the requested documents are not already present in the proxy's cache, and sends documents back to the clients. Originally, proxies were designed to allow network administrators to be able to control Internet access from within an Intranet [1]. It was recognized, however, that proxies may also serve as repositories for frequently requested documents. This role of proxies has made them very popular. Caching documents at the proxy can save network bandwidth and reduce network latency for document accesses [2].
Block Addressing Indices for Approximate Text Retrieval
- Journal of the American Society for Information Science (JASIS
, 1997
"... Although the issue of approximate text retrieval is gaining importance in the last years, it is currently addressed by only a few indexing schemes. To reduce space requirements, the indices may point to text blocks instead of exact word positions. This is called "block addressing". The most notoriou ..."
Abstract
-
Cited by 36 (22 self)
- Add to MetaCart
Although the issue of approximate text retrieval is gaining importance in the last years, it is currently addressed by only a few indexing schemes. To reduce space requirements, the indices may point to text blocks instead of exact word positions. This is called "block addressing". The most notorious index of this kind is Glimpse. However, block addressing has not been well studied yet, especially regarding approximate searching. Our main contribution is an analytical study of the spacetime trade-offs related to the block size. We find that, under reasonable assumptions, it is possible to build an index which is simultaneously sublinear in space overhead and in query time. We validate the analysis with extensive experiments, obtaining typical performance figures. These results are valid not only for approximate searching queries but also for classical ones. Finally, we propose a new strategy for approximate searching on block addressing indices, which we experimentally find 4-5 times f...
Large Text Searching Allowing Errors
, 1997
"... . We present a full inverted index for exact and approximate string matching in large texts. The index is composed of a table containing the vocabulary of words of the text and a list of positions in the text corresponding to each word. The size of the table of words is usually much less than 1% of ..."
Abstract
-
Cited by 35 (17 self)
- Add to MetaCart
. We present a full inverted index for exact and approximate string matching in large texts. The index is composed of a table containing the vocabulary of words of the text and a list of positions in the text corresponding to each word. The size of the table of words is usually much less than 1% of the text size and hence can be kept in main memory, where most query processing takes place. The text, on the other hand, is not accessed at all. The algorithm permits a large number of variations of the exact and approximate string search problem, such as phrases, string matching with sets of characters (range and arbitrary set of characters, complements, wild cards), approximate search with nonuniform costs and arbitrary regular expressions. The whole index can be built in linear time, in a single sequential pass over the text, takes near 1=3 the space of the text, and retrieval times are near O( p n) for typical cases. Experimental results show that the algorithm works well in practice...
Broadcast on Demand: Efficient and Timely Dissemination . . .
- IN PROCEEDINGS OF 3 RD IEEE REAL-TIME TECHNOLOGY APPLICATION SYMPOSIUM
, 1997
"... The demand for efficient, scalable and cost effective mobile information access systems is rapidly growing. Radio frequency broadcast plays a major role in mobile computing, and there is a need to provide service models for broadcasting information according to mobile users' needs. In this paper we ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
The demand for efficient, scalable and cost effective mobile information access systems is rapidly growing. Radio frequency broadcast plays a major role in mobile computing, and there is a need to provide service models for broadcasting information according to mobile users' needs. In this paper we present a model called Broadcast on Demand (BoD), which provides timely broadcasts according to requests from users. Compared to static broadcast, this approach has a different emphasis: it is based on a demand driven framework, aimed at satisfying the temporal constraints of the requests, and uses scheduling techniques at the server side to utilize the limited bandwidth dynamically and efficiently. In this paper, several broadcast transmission scheduling policies for BoD are examined. The study indicates that EDF-based policies combined with batching of requests achieve good performance. The results show that BoD is successful in satisfying the temporal constraints of the requests and is a viable service model for wireless broadcast stations.

