Results 1 -
8 of
8
Modeling and caching of peer-to-peer traffic
- in Proc. of IEEE ICNP
, 2006
"... Abstract — Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. We explore the potential of deploying proxy caches in different Autonomous Systems (ASes) with the goal of reducing the cost incurred by Intern ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Abstract — Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. We explore the potential of deploying proxy caches in different Autonomous Systems (ASes) with the goal of reducing the cost incurred by Internet service providers and alleviating the load on the Internet backbone. We conduct a measurement study to model the popularity of P2P objects in different ASes. Our study shows that the popularity of P2P objects can be modeled by a Mandelbrot-Zipf distribution, regardless of the AS. Guided by our findings, we develop a novel caching algorithm for P2P traffic that is based on object segmentation, and partial admission and eviction of objects. Our trace-based simulations show that with a relatively small cache size, less than 10 % of the total traffic, a byte hit rate of up to 35 % can be achieved by our algorithm, which is close to the byte hit rate achieved by an off-line optimal algorithm with complete knowledge of future requests. Our results also show that our algorithm achieves a byte hit rate that is at least 40% more, and at most triple, the byte hit rate of the common web caching algorithms. Furthermore, our algorithm is robust in face of aborted downloads, which is a common case in P2P systems. I.
Extension of Zipf's Law to Words and Phrases
- Proceedings of the 19th International Conference on Computational Linguistics (COLING
, 2002
"... Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for ngram word phrases as well as for single words. The law for single words is shown to be valid ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for ngram word phrases as well as for single words. The law for single words is shown to be valid only for high frequency words.
Traffic modeling and proportional partial caching for peer-to-peer systems
- IEEE/ACM Transactions on Networking
"... Abstract—Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. We explore the potential of deploying proxy caches in different Autonomous Systems (ASes) with the goal of reducing the cost incurred by Internet ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Abstract—Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. We explore the potential of deploying proxy caches in different Autonomous Systems (ASes) with the goal of reducing the cost incurred by Internet service providers and alleviating the load on the Internet backbone. We conduct an eight-month measurement study to analyze the P2P traffic characteristics that are relevant to caching, such as object popularity, popularity dynamics, and object size. Our study shows that the popularity of P2P objects can be modeled by a Mandelbrot–Zipf distribution, and that several workloads exist in P2P traffic. Guided by our findings, we develop a novel caching algorithm for P2P traffic that is based on object segmentation, and proportional partial admission and eviction of objects. Our trace-based simulations show that with a relatively small cache size, a byte hit rate of up to 35 % can be achieved by our algorithm, which is close to the byte hit rate achieved by an off-line optimal algorithm with complete knowledge of future requests. Our results also show that our algorithm achieves a byte hit rate that is at least 40% more, and at most triple, the byte hit rate of the common web caching algorithms. Furthermore, our algorithm is robust in face of aborted downloads, which is a common case in P2P systems. Index Terms—Internet measurement, network protocols, peer-to-peer systems, traffic modeling, traffic analysis.
Zipf and Type-Token rules for the English and Irish languages
, 2004
"... The Zipf curve of log of frequency against log of rank for a large English corpus of 500 million word tokens and 689,000 word types is shown to have the usual slope close to –1 for rank less than 5,000, but then for a higher rank it turns to give a slope close to –2. This is apparently mainly due to ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The Zipf curve of log of frequency against log of rank for a large English corpus of 500 million word tokens and 689,000 word types is shown to have the usual slope close to –1 for rank less than 5,000, but then for a higher rank it turns to give a slope close to –2. This is apparently mainly due to foreign words and place names. The Zipf curve for a highly-inflected language (the Indo-European Celtic language, Irish) is also given. Because of the larger number of word types per lemma, it remains flatter than the English curve maintaining a slope of –1 until a turning point of about rank 30,000. A formula which calculates the number of tokens given the number of types is derived in terms of the rank at the turning point, 5,000 for English and 30,000 for Irish.
Log-Linear Interpolation of Language Models
, 2000
"... Building probabilistic models of language is a central task in natural language and speech processing allowing to integrate the syntactic and/or semantic (and recently pragmatic) constraints of the language into the systems. Probabilistic language models are an attractive alternative to the more tra ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Building probabilistic models of language is a central task in natural language and speech processing allowing to integrate the syntactic and/or semantic (and recently pragmatic) constraints of the language into the systems. Probabilistic language models are an attractive alternative to the more traditional rule-based systems, such as context free grammars, because of the recent availability of massive amount of text corpora which can be used to e#ciently train the models and because instead of binary grammaticality judgement o#ered by the rule-based systems, likelihood of any sequence of lexical units can be obtained, which is a crucial factor in such tasks as speech recognition. Probabilistic language models also find their application in part-of-speech tagging, machine translation, semantic disambiguation and numerous other fields.
Assessment of a modern Farsi corpus
- Proceedings of The 2nd Workshop on Information Technology & its Disciplines (WITID), ITRC
, 2004
"... The development of Language Engineering (LE) and Information Retrieval (IR) applications requires availability of sizeable, reliable and representative corpora. This paper describes how we have constructed a well-structured 345 MB tagged corpus of news, and presents some beneficial statistics of thi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The development of Language Engineering (LE) and Information Retrieval (IR) applications requires availability of sizeable, reliable and representative corpora. This paper describes how we have constructed a well-structured 345 MB tagged corpus of news, and presents some beneficial statistics of this corpus based upon the characteristics of Farsi language. It also goes into particular detail on the fitness of the frequency and rank of Farsi words with Zipf-Mandelbrot’s law. We will then present our measurement of Entropy of Farsi for this corpus.
On the Benefits of Cooperative Proxy Caching for Peer-to-Peer Traffic
"... Abstract—This paper analyzes the potential of cooperative proxy caching for P2P traffic as a means to ease the burden imposed by P2P traffic on ISPs. In particular, we propose two models for cooperative caching of P2P traffic. The first model enables cooperation among caches that belong to different ..."
Abstract
- Add to MetaCart
Abstract—This paper analyzes the potential of cooperative proxy caching for P2P traffic as a means to ease the burden imposed by P2P traffic on ISPs. In particular, we propose two models for cooperative caching of P2P traffic. The first model enables cooperation among caches that belong to different autonomous systems (ASes), while the second considers cooperation among caches deployed within the same AS. We analyze the potential gain of cooperative caching in these two models. To perform this analysis, we conduct an eight-month measurement study on a popular P2P system to collect traffic traces for multiple caches. Then, we perform extensive tracebased simulations to analyze different angles of cooperative caching schemes. Our results demonstrate that: (i) significant improvement in byte hit rate can be achieved using cooperative caching, (ii) simple object replacement policies are sufficient to achieve that gain, and (iii) the overhead imposed by cooperative caching is negligible. In addition, we develop an analytic model to assess the gain from cooperative caching in different settings. The model accounts for number of caches, salient P2P traffic features, and network characteristics. Our model confirms that substantial gains from cooperative caching are attainable under wide ranges of traffic and network characteristics.
Towards Robust and Scalable Trust Metrics
"... It is an equal failing to trust everybody and to trust nobody. English proverb We describe a distributed and scalable trust metric for networks where transactions occur under a model of preferential attachment. Our trust metric algorithm, which we call expert voting is very simple. For a network ove ..."
Abstract
- Add to MetaCart
It is an equal failing to trust everybody and to trust nobody. English proverb We describe a distributed and scalable trust metric for networks where transactions occur under a model of preferential attachment. Our trust metric algorithm, which we call expert voting is very simple. For a network over nodes, the algorithm always considers only the opinions of the first nodes to join the network; we call these nodes experts. For any node, the algorithm evaluates the trustworthiness of based on the opinions of those experts which have had transactions with. Empirical results suggest that this simple algorithm is surprisingly robust for large scale networks where transactions occur under a model of preferential attachment. To the best of our knowledge, this is the first algorithm that exploits a model of preferential attachment. 1

