Results 1 - 10
of
52
A large-scale study of the evolution of web pages
- In Proceedings of the 12th International World Wide Web Conference
, 2003
"... How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who m ..."
Abstract
-
Cited by 144 (5 self)
- Add to MetaCart
How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them. One notable exception is a study by Cho and Garcia-Molina, who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40 % of all web pages in their set changed within a week, and 23 % of those pages that fell into the.com domain changed daily. This paper expands on Cho and Garcia-Molina’s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of 11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1 % of all of our URLs, and saved the full text of each download of the corresponding pages. After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with change intensity. We found that the average degree of change varies widely across top-level domains, and that larger pages change more often and more severely than smaller ones. This paper describes the crawl and the data transformations we performed on the logs, and presents some statistical observations on the degree of change of different classes of pages.
Separating agreement from execution for byzantine fault tolerant services
- IN PROC. SOSP
, 2003
"... We describe a new architecture for Byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests. This separation yields two fundamental and practically significant advantages over previous architectures. First, it reduces rep ..."
Abstract
-
Cited by 110 (15 self)
- Add to MetaCart
We describe a new architecture for Byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests. This separation yields two fundamental and practically significant advantages over previous architectures. First, it reduces replication costs because the new architecture can tolerate faults in up to half of the state machine replicas that execute requests. Previous systems can tolerate faults in at most a third of the combined agreement/state machine replicas. Second, separating agreement from execution allows a general privacy firewall architecture to protect confidentiality through replication. In contrast, replication in previous systems hurts confidentiality because exploiting the weakest replica can be su#cient to compromise the system. We have constructed a prototype and evaluated it running both microbenchmarks and an NFS server. Overall, we find that the architecture adds modest latencies to unreplicated systems and that its performance is competitive with existing Byzantine fault tolerant systems.
A High-level Programming Environment for Packet Trace Anonymization and Transformation
- In Proceedings of the ACM SIGCOMM Conference
, 2003
"... Packet traces of operational Internet traffic are invaluable to network research, but public sharing of such traces is severely limited by the need to first remove all sensitive information. Current trace anonymization technology leaves only the packet headers intact, completely stripping the conten ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
Packet traces of operational Internet traffic are invaluable to network research, but public sharing of such traces is severely limited by the need to first remove all sensitive information. Current trace anonymization technology leaves only the packet headers intact, completely stripping the contents; to our knowledge, there are no publicly available traces of any significant size that contain packet payloads. We describe a new approach to transform and anonymize packet traces. Our tool provides high-level language support for packet transformation, allowing the user to write short policy scripts to express sophisticated trace transformations. The resulting scripts can anonymize both packet headers and payloads, and can perform application-level transformations such as editing HTTP or SMTP headers, replacing the content of Web items with MD5 hashes, or altering filenames or reply codes that match given patterns. We discuss the critical issue of verifying that anonymizations are both correctly applied and correctly specified, and experiences with anonymizing FTP traces from the Lawrence Berkeley National Laboratory for public release.
The Devil and Packet Trace Anonymization
, 2006
"... Releasing network measurement data---including packet traces--- to the research community is a virtuous activity that promotes solid research. However, in practice, releasing anonymized packet traces for public use entails many more vexing considerations than just the usual notion of how to scramble ..."
Abstract
-
Cited by 64 (3 self)
- Add to MetaCart
Releasing network measurement data---including packet traces--- to the research community is a virtuous activity that promotes solid research. However, in practice, releasing anonymized packet traces for public use entails many more vexing considerations than just the usual notion of how to scramble IP addresses to preserve privacy. Publishing traces requires carefully balancing the security needs of the organization providing the trace with the research usefulness of the anonymized trace. In this paper we recount our experiences in (i) securing permission from a large site to release packet header traces of the site's internal traffic, (ii) implementing the corresponding anonymization policy, and (iii) validating its correctness. We present a general tool, tcpmkpub, for anonymizing traces, discuss the process used to determine the particular anonymization policy, and describe the use of meta-data accompanying the traces to provide insight into features that have been obfuscated by anonymization.
On flow correlation attacks and countermeasures in mix networks
- in Proceedings of Privacy Enhancing Technologies workshop
, 2004
"... Abstract. In this paper, we address issues related to flow correlation attacks and the corresponding countermeasures in mix networks. Mixes have been used in many anonymous communication systems and are supposed to provide countermeasures that can defeat various traffic analysis attacks. In this pap ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
Abstract. In this paper, we address issues related to flow correlation attacks and the corresponding countermeasures in mix networks. Mixes have been used in many anonymous communication systems and are supposed to provide countermeasures that can defeat various traffic analysis attacks. In this paper, we focus on a particular class of traffic analysis attack, flow correlation attacks, by which an adversary attempts to analyze the network traffic and correlate the traffic of a flow over an input link at a mix with that over an output link of the same mix. Two classes of correlation methods are considered, namely time-domain methods and frequency-domain methods. Based on our threat model and known strategies in existing mix networks, we perform extensive experiments to analyze the performance of mixes. We find that a mix with any known batching strategy may fail against flow correlation attacks in the sense that for a given flow over an input link, the adversary can correctly determine which output link is used by the same flow. We also investigated methods that can effectively counter the flow correlation attack and other timing attacks. The empirical results provided in this paper give an indication to designers of Mix networks about appropriate configurations and alternative mechanisms to be used to counter flow correlation attacks. 1
Improving wireless privacy with an identifier-free link layer protocol
- In MobiSys ’08: 6th International Conference on Mobile Systems, Applications, and Services
, 2008
"... We present the design and evaluation of an 802.11-like wireless link layer protocol that obfuscates all transmitted bits to increase privacy. This includes explicit identifiers such as MAC addresses, the contents of management messages, and other protocol fields that the existing 802.11 protocol rel ..."
Abstract
-
Cited by 30 (10 self)
- Add to MetaCart
We present the design and evaluation of an 802.11-like wireless link layer protocol that obfuscates all transmitted bits to increase privacy. This includes explicit identifiers such as MAC addresses, the contents of management messages, and other protocol fields that the existing 802.11 protocol relies on to be sent in the clear. By obscuring these fields, we greatly increase the difficulty of identifying or profiling users from their transmissions in ways that are otherwise straightforward. Our design, called SlyFi, is nearly as efficient as existing schemes such as WPA for discovery, link setup, and data delivery despite its heightened protections; transmission requires only symmetric key encryption and reception requires a table lookup followed by symmetric key decryption. Experiments using our implementation on Atheros 802.11 drivers show that SlyFi can discover and associate with networks faster than 802.11 using WPA-PSK. The overhead SlyFi introduces in packet delivery is only slightly higher than that added by WPA-CCMP encryption (10 % vs. 3 % decrease in throughput).
802.11 user fingerprinting
- In MobiCom ’07: Proceedings of the 13th Annual ACM International Conference on Mobile Computing and Networking
, 2007
"... The ubiquity of 802.11 devices and networks enables anyone to track our every move with alarming ease. Each 802.11 device transmits a globally unique and persistent MAC address and thus is trivially identifiable. In response, recent research has proposed replacing such identifiers with pseudonyms (i ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
The ubiquity of 802.11 devices and networks enables anyone to track our every move with alarming ease. Each 802.11 device transmits a globally unique and persistent MAC address and thus is trivially identifiable. In response, recent research has proposed replacing such identifiers with pseudonyms (i.e., temporary, unlinkable names). In this paper, we demonstrate that pseudonyms are insufficient to prevent tracking of 802.11 devices because implicit identifiers, or identifying characteristics of 802.11 traffic, can identify many users with high accuracy. For example, even without unique names and addresses, we estimate that an adversary can identify 64 % of users with 90 % accuracy when they spend a day at a busy hot spot. We present an automated procedure based on four previously unrecognized implicit identifiers that can identify users in three real 802.11 traces even when pseudonyms and encryption are employed. We find that the majority of users can be identified using our techniques, but our ability to identify users is not uniform; some users are not easily identifiable. Nonetheless, we show that even a single implicit identifier is sufficient to distinguish many users. Therefore, we argue that design considerations beyond eliminating explicit identifiers (i.e., unique names and addresses), must be addressed in order to prevent user tracking in wireless networks. Categories and Subject Descriptors:
Privacy vulnerabilities in encrypted HTTP streams
- In Proceedings of Privacy Enhancing Technologies Workshop (PET 2005
, 2005
"... Abstract. Encrypting traffic does not prevent an attacker from performing some types of traffic analysis. We present a straightforward traffic analysis attack against encrypted HTTP streams that is surprisingly effective in identifying the source of the traffic. An attacker starts by creating a prof ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Abstract. Encrypting traffic does not prevent an attacker from performing some types of traffic analysis. We present a straightforward traffic analysis attack against encrypted HTTP streams that is surprisingly effective in identifying the source of the traffic. An attacker starts by creating a profile of the statistical characteristics of web requests from interesting sites, including distributions of packet sizes and inter-arrival times. Later, candidate encrypted streams are compared against these profiles. In our evaluations using real traffic, we find that many web sites are subject to this attack. With a training period of 24 hours and a 1 hour delay afterwards, the attack achieves only 23 % accuracy. However, an attacker can easily pre-determine which of trained sites are easily identifiable. Accordingly, against 25 such sites, the attack achieves 40% accuracy; with three guesses, the attack achieves 100 % accuracy for our data. Longer delays after training decrease accuracy, but not substantially. We also propose some countermeasures and improvements to our current method. Previous work analyzed SSL traffic to a proxy, taking advantage of a known flaw in SSL that reveals the length of each web object. In contrast, we exploit the statistical characteristics of web streams that are encrypted as a single flow, which is the case with WEP/WPA, IPsec, and SSH tunnels. 1
Language Identification of Encrypted VoIP Traffic: Alejandra y Roberto or Alice and Bob
- In Proceedings of the USENIX Security Symposium
, 2007
"... Voice over IP (VoIP) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP session should be encrypted. However, we demonstrate that curr ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Voice over IP (VoIP) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted VoIP packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted VoIP traffic. For instance, our 21-way classifier achieves 66 % accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., “Is this a Spanish or English conversation?”) rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in VoIP. 1
Network Flow Watermarking Attack on Low-Latency Anonymous Communication Systems
"... Many proposed low-latency anonymous communication systems have used various flow transformations such as traffic padding, adding cover traffic (or bogus packets), packet dropping, flow mixing, flow splitting, and flow merging to achieve anonymity. It has long been believed that these flow transforma ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Many proposed low-latency anonymous communication systems have used various flow transformations such as traffic padding, adding cover traffic (or bogus packets), packet dropping, flow mixing, flow splitting, and flow merging to achieve anonymity. It has long been believed that these flow transformations would effectively disguise network flows, thus achieve good anonymity. In this paper, we investigate the fundamental limitations of flow transformations in achieving anonymity, and we show that flow transformations do not necessarily provide the level of anonymity people have expected or believed. By injecting unique watermark into the inter-packet timing domain of a packet flow, we are able to make any sufficiently long flow uniquely identifiable even if 1) it is disguised by substantial amount of

