Results 1 - 10
of
20
D.: Polyglot: automatic extraction of protocol message format using dynamic binary analysis
- In: CCS ’07: Proceedings of the 14th ACM conference on Computer and communications security
, 2007
"... Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on netwo ..."
Abstract
-
Cited by 46 (8 self)
- Add to MetaCart
Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic analysis and is based on a unique intuition—the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-ofthe-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.
Discoverer: Automatic protocol reverse engineering from network traces
- In Proceedings of the 16th USENIX Security Symposium (Security’07
, 2007
"... Application-level protocol specifications are useful for many security applications, including intrusion prevention and detection that performs deep packet inspection and traffic normalization, and penetration testing that generates network inputs to an application to uncover potential vulnerabiliti ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
Application-level protocol specifications are useful for many security applications, including intrusion prevention and detection that performs deep packet inspection and traffic normalization, and penetration testing that generates network inputs to an application to uncover potential vulnerabilities. However, current practice in deriving protocol specifications is mostly manual. In this paper, we present Discoverer, a tool for automatically reverse engineering the protocol message formats of an application from its network trace. A key property of Discoverer is that it operates in a protocol-independent fashion by inferring protocol idioms commonly seen in message formats of many application-level protocols. We evaluated the efficacy of Discoverer over one text protocol (HTTP) and two binary protocols (RPC and CIFS/SMB) by comparing our inferred formats with true formats obtained from Ethereal [5]. For all three protocols, more than 90 % of our inferred formats correspond to exactly one true format; one true format is reflected in five inferred formats on average; our inferred formats cover over 95 % of messages, which belong to 30-40 % of true formats observed in the trace. 1
Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution
- IN 15TH SYMPOSIUM ON NETWORK AND DISTRIBUTED SYSTEM SECURITY (NDSS
, 2008
"... Protocol reverse engineering has often been a manual process that is considered time-consuming, tedious and error-prone. To address this limitation, a number of solutions have recently been proposed to allow for automatic protocol reverse engineering. Unfortunately, they are either limited in extrac ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
Protocol reverse engineering has often been a manual process that is considered time-consuming, tedious and error-prone. To address this limitation, a number of solutions have recently been proposed to allow for automatic protocol reverse engineering. Unfortunately, they are either limited in extracting protocol fields due to lack of program semantics in network traces or primitive in only revealing the flat structure of protocol format. In this paper, we present a system called AutoFormat that aims at not only extracting protocol fields with high accuracy, but also revealing the inherently “non-flat”, hierarchical structures of protocol messages. AutoFormat is based on the key insight that different protocol fields in the same message are typically handled in different execution contexts (e.g., the runtime call stack). As such, by monitoring the program execution, we can collect the execution context information for every message byte (annotated with its offset in the entire message) and cluster them to derive the protocol format. We have evaluated our system with more than 30 protocol messages from seven protocols, including two text-based protocols (HTTP and SIP), three binary-based protocols (DHCP, RIP, and OSPF), one hybrid protocol (CIFS/SMB), as well as one unknown protocol used by a real-world malware. Our results show that AutoFormat can not only identify individual message fields automatically and with high accuracy (an average 93.4 % match ratio compared with Wireshark), but also unveil the structure of the protocol format by revealing possible relations (e.g., sequential, parallel, and hierarchical) among the message fields.
Automatic Network Protocol Analysis
- Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS’08
, 2008
"... ..."
Tupni: Automatic Reverse Engineering of Input Formats
- In Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS
, 2008
"... Recent work has established the importance of automatic reverse engineering of protocol or file format specifications. However, the formats reverse engineered by previous tools have missed important information that is critical for security applications. In this paper, we present Tupni, a tool that ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Recent work has established the importance of automatic reverse engineering of protocol or file format specifications. However, the formats reverse engineered by previous tools have missed important information that is critical for security applications. In this paper, we present Tupni, a tool that can reverse engineer an input format with a rich set of information, including record sequences, record types, and input constraints. Tupni can generalize the format specification over multiple inputs. We have implemented a prototype of Tupni and evaluated it on 10 different formats: five file formats (WMF, BMP, JPG, PNG and TIF) and five network protocols (DNS, RPC, TFTP, HTTP and FTP). Tupni identified all record sequences in the test inputs. We also show that, by aggregating over multiple WMF files, Tupni can derive a more complete format specification for WMF. Furthermore, we demonstrate the utility of Tupni by using the rich information it provides for zeroday vulnerability signature generation, which was not possible with previous reverse engineering tools.
Dispatcher: Enabling active botnet infiltration using automatic protocol reverse-engineering
- In CCS’09: of the 16th ACM conference on Computer and communications security
, 2009
"... Automatic protocol reverse-engineering is important for many security applications, including the analysis and defense against botnets. Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of nefarious activity and to enable active botnet i ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Automatic protocol reverse-engineering is important for many security applications, including the analysis and defense against botnets. Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of nefarious activity and to enable active botnet infiltration. Frequently, security analysts need to rewrite messages sent and received by a bot in order to contain malicious activity and to provide the botmaster with an illusion of successful and unhampered operation. To enable such rewriting, we need detailed information about the intent and structure of the messages in both directions of the communication despite the fact that we generally only have access to the implementation of one endpoint, namely the bot binary. Current techniques cannot enable such rewriting. In this paper, we propose techniques to extract the format of protocol messages sent by an
GQ: Realizing a System to Catch Worms in a Quarter Million Places
, 2006
"... A key tool for detecting new worm outbreaks in their early stages is the honeyfarm, a large collection of honeypots fed Internet traffic by a “network telescope”. However, actual operation of a honeyfarm in a large-scale environment presents difficult scaling challenges. We discuss the structure and ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
A key tool for detecting new worm outbreaks in their early stages is the honeyfarm, a large collection of honeypots fed Internet traffic by a “network telescope”. However, actual operation of a honeyfarm in a large-scale environment presents difficult scaling challenges. We discuss the structure and implementation of GQ, a honeyfarm system we built to analyze in real-time the scanning probes seen on a quarter million Internet addresses. GQ’s architecture emphasizes high fidelity, scalability, isolation, stringent control, and wide coverage. We examine how the implementation endeavors to achieve each of these, evaluate its effectiveness in doing so, and report on preliminary experiences with operating the system at scale, during which we captured 66 distinct worms over the course of four months. 1
Reformat: Automatic Reverse Engineering of Encrypted Messages
, 2008
"... Automatic protocol reverse engineering has recently received significant attention due to its importance to many security applications. However, previous methods are all limited in analyzing only plain-text communications wherein the exchanged messages are not encrypted. In this paper, we propose Re ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Automatic protocol reverse engineering has recently received significant attention due to its importance to many security applications. However, previous methods are all limited in analyzing only plain-text communications wherein the exchanged messages are not encrypted. In this paper, we propose ReFormat, a system that aims at deriving the message format even when the message is encrypted. Our approach is based on the observation that an encrypted input message will typically go through two phases: message decryption and normal protocol processing. These two phases can be differentiated because the corresponding instructions are significantly different. Further, with the help of data lifetime analysis of run-time buffers, we can pinpoint the memory locations that contain the decrypted message generated from the first phase and are later accessed in the second phase. We have developed a prototype and evaluated it with several real-world protocols. Our experiments show that ReFormat can accurately identify decrypted message buffers and then reveal the associated message structure.
Traffic to Protocol Reverse Engineering
"... has played an increasing role in honeypot operations. It allows to automatically generate Statemodels and scripts being able to act as realistic counterpart for capturing unknown malware. This work proposes a novel approach in the field of NPRE. By passively listening to network traces, our system a ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
has played an increasing role in honeypot operations. It allows to automatically generate Statemodels and scripts being able to act as realistic counterpart for capturing unknown malware. This work proposes a novel approach in the field of NPRE. By passively listening to network traces, our system automatically derives the protocol state machines of the peers involved allowing the analyst to understand its intrinsic logic. We present a new methodology to extract the relevant fields from arbitrary binary protocols to construct a statemodel. We prove our methodology by deriving the statemachine of documented protocols ARP, DHCP and TCP. We then apply it to Kademlia, the results show the usefulness to support binary reverse engineering processes and detect a new undocumented feature. I.
Rosetta: Extracting Protocol Semantics using Binary Analysis with Applications to Protocol Replay and NAT Rewriting Abstract
, 2007
"... Rewriting a previously seen dialog between two entities, so that it is accepted by another entity, is important for many applications including: the protocol replay problem and the NAT rewriting problem. Both problems are instances of a larger problem that we call the dialog rewriting problem. The c ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Rewriting a previously seen dialog between two entities, so that it is accepted by another entity, is important for many applications including: the protocol replay problem and the NAT rewriting problem. Both problems are instances of a larger problem that we call the dialog rewriting problem. The challenge in dialog rewriting is that the dynamic fields, e.g., hostnames, IP addresses, session identifiers or timestamps, in the original dialog need to be rewritten for the modified dialog to succeed. This is particularly difficult because the protocol used in the original dialog might be unknown. In this paper, our goal is to generate a transformation function that can be used to rewrite the values of the dynamic fields. For this, we propose binary analysis techniques to solve the main two challenges: 1) how to automatically identify the dynamic fields, and 2) how to automatically rewrite the values in the dynamic fields. We have implemented Rosetta, a system that creates the transformation function using our proposed techniques. Our results show that we are able to identify different types of dynamic fields present in commonly used protocols such as FTP, DNS and ICQ, and that we are able to rewrite the values in the dynamic fields, even when those fields use complex encodings to represent the data, thus enabling the protocol replay and NAT rewriting problems. 1

