Results 1 - 10
of
27
Automatic reverse engineering of malware emulators
- In Proceedings of the IEEE Symposium on Security and Privacy
, 2009
"... Malware authors have recently begun using emulation technology to obfuscate their code. They convert native malware binaries into bytecode programs written in a randomly generated instruction set and paired with a native binary emulator that interprets the bytecode. No existing malware analysis can ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Malware authors have recently begun using emulation technology to obfuscate their code. They convert native malware binaries into bytecode programs written in a randomly generated instruction set and paired with a native binary emulator that interprets the bytecode. No existing malware analysis can reliably reverse this obfuscation technique. In this paper, we present the first work in automatic reverse engineering of malware emulators. Our algorithms are based on dynamic analysis. We execute the emulated malware in a protected environment and record the entire x86 instruction trace generated by the emulator. We then use dynamic data-flow and taint analysis over the trace to identify data buffers containing the bytecode program and extract the syntactic and semantic information about the bytecode instruction set. With these analysis outputs, we are able to generate data structures, such as control-flow graphs, that provide the foundation for subsequent malware analysis. We implemented a proof-of-concept system called Rotalumé and evaluated it using both legitimate programs and malware emulated by VMProtect and Code Virtualizer. The results show that Rotalumé accurately reveals the syntax and semantics of emulated instruction sets and reconstructs execution paths of original programs from their bytecode representations. 1.
Reformat: Automatic Reverse Engineering of Encrypted Messages
, 2008
"... Automatic protocol reverse engineering has recently received significant attention due to its importance to many security applications. However, previous methods are all limited in analyzing only plain-text communications wherein the exchanged messages are not encrypted. In this paper, we propose Re ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Automatic protocol reverse engineering has recently received significant attention due to its importance to many security applications. However, previous methods are all limited in analyzing only plain-text communications wherein the exchanged messages are not encrypted. In this paper, we propose ReFormat, a system that aims at deriving the message format even when the message is encrypted. Our approach is based on the observation that an encrypted input message will typically go through two phases: message decryption and normal protocol processing. These two phases can be differentiated because the corresponding instructions are significantly different. Further, with the help of data lifetime analysis of run-time buffers, we can pinpoint the memory locations that contain the decrypted message generated from the first phase and are later accessed in the second phase. We have developed a prototype and evaluated it with several real-world protocols. Our experiments show that ReFormat can accurately identify decrypted message buffers and then reveal the associated message structure.
Deriving Input Syntactic Structure from Execution
, 2008
"... Program input syntactic structure is essential for a wide range of applications such as test case generation, software debugging and network security. However, such important information is often not available (e.g., most malware programs make use of secret protocols to communicate) or not directly ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Program input syntactic structure is essential for a wide range of applications such as test case generation, software debugging and network security. However, such important information is often not available (e.g., most malware programs make use of secret protocols to communicate) or not directly usable by machines (e.g., many programs specify their inputs in plain text or other random formats). Furthermore, many programs claim they accept inputs with a published format, but their implementations actually support a subset or a variant. Based on the observations that input structure is manifested by the way input symbols are used during execution and most programs take input with top-down or bottom-up grammars, we devise two dynamic analyses, one for each grammar category. Our evaluation on a set of real-world programs shows that our technique is able to precisely reverse engineer input syntactic structure from execution.
TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection
"... Abstract—Fuzz testing has proven successful in finding security vulnerabilities in large programs. However, traditional fuzz testing tools have a well-known common drawback: they are ineffective if most generated malformed inputs are rejected in the early stage of program running, especially when ta ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract—Fuzz testing has proven successful in finding security vulnerabilities in large programs. However, traditional fuzz testing tools have a well-known common drawback: they are ineffective if most generated malformed inputs are rejected in the early stage of program running, especially when target programs employ checksum mechanisms to verify the integrity of inputs. In this paper, we present TaintScope, an automatic fuzzing system using dynamic taint analysis and symbolic execution techniques, to tackle the above problem. TaintScope has several novel contributions: 1) TaintScope is the first checksum-aware fuzzing tool to the best of our knowledge. It can identify checksum fields in input instances, accurately locate checksum-based integrity checks by using branch profiling techniques, and bypass such checks via control flow alteration. 2) TaintScope is a directed fuzzing tool working at X86 binary
Automatic Reverse Engineering of Data Structures from Binary Execution
"... With only the binary executable of a program, it is useful to discover the program’s data structures and infer their syntactic and semantic definitions. Such knowledge is highly valuable in a variety of security and forensic applications. Although there exist efforts in program data structure infere ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
With only the binary executable of a program, it is useful to discover the program’s data structures and infer their syntactic and semantic definitions. Such knowledge is highly valuable in a variety of security and forensic applications. Although there exist efforts in program data structure inference, the existing solutions are not suitable for our targeted application scenarios. In this paper, we propose a reverse engineering technique to automatically reveal program data structures from binaries. Our technique, called REWARDS, is based on dynamic analysis. More specifically, each memory location accessed by the program is tagged with a timestamped type attribute. Following the program’s runtime data flow, this attribute is propagated to other memory locations and registers that share the same type. During the propagation, a variable’s type gets resolved if it is involved in a type-revealing execution point or “type sink”. More importantly, besides the forward type propagation, REWARDS involves a backward type resolution procedure where the types of some previously accessed variables get recursively resolved starting from a type sink. This procedure is constrained by the timestamps of relevant memory locations to disambiguate variables reusing the same memory location. In addition, REWARDS is able to reconstruct in-memory data structure layout based on the type information derived. We demonstrate that REWARDS provides unique benefits to two applications: memory image forensics and binary fuzzing for vulnerability discovery. 1
Glasnost: Enabling End Users to Detect Traffic Differentiation
"... Holding residential ISPs to their contractual or legal obligations of “unlimited service ” or “network neutrality” is hard because their traffic management policies are opaque to end users and governmental regulatory agencies. We have built and deployed Glasnost, a system that improves network trans ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Holding residential ISPs to their contractual or legal obligations of “unlimited service ” or “network neutrality” is hard because their traffic management policies are opaque to end users and governmental regulatory agencies. We have built and deployed Glasnost, a system that improves network transparency by enabling ordinary Internet users to detect whether their ISPs are differentiating between flows of specific applications. We identify three key challenges in designing such a system: (a) to attract many users, the system must have low barrier of use and generate results in a timely manner, (b) the results must be robust to measurement noise and avoid false accusations of differentiation, which can adversely affect ISPs ’ reputation and business, (c) the system must include mechanisms to keep it up-to-date with the continuously changing differentiation policies of ISPs worldwide. We describe how Glasnost addresses each of these challenges. Glasnost has been operational for over a year. More than 350,000 users from over 5,800 ISPs worldwide have used Glasnost to detect differentiation, validating many of our design choices. We show how data from individual Glasnost users can be aggregated to provide regulators and monitors with useful information on ISP-wide deployment of various differentiation policies. 1
Automatically Identifying Critical Input Regions and Code in Applications
"... Applications that process complex inputs often react in different ways to changes in different regions of the input. Small changes to forgiving regions induce correspondingly small changes in the behavior and output. Small changes to critical regions, on the other hand, can induce disproportionally ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Applications that process complex inputs often react in different ways to changes in different regions of the input. Small changes to forgiving regions induce correspondingly small changes in the behavior and output. Small changes to critical regions, on the other hand, can induce disproportionally large changes in the behavior or output. Identifying the critical and forgiving regions in the input and the corresponding critical and forgiving regions of code is directly relevant to many software engineering tasks. We present a system, Snap, for automatically grouping related input bytes into fields and classifying each field and corresponding regions of code as critical or forgiving. Given an application and one or more inputs, Snap uses targeted input fuzzing in combination with dynamic execution and influence tracing to classify regions of input fields and code as critical or forgiving. Our experimental evaluation shows that Snap makes classifications with close to perfect precision (99%) and very good recall (between 99 % and 73%, depending on the application).
Forensic Triage for Mobile Phones with DEC0DE
"... We present DEC0DE, a system for recovering information from phones with unknown storage formats, a critical problem for forensic triage. Because phones have myriad custom hardware and software, we examine only the stored data. Via flexible descriptions of typical data structures, and using a classic ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present DEC0DE, a system for recovering information from phones with unknown storage formats, a critical problem for forensic triage. Because phones have myriad custom hardware and software, we examine only the stored data. Via flexible descriptions of typical data structures, and using a classic dynamic programming algorithm, we are able to identify call logs and address book entries in phones across varied models and manufacturers. We designed DEC0DE by examining the formats of one set of phone models, and we evaluate its performance on other models. Overall, we are able to obtain high performance for these unexamined models: an average recall of 97 % and precision of 80 % for call logs; and average recall of 93 % and precision of 52 % for address books. Moreover, at the expense of recall dropping to 14%, we can increase precision of address book recovery to 94 % by culling results that don’t match between call logs and address book entries on the same phone. 1
Reverse Engineering of Protocols from Network Traces
"... Abstract—Communication protocols determine how network components interact with each other. Therefore, the ability to derive a specification of a protocol can be useful in various contexts, such as to support deeper black-box testing or effective defense mechanisms. Unfortunately, it is often hard t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Communication protocols determine how network components interact with each other. Therefore, the ability to derive a specification of a protocol can be useful in various contexts, such as to support deeper black-box testing or effective defense mechanisms. Unfortunately, it is often hard to obtain the specification because systems implement closed (i.e., undocumented) protocols, or because a time consuming translation has to be performed, from the textual description of the protocol to a format readable by the tools. To address these issues, we propose a new methodology to automatically infer a specification of a protocol from network traces, which generates automata for the protocol language and state machine. Since our solution only resorts to interaction samples of the protocol, it is well-suited to uncover the message formats and protocol states of closed protocols and also to automate most of the process of specifying open protocols. The approach was implemented in a tool and experimentally evaluated with publicly available FTP traces. Our results show that the inferred specification is a good approximation of the reference specification, exhibiting a high level of precision and recall. I.
Polymorphing Software by Randomizing Data Structure Layout
"... Abstract. This paper introduces a new software polymorphism technique that randomizes program data structure layout. This technique will generate different data structure layouts for a program and thus diversify the binary code compiled from the same program source code. This technique can mitigate ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. This paper introduces a new software polymorphism technique that randomizes program data structure layout. This technique will generate different data structure layouts for a program and thus diversify the binary code compiled from the same program source code. This technique can mitigate attacks (e.g., kernel rootkit attacks) that require knowledge about data structure definitions. It is also able to disrupt the generation of data structure-based program signatures. We have implemented our data structure layout randomization technique in the open source compiler collection gcc-4.2.4 and applied it to a number of programs. Our evaluation results show that our technique is able to achieve software binary diversity. We also apply the technique to one operating system data structure in order to foil a number of kernel rootkit attacks. Meanwhile, programs produced by the technique were analyzed by a state-of-the-art data structure inference system and it was demonstrated that reliance on data structure signatures alone may lead to false negatives in malware detection. 1

