Results 1 - 10
of
55
Out of control: Overcoming control-flow integrity
- In IEEE Symposium on Security and Privacy
, 2014
"... Abstract—As existing defenses like ALSR, DEP, and stack cookies are not sufficient to stop determined attackers from exploiting our software, interest in Control Flow Integrity (CFI) is growing. In its ideal form, CFI prevents any flow of control that was not intended by the original program, effect ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
Abstract—As existing defenses like ALSR, DEP, and stack cookies are not sufficient to stop determined attackers from exploiting our software, interest in Control Flow Integrity (CFI) is growing. In its ideal form, CFI prevents any flow of control that was not intended by the original program, effectively putting a stop to exploitation based on return oriented programming (and many other attacks besides). Two main problems have prevented CFI from being deployed in practice. First, many CFI implementations require source code or debug information that is typically not available for commercial software. Second, in its ideal form, the technique is very expensive. It is for this reason that current research efforts focus on making CFI fast and practical. Specifically, much of the work on practical CFI is applicable to binaries, and improves performance by enforcing a looser notion of control flow integrity. In this paper, we examine the security implications of such looser notions of CFI: are they still able to prevent code reuse attacks, and if not, how hard is it to bypass its protection? Specifically, we show that with two new types of gadgets, return oriented programming is still possible. We assess the availability of our gadget sets, and demonstrate the practicality of these results with a practical exploit against Internet Explorer that bypasses modern CFI implementations. Keywords—Control-flow integrity evaluation, code-reuse attack I.
Enforcing Forward-Edge Control-Flow Integrity
- in GCC & LLVM. In 23rd USENIX Security Symposium (USENIX Security 14) (Aug. 2014), USENIX Association
"... Constraining dynamic control transfers is a common tech-nique for mitigating software vulnerabilities. This de-fense has been widely and successfully used to protect return addresses and stack data; hence, current attacks instead typically corrupt vtable and function pointers to subvert a forward ed ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
Constraining dynamic control transfers is a common tech-nique for mitigating software vulnerabilities. This de-fense has been widely and successfully used to protect return addresses and stack data; hence, current attacks instead typically corrupt vtable and function pointers to subvert a forward edge (an indirect jump or call) in the control-flow graph. Forward edges can be protected us-ing Control-Flow Integrity (CFI) but, to date, CFI im-plementations have been research prototypes, based on impractical assumptions or ad hoc, heuristic techniques. To be widely adoptable, CFI mechanisms must be inte-grated into production compilers and be compatible with software-engineering aspects such as incremental compi-lation and dynamic libraries. This paper presents implementations of fine-grained, forward-edge CFI enforcement and analysis for GCC and LLVM that meet the above requirements. An analysis and evaluation of the security, performance, and resource consumption of these mechanisms applied to the SPEC CPU2006 benchmarks and common benchmarks for the Chromium web browser show the practicality of our ap-proach: these fine-grained CFI mechanisms have signif-icantly lower overhead than recent academic CFI proto-types. Implementing CFI in industrial compiler frame-works has also led to insights into design tradeoffs and practical challenges, such as dynamic loading. 1
SoK: Automated Software Diversity
"... Abstract—The idea of automatic software diversity is at least two decades old. The deficiencies of currently deployed defenses and the transition to online software distribution (the “App store ” model) for traditional and mobile computers has revived the interest in automatic software diversity. Co ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
Abstract—The idea of automatic software diversity is at least two decades old. The deficiencies of currently deployed defenses and the transition to online software distribution (the “App store ” model) for traditional and mobile computers has revived the interest in automatic software diversity. Consequently, the literature on diversity grew by more than two dozen papers since 2008. Diversity offers several unique properties. Unlike other de-fenses, it introduces uncertainty in the target. Precise knowledge of the target software provides the underpinning for a wide range of attacks. This makes diversity a broad rather than narrowly focused defense mechanism. Second, diversity offers probabilistic protection similar to cryptography—attacks may succeed by chance so implementations must offer high entropy. Finally, the design space of diversifying program transformations is large. As a result, researchers have proposed multiple approaches to software diversity that vary with respect to threat models, security, performance, and practicality. In this paper, we systematically study the state-of-the-art in software diversity and highlight fundamental trade-offs between fully automated approaches. We also point to open areas and unresolved challenges. These include “hybrid solutions”, error reporting, patching, and implementation disclosure attacks on diversified software. I.
Modular Control-Flow Integrity
- In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014), PLDI ’14, ACM
"... Control-Flow Integrity (CFI) is a software-hardening technique. It inlines checks into a program so that its execution always follows a predetermined Control-Flow Graph (CFG). As a result, CFI is ef-fective at preventing control-flow hijacking attacks. However, past fine-grained CFI implementations ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
(Show Context)
Control-Flow Integrity (CFI) is a software-hardening technique. It inlines checks into a program so that its execution always follows a predetermined Control-Flow Graph (CFG). As a result, CFI is ef-fective at preventing control-flow hijacking attacks. However, past fine-grained CFI implementations do not support separate compi-lation, which hinders its adoption. We present Modular Control-Flow Integrity (MCFI), a new CFI technique that supports separate compilation. MCFI allows mod-ules to be independently instrumented and linked statically or dy-namically. The combined module enforces a CFG that is a combi-nation of the individual modules ’ CFGs. One challenge in support-ing dynamic linking in multithreaded code is how to ensure a safe transition from the old CFG to the new CFG when libraries are dy-namically linked. The key technique we use is to have the CFG rep-resented in a runtime data structure and have reads and updates of the data structure wrapped in transactions to ensure thread safety. Our evaluation on SPECCPU2006 benchmarks shows that MCFI supports separate compilation, incurs low overhead of around 5%, and enhances security.
SAFEDISPATCH: Securing C++ Virtual Calls from Memory Corruption Attacks
"... Abstract—Several defenses have increased the cost of tradi-tional, low-level attacks that corrupt control data, e.g. return addresses saved on the stack, to compromise program execution. In response, creative adversaries have begun circumventing these defenses by exploiting programming errors to man ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Several defenses have increased the cost of tradi-tional, low-level attacks that corrupt control data, e.g. return addresses saved on the stack, to compromise program execution. In response, creative adversaries have begun circumventing these defenses by exploiting programming errors to manipulate pointers to virtual tables, or vtables, of C++ objects. These attacks can hijack program control flow whenever a virtual method of a corrupted object is called, potentially allowing the attacker to gain complete control of the underlying system. In this paper we present SAFEDISPATCH, a novel defense to prevent such vtable hijacking by statically analyzing C++ programs and inserting sufficient runtime checks to ensure that control flow at virtual method call sites cannot be arbitrarily influenced by an attacker. We implemented SAFEDISPATCH as a Clang++/LLVM extension, used our enhanced compiler to build a vtable-safe version of the Google Chromium browser, and measured the performance overhead of our approach on popular browser benchmark suites. By carefully crafting a handful of optimizations, we were able to reduce average runtime overhead to just 2.1%. I.
Code-Pointer Integrity
, 2014
"... Systems code is often written in low-level languages like C/C++, which offer many benefits but also dele-gate memory management to programmers. This invites memory safety bugs that attackers can exploit to divert control flow and compromise the system. Deployed de-fense mechanisms (e.g., ASLR, DEP) ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Systems code is often written in low-level languages like C/C++, which offer many benefits but also dele-gate memory management to programmers. This invites memory safety bugs that attackers can exploit to divert control flow and compromise the system. Deployed de-fense mechanisms (e.g., ASLR, DEP) are incomplete, and stronger defense mechanisms (e.g., CFI) often have high overhead and limited guarantees [19, 15, 9]. We introduce code-pointer integrity (CPI), a new de-sign point that guarantees the integrity of all code point-ers in a program (e.g., function pointers, saved return ad-dresses) and thereby prevents all control-flow hijack at-tacks, including return-oriented programming. We also introduce code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2 % overhead for C and 1.9 % for C/C++, while CPI’s overhead is 2.9 % for C and 8.4 % for C/C++. A prototype implementation of CPI and CPS can be obtained from
KCoFI: Complete control-flow integrity for commodity operating system kernels
- IEEE S&P
, 2014
"... We present a new system, KCoFI, that is the first we know of to provide complete Control-Flow Integrity protection for commodity operating systems without using heavyweight complete memory safety. Unlike previous systems, KCoFI protects commodity operating systems from classical control-flow hijack ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
(Show Context)
We present a new system, KCoFI, that is the first we know of to provide complete Control-Flow Integrity protection for commodity operating systems without using heavyweight complete memory safety. Unlike previous systems, KCoFI protects commodity operating systems from classical control-flow hijack attacks, return-to-user attacks, and code segment modification attacks. We formally verify a subset of KCoFI’s design by modeling several features in small-step semantics and providing a partial proof that the semantics maintain control-flow integrity. The model and proof account for oper-ations such as page table management, trap handlers, context switching, and signal delivery. Our evaluation shows that KCoFI prevents all the gadgets found by an open-source Return Oriented Programming (ROP) gadget-finding tool in the FreeBSD kernel from being used; it also reduces the number of indirect control-flow targets by 98.18%. Our evaluation also shows that the performance impact of KCoFI on web server bandwidth is negligible while file transfer bandwidth using OpenSSH is reduced by an average of 13%, and at worst 27%, across a wide range of file sizes. PostMark, an extremely file-system intensive benchmark, shows 2x overhead. Where comparable numbers are available, the overheads of KCoFI are far lower than heavyweight memory-safety techniques. I.
Monitor Integrity Protection with Space Efficiency and Separate Compilation ABSTRACT
"... Low-level inlined reference monitors weave monitor code into a program for security. To ensure that monitor code cannot be bypassed by branching instructions, some form of control-flow integrity must be guaranteed. Past approaches to protecting monitor code either have high space overhead or do not ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Low-level inlined reference monitors weave monitor code into a program for security. To ensure that monitor code cannot be bypassed by branching instructions, some form of control-flow integrity must be guaranteed. Past approaches to protecting monitor code either have high space overhead or do not support separate compilation. We present Monitor Integrity Protection (MIP), a form of coarse-grained control-flow integrity. The key idea of MIP is to arrange instructions in variable-sized chunks and dynamically restrict indirect branches to target only chunk beginnings. We show that this simple idea is effective in protecting monitor code integrity, enjoys low space and execution-time overhead, supports separate compilation, and is largely compatible with an existing compiler toolchain. We also show that MIP enables a separate verifier that completely disassembles a binary and verifies its security. MIP is designed to support inlined reference monitors. As a case study, we have implemented MIP-based Software-based Fault Isolation (SFI) on both x86-32 and x86-64. The evaluation shows that MIP-based SFI has competitive performance with other SFI implementations, while enjoying low space overhead.
Readactor: Practical code randomization resilient to memory disclosure
- In IEEE Symposium on Security and Privacy, S&P ’15
, 2015
"... Abstract-Code-reuse attacks such as return-oriented programming (ROP) pose a severe threat to modern software. Designing practical and effective defenses against code-reuse attacks is highly challenging. One line of defense builds upon fine-grained code diversification to prevent the adversary from ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Abstract-Code-reuse attacks such as return-oriented programming (ROP) pose a severe threat to modern software. Designing practical and effective defenses against code-reuse attacks is highly challenging. One line of defense builds upon fine-grained code diversification to prevent the adversary from constructing a reliable code-reuse attack. However, all solutions proposed so far are either vulnerable to memory disclosure or are impractical for deployment on commodity systems. In this paper, we address the deficiencies of existing solutions and present the first practical, fine-grained code randomization defense, called Readactor, resilient to both static and dynamic ROP attacks. We distinguish between direct memory disclosure, where the attacker reads code pages, and indirect memory disclosure, where attackers use code pointers on data pages to infer the code layout without reading code pages. Unlike previous work, Readactor resists both types of memory disclosure. Moreover, our technique protects both statically and dynamically generated code. We use a new compiler-based code generation paradigm that uses hardware features provided by modern CPUs to enable execute-only memory and hide code pointers from leakage to the adversary. Finally, our extensive evaluation shows that our approach is practical-we protect the entire Google Chromium browser and its V8 JIT compiler-and efficient with an average SPEC CPU2006 performance overhead of only 6.4%.
Opaque control-flow integrity.
- In 22nd Annual Network and Distributed System Security Symposium, NDSS,
, 2015
"... Abstract-A new binary software randomization and ControlFlow Integrity (CFI) enforcement system is presented, which is the first to efficiently resist code-reuse attacks launched by informed adversaries who possess full knowledge of the inmemory code layout of victim programs. The defense mitigates ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Abstract-A new binary software randomization and ControlFlow Integrity (CFI) enforcement system is presented, which is the first to efficiently resist code-reuse attacks launched by informed adversaries who possess full knowledge of the inmemory code layout of victim programs. The defense mitigates a recent wave of implementation disclosure attacks, by which adversaries can exfiltrate in-memory code details in order to prepare code-reuse attacks (e.g., Return-Oriented Programming (ROP) attacks) that bypass fine-grained randomization defenses. Such implementation-aware attacks defeat traditional fine-grained randomization by undermining its assumption that the randomized locations of abusable code gadgets remain secret. Opaque CFI (O-CFI) overcomes this weakness through a novel combination of fine-grained code-randomization and coarsegrained control-flow integrity checking. It conceals the graph of hijackable control-flow edges even from attackers who can view the complete stack, heap, and binary code of the victim process. For maximal efficiency, the integrity checks are implemented using instructions that will soon be hardware-accelerated on commodity x86-x64 processors. The approach is highly practical since it does not require a modified compiler and can protect legacy binaries without access to source code. Experiments using our fully functional prototype implementation show that O-CFI provides significant probabilistic protection against ROP attacks launched by adversaries with complete code layout knowledge, and exhibits only 4.7% mean performance overhead on current hardware (with further overhead reductions to follow on forthcoming Intel processors). I. MOTIVATION Code-reuse attacks (cf., Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author (for reproduction of an entire paper only), and the author's employer if the paper was prepared within the scope of employment. This has motivated copious work on defenses against codereuse threats. Prior defenses can generally be categorized into: CFI [1] and artificial software diversity CFI restricts all of a program's runtime control-flows to a graph of whitelisted control-flow edges. Usually the graph is derived from the semantics of the program source code or a conservative disassembly of its binary code. As a result, CFIprotected programs reject control-flow hijacks that attempt to traverse edges not supported by the original program's semantics. Fine-grained CFI monitors indirect control-flows precisely; for example, function callees must return to their exact callers. Although such precision provides the highest security, it also tends to incur high performance overheads (e.g., 21% for precise caller-callee return-matching [1]). Because this overhead is often too high for industry adoption, researchers have proposed many optimized, coarser-grained variants of CFI. Coarse-grained CFI trades some security for better performance by reducing the precision of the checks. For example, functions must return to valid call sites (but not necessarily to the particular site that invoked the callee). Unfortunately, such relaxations have proved dangerous-a number of recent proof-of-concept exploits have shown how even minor relaxations of the control-flow policy can be exploited to effect attacks Artificial software diversity offers a different but complementary approach that randomizes programs in such a way that attacks succeeding against one program instance have a very low probability of success against other (independently randomized) instances of the same program. Probabilistic defenses rely on memory secrecy-i.e., the effects of randomization must remain hidden from attackers. One of the simplest and most widely adopted forms of artificial diversity is Address Space Layout Randomization (ASLR), which randomizes the base addresses of program segments at loadtime. Unfortunately, merely randomizing the base addresses does not yield sufficient entropy to preserve memory secrecy in many cases; there are numerous successful derandomization attacks against ASLR Recently, a new wave of implementation disclosure attacks Experiments show that O-CFI enjoys performance overheads comparable to standard fine-grained diversity and non-opaque, coarse-grained CFI. Moreover, O-CFI's control-flow checking logic is implemented using Intel x86/x64 memory-protection extensions (MPX) that are expected to be hardware-accelerated in commodity CPUs from 2015 onwards. We therefore expect even better performance for O-CFI in the near future. Our contributions are as follows: • We introduce O-CFI, the first low-overhead code-reuse defense that tolerates implementation disclosures. • We describe our implementation of a fully functional prototype that protects stripped, x86 legacy binaries without source code. II. THREAT MODEL Our work is motivated by the emergence of attacks against fine-grained diversity and coarse-grained control-flow integrity. We therefore introduce these attacks and distill them into a single, unified threat model. A. Bypassing Coarse-Grained CFI Ideally, CFI permits only programmer-intended control-flow transfers during a program's execution. The typical approach is to assign a unique ID to each permissible indirect controlflow target, and check the IDs at runtime. Unfortunately, this introduces performance overhead proportional to the degree of the graph-the more overlaps between valid target sets of indirect branch instructions, the more IDs must be stored and checked at each branch. Moreover, perfect CFI cannot be realized with a purely static control-flow graph; for example, the permissible destinations of function returns depend on the calling context, which is only known at runtime. Fine-grained CFI therefore implements a dynamically computed shadow stack, incurring high overheads To avoid this, coarse-grained CFI implementations resort to a reduced-degree, static approximation of the control-flow graph, and merge identifiers at the cost of reduced security. Typically, attackers need more than a single 4K page worth of code to find enough gadgets to mount a code-reuse attack. To discourage brute-force searches for more code pages, artificial diversity defenses routinely mine the address space with unmapped pages that abort the process if accessed B. Assumptions Given these sobering realities, we adopt a conservative threat model that assumes that attackers will eventually find and disassemble all code pages in victim processes. Our threat model therefore assumes that the adversary knows the complete in-memory code layout-including the locations of any gadgets required to launch a ROP attack. We also assume that the attacker can read and write the full contents of the heap and stack, as well as any data structures used by the dynamic loader. In keeping with common practice, we assume that data execution protection is activated, so that code page permissions can be maintained as either writable or executable but not both. However, we assume that attackers cannot safely perform a comprehensive, linear scan of virtual memory, since defenders may place unmapped guard pages at random locations. Instead, attackers must follow references from one disclosed memory page to another III. O-CFI OVERVIEW O-CFI combines insights from CFI and automated software diversity. It extends CFI with a new, coarse-grained CFI enforcement strategy inspired by bounds-checking, that validates control-flow transfers without divulging the bounds against which their destinations are checked. Bounds-checking is fast, the bounds are easier to conceal than arbitrary gadget locations, and the bounds are randomizable. This imbues CFI and fine-grained software diversity with an additional layer of protection against code-reuse attacks aided by implementation disclosures. As a result, O-CFI enjoys performance similar to coarse-grained CFI, with probabilistic security guarantees similar to fine-grained artificial diversity in the absence of implementation disclosures. Following traditional CFI, an O-CFI policy assigns to each indirect branch site a destination set that captures its set of permissible destination addresses. Such a graph can be derived from the program's source code or (with lesser precision) a conservative disassembly of its object code. We next reformulate this policy as a bounds-checking problem by reducing each destination set to only its minimal and maximal members. This policy approximation can be efficiently enforced by confining each branch to the memory-aligned addresses within its destination set range. All intended destination addresses are aligned within these bounds, so the enforcement conservatively preserves intended control-flows. Code layout is optimized to tighten the bounds, so that the set of unintended, aligned destinations within the bounds remains minimal. These few remaining unintended but reachable destinations are protected by the artificial diversity half of our approach. Our artificial diversity approach probabilistically protects the aligned, in-bounds, but policy-violating control-flows by applying fine-grained randomization to the binary code at load-time. While the overall strategy for implementing this randomization step is based on prior works Reformulating CFI in this way forces attackers to change their plan of attack. The recent attacks against coarse-grained CFI succeed by finding exploitable code that is reachable due to policy-relaxations needed for acceptable performance. These relaxations admit an alarming array of false-positives: instead of identifying the actual caller, all call-preceded instructions are incorrectly identified as permitted branch destinations. Such instructions saturate a typical address space, giving attackers too much wiggle room to build attacks. O-CFI counters this by changing the approximation approach: each branch destination is restricted to a relatively short span of aligned addresses, with all the bounds chosen pseudo-randomly at load-time. This greatly narrows the field of possible hijacks, and it removes the opportunity for attackers to analyze programs ahead of time for viable ROP gadget chains. In O-CFI, no two program instances admit the same set of ROP payloads, since the bounds are all randomized every time the program is loaded. Since the security of coarse-grained CFI depends in part on the precision of its policy approximation, it is worthwhile to improve the precision by tightening the bounds imposed upon each branch. This effectively reduces the space of attacker guesses that might succeed in hijacking any given branch. To reduce this space as much as possible, we introduce a novel binary code optimization, called portals, that minimizes the distance covered by the lowest and greatest element of each indirect branch's destination set. Our fine-grained artificial diversity implementation is an adaptation and extension of binary stirring To protect against information leaks that might disclose bounds information, our implementation is carefully designed to keep all bounds opaque to external threats. They are randomly chosen at load-time (as a side-effect of binary stirring) and stored in a bounds lookup table (BLT) located at a randomly chosen base address. The table size is very small relative to the virtual address space, and attackers cannot safely perform bruteforce scans of the full address space (see §II-B), so guessing the BLT's location is probabilistically infeasible for attackers. No code or data sections contain any pointer references to BLT addresses; all references are computed dynamically at load-time and stored henceforth exclusively in protected registers. A. Bounding the Control Flow For each indirect branch site with (non-empty) destination set D, O-CFI guards the branch instruction with a bounds-check that continues execution only if the impending target t satisfies t ∈ [min D, max D]. Indirect branch instructions include all control-flow transfer instructions that target computed destinations, including return instructions. Failure of the boundscheck solicits immediate process termination with an error code (for easier debugging). Termination could be replaced with a different intervention if desired, such as an automated attack analysis or alarm, followed by restart and re-randomization. The bounds-check implementation first loads the pair (min D, max D) from the BLT into registers via an indirect, indexed memory reference. The load instruction's arguments and syntax are independent of the BLT's location, concealing its address from attackers who can read the checking code. The impending branch target t is then checked against the loaded bounds. If the check succeeds, execution continues; otherwise the process immediately terminates with a bounds range (#BR) exception. The #BR exception helps distinguish between crashes and guessing attacks. To resist guessing attacks (e.g., BROP), web servers and other services should use this exception to trigger re-randomization as they restart. Following the approaches of PittSFIeld To bypass these checks, an attacker must craft a payload whose every gadget is properly aligned and falls within the bounds of the preceding gadget's conclusory indirect branch. The odds of guessing a reachable series of such gadgets decrease exponentially with the number of gadgets in the desired payload. B. Opacifying Control-flow Bounds Diversifying bounds. The bounds introduced by O-CFI constitute a coarse-grained CFI policy. Section II warns that such coarse granularity can lead to vulnerabilities. However, to exploit such vulnerabilities, attackers must discover which control-flows adhere to the CFI policy and which do not. To make the impermissible flows opaque to attackers, we use diversity. Our prototype uses a modified version of the technique outlined by Wartell et al. Performing fine-grain code randomization at load-time indirectly randomizes the ranges used to bound the control-flow. In contrast to other CFI techniques, attackers therefore do not have a priori knowledge of the control-flow bounds. Preventing Information Leaks. Attackers bypass fine grained diversity using information leaks, such as those described in §II-A. Were O-CFI's control-flow bounds expressed as constants in the instruction stream, attackers could bypass our defense via information leaks. To avoid this, we instead confine this sensitive information to an isolated data page, the BLT. The BLT is initialized at a random virtual memory address at load-time, and there are no pointer references (obfuscated or otherwise) to any BLT address in any code or data page in the process. This keeps its location hidden from attackers. Furthermore, we take additional steps to prevent accidental BLT disclosure via pointer leaks. Our prototype stores BLT base addresses in segment selectors-a legacy feature of all x86 processors. In particular, each load from the BLT uses the gs segment selector and a unique index to read the correct bounds. We only use the gs selector for instructions that implement bounds checks, so there are no other instructions that adversaries can reuse to learn its value. Attackers are also prevented from executing instructions that reveal the contents of the segment registers, since such instructions are privileged. To succeed, attackers must therefore (i) guess branch ranges, or (ii) guess the base address of the BLT. The odds of correctly guessing the location of the BLT are low enough to provide probabilistic protection. On 32-bit Windows Systems, for instance, the chances of guessing the base address are or less than one in two billion. Incorrect guesses alert defenders and trigger re-randomization with high probability (by accessing an unallocated memory page). The likelihood of successfully guessing a reachable gadget chain is a function of the length of the chain and the span of the bounds. The next section therefore focuses on reducing the average bounds span. C. Tightening Control-flow Check Bounds The distance between the lowest and highest intended destinations of any given indirect control-flow transfer instruction depends on the code layout. Placing indirect branches close to their targets both reduces bounds and improves locality, elevating both security and efficiency. Therefore we organize the code segment into clusters-one per indirect branch-each containing the basic blocks targeted by a particular branch. To accommodate blocks that are destinations of multiple distinct branch instructions, we consider three options: (i) put the block in one cluster and expand the bounds of other branches to include its address, (ii) create duplicate copies of the block in multiple clusters, or (iii) add a portal block to each cluster, which unconditionally jumps to the block. Each solution incurs a trade-off: expanding bounds reduces security, creating duplicates increases code size, and portals introduce runtime overhead. The options are not mutually exclusive, affording optimizers a range of strategies. Our experiments indicate that portals are often the best choice (see below). The capacity of the portal system limits the number of portals per nexus. Varying nexus capacity allows O-CFI to be tuned to different requirements. Setting it to zero prevents the creation of any portals, forcing the optimizer to choose alternative options. At the other extreme, setting no upper limit allows a portal to be created for every target, reducing all bounds ranges to wt, where w is the alignment width (usually 16 bytes; see §V-A) and t is the number of targets of the branch. At this setting, all indirect branches can only branch into a nexus, and through them, only to exactly those addresses that have been statically identified as targets. Thus, O-CFI with unbounded nexus capacity enforces fine-grained, static CFI. The extra layer of indirection imposed by a portal has a minor impact on runtime; there is thus a trade-off between security and performance. Users may opt for full CFI enforcement with O-CFI for security-critical components, and lower the nexus capacity to a desired performance level for less critical software. In our experiments, we found that a nexus capacity of 12 results in a significant reduction in bounds sizes with imperceptible performance effects. All of our experiments in §V use this nexus capacity. Section V-D details how different nexus capacities affect bound ranges. D. Example Defense against JIT-ROP The following example illustrates how O-CFI secures binaries against disclosure attacks. Consider a binary whose code segment contains five useful gadgets g 1 , . . . , g 5 . Each gadget terminates in an indirect branch protected by a bounds check. Under appropriate conditions, a disclosure attack such as JIT-ROP is able to recover a large portion of the runtime layout of the binary In our example, if g 1 is selected to be part of the payload, it can only be chained with gadget g 4 or g 5 . Attempting to jump from g 1 to any other gadget triggers a bounds violation that stops the attack. Similarly, an attack that hijacks a controlflow to c 1 can only redirect it to gadgets g 1 , g 2 , or g 3 ; all other gadgets are outside cluster c 1 and are therefore detected as impermissible destinations of the hijacked branch. Broadly speaking, all links in a payload's chain must traverse edges in the Cartesian product of the (aligned) gadget sets within the corresponding clusters. A successful attack must therefore limit itself to an extremely sparse graph of available edges. Our experiments (see §V-C) indicate that in practice the probability of successfully chaining gadgets in such a sparse graph is very low-just 0.01% for a four-gadget payload. The entropy of our procedure is further analyzed in §VI-A. IV. O-CFI IMPLEMENTATION We have implemented a fully functional prototype of O-CFI for the Intel x86 architecture. Our implementation uses a binary rewriting framework that secures COTS x86 binaries without source, debug, or relocation information. Like traditional CFI, however, we emphasize that O-CFI is equally suitable for inclusion in a compiler. Our rewriter generates a transformed version of the binary that leverages 1) a coarse-grained CFI policy that bounds control-flows, 2) fine-grained randomization to thwart traditional ROP attacks and diversify control-flow bounds so they become unknown and unreliable for attackers, 3) x86 segmentation registers to prevent accidental leakage of the bounds lookup table (BLT), and 4) an SFI framework similar to PittSFIeld [29] to enforce instruction alignment, denying attackers access to misaligned instructions that bypass bounds checks. The architecture of O-CFI is shown in A. Static Binary Rewriting 1) Conservative Disassembly: We first disassemble the code section using a conservative disassembler. Similar to the approach outlined by Wartell et al. [45], the code section is duplicated, with the old copy (renamed to .told) serving as a read-only data segment and the new copy (called .tnew) containing the rewritten executable code. The .told section is set non-executable, and all code blocks identified as possible targets of indirect jumps are overwritten with a five-byte tagged pointer. The tagged pointer consists of a tag byte (0xF4) followed by the four-byte address of that block in the .tnew section. The tag byte facilitates efficient runtime redirection of stale pointers to their correct targets, as explained below. Since the .told section preserves all static data at its original addresses, data pointers in the rewritten code section continue to behave correctly. This makes the rewriting system resilient to disassembly errors that misclassify data as code. Disassembly errors that misclassify code as data could omit such code from the .tnew section, resulting in a crash at runtime. To avoid this, we use settings that encourage the disassembler to interpret all bytes with valid instruction encodings as code. In our experiments, these settings suffice to avoid all disassembly errors that affect proper code translation. 2) SFI and Randomization Framework: To prevent attacks from jumping over the guards that constrain branch ranges, the new code segment is split into power-of-two sized basicblocks called chunks 6 Direct branches are statically rewritten to reference their new target addresses. Indirect branches require extra effort, since their exact targets are only known at runtime. At runtime, there are two common cases: (a) the impending target is already within the .tnew section (e.g., it was pushed by a call), or (b) the impending target is a stale pointer that points into the .told section (e.g., it was loaded from a method dispatch table in the heap, which the static rewriter does not modify). The first case requires no special treatment; the second solicits an efficient dynamic lookup and redirection of the stale pointer to its new location The stale pointer redirection mechanism is not relied upon for security. Like all indirect branch targets, redirected pointers undergo a mask and bounds-check before becoming controlflow destinations. Thus, corrupting or defeating the redirection mechanism does not circumvent the security policy. The ability to redirect code pointers lays the foundation for load-time randomization. Once the new randomized locations for basic blocks have been finalized, updating the values in the .told section allows our redirection mechanism to correctly redirect all indirect branches to the new, randomized block locations. Direct branches are simply modified in-place. 3) Branch Instrumentation: The above techniques enforce SFI and fine-grained randomization. This protects against traditional ROP attacks, but not against implementation-aware attacks, which require the additional hardening implemented by O-CFI's bounds-checking. Bounds-checking is applied after stale pointer redirection alongside masking, to further limit the set of accessible gadgets. SFI enforcement prevents attack payloads from circumventing these bounds checks. Algorithm 1 CreateClusters(S): Cluster basic blocks to place the targets of indirect branches as close together as possible. Input: S {the set of the basic blocks in the code segment} Output: C {a set of clusters, one per indirect branch. Each c ∈ C is a block set containing all targets of a specific branch, plus an empty nexus for later portal insertion.} C ← ∅ for all b ∈ Branches(S) do c ← ∅ for all t ∈ Targets(b) do b ← GetBasicBlock (t) if b / ∈ C then c ← c ∪ {b } end for {The nexus is an empty basic block to hold portals.} C ← C ∪ {(c ∪ CreateNexus())} end for {Add unclaimed basic-blocks into a single final cluster.} C ← C ∪ {(S − C)} Furthermore, due to randomization, the bounds remain unknown to implementation-aware attackers, and vary from program instance to program instance. Attacks cannot statically pre-compute bounds ranges because the runtime randomization phase changes bounds values on each execution. They also cannot dynamically leak the bounds, all of which are stored securely in the BLT and never leaked to the stack or heap. Attackers must therefore hazard guesses as to which gadget chains are safely accessible for any given program instance. Our bounds-checking logic is detailed in 4) Accurate Target Identification: To ensure that we identify all intended targets of indirect branches, we employ disassembly heuristics that identify a superset of potential targets. As an example, we follow the following sequence of steps to identify the set of potential targets for a return instruction: 1) Identify all code references to the function that contains the return. This includes direct and indirect branches to the function entry point, as well as to any basic block within the function. 2) For each identified branch that is not a call, find all code references that flow into it. 3) Recursively traverse all non-call references until a fixed point is reached (i.e., a set with only calls). 4) The instruction immediately after each call forms the target set for that return. Our heuristics are tuned to prefer false positives (non-targets treated as possibly valid destinations), since such errors do not significantly affect the operation of our system. In particular, each such error only marginally weakens the system's security (by admitting an unnecessary control-flow link that remains guarded by randomization) and slightly increases generated code size. A compiler-side solution could be more precise, at the cost of requiring source code and recompilation of programs. 5) Bounds Range Minimization: As discussed in §III-C, we use a combination of clustering and portals to reduce bounds ranges. While the portals themselves are created only at binary load-time, it is in the static phase that branch targets are clustered together and empty nexuses created. Algorithm 1 gives 7