Results 1 -
8 of
8
BPF+: Exploiting Global Data-flow Optimization in a Generalized Packet Filter Architecture
- In SIGCOMM
, 1999
"... A packet filter is a programmable selection criterion for classifying or selecting packets from a packet stream in a generic, reusable fashion. Previous work on packet filters falls roughly into two categories, namely those efforts that investigate flexible and extensible filter abstractions but sac ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
A packet filter is a programmable selection criterion for classifying or selecting packets from a packet stream in a generic, reusable fashion. Previous work on packet filters falls roughly into two categories, namely those efforts that investigate flexible and extensible filter abstractions but sacrifice performance, and those that focus on low-level, optimized filtering representations but sacrifice flexibility. Applications like network monitoring and intrusion detection, however, require both high-level expressiveness and raw performance. In this paper, we propose a fully general packet filter framework that affords both a high degree of flexibility and good performance. In our framework, a packet filter is expressed in a high-level language that is compiled into a highly efficient native implementation. The optimization phase of the compiler uses a flowgraph set relation called edge dominators and the novel application of an optimization technique that we call "redundant predicate...
Systematic Compilation For Predicated Execution
, 2000
"... ... synergistically to realize the potential of predication. The Partial Reverse If-Conversion Framework provides the first compilation framework to accurately balance control and predication, while providing other compiler components with complete access to the predicated code for further optimizat ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
... synergistically to realize the potential of predication. The Partial Reverse If-Conversion Framework provides the first compilation framework to accurately balance control and predication, while providing other compiler components with complete access to the predicated code for further optimization. Though the full potential of the Partial Reverse If-Conversion Framework remains unexplored, current compiler technology justies its worth. To operate on predicated code, the optimizer, scheduler, and register allocator require accurate information regarding the relationships among predicates. The Predicate Analysis System is the first efficient predicate relationship database to provide an approximation-free representation. Optimization, scheduling, and register allocation also require accurate knowledge of the flow of information in the predicated code. Using the Predicate Analysis System, the Predicate Dataow Graph is built to provide dataflow information. The Predicate Dataflow Graph
The Program Decision Logic Approach to Predicated Execution
- IN PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1999
"... Modern compilers must expose sufficient amounts of Instruction-Level Parallelism (ILP) to achieve the promised performance increases of superscalar and VLIW processors. One of the major impediments to achieving this goal has been inefficient programmatic control flow. Historically, the compiler has ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Modern compilers must expose sufficient amounts of Instruction-Level Parallelism (ILP) to achieve the promised performance increases of superscalar and VLIW processors. One of the major impediments to achieving this goal has been inefficient programmatic control flow. Historically, the compiler has translated the programmer's original control structure directly into assembly code with conditional branch instructions. Eliminating inefficiencies in handling branch instructions and exploiting ILP has been the subject of much research. However, traditional branch handling techniques cannot significantly alter the program's inherent control structure. The advent of predication as a program control representation has enabled compilers to manipulate control in a form more closely related to the underlying program logic. This work takes full advantage of the predication paradigm by abstracting the program control flow into a logical form referred to as a program decision logic network. This network is modeled as a Boolean equation and minimized using modified versions of logic synthesis techniques. After minimization, the more efficient version of the program's original control flow is re-expressed in predicated code. Furthermore, this paper proposes extensions to the HPL PlayDoh predication model in support of more effective predicate decision logic network minimization. Finally, this paper shows the ability of the mechanisms presented to overcome limits on ILP previously imposed by rigid program control structure.
Efficient and effective branch reordering using profile data
- ACM Transactions on Programming Languages and Systems (TOPLAS
, 2002
"... The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed often results in a substan ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed often results in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches that compare a common variable to constants. The goal is to obtain an ordering where the fewest av erage number of branches in the sequence will be executed. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch is estimated. Fourth, the most beneficial ordering of the branches based on the estimated probability and cost is selected. The most beneficial ordering often includes the insertion of additional conditional branches that did not previously exist in the sequence. Finally, the control flow isrestructured to reflect the new ordering. The results of applying the transformation are on average reductions of about 8% fewer instructions executed and 13 % branches performed, as well as about a 4 % decrease in execution time.
Engelen, R.: Branch elimination by condition merging
, 2003
"... Conditional branches are expensive. Branches require a significant percentage of execution cycles since they occur frequently and cause pipeline flushes when mispredicted. In addition, branches result in forks in the control flow, which can prevent other code-improving transformations from being app ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Conditional branches are expensive. Branches require a significant percentage of execution cycles since they occur frequently and cause pipeline flushes when mispredicted. In addition, branches result in forks in the control flow, which can prevent other code-improving transformations from being applied. In this paper we describe profile-based techniques for replacing the execution of a set of two or more branches with a single branch on a conventional scalar processor. These sets of branches can include tests of multiple variables. For instance, the test if (p1! = 0 & & p2! = 0), which is testing for NULL pointers, can be replaced with if (p1 & p2! = 0). Program profiling is performed to target condition merging along frequently executed paths. The results show that eliminating branches by merging conditions can significantly reduce the number of conditional branches executed in non-numerical applications. key words: compiler, condition merging, profiling, code duplication
Branch Elimination via Multi-Variable Condition Merging
"... Conditional branches are expensive. Branches require a signi cant percentage of execution cycles since they occur frequently and cause pipeline ushes when mispredicted. In addition, branches result in forks in the control ow, which can prevent other code-improving transformations from being appl ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Conditional branches are expensive. Branches require a signi cant percentage of execution cycles since they occur frequently and cause pipeline ushes when mispredicted. In addition, branches result in forks in the control ow, which can prevent other code-improving transformations from being applied. In this paper we describe pro le-based techniques for replacing the execution of a set of two or more branches with a single branch on a conventional scalar processor. First, we gather pro le information to detect the frequently executed paths in a program.
Generalized Index-Set Splitting
"... Abstract. This paper introduces Index-Set Splitting (ISS), a technique that splits a loop containing several conditional statements into several loops with less complex control flow. Contrary to the classic loop unswitching technique, ISS splits loops when the conditional is loop variant. ISS uses a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper introduces Index-Set Splitting (ISS), a technique that splits a loop containing several conditional statements into several loops with less complex control flow. Contrary to the classic loop unswitching technique, ISS splits loops when the conditional is loop variant. ISS uses an Index Sub-range Tree (IST) to identify the structure of the conditionals in the loop and to select which conditionals should be eliminated. This decision is based on an estimation of the code growth for each splitting: a greedy algorithm spends a pre-determined code growth budget. ISTs separate the decision about which splits to perform from the actual code generation for the split loops. The use of ISS to improve a loop fusion framework is then discussed. ISS opportunity identification in the SPEC2000 benchmark suite and three other suites demonstrate that ISS is a general technique that may benefit other compilers. 1
Program decision logic optimization using predication and control speculation
"... The mainstream arrival of predication, a means other than branching of selecting instructions for execution, has required compiler architects to reformulate fundamental analyses and transformations. Traditionally, the compiler has generated branches straightforwardly to implement control flow desig ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The mainstream arrival of predication, a means other than branching of selecting instructions for execution, has required compiler architects to reformulate fundamental analyses and transformations. Traditionally, the compiler has generated branches straightforwardly to implement control flow designed by the programmer and has then performed sophisticated “global” optimizations to move and optimize code around them. In this model, the inherent tie between the control state of the program and the location of the single instruction pointer serialized runtime evaluation of control and limited the extent to which the compiler could optimize the control structure of the program (without extensive code replication). Predication provides a means of control independent of branches and instruction fetch location, freeing both compiler and architecture from these restrictions; effective compilation of predicated code, however, requires sophisticated understanding of the program’s control structure. This article explores a representational technique which, through direct code analysis, maps the program’s control component into a canonical database, a reduced ordered binary decision diagram (ROBDD), which fully enables the compiler to utilize and manipulate predication. This abstraction is then applied to optimize the program’s control component, transforming it into a form more amenable to instruction-level parallel (ILP) execution.

