Results 1 - 10
of
24
Clustered Speculative Multithreaded Processors
, 1999
"... In this paper we present a processor microarchitecture that can simultaneously execute multiple threads and has a clustered design for scalability purposes. A main feature of the proposed microarchitecture is its capability to spawn speculative threads from a single-thread application at run-time. T ..."
Abstract
-
Cited by 143 (9 self)
- Add to MetaCart
In this paper we present a processor microarchitecture that can simultaneously execute multiple threads and has a clustered design for scalability purposes. A main feature of the proposed microarchitecture is its capability to spawn speculative threads from a single-thread application at run-time. These speculative threaak use otherwise idle resources of the machine. Spawning a speculative thread involves predicting its control flow as well as its dependences with other threads and the values that flow through them. In this way, threads fhat are not independent can be executed in parallel. Control-Jlow, data value and data dependence predictors particularly designedfor this type of microarchitecture are presented. Results show the potential of the microarchitecture to exploit speculative parallelism in programs that are hard to parallelize at compile-time, such as the SpecInt9.5. For a 4-thread unit configuration, some programs such as ijpeg and Ii can exploit an average degree of parallelism of more than 2 threads per cycle. The average degree ofparallelism for the whole SpecInt95 suite is 1.6 threads per cycle. This speculative parallelism results in significant speedups for all the Speclnt95 programs when compared with a single-thread execution.
Enhancing Software Reliability with Speculative Threads
, 2002
"... This paper advocates the use of a monitor-and-recover programming paradigm to enhance the reliability of software, and proposes an architectural design that allows software and hardware to cooperate in making this paradigm more efficient and easier to program. We propose that programmers write moni ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
This paper advocates the use of a monitor-and-recover programming paradigm to enhance the reliability of software, and proposes an architectural design that allows software and hardware to cooperate in making this paradigm more efficient and easier to program. We propose that programmers write monitoring functions assuming simple sequential execution semantics. Our architecture speeds up the computation by executing the monitoring functions speculatively in parallel with the main computation. For recovery, programmers can define fine-grain transactions whose side effects, including all register modifications and memory writes, can either be committed or aborted under program control. Transactions are implemented efficiently by treating them as speculative threads. Our experimental
Network-level polymorphic shellcode detection using emulation
- In Proceedings of the GI/IEEE SIG SIDAR Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA
, 2006
"... Abstract. As state-of-the-art attack detection technology becomes more prevalent, attackers are likely to evolve, employing techniques such as polymorphism and metamorphism to evade detection. Although recent results have been promising, most existing proposals can be defeated using only minor enhan ..."
Abstract
-
Cited by 22 (10 self)
- Add to MetaCart
Abstract. As state-of-the-art attack detection technology becomes more prevalent, attackers are likely to evolve, employing techniques such as polymorphism and metamorphism to evade detection. Although recent results have been promising, most existing proposals can be defeated using only minor enhancements to the attack vector. We present a heuristic detection method that scans network traffic streams for the presence of polymorphic shellcode. Our approach relies on a NIDS-embedded CPU emulator that executes every potential instruction sequence, aiming to identify the execution behavior of polymorphic shellcodes. Our analysis demonstrates that the proposed approach is more robust to obfuscation techniques like self-modifications compared to previous proposals, but also highlights advanced evasion techniques that need to be more closely examined towards a satisfactory solution to the polymorphic shellcode detection problem. 1
On Dynamic Speculative Thread Partitioning and the MEM-slicing Algorithm
- In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT
, 1999
"... A dynamic speculative multithreaded processor automatically extracts thread level parallelism from sequential binary applications without software support. The hardware is responsible for partitioning the program into threads and managing inter-thread dependencies. Current published dynamic thread p ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
A dynamic speculative multithreaded processor automatically extracts thread level parallelism from sequential binary applications without software support. The hardware is responsible for partitioning the program into threads and managing inter-thread dependencies. Current published dynamic thread partitioning algorithms work by detecting loops, procedures, or partitioning at fixed intervals. Research has thus far examined these algorithms in isolation from one another. This paper makes two contributions. First, it quantitatively compares different dynamic partitioning algorithms in the context of a fixed architecture. The architecture is a single-chip shared memory multiprocessor enhanced to allow thread and value speculation. Second, this paper presents a new dynamic partitioning algorithm called MEM-slicing. Insights into the development and operation of this algorithm are presented. The technique is particularly suited to irregular, non-numeric programs, and greatly outperforms oth...
Loop Termination Prediction
, 2000
"... Deeply pipelined high performance processors require highly accurate branch prediction to drive their instruction fetch. However there remains a class of events which are not easily predictable by standard two level predictors. One such event is loop termination. In deeply nested loops, loop termi ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Deeply pipelined high performance processors require highly accurate branch prediction to drive their instruction fetch. However there remains a class of events which are not easily predictable by standard two level predictors. One such event is loop termination. In deeply nested loops, loop terminations can account for a significant amount of the mispredictions. We propose two techniques for dealing with loop terminations. A simple hardware extension to existing prediction architectures called Loop Termination Prediction is presented, which captures the long regular repeating patterns of loops. In addition, a software technique called Branch Splitting is examined, which breaks loops with iteration counts above the detection of current predictors into smaller loops that may be effectively captured. Our results show that for many programs adding a small loop termination buffer can reduce the missprediction rate by up to a difference of 2%.
Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware
- In Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES
, 2003
"... Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or “critical regions.” Previous critical region detectors have been targeted to desktop process ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or “critical regions.” Previous critical region detectors have been targeted to desktop processors. We introduce a critical region detector targeted to embedded processors, with the unique features of being very size and power efficient, and being completely non-intrusive to the software’s execution – features needed in timing-sensitive embedded systems. Our detector not only finds the critical regions, but also determines their relative frequencies, a potentially important feature for selecting among alternative dynamic optimization methods. Our detector uses a tiny cache coupled with a small amount of logic. We provide results of extensive explorations across seventeen embedded system benchmarks. We show that highly accurate results can be achieved with only a 0.02 % power overhead and acceptable size overhead. Our detector is currently being used as part of a dynamic hardware/software partitioning approach, but is applicable to a wide-variety of situations.
On the Performance Potential of Different Types of Speculative Thread-Level Parallelism
, 2006
"... Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of di#cultto -analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism, evaluation of the unique performance potential of TLS, i.e., pe ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of di#cultto -analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism, evaluation of the unique performance potential of TLS, i.e., performance gain that be achieved only through speculation, has not received much attention. In this paper, we evaluate this aspect, by separating the speedup achievable via true TLP (thread-level parallelism) and TLS, for the SPEC CPU2000 benchmark. Further, we dissect the performance potential of each type of speculation --- control speculation, data dependence speculation and data value speculation. To the best of our knowledge, this is the first dissection study of its kind. Assuming an oracle TLS mechanism --- which corresponds to perfect speculation and zero threading overhead --- whereby the execution time of a candidate program region (for speculative execution) can be reduced to zero, our study shows that, at the loop-level, the upper bound on the arithmetic mean and geometric mean speedup achievable via TLS across SPEC CPU2000 is 39.16% (standard deviation = 31.23) and 18.18% respectively.
Exploiting Speculative Thread-Level Parallelism on a SMT Processor
- SMT PROCESSOR. INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND NETWORKING
, 1999
"... In this paper we present a run-time mechanism to simultaneously execute multiple threads from a sequential program on a simultaneous multithreaded (SMT) processor. The threads are speculative in the sense that they are created by predicting he future control flow of the program. Moreover, threads a ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper we present a run-time mechanism to simultaneously execute multiple threads from a sequential program on a simultaneous multithreaded (SMT) processor. The threads are speculative in the sense that they are created by predicting he future control flow of the program. Moreover, threads are not necessarily independent. Data dependences among simultaneously executed threads may exist. To avoid the
Runtime Predictability of Loops
, 2001
"... To obtain the benefits of aggressive, wide-issue, architectures, a large window of valid instructions must be available. While researchers have been successful in obtaining high accuracies with a range of dynamic branch predictors, there still remains the need for more aggressive instruction deliver ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
To obtain the benefits of aggressive, wide-issue, architectures, a large window of valid instructions must be available. While researchers have been successful in obtaining high accuracies with a range of dynamic branch predictors, there still remains the need for more aggressive instruction delivery.
Exploiting postdominance for speculative parallelization
- In Proceedings of the 13th International Symposium on High-Performance Computer Architecture
, 2007
"... This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

