Results 1 -
5 of
5
The Case for Hardware Transactional Memory in Software Packet Processing
"... Software packet processing is becoming more important to enable differentiated and rapidly-evolving network services. With increasing numbers of programmable processor and accelerator cores per network node, it is a challenge to support sharing and synchronization across them in a way that is scalab ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Software packet processing is becoming more important to enable differentiated and rapidly-evolving network services. With increasing numbers of programmable processor and accelerator cores per network node, it is a challenge to support sharing and synchronization across them in a way that is scalable and easy-to-program. In this paper, we focus on parallel/threaded applications that have irregular control-flow and frequently-updated shared state that must be synchronized across threads. However, conventional lock-based synchronization is both difficult to use and also often results in frequent conservative serialization of critical sections. Alternatively, we propose that Transactional memory (TM) is a good match to software packet processing: it both (i) can allow the system to optimistically exploit parallelism between the processing of packets whenever it is safe to do so, and (ii) is easy-to-use for a programmer. With the NetFPGA [1] platform and four network packet processing applications that are threaded and share memory, we evaluate hardware support for TM (HTM) using the reconfigurable FPGA fabric. Relative to NetThreads [2], our two-processor four-way-multithreaded system with conventional lock-based synchronization, we find that adding HTM achieves 6%, 54 % and 57 % increases in packet throughput for three of four packet processing applications studied, due to reduced conservative serialization.
General Terms
"... We propose NetTM: support for hardware transactional memory (HTM) in an FPGA-based soft multithreaded multicore that matches the strengths of FPGAs. We evaluate our system using the NetFPGA [6] platform and four network packet processing applications that are threaded and share memory. Relative to N ..."
Abstract
- Add to MetaCart
We propose NetTM: support for hardware transactional memory (HTM) in an FPGA-based soft multithreaded multicore that matches the strengths of FPGAs. We evaluate our system using the NetFPGA [6] platform and four network packet processing applications that are threaded and share memory. Relative to NetThreads [5], an existing two-processor four-way-multithreaded system with conventional lock-based synchronization, we find that adding HTM support (i) maintains a reasonable operating frequency of 125MHz with an area overhead of 20%, (ii) can transactionally execute lock-based critical sections with no software modification, and (iii) achieves 6%, 55 % and 57 % increases in packet throughput for three of four packet processing applications studied, due to reduced false synchronization. Categories and Subject Descriptors C.1.4 [Processor architectures]: Parallel Architectures; C.3 [Special-purpose and application-based systems]:
Understanding and Improving Bloom Filter Configuration for Lazy Address-Set Disambiguation
"... by disambiguating address-sets using bit-vector-based Bloom filters, which are efficient, but can report false conflicts that do not exist. Systems with lazy conflict detection often use Bloom filters unconventionally by testing sets for null-intersection via Bloom filter intersection, contrasting w ..."
Abstract
- Add to MetaCart
by disambiguating address-sets using bit-vector-based Bloom filters, which are efficient, but can report false conflicts that do not exist. Systems with lazy conflict detection often use Bloom filters unconventionally by testing sets for null-intersection via Bloom filter intersection, contrasting with the conventional approach of issuing membership queries into the Bloom filter. In this dissertation we develop much-needed theory for probability of false conflicts in Bloom filter null-intersection tests, notably demonstrating that Bloom filter intersection requires substantially larger bit-vectors to provide equivalent statistical behaviortoquerying. Furthermore,werecognizethatourtheoreticalimplicationscounter practicalintuition, andthususeRingSTMtoevaluatetheoryinpracticebyimplementing and comparing the Bloom filter configurations. We find that despite its overheads, the queue-of-queries approach reduces execution time and is thus the most compelling alternative to Bloom filter intersection for lazy address-set disambiguation. ii Acknowledgements A great deal of thanks goes to my supervisor, Professor Greg Steffan, for accepting and encouraging my blend of interests in theory and systems implementation. Through
Nonnumerical Algorithms and Problems—computations
"... A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems that ease the burden of parallel programming. However, many of these systems intersect the Bloom filter bit-arrays to approximate address-set intersection and ..."
Abstract
- Add to MetaCart
A Bloom filter is a probabilistic bit-array-based set representation that has recently been applied to address-set disambiguation in systems that ease the burden of parallel programming. However, many of these systems intersect the Bloom filter bit-arrays to approximate address-set intersection and decide set disjointness. This is in contrast with the conventional and well-studied approach of making individual membership queries into the Bloom filter. In this paper we present much-needed probabilistic models for the unconventional application of testing set disjointness using Bloom filters. Consequently, we demonstrate that intersecting Bloom filters requires substantially larger bit-arrays to provide the same probability of false set-overlap as querying into the bit-array. For when intersection is unavoidable, we prove that partitioned Bloom filters require less space than unpartitioned. Finally, we show that for Bloom filters with a single hash function, surprisingly, intersection and querying share the same probability of false set-overlap.
Overlay Architectures for FPGA-Based Software Packet Processing
, 2011
"... Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since ..."
Abstract
- Add to MetaCart
Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and

