Results 1 -
4 of
4
The ALPBench Benchmark Suite for Complex Multimedia Applications
- In Proc. of the IEEE Int. Symp. on Workload Characterization
, 2005
"... Multimedia applications are becoming increasingly important for a large class of general-purpose processors. Contemporary media applications are highly complex and demand high performance. A distinctive feature of these applications is that they have significant parallelism, including thread-, data- ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Multimedia applications are becoming increasingly important for a large class of general-purpose processors. Contemporary media applications are highly complex and demand high performance. A distinctive feature of these applications is that they have significant parallelism, including thread-, data-, and instruction-level parallelism, that is potentially well-aligned with the increasing parallelism supported by emerging multi-core architectures. Designing systems to meet the demands of these applications therefore requires a benchmark suite comprising these complex applications and that exposes the parallelism present in them. This paper makes two contributions. First, it presents ALPBench, a publicly available benchmark suite that pulls together five complex media applications from various sources: speech recognition (CMU Sphinx 3), face recognition (CSU), ray tracing (Tachyon), MPEG-2 encode (MSSG), and MPEG-2 decode (MSSG). We have modified the original applications to expose thread-level and datalevel parallelism using POSIX threads and sub-word SIMD (Intel’s SSE2) instructions respectively. Second, the paper provides a performance characterization of the ALPBench benchmarks, with a focus on parallelism. Such a characterization is useful for architects and compiler writers for designing systems and compiler optimizations for these applications. 1.
ALP: Efficient Support for All Levels of Parallelism for Complex Media Applications
- ACM Trans. Archit. Code Optim
, 2005
"... The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for multiple forms of parallelism, including ILP, TL ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for multiple forms of parallelism, including ILP, TLP, and various forms of DLP, such as subword SIMD, short vectors, and streams. Based on our observations, we propose an architecture, called ALP, that efficiently integrates all of these forms of parallelism with evolutionary changes to the programming model and hardware. The novel part of ALP is a DLP technique called SIMD vectors and streams (SVectors/SStreams), which is integrated within a conventional superscalar-based CMP/SMT architecture with subword SIMD. This technique lies between subword SIMD and vectors, providing significant benefits over the former at a lower cost than the latter. Our evaluations show that each form of parallelism supported by ALP is important. Specifically, SVectors/SStreams are effective, compared to a system with the other enhancements in ALP. They give speedups of 1.1 to 3.4X and energy-delay product improvements of 1.1 to 5.1X for applications with DLP.
1 2 Using classifier ensembles to label spatially disjoint data
, 2007
"... 12 September 2007 Disk Used 11 We describe an ensemble approach to learning from arbitrarily partitioned data. The partitioning comes from the distributed process-12 ing requirements of a large scale simulation. The volume of the data is such that classifiers can train only on data local to a given ..."
Abstract
- Add to MetaCart
12 September 2007 Disk Used 11 We describe an ensemble approach to learning from arbitrarily partitioned data. The partitioning comes from the distributed process-12 ing requirements of a large scale simulation. The volume of the data is such that classifiers can train only on data local to a given par-13 tition. As a result of the partition reflecting the needs of the simulation, the class statistics can vary from partition to partition. Some 14 classes will likely be missing from some partitions. We combine a fast ensemble learning algorithm with probabilistic majority voting 15 in order to learn an accurate classifier from such data. Results from simulations of an impactor bar crushing a storage canister and from 16 facial feature recognition show that regions of interest are successfully identified in spite of the class imbalance in the individual training 17 sets.
Energy Efficient Support for . . . FOR COMPLEX MEDIA APPLICATIONS
, 2005
"... Real-time complex media applications are becoming increasingly common on general-purpose systems such as desktop, laptop, and handheld computers. However, real-time execution of such complex media applications needs a considerable amount of processing power that often surpasses the capabilities of c ..."
Abstract
- Add to MetaCart
Real-time complex media applications are becoming increasingly common on general-purpose systems such as desktop, laptop, and handheld computers. However, real-time execution of such complex media applications needs a considerable amount of processing power that often surpasses the capabilities of current superscalar processors. Further, high performance processors are often constrained by power and energy consumption, especially in the mobile systems where media applications have become popular. The objective of this dissertation is to develop general-purpose processors that can meet the performance demands of future media applications in an energy-efficient way, while also continuing to work well on other common workloads for desktop, laptop, and handheld systems. Fortunately, most media applications have a lot of parallelism that can be exploited for energyefficient high-performance designs. Media applications exhibit multiple types of parallelism: threadlevel parallelism (TLP), data-level parallelism (DLP), and instruction-level parallelism (ILP). In this work, we investigate exploiting all these three forms of parallelism to provide both high performance and energy efficiency. This dissertation makes three broad contributions. First, we analyze the parallelism in complex

