## PERFORMANCE ANALYSIS OF H.263 VIDEO ENCODER (1999)

### BibTeX

@MISC{Nguyen99performanceanalysis,

author = {Thinh Pq Nguyen and For Viram},

title = {PERFORMANCE ANALYSIS OF H.263 VIDEO ENCODER},

year = {1999}

}

### OpenURL

### Abstract

VIRAM (Vector Intelligent Random Access Memory) is a vector architecture processor with embedded memory, designed for portable multimedia processing devices. Its vector processing capability results in high performance multimedia processing, while embedded DRAM technology provides high memory bandwidth at low energy consumption. In this thesis, we evaluate and compare performance of VIRAM to other digital signal processors (DSPs) and conventional SIMD (Single Instruction Multiple Data) media extensions in the context of video coding. In particular, we will examine motion estimation (ME) and discrete cosine transform (DCT) which have been shown to dominate typical video encoders such as H.263. In doing so, we point out the advantages and disadvantages of certain VIRAM’s designs with respect to video coding. In addition, we also show that VIRAM outperforms other architectures by 4.6x to 8.7x in computing motion estimation and by 1.2x to 5.9x in computing discrete cosine transform. With appropriate VIRAM’s features, our simulation shows that VIRAM can achieve near realtime encoding of standard video QCIF sequences using exhaustive search for motion

### Citations

228 |
Displacement measurement and its application in interframe image coding
- Jain, Jain
- 1981
(Show Context)
Citation Context ...quares in the search region are computed to find the best match. Therefore, other motion estimation techniques have been proposed such as three-step search [11] and two-dimensional logarithmic search =-=[10]-=- to reduce the time complexity. These techniques compute SAD only at selected locations rather than all the possible 16x16 squares in the search region. As a result, they are faster to compute at the ... |

77 | Vector Microprocessors
- Asanovic
- 1998
(Show Context)
Citation Context ...or traditional applications with complex control flow. On the other hand, the kernels of multimedia applications are often characterized by large amounts of data parallelism and high memory bandwidth =-=[2]-=-. For instance, standardized video codecs such as MPEG-4 and H.263 consist of motion estimation (ME) and discrete cosine transform (DCT), both requiring high memory bandwidth and involving large amoun... |

29 | A media-enhanced vector architecture for embedded memory systems
- Kozyrakis
- 1999
(Show Context)
Citation Context ...popularity of portable and hand-held devices such as digital cameras and wireless videophones has created a need for hardware architectures designed for mobile, portable video processing applications =-=[13]-=-. Conventional microprocessors are not well suited to video processing because they are optimized for traditional applications with complex control flow. On the other hand, the kernels of multimedia a... |

11 | Low Power Memory Storage and Transfer Organization for the MPEG-4 Full Pel Motion Estimation on a Multimedia Processor - Brockmeyer, Nachtergaele, et al. - 1999 |

9 |
et al., The Energy Efficiency of IRAM Architectures
- Fromm
- 1997
(Show Context)
Citation Context ...e memory bandwidth by 100 times as compared to conventional microprocessor systems [15]. The target power consumption for the vector unit and memory is 2 watts, which is suitable for portable devices =-=[13,5]-=-. VIRAM has two vector arithmetic functional units (VAFU) and one vector memory functional unit (VMFU). Both VAFU and VMFU have four 64-bit vector data paths that can be used to perform eight 32-bit o... |

8 |
et al. A Case for Intelligent DRAM
- Patterson
- 1997
(Show Context)
Citation Context ... System DRAM Banks (16MB) VMFU 7sprocessor and main memory are placed on the same chip, VIRAM can potentially increase memory bandwidth by 100 times as compared to conventional microprocessor systems =-=[15]-=-. The target power consumption for the vector unit and memory is 2 watts, which is suitable for portable devices [13,5]. VIRAM has two vector arithmetic functional units (VAFU) and one vector memory f... |

2 | multimedia workloads willchange processor design - How - 1997 |

2 |
et al. “Vector IRAM Memory Performance For Image Access Patterns
- Fromm
- 1999
(Show Context)
Citation Context ...O interface with four 100MB/s parallel lines. Since VIRAM is not yet available, the performance results in this thesis are based on a near cycle-accurate simulator that was developed at U.C. Berkeley =-=[6]-=-. 4 H.263 Performance Characterization To characterize the performance of the encoder, we use the H.263 version 2 written by Telenor [15], a popular public-domain implementation of H.263. Our test env... |

2 |
Using MMX TM Instructions to Compute the Absolute Difference
- Corp
(Show Context)
Citation Context ...p factor of VIRAM over MMX. On the Pentium MMX, the measuring time is multiplied by the clock rate to obtain the number of cycles, and the measurement is done using SAD routine provided by Intel Corp =-=[8]-=-. On VIRAM, we use the cycle-accurate performance simulator. Figure 6 shows the efficiency of hardware usage on motion estimation. VIRAM-1 is the current design that can generate four addresses per cy... |

2 |
et al. "Practical Fast 1-D DCT Algorithms with 11 Multiplications
- Loeffler
(Show Context)
Citation Context ...e 7. Flow graph for 1-D DCT algorithm by Arai, Agui, and Nakajima. There are several fast methods for computing the DCT. In this thesis, we consider only two popular algorithms for computing DCT: LLM =-=[14]-=- and AAN [1]. While the original LLM algorithm uses 11 multiplications and 29 additions, we implement the alternate LLM [14], which uses 12 multiplications and 32 additions. The advantage of A1 A2 A3 ... |

1 |
et al. “Motion Estimation Algorithms for Video Compression
- Furth
(Show Context)
Citation Context ... 22327 27367 (82.8%) 2967 (9.0%) 2706 (8.2%) 33040 Table 1. Distribution of time spent on individual components of H.263 Motion estimation is used to exploit the inherent temporal redundancy of video =-=[7]-=-. In a typical motion estimation process, each frame of the video is divided into 16x16 macroblocks as described in Section 2. Given a macroblock in the current frame, the goal is to find the 16x16 re... |

1 |
et al. “Motion Compenstated Interframe Coding for Video Conferencing
- Koga
- 1982
(Show Context)
Citation Context ...ch where the SAD of all the possible 16x16 squares in the search region are computed to find the best match. Therefore, other motion estimation techniques have been proposed such as three-step search =-=[11]-=- and two-dimensional logarithmic search [10] to reduce the time complexity. These techniques compute SAD only at selected locations rather than all the possible 16x16 squares in the search region. As ... |