Results 1 -
9 of
9
Potential benefits of delta encoding and data compression for HTTP (Corrected version)
, 1997
"... ..."
Scalable kernel performance for Internet servers under realistic loads
, 1998
"... UNIX Internet servers with an event-driven architecture often perform poorly under real workloads, even if they perform well under laboratory benchmarking conditions. We investigated the poor performance of event-driven servers. We found that the delays typical in wide-area networks cause busy serve ..."
Abstract
-
Cited by 86 (9 self)
- Add to MetaCart
UNIX Internet servers with an event-driven architecture often perform poorly under real workloads, even if they perform well under laboratory benchmarking conditions. We investigated the poor performance of event-driven servers. We found that the delays typical in wide-area networks cause busy servers to manage a large number of simultaneous connections. We also observed that the select system call implementation in most UNIX kernels scales poorly with the number of connections being managed by a process. The UNIX algorithm for allocating file descriptors also scales poorly. These algorithmic problems lead directly to the poor performance of event-driven servers. We implemented scalable versions of the select system call and the descriptor allocation algorithm. This led to an improvement of up to 58% in Web proxy and Web server throughput, and dramatically improved the scalability of the system.
Memory-System Design Considerations For Dynamically-Scheduled Microprocessors
, 1997
"... Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
Memory-System Design Considerations for Dynamically-Scheduled Microprocessors Keith Istvan Farkas Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 1997 Dynamically-scheduled processors challenge hardware and software architects to develop designs that balance hardware complexity and compiler technology against performance targets. This dissertation presents a first thorough look at some of the issues introduced by this hardware complexity. The focus of the investigation of these issues is the register file and the other components of the data memory system. These components are: the lockup-free data cache, the stream buffers, and the interface to the lower levels of the memory system. The investigation is based on software models. These models incorporate the features of a dynamically-scheduled processor that affect the design of the data-memory components. The models represent a balance between accuracy and generality, and ar...
Memory Consistency Models for Shared-Memory Multiprocessors
- WRL RESEARCH REPORT
, 1995
"... The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the u ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the
Register File Design Considerations in Dynamically Scheduled Processors
- In Proceedings of the Second IEEE Symposium on High-Performance Computer Architecture
, 1995
"... We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at processors capable of issuing either four or eight instructions per cycle and found that in most cases implementing precise ..."
Abstract
-
Cited by 40 (1 self)
- Add to MetaCart
We have investigated the register file requirements of dynamically scheduled processors using register renaming and dispatch queues running the SPEC92 benchmarks. We looked at processors capable of issuing either four or eight instructions per cycle and found that in most cases implementing precise exceptions requires a relatively small number of additional registers compared to imprecise exceptions. Systems with aggressive non-blocking load support were able to achieve performance similar to processors with perfect memory systems at the cost of some additional registers. Given our machine assumptions, we found that the performance of a four-issue machine with a 32-entry dispatch queue tends to saturate around 80 registers. For an eight-issue machine with a 64-entry dispatch queue performance does not saturate until about 128 registers. Assuming the machine cycle time is proportional to the register file cycle time, the 8-issue machine yields only 20% higher performance than the 4-issue machine due in part...
S.C.: A system for recognizing a large class of engineering drawings
- IEEE Trans. Pattern Anal. Mach. Intell
, 1997
"... Abstract—We present a system for recognizing a large class of engineering drawings characterized by alternating instances of symbols and connection lines. The class includes domains such as flowcharts, logic and electrical circuits, and chemical plant diagrams. The output of the system, a netlist id ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract—We present a system for recognizing a large class of engineering drawings characterized by alternating instances of symbols and connection lines. The class includes domains such as flowcharts, logic and electrical circuits, and chemical plant diagrams. The output of the system, a netlist identifying the symbol types and interconnections, may be used for design simulation or as a compact portable representation of the drawing. The automatic recognition task is divided into two stages: 1) Domainindependent rules are used to segment symbols from connection lines in the drawing image that has been thinned, vectorized, and preprocessed in routine ways. 2) A drawing understanding subsystem works in concert with a set of domain-specific matchers to classify symbols and correct errors automatically. A graphical user interface is provided to correct residual errors interactively and to log data for reporting errors objectively. The system has been tested on a database of 64 printed images drawn from text books and handbooks in different domains and scanned at 150 and 300 dpi resolution. Index Terms—Symbolic drawings, flow diagrams, segmentation and labeling, domain independence, automatic and interactive
Efficient Dynamic Procedure Placement
, 1998
"... Commercial applications such as database servers often have very large instruction footprints and consequently are frequently stalled due to instruction cache misses. A large fraction of the i-cache misses are typically due to conflicts in the relatively small direct-mapped on-chip instruction ca ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Commercial applications such as database servers often have very large instruction footprints and consequently are frequently stalled due to instruction cache misses. A large fraction of the i-cache misses are typically due to conflicts in the relatively small direct-mapped on-chip instruction caches. A variety of tools have been developed to try to order the procedures of an application to minimize these conflicts. Such tools often make use of profile information to place procedures so that procedures that frequently call each other do not conflict in the i-cache. However, users often avoid using any kind of tool that requires them to do extra profiling and linking steps to optimize their application. In addition, any tool that does a static layout of procedures (whether using profiling information or not) cannot adapt to varying application workloads that cause very different application behavior. We have developed a method called DPP (dynamic procedure placement) for pl...
Reducing Compulsory and Capacity Misses
, 1990
"... This paper investigates several methods for reducing cache miss rates. Longer cache lines can be advantageously used to decrease cache miss rates when used in conjunction with miss caches. Prefetch techniques can also be used to reduce cache miss rates. However, stream buffers are better than either ..."
Abstract
- Add to MetaCart
This paper investigates several methods for reducing cache miss rates. Longer cache lines can be advantageously used to decrease cache miss rates when used in conjunction with miss caches. Prefetch techniques can also be used to reduce cache miss rates. However, stream buffers are better than either of these two approaches. They are shown to have lower miss rates than an optimal line size for each program, and have better or near equal performance to traditional prefetch techniques even when single instruction-issue latency is assumed for prefetches. Stream buffers in conjunction with victim caches can often provide a reduction in miss rate equivalent to a doubling or quadupling of cache size. In some cases the reduction in miss rate provided by stream buffers and victim caches is larger than that of any size cache. Finally, the potential for compiler optimizations to increase the performance of stream buffers is investigated. This tech note is a copy of a paper that was submitted to ...
Comparative Evaluation Of Fine- and Coarse-Grain . . .
- IN PROCEEDINGS OF THE FIFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE
, 1989
"... Symmetric multiprocessors (SMPs) connected with low-latency networks provide attractive building blocks for software distributed shared memory systems. Two distinct approaches have been used: the fine-grain approach that instruments application loads and stores to support a small coherence granu ..."
Abstract
- Add to MetaCart
Symmetric multiprocessors (SMPs) connected with low-latency networks provide attractive building blocks for software distributed shared memory systems. Two distinct approaches have been used: the fine-grain approach that instruments application loads and stores to support a small coherence granularity, and the coarse-grain approach based on virtual memory hardware that provides coherence at a page granularity. Fine-grain systems offer a simple migration path for applications developed on hardware multiprocessors by supporting coherence protocols similar to those implemented in hardware. On the other hand, coarse-grain systems can potentially provide higher performance through more optimized protocols and larger transfer granularities, while avoiding instrumentation overheads. Numerous studies have examined each approach individually, but major differences in experimental platforms and applications make comparison of the approaches difficult. This paper presents a detailed...

