Results 1  10
of
130
Analytical and numerical aspects of certain nonlinear evolution equations
 Journal of Computational Physics
, 1984
"... Various numerical methods are used in order to approximate the Kortewegde Vries equation, namely: (i) ZabuskyKruskal scheme, (ii) hopscotch method, (iii) a scheme due to Goda, (iv) a proposed local scheme, (v) a proposed global scheme, (vi) a scheme suggested by Kruskal, (vii) split step Fourier m ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
Various numerical methods are used in order to approximate the Kortewegde Vries equation, namely: (i) ZabuskyKruskal scheme, (ii) hopscotch method, (iii) a scheme due to Goda, (iv) a proposed local scheme, (v) a proposed global scheme, (vi) a scheme suggested by Kruskal, (vii) split step Fourier method by Tappert, (viii) an improved split step Fourier method, and (ix) pseudospectral method by Fornberg and Whitham. Comparisons between our proposed scheme, which is developed using notions of the inverse scattering transform, and the other utilized schemes are obtained. 1.
Row Projection Methods For Large Nonsymmetric Linear Systems
 SIAM J. Scientific and Statistical Computing
, 1992
"... . Three conjugate gradient accelerated row projection (RP) methods for nonsymmetric linear systems are presented and their properties described. One method is based on Kaczmarz's method and has an iteration matrix that is the product of orthogonal projectors; another is based on Cimmino's method and ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
. Three conjugate gradient accelerated row projection (RP) methods for nonsymmetric linear systems are presented and their properties described. One method is based on Kaczmarz's method and has an iteration matrix that is the product of orthogonal projectors; another is based on Cimmino's method and has an iteration matrix that is the sum of orthogonal projectors. A new RP method which requires fewer matrixvector operations, explicitly reduces the problem size, is error reducing in the 2norm, and consistently produces better solutions than other RP algorithms is also introduced. Using comparisons with the method of conjugate gradient applied to the normal equations, the properties of RP methods are explained. A row partitioning approach is described which yields parallel implementations suitable for a wide range of computer architectures, requires only a few vectors of extra storage, and allows computing the necessary projections with small errors. Numerical testing verifies the robu...
OPTIMIZATION AND PERFORMANCE MODELING OF STENCIL COMPUTATIONS ON MODERN MICROPROCESSORS
"... Stencilbased kernels constitute the core of many important scientific applications on blockstructured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. In this paper, we explore the impact of tre ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
Stencilbased kernels constitute the core of many important scientific applications on blockstructured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. In this paper, we explore the impact of trends in memory subsystems on a variety of stencil optimization techniques and develop performance models to analytically guide our optimizations. Our work targets cache reuse methodologies across single and multiple stencil sweeps, examining cacheaware algorithms as well as cacheoblivious techniques on the Intel Itanium2, AMD Opteron, and IBM Power5. Additionally, we consider stencil computations on the heterogeneous multicore design of the Cell processor, a machine with an explicitlymanaged memory hierarchy. Overall our work represents one of the most extensive analyses of stencil optimizations and performance modeling to date. Results demonstrate that recent trends in memory system organization have reduced the efficacy of traditional cacheblocking optimizations. We also show that a cacheaware implementation is significantly faster than a cacheoblivious approach, while the explicitly managed memory on Cell enables the highest overall efficiency: Cell attains 88 % of algorithmic peak while the best competing cachebased processor only achieves 54 % of algorithmic peak performance.
Techniques for interactive design using the PDE method
 ACM Transactions on Graphics
, 1999
"... Interactive design of practical surfaces using the partial differential equation (PDE) method is considered. The PDE method treats surface design as a boundary value problem (ensuring that surfaces can be defined using a small set of design parameters). Owing to the elliptic nature of the PDE operat ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
Interactive design of practical surfaces using the partial differential equation (PDE) method is considered. The PDE method treats surface design as a boundary value problem (ensuring that surfaces can be defined using a small set of design parameters). Owing to the elliptic nature of the PDE operator, the boundary conditions imposed around the edges of the surface control the internal shape of the surface. Moreover, surfaces obtained in this manner tend to be smooth and fair. The PDE chosen has a closed form solution allowing the interactive manipulation of the surfaces in real time. Thus we present efficient techniques by which we show how surfaces of practical significance can be constructed interactively in real time.
A Cost Analysis for a Higherorder Parallel Programming Model
, 1996
"... Programming parallel computers remains a difficult task. An ideal programming environment should enable the user to concentrate on the problem solving activity at a convenient level of abstraction, while managing the intricate lowlevel details without sacrificing performance. This thesis investiga ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Programming parallel computers remains a difficult task. An ideal programming environment should enable the user to concentrate on the problem solving activity at a convenient level of abstraction, while managing the intricate lowlevel details without sacrificing performance. This thesis investigates a model of parallel programming based on the BirdMeertens Formalism (BMF). This is a set of higherorder functions, many of which are implicitly parallel. Programs are expressed in terms of functions borrowed from BMF. A parallel implementation is defined for each of these functions for a particular topology, and the associated execution costs are derived. The topologies which have been considered include the hypercube, 2D torus, tree and the linear array. An analyser estimates the costs associated with different implementations of a given program and selects a costeffective one for a given topology. All the analysis is performed at compiletime which has the advantage of reducing run...
Efficient Electrostatic and Electromagnetic Simulation Using IES³
 IEEE Computational Science and Engineering
, 1998
"... Integral equation techniques are often used to extract models of integrated circuit structures. This extraction involves solving a dense system of linear equations, and using direct solution methods is prohibitive for large problems. In this paper, we present IES 3 (pronounced "ice cube"), a fast ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Integral equation techniques are often used to extract models of integrated circuit structures. This extraction involves solving a dense system of linear equations, and using direct solution methods is prohibitive for large problems. In this paper, we present IES 3 (pronounced "ice cube"), a fast Integral Equation Solver for threedimensional problems with arbitrary kernels. We apply our method to solving electrostatic problems and electromagnetic problems in the electrically small regime (i.e., when circuit structures are at most a wavelength or so in size). The overall approach gives O(N log N) complexity, where N is the number of panels in a discretization of the conductor surfaces. 1 Introduction Extracting compact, accurate linear models for packages, interconnect, and components plays a significant role in modern Radio Frequency designs. Models can be extracted in a variety of ways, but for the high accuracy that critical sections of RF designs demand, only direct numeric sim...
THE PARFORM  A High Performance Platform for Parallel Computing in a Distributed Workstation Environment
, 1992
"... The typical workstation in a LAN is idle during large periods of time. Under the concept of a hypercomputer this unused, distributed computing power can be put at the disposal of the user. The dynamic, heterogeneous, and distributed environment calls for a platform taking care of transparency, paral ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
The typical workstation in a LAN is idle during large periods of time. Under the concept of a hypercomputer this unused, distributed computing power can be put at the disposal of the user. The dynamic, heterogeneous, and distributed environment calls for a platform taking care of transparency, parallelization, load balancing and other issues. We describe such a system which, by optimized design and dynamic load distribution, proves faster than many related approaches. Performance measurements and an analytic model on scalability are presented for an explicit finite difference PDE solver. Keywords: Distributed parallel computing, idle workstations, dynamic load balancing, scalability, finite difference methods. Subject Classification: D.1.3, D.4.7, C.2.5 (ACM/CRCS) 1 Introduction Present trends in supercomputer development emphasize expensive technologies and specialized architectural concepts. Furthermore, we observe a significant increase of Supported by Siemens AG, ZFE, Germany...
Solution and linear estimation of 2D nearestneighbor models
 Proceedings of the IEEE
, 1990
"... This paper considers the smoothing problem for 2D random fields described by stochastic nearestneighbor models (NNMs). The class of 2D estimation problems that can be modeled in this way is quite large since NNMs arise whenever partial differential equations are discretized with finite difference ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
This paper considers the smoothing problem for 2D random fields described by stochastic nearestneighbor models (NNMs). The class of 2D estimation problems that can be modeled in this way is quite large since NNMs arise whenever partial differential equations are discretized with finite difference methods. The NNM smoother is obtained by using a general smoothing technique developed in [1][3] for boundaryvalue processes in one or several dimensions. In this approach, the smoother is described by a Hamiltonian system of twice the dimension of the original system. For the problem considered here, the smoother is itself in NNM form. By converting this 2D NNM system into an equivalent 1D twopoint boundaryvalue descriptor system (TPBVDS) of large dimension, a recursive and stable solution technique is obtained. Under slightly restrictive assumptions, an even faster procedure can be obtained by using the FFT with respect to one of the space dimensions to convert the 1D TPBVDS mentioned above into a set of decoupled TPBVDSs of loworder which can be solved in parallel. This fast implementation of the smoother is illustrated by two examples, corresponding respectively to the discretized Poisson and heat equations.