|
Organizers |
Poster: Differentiating the BLAS
by
H. Martin Bücker
Aachen University of Technology
Coauthors: Paul D. Hovland (Argonne National Laboratory)
Primarily due to the deep memory hierarchies of today's microprocessors, straightforward implementations of simple loops in high-level programming languages such as Fortran, C, or C++ almost never achieve the processor's theoretical peak performance. From a conceptual point of view, computations based on simple loops are crucial for a differentiated version of the BLAS. The A0-poster presents some timing results obtained from preliminary hand-coded implementations of differentiated versions of some BLAS operations and discusses the process of automating the tedious task of optimizing this code for a particular architecture.
Date received: February 11, 2000
Copyright © 2000 by the author(s). The author(s) of this document and the organizers of the conference have granted their consent to include this abstract in Atlas Mathematical Conference Abstracts. Document # cads-74.