Comparing parallelism extraction techniques: superscalar processors, pipelined processors, and multiprocessors
Gespeichert in:
Bibliographische Detailangaben
Beteiligte Personen: Lilja, David J. (VerfasserIn), Yew, Pen-Chung (VerfasserIn)
Format: Buch
Sprache:Englisch
Veröffentlicht: Urbana, Ill. 1990
Schriftenreihe:Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report 954
Schlagwörter:
Abstract:Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors
The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance.
Umfang:32 S.