Verfügbarkeit: Comparing parallelism extraction techniques

Comparing parallelism extraction techniques: superscalar processors, pipelined processors, and multiprocessors

Gespeichert in:

Bibliographische Detailangaben
Beteiligte Personen:	Lilja, David J. (VerfasserIn), Yew, Pen-Chung (VerfasserIn)
Format:	Buch
Sprache:	Englisch
Veröffentlicht:	Urbana, Ill. 1990
Schriftenreihe:	Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report 954
Schlagwörter:	Multiprocessors > Evaluation
Abstract:	Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance.
Umfang:	32 S.

Internformat

MARC


LEADER	00000nam a2200000 cb4500
001	BV008949514
003	DE-604
005	00000000000000.0
007	t\|
008	940206s1990 xx \|\|\|\| 00\|\|\| eng d
035			\|a (OCoLC)22145303
035			\|a (DE-599)BVBBV008949514
040			\|a DE-604 \|b ger \|e rakddb
041	0		\|a eng
049			\|a DE-29T
100	1		\|a Lilja, David J. \|e Verfasser \|4 aut
245	1	0	\|a Comparing parallelism extraction techniques \|b superscalar processors, pipelined processors, and multiprocessors \|c David J. Lilja ; Pen-Chung Yew
264		1	\|a Urbana, Ill. \|c 1990
300			\|a 32 S.
336			\|b txt \|2 rdacontent
337			\|b n \|2 rdamedia
338			\|b nc \|2 rdacarrier
490	1		\|a Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report \|v 954
520	3		\|a Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors
520	3		\|a The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance.
650		4	\|a Multiprocessors \|x Evaluation
700	1		\|a Yew, Pen-Chung \|e Verfasser \|4 aut
830		0	\|a Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report \|v 954 \|w (DE-604)BV008930033 \|9 954
943	1		\|a oai:aleph.bib-bvb.de:BVB01-005905171

Datensatz im Suchindex

_version_	1818951112842018816
any_adam_object
author	Lilja, David J. Yew, Pen-Chung
author_facet	Lilja, David J. Yew, Pen-Chung
author_role	aut aut
author_sort	Lilja, David J.
author_variant	d j l dj djl p c y pcy
building	Verbundindex
bvnumber	BV008949514
ctrlnum	(OCoLC)22145303 (DE-599)BVBBV008949514
format	Book
fullrecord	<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02398nam a2200313 cb4500</leader><controlfield tag="001">BV008949514</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t\|</controlfield><controlfield tag="008">940206s1990 xx \|\|\|\| 00\|\|\| eng d</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)22145303</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV008949514</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lilja, David J.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Comparing parallelism extraction techniques</subfield><subfield code="b">superscalar processors, pipelined processors, and multiprocessors</subfield><subfield code="c">David J. Lilja ; Pen-Chung Yew</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Urbana, Ill.</subfield><subfield code="c">1990</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">32 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report</subfield><subfield code="v">954</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multiprocessors</subfield><subfield code="x">Evaluation</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Yew, Pen-Chung</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report</subfield><subfield code="v">954</subfield><subfield code="w">(DE-604)BV008930033</subfield><subfield code="9">954</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-005905171</subfield></datafield></record></collection>
id	DE-604.BV008949514
illustrated	Not Illustrated
indexdate	2024-12-20T09:29:19Z
institution	BVB
language	English
oai_aleph_id	oai:aleph.bib-bvb.de:BVB01-005905171
oclc_num	22145303
open_access_boolean
owner	DE-29T
owner_facet	DE-29T
physical	32 S.
publishDate	1990
publishDateSearch	1990
publishDateSort	1990
record_format	marc
series	Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report
series2	Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report
spelling	Lilja, David J. Verfasser aut Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew Urbana, Ill. 1990 32 S. txt rdacontent n rdamedia nc rdacarrier Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report 954 Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance. Multiprocessors Evaluation Yew, Pen-Chung Verfasser aut Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report 954 (DE-604)BV008930033 954
spellingShingle	Lilja, David J. Yew, Pen-Chung Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report Multiprocessors Evaluation
title	Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors
title_auth	Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors
title_exact_search	Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors
title_full	Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew
title_fullStr	Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew
title_full_unstemmed	Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew
title_short	Comparing parallelism extraction techniques
title_sort	comparing parallelism extraction techniques superscalar processors pipelined processors and multiprocessors
title_sub	superscalar processors, pipelined processors, and multiprocessors
topic	Multiprocessors Evaluation
topic_facet	Multiprocessors Evaluation
volume_link	(DE-604)BV008930033
work_keys_str_mv	AT liljadavidj comparingparallelismextractiontechniquessuperscalarprocessorspipelinedprocessorsandmultiprocessors AT yewpenchung comparingparallelismextractiontechniquessuperscalarprocessorspipelinedprocessorsandmultiprocessors

Verfügbarkeit

MARC

Datensatz im Suchindex

‌