Comparing parallelism extraction techniques: superscalar processors, pipelined processors, and multiprocessors
Gespeichert in:
Beteiligte Personen: | , |
---|---|
Format: | Buch |
Sprache: | Englisch |
Veröffentlicht: |
Urbana, Ill.
1990
|
Schriftenreihe: | Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report
954 |
Schlagwörter: | |
Abstract: | Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance. |
Umfang: | 32 S. |
Internformat
MARC
LEADER | 00000nam a2200000 cb4500 | ||
---|---|---|---|
001 | BV008949514 | ||
003 | DE-604 | ||
005 | 00000000000000.0 | ||
007 | t| | ||
008 | 940206s1990 xx |||| 00||| eng d | ||
035 | |a (OCoLC)22145303 | ||
035 | |a (DE-599)BVBBV008949514 | ||
040 | |a DE-604 |b ger |e rakddb | ||
041 | 0 | |a eng | |
049 | |a DE-29T | ||
100 | 1 | |a Lilja, David J. |e Verfasser |4 aut | |
245 | 1 | 0 | |a Comparing parallelism extraction techniques |b superscalar processors, pipelined processors, and multiprocessors |c David J. Lilja ; Pen-Chung Yew |
264 | 1 | |a Urbana, Ill. |c 1990 | |
300 | |a 32 S. | ||
336 | |b txt |2 rdacontent | ||
337 | |b n |2 rdamedia | ||
338 | |b nc |2 rdacarrier | ||
490 | 1 | |a Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report |v 954 | |
520 | 3 | |a Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors | |
520 | 3 | |a The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance. | |
650 | 4 | |a Multiprocessors |x Evaluation | |
700 | 1 | |a Yew, Pen-Chung |e Verfasser |4 aut | |
830 | 0 | |a Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report |v 954 |w (DE-604)BV008930033 |9 954 | |
943 | 1 | |a oai:aleph.bib-bvb.de:BVB01-005905171 |
Datensatz im Suchindex
_version_ | 1818951112842018816 |
---|---|
any_adam_object | |
author | Lilja, David J. Yew, Pen-Chung |
author_facet | Lilja, David J. Yew, Pen-Chung |
author_role | aut aut |
author_sort | Lilja, David J. |
author_variant | d j l dj djl p c y pcy |
building | Verbundindex |
bvnumber | BV008949514 |
ctrlnum | (OCoLC)22145303 (DE-599)BVBBV008949514 |
format | Book |
fullrecord | <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>02398nam a2200313 cb4500</leader><controlfield tag="001">BV008949514</controlfield><controlfield tag="003">DE-604</controlfield><controlfield tag="005">00000000000000.0</controlfield><controlfield tag="007">t|</controlfield><controlfield tag="008">940206s1990 xx |||| 00||| eng d</controlfield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)22145303</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(DE-599)BVBBV008949514</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">DE-604</subfield><subfield code="b">ger</subfield><subfield code="e">rakddb</subfield></datafield><datafield tag="041" ind1="0" ind2=" "><subfield code="a">eng</subfield></datafield><datafield tag="049" ind1=" " ind2=" "><subfield code="a">DE-29T</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Lilja, David J.</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Comparing parallelism extraction techniques</subfield><subfield code="b">superscalar processors, pipelined processors, and multiprocessors</subfield><subfield code="c">David J. Lilja ; Pen-Chung Yew</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Urbana, Ill.</subfield><subfield code="c">1990</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">32 S.</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="b">n</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="b">nc</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="490" ind1="1" ind2=" "><subfield code="a">Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report</subfield><subfield code="v">954</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors</subfield></datafield><datafield tag="520" ind1="3" ind2=" "><subfield code="a">The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance.</subfield></datafield><datafield tag="650" ind1=" " ind2="4"><subfield code="a">Multiprocessors</subfield><subfield code="x">Evaluation</subfield></datafield><datafield tag="700" ind1="1" ind2=" "><subfield code="a">Yew, Pen-Chung</subfield><subfield code="e">Verfasser</subfield><subfield code="4">aut</subfield></datafield><datafield tag="830" ind1=" " ind2="0"><subfield code="a">Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report</subfield><subfield code="v">954</subfield><subfield code="w">(DE-604)BV008930033</subfield><subfield code="9">954</subfield></datafield><datafield tag="943" ind1="1" ind2=" "><subfield code="a">oai:aleph.bib-bvb.de:BVB01-005905171</subfield></datafield></record></collection> |
id | DE-604.BV008949514 |
illustrated | Not Illustrated |
indexdate | 2024-12-20T09:29:19Z |
institution | BVB |
language | English |
oai_aleph_id | oai:aleph.bib-bvb.de:BVB01-005905171 |
oclc_num | 22145303 |
open_access_boolean | |
owner | DE-29T |
owner_facet | DE-29T |
physical | 32 S. |
publishDate | 1990 |
publishDateSearch | 1990 |
publishDateSort | 1990 |
record_format | marc |
series | Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report |
series2 | Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report |
spelling | Lilja, David J. Verfasser aut Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew Urbana, Ill. 1990 32 S. txt rdacontent n rdamedia nc rdacarrier Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report 954 Abstract: "We compare the ability of a superscalar processor, a pipelined processor, and a multiprocessor, all with the same degree of architectural parallelism, to automatically extract parallelism from scientific application programs. We find that the loop-level parallelism of the multiprocessor performs better than the instruction-level parallelism of the other processors on programs with high inherent parallelism. This performance difference is due to register allocation difficulties and instruction look-ahead requirements in the superscalar and pipelined processors The results suggest that dynamic loop unrolling is inadequate for obtaining the best performance in these processors and that it should be done by the compiler to allow for more intelligent register allocation. The multiprocessor is shown to have generally the highest memory and functional unit bandwidth requirements while the pipelined processor requires significantly more registers in order to hide the memory latency as efficiently as the other configurations. We show that there is significant fine-grain parallelism within parallel loop iterations as well as in the sequential code between parallel loops. Hence, a combination of fine-grain and coarse-grain parallelism extraction techniques are necessary in order to maximize performance. Multiprocessors Evaluation Yew, Pen-Chung Verfasser aut Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report 954 (DE-604)BV008930033 954 |
spellingShingle | Lilja, David J. Yew, Pen-Chung Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors Center for Supercomputing Research and Development <Urbana, Ill.>: CSRD report Multiprocessors Evaluation |
title | Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors |
title_auth | Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors |
title_exact_search | Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors |
title_full | Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew |
title_fullStr | Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew |
title_full_unstemmed | Comparing parallelism extraction techniques superscalar processors, pipelined processors, and multiprocessors David J. Lilja ; Pen-Chung Yew |
title_short | Comparing parallelism extraction techniques |
title_sort | comparing parallelism extraction techniques superscalar processors pipelined processors and multiprocessors |
title_sub | superscalar processors, pipelined processors, and multiprocessors |
topic | Multiprocessors Evaluation |
topic_facet | Multiprocessors Evaluation |
volume_link | (DE-604)BV008930033 |
work_keys_str_mv | AT liljadavidj comparingparallelismextractiontechniquessuperscalarprocessorspipelinedprocessorsandmultiprocessors AT yewpenchung comparingparallelismextractiontechniquessuperscalarprocessorspipelinedprocessorsandmultiprocessors |