Bài giảng Computer Organization and Architecture: Chapter 15
Số trang: 30
Loại file: ppt
Dung lượng: 448.00 KB
Lượt xem: 11
Lượt tải: 0
Xem trước 3 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
Bài giảng Computer Organization and Architecture: Chapter 15 - IA-64 Architecture hướng đến giới thiệu về Background to IA-64; Motivation; Superscalar v IA-64; Why New Architecture;...
Nội dung trích xuất từ tài liệu:
Bài giảng Computer Organization and Architecture: Chapter 15William StallingsComputer Organizationand Architecture6th EditionChapter 15IA-64 ArchitectureBackground to IA-64• Pentium4appearstobelastinx86line• Intel&HewlettPackard(HP)jointlydeveloped• Newarchitecture —64bitarchitecture —Notextensionofx86 —NotadaptationofHP64bitRISCarchitecture• Exploitsvastcircuitryandhighspeeds• Systematicuseofparallelism• DeparturefromsuperscalarMotivation• Instructionlevelparallelism —Implicitinmachineinstruction —Notdeterminedatruntimebyprocessor• Longorverylonginstructionwords(LIW/VLIW)• Branchpredication(notthesameasbranch prediction)• Speculativeloading• Intel&HPcallthisExplicitParallelInstruction Computing(EPIC)• IA64isaninstructionsetarchitectureintended forimplementationonEPIC• ItaniumisfirstIntelproductSuperscalar v IA-64Why New Architecture?• Nothardwarecompatiblewithx86• Nowhavetensofmillionsoftransistorsavailableonchip• Couldbuildbiggercache — Diminishingreturns• Addmoreexecutionunits — Increasesuperscaling — “Complexitywall” — Moreunitsmakesprocessor“wider” — Morelogicneededtoorchestrate — Improvedbranchpredictionrequired — Longerpipelinesrequired — Greaterpenaltyformisprediction — Largernumberofrenamingregistersrequired — AtmostsixinstructionspercycleExplicit Parallelism• Instructionparallelismscheduledatcompile time —Includedwithmachineinstruction• Processorusesthisinfotoperformparallel execution• Requireslesscomplexcircuitry• Compilerhasmuchmoretimetodetermine possibleparalleloperations• CompilerseeswholeprogramGeneral OrganizationKey Features• Largenumberofregisters —IA64instructionformatassumes256 – 128*64bitinteger,logical&generalpurpose – 128*82bitfloatingpointandgraphic —64*1bitpredicatedexecutionregisters(seelater) —Tosupporthighdegreeofparallelism• Multipleexecutionunits —Expectedtobe8ormore —Dependsonnumberoftransistorsavailable —Executionofparallelinstructionsdependson hardwareavailable – 8parallelinstructionsmaybespiltintotwolotsoffourifonly fourexecutionunitsareavailableIA-64 Execution Units • IUnit —Integerarithmetic —Shiftandadd —Logical —Compare —Integermultimediaops• MUnit —Loadandstore – Betweenregisterandmemory —SomeintegerALU• BUnit —Branchinstructions• FUnit —FloatingpointinstructionsInstruction Format DiagramInstruction Format• 128bitbundle —Holdsthreeinstructions(syllables)plustemplate —Canfetchoneormorebundlesatatime —Templatecontainsinfoonwhichinstructionscanbe executedinparallel – Notconfinedtosinglebundle – e.g.astreamof8instructionsmaybeexecutedinparallel – Compilerwillhavereorderedinstructionstoformcontiguous bundles – Canmixdependentandindependentinstructionsinsame bundle —Instructionis41bitlong – MoreregistersthanusualRISC – Predicatedexecutionregisters(seelater)Assembly Language Format• [qp] mnemonic [.comp] dest = srcs //• qppredicateregister — 1atexecutionthenexecuteandcommitresulttohardware — 0resultisdiscarded• mnemonicnameofinstruction• comp–oneormoreinstructioncompletersusedto qualifymnemonic• dest–oneormoredestinationoperands• srcs–oneormoresourceoperands• //comment• Instructiongroupsandstopsindicatedby;; — Sequencewithoutreadafterwriteorwriteafterwrite — DonotneedhardwareregisterdependencychecksAssembly Examplesld8 r1 = [r5] ;; //first groupadd r3 = r1, r4 //second group• Secondinstructiondependsonvalueinr1 —Changedbyfirstinstruction —CannotbeinsamegroupforparallelexecutionPredicationSpeculativeLoadingControl & Data Speculation• Control —AKASpeculativeloading —Loaddatafrommemorybeforeneeded• Data —Loadmovedbeforestorethatmightaltermemory location —SubsequentcheckinvalueSoftware PipeliningL1: ld4 r4=[r5],4 ;; //cycle 0 load postinc 4 add r7=r4,r9 ;; //cycle 2 st4 [r6]=r7,4 //cycle 3 store postinc 4 br.cloop L1 ;; //cycle 3• Addsconstanttoonevectorandstoresresultinanother• Noopportunityforinstructionlevelparallelism• Instructioniniterationxallexecutedbeforeiterationx+1begins• Ifnoaddressconflictsbetweenloadsandstorescanmove independentinstructionsfromloopx+1toloopxUnrolled Loopld4 r32=[r5],4;; //cycle 0ld4 r33=[r5],4;; //cycle 1ld4 r34=[r5],4 //cycle 2add r36=r32,r9;; //cy ...
Nội dung trích xuất từ tài liệu:
Bài giảng Computer Organization and Architecture: Chapter 15William StallingsComputer Organizationand Architecture6th EditionChapter 15IA-64 ArchitectureBackground to IA-64• Pentium4appearstobelastinx86line• Intel&HewlettPackard(HP)jointlydeveloped• Newarchitecture —64bitarchitecture —Notextensionofx86 —NotadaptationofHP64bitRISCarchitecture• Exploitsvastcircuitryandhighspeeds• Systematicuseofparallelism• DeparturefromsuperscalarMotivation• Instructionlevelparallelism —Implicitinmachineinstruction —Notdeterminedatruntimebyprocessor• Longorverylonginstructionwords(LIW/VLIW)• Branchpredication(notthesameasbranch prediction)• Speculativeloading• Intel&HPcallthisExplicitParallelInstruction Computing(EPIC)• IA64isaninstructionsetarchitectureintended forimplementationonEPIC• ItaniumisfirstIntelproductSuperscalar v IA-64Why New Architecture?• Nothardwarecompatiblewithx86• Nowhavetensofmillionsoftransistorsavailableonchip• Couldbuildbiggercache — Diminishingreturns• Addmoreexecutionunits — Increasesuperscaling — “Complexitywall” — Moreunitsmakesprocessor“wider” — Morelogicneededtoorchestrate — Improvedbranchpredictionrequired — Longerpipelinesrequired — Greaterpenaltyformisprediction — Largernumberofrenamingregistersrequired — AtmostsixinstructionspercycleExplicit Parallelism• Instructionparallelismscheduledatcompile time —Includedwithmachineinstruction• Processorusesthisinfotoperformparallel execution• Requireslesscomplexcircuitry• Compilerhasmuchmoretimetodetermine possibleparalleloperations• CompilerseeswholeprogramGeneral OrganizationKey Features• Largenumberofregisters —IA64instructionformatassumes256 – 128*64bitinteger,logical&generalpurpose – 128*82bitfloatingpointandgraphic —64*1bitpredicatedexecutionregisters(seelater) —Tosupporthighdegreeofparallelism• Multipleexecutionunits —Expectedtobe8ormore —Dependsonnumberoftransistorsavailable —Executionofparallelinstructionsdependson hardwareavailable – 8parallelinstructionsmaybespiltintotwolotsoffourifonly fourexecutionunitsareavailableIA-64 Execution Units • IUnit —Integerarithmetic —Shiftandadd —Logical —Compare —Integermultimediaops• MUnit —Loadandstore – Betweenregisterandmemory —SomeintegerALU• BUnit —Branchinstructions• FUnit —FloatingpointinstructionsInstruction Format DiagramInstruction Format• 128bitbundle —Holdsthreeinstructions(syllables)plustemplate —Canfetchoneormorebundlesatatime —Templatecontainsinfoonwhichinstructionscanbe executedinparallel – Notconfinedtosinglebundle – e.g.astreamof8instructionsmaybeexecutedinparallel – Compilerwillhavereorderedinstructionstoformcontiguous bundles – Canmixdependentandindependentinstructionsinsame bundle —Instructionis41bitlong – MoreregistersthanusualRISC – Predicatedexecutionregisters(seelater)Assembly Language Format• [qp] mnemonic [.comp] dest = srcs //• qppredicateregister — 1atexecutionthenexecuteandcommitresulttohardware — 0resultisdiscarded• mnemonicnameofinstruction• comp–oneormoreinstructioncompletersusedto qualifymnemonic• dest–oneormoredestinationoperands• srcs–oneormoresourceoperands• //comment• Instructiongroupsandstopsindicatedby;; — Sequencewithoutreadafterwriteorwriteafterwrite — DonotneedhardwareregisterdependencychecksAssembly Examplesld8 r1 = [r5] ;; //first groupadd r3 = r1, r4 //second group• Secondinstructiondependsonvalueinr1 —Changedbyfirstinstruction —CannotbeinsamegroupforparallelexecutionPredicationSpeculativeLoadingControl & Data Speculation• Control —AKASpeculativeloading —Loaddatafrommemorybeforeneeded• Data —Loadmovedbeforestorethatmightaltermemory location —SubsequentcheckinvalueSoftware PipeliningL1: ld4 r4=[r5],4 ;; //cycle 0 load postinc 4 add r7=r4,r9 ;; //cycle 2 st4 [r6]=r7,4 //cycle 3 store postinc 4 br.cloop L1 ;; //cycle 3• Addsconstanttoonevectorandstoresresultinanother• Noopportunityforinstructionlevelparallelism• Instructioniniterationxallexecutedbeforeiterationx+1begins• Ifnoaddressconflictsbetweenloadsandstorescanmove independentinstructionsfromloopx+1toloopxUnrolled Loopld4 r32=[r5],4;; //cycle 0ld4 r33=[r5],4;; //cycle 1ld4 r34=[r5],4 //cycle 2add r36=r32,r9;; //cy ...
Tìm kiếm theo từ khóa liên quan:
Bài giảng Computer Organization and Architecture Computer Organization and Architecture IA-64 Architecture Tìm hiểu IA-64 Architecture Background to IA-64 Superscalar v IA-64Gợi ý tài liệu liên quan:
-
Bài giảng Computer Organization and Architecture: Chapter 1
18 trang 28 0 0 -
Ebook Computer organization and architecture: Designing for performance (Tenth edition)
864 trang 23 0 0 -
Computer organization and Architecture ninth edition - William Stallings
787 trang 21 0 0 -
Bài giảng Computer Organization and Architecture: Chapter 4
53 trang 18 0 0 -
Ebook Computer organization and architecture: Designing for performance (6th ed ): Part 1
586 trang 16 0 0 -
Bài giảng Computer Organization and Architecture: Chapter 2
53 trang 16 0 0 -
Bài giảng Computer Organization and Architecture: Chapter 12
59 trang 15 0 0 -
Bài giảng Computer Organization and Architecture: Chapter 18
62 trang 15 0 0 -
Bài giảng Computer Organization and Architecture: Chapter 11
36 trang 14 0 0 -
Bài giảng Computer Organization and Architecture: Chapter 13
38 trang 14 0 0