Bài giảng Computer Organization and Architecture: Chapter 15

Số trang: 30 Loại file: ppt Dung lượng: 448.00 KB Lượt xem: 11 Lượt tải: 0

10.10.2023

Phí tải xuống: 20,000 VND

Xem trước 3 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Bài giảng Computer Organization and Architecture: Chapter 15 - IA-64 Architecture hướng đến giới thiệu về Background to IA-64; Motivation; Superscalar v IA-64; Why New Architecture;...
Nội dung trích xuất từ tài liệu:
Bài giảng Computer Organization and Architecture: Chapter 15William StallingsComputer Organizationand Architecture6th EditionChapter 15IA-64 ArchitectureBackground to IA-64• Pentium4appearstobelastinx86line• Intel&HewlettPackard(HP)jointlydeveloped• Newarchitecture —64bitarchitecture —Notextensionofx86 —NotadaptationofHP64bitRISCarchitecture• Exploitsvastcircuitryandhighspeeds• Systematicuseofparallelism• DeparturefromsuperscalarMotivation• Instructionlevelparallelism —Implicitinmachineinstruction —Notdeterminedatruntimebyprocessor• Longorverylonginstructionwords(LIW/VLIW)• Branchpredication(notthesameasbranch prediction)• Speculativeloading• Intel&HPcallthisExplicitParallelInstruction Computing(EPIC)• IA64isaninstructionsetarchitectureintended forimplementationonEPIC• ItaniumisfirstIntelproductSuperscalar v IA-64Why New Architecture?• Nothardwarecompatiblewithx86• Nowhavetensofmillionsoftransistorsavailableonchip• Couldbuildbiggercache — Diminishingreturns• Addmoreexecutionunits — Increasesuperscaling — “Complexitywall” — Moreunitsmakesprocessor“wider” — Morelogicneededtoorchestrate — Improvedbranchpredictionrequired — Longerpipelinesrequired — Greaterpenaltyformisprediction — Largernumberofrenamingregistersrequired — AtmostsixinstructionspercycleExplicit Parallelism• Instructionparallelismscheduledatcompile time —Includedwithmachineinstruction• Processorusesthisinfotoperformparallel execution• Requireslesscomplexcircuitry• Compilerhasmuchmoretimetodetermine possibleparalleloperations• CompilerseeswholeprogramGeneral OrganizationKey Features• Largenumberofregisters —IA64instructionformatassumes256 – 128*64bitinteger,logical&generalpurpose – 128*82bitfloatingpointandgraphic —64*1bitpredicatedexecutionregisters(seelater) —Tosupporthighdegreeofparallelism• Multipleexecutionunits —Expectedtobe8ormore —Dependsonnumberoftransistorsavailable —Executionofparallelinstructionsdependson hardwareavailable – 8parallelinstructionsmaybespiltintotwolotsoffourifonly fourexecutionunitsareavailableIA-64 Execution Units • IUnit —Integerarithmetic —Shiftandadd —Logical —Compare —Integermultimediaops• MUnit —Loadandstore – Betweenregisterandmemory —SomeintegerALU• BUnit —Branchinstructions• FUnit —FloatingpointinstructionsInstruction Format DiagramInstruction Format• 128bitbundle —Holdsthreeinstructions(syllables)plustemplate —Canfetchoneormorebundlesatatime —Templatecontainsinfoonwhichinstructionscanbe executedinparallel – Notconfinedtosinglebundle – e.g.astreamof8instructionsmaybeexecutedinparallel – Compilerwillhavereorderedinstructionstoformcontiguous bundles – Canmixdependentandindependentinstructionsinsame bundle —Instructionis41bitlong – MoreregistersthanusualRISC – Predicatedexecutionregisters(seelater)Assembly Language Format• [qp] mnemonic [.comp] dest = srcs //• qppredicateregister — 1atexecutionthenexecuteandcommitresulttohardware — 0resultisdiscarded• mnemonicnameofinstruction• comp–oneormoreinstructioncompletersusedto qualifymnemonic• dest–oneormoredestinationoperands• srcs–oneormoresourceoperands• //comment• Instructiongroupsandstopsindicatedby;; — Sequencewithoutreadafterwriteorwriteafterwrite — DonotneedhardwareregisterdependencychecksAssembly Examplesld8 r1 = [r5] ;; //first groupadd r3 = r1, r4 //second group• Secondinstructiondependsonvalueinr1 —Changedbyfirstinstruction —CannotbeinsamegroupforparallelexecutionPredicationSpeculativeLoadingControl & Data Speculation• Control —AKASpeculativeloading —Loaddatafrommemorybeforeneeded• Data —Loadmovedbeforestorethatmightaltermemory location —SubsequentcheckinvalueSoftware PipeliningL1: ld4 r4=[r5],4 ;; //cycle 0 load postinc 4 add r7=r4,r9 ;; //cycle 2 st4 [r6]=r7,4 //cycle 3 store postinc 4 br.cloop L1 ;; //cycle 3• Addsconstanttoonevectorandstoresresultinanother• Noopportunityforinstructionlevelparallelism• Instructioniniterationxallexecutedbeforeiterationx+1begins• Ifnoaddressconflictsbetweenloadsandstorescanmove independentinstructionsfromloopx+1toloopxUnrolled Loopld4 r32=[r5],4;; //cycle 0ld4 r33=[r5],4;; //cycle 1ld4 r34=[r5],4 //cycle 2add r36=r32,r9;; //cy ...