-
microarchitecture.pdf下载
资源介绍
Contents
1 Introduction.......................................................................................................................3
1.1 About this manual.......................................................................................................3
1.2 Microprocessor versions covered by this manual........................................................4
2 Out-of-order execution (All processors except P1, PMMX)................................................5
2.1 Instructions are split into uops.....................................................................................5
2.2 Register renaming......................................................................................................6
3 Branch prediction (all processors).....................................................................................7
3.1 Prediction methods for conditional jumps....................................................................7
3.2 Branch prediction in P1.............................................................................................13
3.3 Branch prediction in PMMX, PPro, P2, and P3.........................................................17
3.4 Branch prediction in P4 and P4E..............................................................................18
3.5 Branch prediction in PM and Core2..........................................................................21
3.6 Branch prediction in AMD64.....................................................................................22
3.7 Indirect jumps (all processors except PM and Core2)...............................................25
3.8 Returns (all processors except P1)...........................................................................25
3.9 Static prediction........................................................................................................26
3.10 Close jumps............................................................................................................27
4 Pentium 1 and Pentium MMX pipeline.............................................................................29
4.1 Pairing integer instructions........................................................................................29
4.2 Address generation interlock.....................................................................................33
4.3 Splitting complex instructions into simpler ones........................................................33
4.4 Prefixes.....................................................................................................................34
4.5 Scheduling floating point code..................................................................................35
5 Pentium Pro, II and III pipeline.........................................................................................38
5.1 The pipeline in PPro, P2 and P3...............................................................................38
5.2 Instruction fetch........................................................................................................38
5.3 Instruction decoding..................................................................................................39
5.4 Register renaming....................................................................................................43
5.5 ROB read..................................................................................................................43
5.6 Out of order execution..............................................................................................47
5.7 Retirement................................................................................................................48
5.8 Partial register stalls..................................................................................................49
5.9 Partial memory stalls.................................................................................................52
5.10 Bottlenecks in PPro, P2, P3....................................................................................53
6 Pentium M pipeline..........................................................................................................55
6.1 The pipeline in PM....................................................................................................55
6.2 The pipeline in Core Solo and Duo...........................................................................56
6.3 Instruction fetch........................................................................................................56
6.4 Instruction decoding..................................................................................................56
6.5 Loop buffer...............................................................................................................58
6.6 Micro-op fusion.........................................................................................................58
6.7 Stack engine.............................................................................................................60
6.8 Register renaming....................................................................................................62
6.9 Register read stalls...................................................................................................62
2
6.10 Execution units.......................................................................................................64
6.11 Execution units that are connected to both port 0 and 1..........................................64
6.12 Retirement..............................................................................................................66
6.13 Partial register access.............................................................................................66
6.14 Partial memory stalls...............................................................................................68
6.15 Bottlenecks in PM...................................................................................................68
7 Core 2 pipeline................................................................................................................71
7.1 Pipeline.....................................................................................................................71
7.2 Instruction fetch and predecoding.............................................................................71
7.3 Instruction decoding..................................................................................................73
7.4 Micro-op fusion.........................................................................................................74
7.5 Macro-op fusion........................................................................................................74
7.6 Stack engine.............................................................................................................76
7.7 Register renaming....................................................................................................76
7.8 Register read stalls...................................................................................................76
7.9 Execution units.........................................................................................................78
7.10 Retirement..............................................................................................................80
7.11 Partial register access.............................................................................................80
7.12 Partial memory stalls...............................................................................................81
7.13 Cache and memory access.....................................................................................81
7.14 Breaking dependence chains..................................................................................82
7.15 Bottlenecks in Core2...............................................................................................83
8 Pentium 4 (NetBurst) pipeline..........................................................................................85
8.1 Data cache...............................................................................................................85
8.2 Trace cache..............................................................................................................85
8.3 Instruction decoding..................................................................................................90
8.4 Execution units.........................................................................................................91
8.5 Do the floating point and MMX units run at half speed?............................................93
8.6 Transfer of data between execution units..................................................................96
8.7 Retirement................................................................................................................98
8.8 Partial registers and partial flags...............................................................................99
8.9 Partial memory access............................................................................................100
8.10 Memory intermediates in dependence chains.......................................................100
8.11 Breaking dependence chains................................................................................102
8.12 Choosing the optimal instructions.........................................................................102
8.13 Bottlenecks in P4 and P4E....................................................................................105
9 AMD64 pipeline.............................................................................................................108
9.1 The pipeline in AMD64............................................................................................108
9.2 Instruction fetch......................................................................................................110
9.3 Predecoding and instruction length decoding..........................................................110
9.4 Single, double and vector path instructions.............................................................111
9.5 Integer execution pipes...........................................................................................112
9.6 Floating point execution pipes.................................................................................112
9.7 Mixing instructions with different latency.................................................................114
9.8 64 bit versus 128 bit instructions.............................................................................115
9.9 Data delay between differently typed instructions...................................................116
9.10 Partial register access...........................................................................................117
9.11 Partial flag access.................................................................................................117
9.12 Partial memory stalls.............................................................................................118
9.13 Loops....................................................................................................................118
9.14 Cache...................................................................................................................119
9.15 Bottlenecks in AMD64...........................................................................................120
10 Comparison of microarchitectures...............................................................................122
10.1 The AMD kernel....................................................................................................122
10.2 The Pentium 4 kernel............................................................................................123
10.3 The Pentium M kernel...........................................................................................124
10.4 Intel Core 2 microarchitecture...............................................................................125
10.5 Conclusion............................................................................................................126
3
10.6 Future trends........................................................................................................128
11 Literature.....................................................................................................................129