An Overview of how Each Inctruction Activates Each Stage
This part of the lesson explains how every stage is activated by each type of instruction. In order to do that we will use a simple list of unconnected instructions:
Program
lui a5 30
addi sp, s0, 4
srai sp, s0, 4
sub ra, sp, gp
blt x12 x10 -12
lw a1 4(t2)
sw a1 8(t2)
jal a5 30
Ripes will be used for the simulation.
Video Transcript
We will now focus on how each type of instruction activates each stage. Please load the 5 stage pipelined processor without Hazard Detection and without Forwarding, in Extended mode, with no initializations, and copy the program in the editor tab. Let us now start the simulation and consider the fetch stage.
-
The fetch stage contains the circuit that calculates the next Program counter value, choosing between the current value plus four (which means fetching the instruction immediately following thecurrent one, as instructions are 32 bits, or 4 bytes wide) and the alu result, that for example holds the destination address of branch and jump instructions. Mind however that the signal that actually makes the choice is calculated in the execution stage. Therefore we can say that although topologically this circuit belongs to the Fetch stage, it functionally belongs to the execution stage, as that is where it is controlled from. As for the instruction, in this stage it has not been decoded yet, so it has no semantic meaning for the processor. This means that each instruction is treated the same way: in this stage it is just this hex number. For each instruction we can verify that its hex code is matched, at the output of the instruction memory.
-
Starting from the decode stage, not all the signals will be used by each instruction. For example the load upper immediate instruction here does not feature source registers fields, and yet we see some values nonetheless. That is because those very bits that in other instructions decode into the two source registers here are are used for something else, the immediate. The processor ignores the values that have nothing to do with the current instruction, it just does not use them. Therefore, we will ignore them, but you have to know the Instruction Set to know what to ignore and what to check. This is particularly important because it is here in the decode stage, that all the signals that control the other stages are calculated. Here on the right you see the cheat sheet that you can find at the end of the lesson menu.
-
For the 'load upper immediate' instruction we can check the address of the destination register, the immediate, which is shifted by 12 positions as per instruction specification, and the following flags of the Control Unit from top to bottom:
- The register file write enable which is enabled, since this instruction will eventually write to a register.
- The write data select signal, which selects the alu result, which in this case will just contain a copy of the immediate. Remember that the register file is being written later, from the write back stage, through the write enable signal, the write address signal, and the write data signal. This means that the Register File chip although topologically part of the decode stage, is functionally part both of de decode stage itself for its read function, but also of the write back stage for the write function.
- The next five signals are only for memory and branch or jump instructions, so we ignore them.
- The next three signals select the first source register and the immediate as alu inputs, and 'load upper immediate' as alu function.
- We also ignore the last signal as it is used only by branch instructions.
-
Next cycle. -
For the 'add immediate' instruction we can check the address of the source register 1, and destination register, the immediate, and the following flags of the Control Unit from top to bottom:
- The register file write enable which is enabled, since this instruction will eventually write to a register.
- The write data select signal, which selects the alu result, which in this case will just contain the sum of the immediate and the content of source register 1.
- These three signals again select the first source register and the immediate as alu inputs and 'add' as alu function.
-
Next cycle. -
For the 'shift right arithmetic' immediate instruction, the analysis is mostly similar to the last instruction, except for the alu function, which is 'shift right arithmetic'.
-
Next cycle. -
For the subtraction instruction we can check the address of the source registers, and destination register. The analysis of the control unit is the same as the shift right immediate instruction, except the second alu input is selected to be the content of source register 2, instead of immediate, and the alu function is 'subtract'.
-
Next cycle. -
For the 'branch less than' instruction we can check the address of the source registers, the immediate, and the following flags of the Control Unit from top to bottom:
- The register file write enable which is disabled, since this instruction will not eventually write to a register, or do anything really in the memory and write back stages.
- We ignore the write data select and the three memory signals for this very reason.
- We ignore the do jump signal, but we check the do branch signal being enabled.
- The next three signals select the program counter and the immediate as alu inputs, and 'add' as alu function, as the alu will contain the destination address as a relative offset, from the current value of program counter.
- The last signal selects 'less than' as branch comparison.
-
Next cycle. -
For the load instruction we can check the address of the source and destination registers, the immediate, and the following flags of the Control Unit from top to bottom:
- The register file write enable, which is enabled since this instruction will eventually write to a register.
- The write data select which is set to the memory output, as the value to be written to the register file will be the one retrieved from memory.
- The next three signals instruct the memory not to write, but to read an entire word from memory.
- We ignore the do jump and do branch flag.
- The next three signals select the first source register and the immediate as alu inputs, and add as alu function, as the alu will contain the memory load address as a relative offset, from the value of the source register.
- We also ignore the last signal as it is used only by branch instructions.
-
Next cycle. -
For the store instruction the analysis is similar to the load, except the memory is instructed to write a word, instead of reading one.
-
Next cycle. -
Finally for the 'jump and link' instruction we can check the destination register and the immediate, and the following flags of the Control Unit from top to bottom:
- The register file write enable, which is enabled, since this instruction will eventually write to a register.
- The write data select which is set to the Program counter plus four signal, as that is the value to be saved to the register file when a jump is done. The reasons have to do with how calls in operating systems are implemented, but this is a matter for another course.
- We ignore all the next signals except the do jump flag, which is lit, and the three flags that select the program counter and immediate as alu inputs, and 'add' as alu function, as this is how the jump address is calculated, as an offset from the program counter.
This was all of the decode cycle. Onto the next stages, we are going to go faster as most of the functionality has already been decided.
-
-
The execution stage is where all actual calculations take place. This means either alu functions or branch comparisons. Note that here is also where the Program counter select signal is calculated, for controlling the program counter mux in the fetch stage. Now for each instruction we will see what these operations mean. Mind that these are not the only ones, but there is at least one representative of each major category.
-
The 'load upper immediate' instruction uses the alu, which has as inputs zero and the already shifted immediate. The alu function is 'load upper immediate', which simply passes the second alu operator, the immediate, to the output. That is the value that will be written to the destination register, x15 in this case.
-
Next cycle. -
The 'add immediate' instruction also uses the alu, with inputs the content of source register 1, which is x8 in this case, and the immediate, in this case 4. The alu function sums the two, and the result is the value being written to the destination register, x2 in this case.
-
Next cycle. -
The 'shift right arithmetic immediate' instruction uses the execution stage, just like the 'add immediate' instruction, besides the alu function which is 'arithmetic shift right', which shifts right with most significant bit extension, thus maintaining the sign of the operand.
-
Next cycle. -
The subtraction instruction uses the alu, with inputs the two source registers, x2 and x3 in this case. The alu function is 'subtract' and the result is the value being written to the destination register, x1 in this case.
-
Next cycle. -
The 'branch less than' instruction uses both the alu, the branch comparison chip and the program counter select circuit. The alu inputs are the prorgram counter and the immediate, which represent respectively the value of the program counter, from when the branch instruction was in the fetch stage, which is 0 ex 10 instead of the current 0 ex 18, and the offset from it that the branch is targeting, that is -12, or three instructions back. Note that saying "the value of the program counter, from when the instruction was in the branch stage" is the same as saying "the instruction address". The alu function is 'add' and the result is the branch destination address. The branch comparison chip takes as inputs the comparison type, in this case 'less than', and the values of the two source registers. Its result is a flag that signals whether the comparison is satisfied or not. The name is 'branch taken' because when the comparison is satisfied the branch is considered taken, and the mux selector in the fetch stage must select the destination address. Let us then study up close the program counter select circuit, which is made of one 'and' and one 'or' gate. If we notice the names of the signals, we see that the resulting flag, the program counter select circuit, is activated only if, case 1, the instruction is a jump instruction, or, case 2, if it is a branch instruction with at the same time the branch being taken. If you initialized no registers at the beginning of the simulation, the branch should not be taken right now, as the values of both the source registers in input to the branch comparison chip are zero, thus the first one, x12, is not less than the second one, x10. Just for fun we can rewind by one cycle the simulation, and create a case in which the branch is taken, for example by setting the x10 register to the value 2. Now, back at cycle 6 we can see that the program counter selector is activated, and the next program counter value is going to be the current alu result, which is 4. Now go back again one cycle and reset x10 to zero, so we can see the next instructions passing by the stage.
-
Next two cycles. -
Both the load and store instructions use the alu only, with inputs the content of the first source register and an immediate. The function is 'add' and the result is the address of the data memory that they target, just one for reading from it and the other for writing to it.
-
Next cycle. -
Finally the 'jump and link' instruction uses the alu much like the branch instruction, therefore calculating the sum of program counter and an offset, but bypasses the branch taken signal, activating the program counter select flag, in order to jump with no conditions.
This was the execution stage.
-
-
The memory stage is only used by the load and store instructions, which are the only ones to interact with the data memory. You recognize the data memory not being used when the value of the memory operation flag is 'nop'. With the load instruction instead its value is 'load word', and with the store instruction it is 'store word'. For both we can use the Memory tab, to check that memory is being used correctly: in the case of the load, we can just check the value of the memory cell corresponding to the address, in this case 2, whereas for the store, we have to check that the input data, for this case zero, is written to the memory cell of the input address, in this case 8, but at the next cycle. This is due to the fact that, as any register based circuit, any memory shows its output one cycle after having received the input and the write flag.
-
Finally we come to the write back stage, which is only used by the instructions that write one of the registers, which we can recognize by the activated register file write signal. The exact register address is given by the write index signal.
- The 'load upper immediate' instruction has the signal lit, so it writes the alu result, which in this case represents the shifted immediate. The write address is in this case 15.
- In the next three cycles, the 'add immediate' instruction, the 'shift right arithmetic immediate' instruction and the subtraction instruction all do the same as the 'load upper immediate' instruction, with the alu result being the result of the respective operations.
Next cycle.- The branch doesn't do anything in the write back stage, and we can see the disabled register file write signal.
Next cycle.- The load instruction writes to the x11 register the output of the data memory.
Next cycle.- The store doesn't do anything in the write back stage, and we can see the disabled register file write signal.
Next cycle.- Finally the 'jump and link' instruction writes the program counter plus four signal, to the register 15.
With this the overview of the impact of each instruction of each stage is concluded.
Finally, let us see how to graph the execution of the program we just followed. Ripes does it for us with this button, but we have to know how to read the diagram.
- Rewind the whole program. If we press the pipeline diagram button now, we can verify that the pipeline, at cycle zero, only contains the load upper immediate instruction, in the fetch stage.
- If we do the same at the next cycle, we can verify that the pipeline, at cycle one, contains the load upper immediate instruction, in the decode stage, and the add immediate instruction, in the fetch stage.
- If we do that one last time at the next cycle, we can verify that at cycle two, the load upper immediate instruction is in the execution stage and the other two instructions are in the two following stages.
The diagram is constructed like this for all following stages.