DEMO - Program Towards Hazards

This part of the lesson will deal with the execution of an actual meaningful program: try to always keep in mind what the instruction is doing and what configuration it expects when it starts executing.

Program

  • Ripes Processor: 5-stage processor (w/o forwarding or hazard detection) Extended

  • Registers to Initialize:

    • t1 = 0x123
    • t2 = 0xbc
    • a0 = 0x00000fff
  • Program:

sw t1 0(t2)         # M[0xbc] = 0x123
LOOP:
   lw a1 0(t2)      # a1 = M[0xbc] 
   add a2 a1 a1     # a2 = a1+a1
   sw a2 0(t2)      # M[0xbc] = a2
   blt a2 a0 LOOP   # jump if a2<a0

Video Transcript

Now onto the next demo:

  • Load the 5 stage processor without forwarding or hazard detection with the extended layout. This is the same processor as the last demo.
  • Initialise register t1 to the value 0 ex 123, t2 to the value 0xbc, a0 to the value 0x00000fff.
  • Copy and paste the program below in Ripes' Editor tab.

Let us first understand what the program does. The first instruction loads the value of t1 into the memory cell with address "bc", which unlike the last program is addressed with a register instead of with an offset. The second instruction loads such value into the register a1. The next instruction sets a2 to two times a1, and then the program stores this value in the same memory cell 'bc'. Then the program jumps to the start of the loop if a2 is less than a0. Basically, the program doubles the value inside a2 at each loop execution, until it becomes greater than a0. By changing the initialization of a0 we can control how long the program will run (bigger a0 means it runs for more cycles).

Onto the program, let us focus mainly on the decode stage.

Cycle zero to cycle two

In the decode stage, both the first and second instructions need registers which were initialised, x6 and x7, or t1 and t2. We can verify their values. Let us also keep in mind that the load instruction will write register x11 in the write back stage, which means in cycle five.

Cycle three

From the analysis of the assembly code, we saw that the add instruction, which is now in the decode stage, uses the value of a1, or x11, that is supposed to be set by the load instruction as 0 ex 123. However, we can see that the value being retrieved for the address 0x0b, which is x11, is zero. That is because, as we were saying for the previous cycle, the load instruction will write the a1 register in cycle five, that is in two cycles. By then the add instruction will be in the memory stage, and cannot use the alu anymore to calculate its sum.

Towards Hazards

Imagine being the engineers that invented the pipeline. You just subdivided the processor execution into 5 stages, each that executes faster than the full processor. This lets you process more instructions in the same amount of time, because the execution of each happens in parallel to that of four other instructions, at any given time. This also means that each instruction begins its execution, when two or three other instructions are still not completely executed, which is the same as to say that any register that is modified by a given instruction, is not actually modified until at least three cycles after the instruction starts execution. So what if, as in this case with the register a1, the instruction that immediately follows the load needs the loaded value? More in general, given any instruction that modifies the value of a register, what if an instruction immediately after needs to use the new value of that register? If the problem propagates, it means that execution will contain errors and it cannot be continued.

In cycle four we see that the result of the alu is zero and not the expected 0 ex 246, because the values of the registers were retrieved as zero instead of the expected 0 ex 123. This means that the value that will be written to the a2 register will be zero, which is wrong.

Therefore no, the execution cannot be continued as the current implementation of the pipeline leads to errors.

How to solve problems like this will be the topic of the next lesson, and as assignment you will be asked to analyse another assembly program, which contains a slightly different type of error, and write a small report detailing how you found it, much in the same way in which we have commented on the last two programs.

So, you will also be asked to speculate a solution, based on the following part of the lesson (and, if you want some spoilers, by trying to understand how the other three pipelined processors available in Ripes, solve the problem. Mind that this is not strictly necessary, as those processors will be the main topic of the next lesson) .

Back to our problem, hopefully you have now understood the need for some tweaks to the current pipeline hardware. Let us then try to understand intuitively how this can be solved, which again will be the topic of the next lesson.

In this particular case, the load instruction is writing back to the register file the value 0 ex 123 in cycle five, while the following instruction, the add, retrieves it in cycle three. This kind of occurrence is called a hazard (or more precisely a dependence, which is a possible hazard). There are two ways to make sure that the add can retrieve the correct value.

The first way is to make sure that the add instruction is in the decode stage at cycle five. This implies stalling it there for two cycles, by disabling the IDEX interstage register, while the load continues executing. This also implies that all instructions after the add instruction will be stalled, as if in a cue. Naturally, there would be the need of a chip that detects such occurrences, called a Hazard Detection Unit.

Another way is to pinpoint where exactly the value that the add instruction needs is ready, or where it is really needed. In our case, the value is ready as soon as it is retrieved for memory, in the memory stage, which is one cycle before the original case. At the same time, the value is retrieved in the decode stage but it is not needed there, it is only needed for the alu in the execution stage, that is one cycle after the original case. It is possible to forward the needed value from a point of the pipeline to another, in order to minimise the stall cycles of an instruction that is waiting for a value to be ready. This means in our case, forwarding the value 0 ex 123 for example from the write back stage directly to the execution stage, so that the add instruction must only wait for one cycles instead of two. Again, there would be the need of a chip that handles such occurrences, called a Forwarding Unit.

In the next lesson we will see exactly how to generalise this.