DEMO - Simple Meaningless Program

This part of the lesson will deal with the execution of a meaningless program. Since it is meaningless, don't try to understand what it does, focus on how instructions communicate with one another when they are together inside the pipeline.

Program

  • Ripes Processor: 5-stage processor (w/o forwarding or hazard detection) Extended
  • Registers to Initialize:
    • a0 = 0x5
  • Program:
sw a0 0xbc(x0)   # M[0xbc]=a0
li a1 0x7        # -> a1 = 7
li a2 0x3        # -> a2 = 3
lw a3 0xbc(x0)   # a3=M[0xbc]
LOOP:
    add a4 a0 a1     # -> a4=a0+a1
    sub a5 a1 a2     # -> a5=a1-a2
    add a4 a3 a3     # -> a4=a3+a3
    slli a3 a3 1     # -> a3=a3<<1
    bne a5 x0 LOOP    # Branch if a5 is different from zero

Video Transcript

The first program actually does nothing meaningful, which helps us focus on how it flows through the processor. We will comment on each cycle of the execution. I suggest pausing the video whenever you feel like you are ready to analyse the next step yourself, and then let the video continue to verify your work. For the Fetch stage we will focus on the choice of the next PC value, for the Decode stage we will focus on the flags that influence the behaviour of the other stages, for the execution stage we will focus on abstracting the alu function, in relation to the instruction function, while in the last two stages we will focus on the completion of the processing of each instruction.

Side note, it is very important that you know hex values and register ABI names by heart, as they will be used interchangeably for naming registers. For example, often values in the visualisations will be in hex, while the register file on the right will have both decimal and ABI representations, or again the assembly code will use ABI register names. Make sure to know them well, not to get confused.

First of all:

  • Load the 5 stage processor without forwarding or hazard detection with the extended layout.
  • Initialise register a0 to the value 5.
  • Copy and paste the program below in Ripes' Editor tab.

Let us first understand what the program does, even if meaningless. The first instruction loads the value of a0 into the memory cell with address "bc". The second and the third instructions load 7 and 3 in a1 and a2 respectively. The fourth loads the content of the memory cell "bc" into a3. At this point we will expect a3 to hold the same value as a0. The next four instructions are simple arithmetic operations. At the end of each we should see respectively a4 equal to 12, a5 equal to 4, a4 equal to 10, a3 equal to 10, because shifting left by 1 position is equivalent to multiplying by 2. The last instruction branches back to the label if a5 is different from zero. Since neither a1 or a2 are being written and a5 will remain their sum, we expect the branch to continue to be taken forever, while both a3 and a4 continue to be doubled forever at each loop.

Let us now comment the execution, cycle by cycle.

Cycle Zero

The first instruction is in the Fetch Stage. We can see that the instruction in hexadecimal coming out of the instruction memory, matches the one of the first instruction. This can be done with all following instructions. As we can see from this mux, the next program counter value will be the current one plus four, which means that the instruction immediately following the store will be loaded, in this case the load immediate, or addi. This will not always be the case, we will see what happens when the branch reaches the ex stage, which is where this mux that chooses the next program counter is controlled from.

Cycle One

For the Fetch stage the same considerations apply as before. At the same time the Decode stage now contains the store instruction. I remind you that not all the signals in this stage are used by each instruction. For example the store instruction does not have a destination register field, as it uses that same bit range to encode part of the immediate, but we can see some value here nonetheless. The processor ignores the values that have nothing to do with the current instruction, by just not using them. Therefore, we will ignore them too, but you have to know the Instruction Set, to know what to ignore and what to check. For the store instruction, we can verify that the source register 1, which is x0, and the source register 2, which is x10, are indeed retrieved from the register file. At the beginning of the program we expect x10 to contain the value 5, and we can verify it here. Also the immediate output corresponds to the expected value of 188. We can also verify that the register file write enable is not activated, as the store instruction does not have to write to the register file, later in the write back stage. On the other hand, the data memory write enable is active, as the store instruction will write to data memory in the memory stage. The other lit signal is the second operator selector for the alu, which will be used in the next cycle to select the immediate. Finally, one of the signals that is always a good measure to check is the alu control, which now has the value "add". The store instruction uses the addition function of the alu, because it calculates the address for writing the data memory, by adding an offset, in this case 188, to the value of a register, in this case x0.

Cycle Two

The execution stage now contains the store instruction and, as expected, the alu is summing the immediate 188 with the content of x0, which is always zero, resulting in 188. This value will be used to address the data memory during the next cycle.

The decode stage contains the load immediate instruction: as before we have to check only the relevant signals. In this case the register file write enable, as this instruction will write 7 in register x11 in the write back stage. The immediate 7, and the flag that allows the alu to use it, as well as the alu control flag, which is once again set on "add". Finally, the write back select signal is set on alu result. That is because in the write back stage the value that has to be written, in this case to x11, is the result of the sum of 0 and the immediate 7.

As an example of why we ought to ignore the non relevant signals, the register 2 index signal contains the value 7. This has nothing to do with the immediate, just with its encoding. That very same subset of bits, that in other instructions decodes to the source register 2 index field, is used by the addi instruction to encode the immediate 7.

Cycle Three

The store instruction is now in the Memory stage, and it will be here that it will complete its execution. It will do nothing in the next stage, unlike most other instructions. Here we can see the memory write enable being used, after being calculated in the decode stage two cycles ago. From the decode stage also comes the data in, which is just the content of register 2 and we can confirm it is the value 5, while as already said the address comes from the execution stage, as it is just the alu result.

However, if we look in the memory tab for the memory cell 0xbc, we see that it does not contain the value 5 just yet. This is because, as for any register and therefore any memory, it will show its output at the next cycle.

We can verify in the execution stage that the alu is indeed summing the immediate, with the value zero.

We are not going to repeat anything about the fetch stage, as it contains the very same type of instruction as last cycle. You can use it as exercise also for the next three cycles.

Cycle Four

Here in the write back stage the instruction does nothing, but we can now verify in the Memory tab that it has actually written the value 5 into memory cell 0xbc.

The instruction in the memory stage also does nothing, as the addi instruction has nothing to do with the data memory.

We are not going to repeat anything about the execution stage, as it contains the very same type of instruction as last cycle. You can use it as exercise.

Finally, the decode stage contains the load: we can see that it will write to the register file in a few cycles, that it uses the immediate 0xbc in the next stage and that it reads from data memory.

Cycle Five

Here in the write back stage the add immediate instruction finally completes its processing, by selecting the alu result, that two cycles ago was calculated as the sum of 0 and 7, and writing it to the register file at index 0xb. Mind that at the same time another instruction is in the decode stage, and is doing some other thing in parallel. Since it is an add instruction it is reading two registers from the register file. This instruction is the first of a streak of 4 arithmetic operations, two examples of which you have already seen, so we will focus less on the details from now on.

Analysing the assembly, we found that we expect this add instruction to retrieve the register a0 with value 5, and the register a1 with value 7. However, on the right we can see that the register a1 is still zero, since the value is being written in this cycle and will be visible from the next. Fortunately, the implementation of the register file of this specific processor, allows for reading a register at the same time of it being written, so with input 0xb, which corresponds to a1, we can still see the value 7. But what if this was not the case or worse, what if the previous instruction, the load, needed the value of a1? The value was not ready at the previous cycle. We will cover this case in the next part of the lesson, so remember it.

So the add instruction retrieves the correct values of 5 and 7, and will write their sum in register 14 in the write back stage.

Cycle Six

The next sub instruction needs the values of registers a1 and a2 being 7 and 3 respectively. Again while a1 is ready from the previous cycle, a2 is being written to the register file right now, as you can see, but thanks to the implementation of this specific register file, it can still be used by the sub instruction.

Cycle Seven

The same happens now, when the load instruction is writing 5 to the register a3 from the write back stage,and the instruction in the decode stage needs to use it.

Cycle Eight

None of the add and sub instructions do anything in the memory stage, and they complete their execution in the write back stage. Now, the first instruction of the streak is here in the write back, and all the others will complete here in the next cycles in the same way.

Cycle Nine

Now the branch instruction is in the decode stage and needs the value of a5, which again is being written at the same time with the value 4. Being the last instruction of the program, the fetch stage remains empty. For confirmation we can see the program counter value in the fetch stage being 24, that is the address of the instruction just after the branch, which is empty. If you manage to sneak a malicious instruction there, congratulations, you have created a very rudimentary virus.

Anyway let us focus on the decoding of the branch instruction. The three lit flags are the do branch flag, the flag that switches the first alu input to the program counter, and the flag that switches the second alu input to the immediate. The alu operation is again "Add", as the branch instruction in the next cycle will sum the program counter and the negative immediate, to obtain the address of the instruction to jump backwards to.

Remember that the difference between jump and branch is that the jump requires no condition, while the branch does.

Cycle Ten

The first two stages are empty. The next instruction to fetch is being calculated in this cycle in the execution stage, and depends on whether the branch is taken. You can see the 'branch taken' flag being activated, and ultimately selecting the result of the alu as next program counter, back in the fetch stage. The new value of the program counter is the branch instruction address minus 16, which is the distance in bytes between the branch instruction and the targeted label. Note that the address on any instruction is the same value of the program counter, taken from when the instruction was in the decode stage: they are the same thing. The branch instruction will do nothing in the next two stages. At the next cycle the instruction whose address is the new program counter value will be retrieved. In this case, it is the first instruction of the loop.

From now on the execution will continue in the same way indefinitely.