Grouping the Components of the Single-Cycle Processor
This part of the lesson deals with re-arranging the components of the single cycle processor, together with all the modifications to the existing signals that we have to make in order to maintain the same exact functionality and avoid runtime execution errors.
Recap of the the computation flow of a processor
Let us build the actual pipelined processor. As we anticipated during the last part of the lesson, the point is to divide the components of the processor by function, and to divide the clusters with registers. Let us then start with a quick high level recap of the flow of the instruction in the processor.
-
The processor calculates the new program counter value, gives it as address input to the instruction memory, and retrieves the corresponding instruction. We will call this function "Instruction Fetch", or 'IF'.
-
The processor decodes the value of all its fields, which includes retrieving the values of registers from the register file. We will call this function "Instruction Decode", or 'ID'.
-
The processor selects the correct inputs and the correct alu function, to make the calculation associated to the current instruction. The branch instructions will also use the Branch Comparison chip. We will call this function "Execution", or 'EX'.
-
If applicable to the current instruction, the processor interacts with the data memory to either read it or write it (think the load and store instructions). We will call this function "Memory", or 'MEM'.
-
Again if applicable, the processor writes the results of the Execution or of the Memory operations to the register file. We will call this function "Write Back", or 'WB'.
Building the pipeline piece by piece
Now let us build each stage, that is each of the new "areas" of the processor, constituted by the components grouped by function and divided by registers. In doing so, we can compile the table below, which keeps track of all components and connections for each stage.
| Stage | Which component are in this stage? | Which signals come from the following stages? | Which signals go towards previous stages? | Which signals go into the interstage register towards the next stage? |
|---|---|---|---|---|
| IF | ||||
| ID | ||||
| EX | ||||
| MEM | ||||
| WB |
Let us follow the same points we layed out in the recap above.
-
IF Stage: The first area contains the components that perform the "Instruction Fetch" function, which means the Program counter register, the mux that selects its next value, the plus four adder and of course the instruction memory. The other chips, like the decode chip, are part of the next stages as they decode the instruction, not just fetch it. Now that we have the components, we ought to recognize which signals go into the interstage register towards the next stage, which will be all the signals that are needed in the following stages of the processor, and which signals come from one of those following stages, which are usually needed when a signal that controls one of the components in a given stage, is calculated in one of the following stages. Since this is the first stage there will be no signal going towards previous stages, but those will be needed in the following stages when sending back, the signals that are calculated after they are needed. This will be more clear as we go along.
-
Of the outputs of the components of this stage, all three are needed in the following chips: the instruction signal is needed in the decode chip, the program counter signal in one of the alu operator muxes, the program counter plus four signal in the mux that selects the register file write data. They therefore all go into the interstage register, which since they are all 32 bit signals will be 96 bits wide.
-
Of the inputs of the components only one is not covered by neither the components themselves, nor signals coming from the previous stage, that is the input of the mux that selects the next program counter value linked to the alu result. For example, a branch instruction with its branch taken, needs the alu to calculate its branch address, to feed the program counter. This is an example of a signal that is calculated in a later stage and sent back to a previous one.
-
-
ID Stage: The second area contains the components that perform the "instruction decode" function, which means firstly extracting opcode and source register indexes, therefore we need the decode chip, secondly the value of said registers, so we need the whole register file, and finally the immediates if any, so we need the immediate chip.
-
Of the outputs of these components, all three are needed in the following chips: the two values of the source registers are needed as inputs of the alu, and of the branch comparison chip, as well as data memory for the source register 2, while the immediate is only needed as alu input. Note that since the program counter signal is not used here, it has just to be passed through towards the next stage. The same goes with the program counter plus four signal.
-
Of the inputs of the components, again only one is not covered by the components themselves, which is the register file data in, calculated all the way back at the last mux. This is calculated at the end of all other operations, because many signals can be written in a register: arithmetic operations instructions write the alu result, loads write the memory output, jumps write the program counter plus four signal. There is however one observation that has to be done: when writing the register file, the data in signal and the write index signal must refer to the same instruction. This is not an issue in the single cycle processor, as the whole processor is processing a single instruction at any given time, but what we are doing is dividing the processor into stages, so that each stage can contain an instruction, and therefore process multiple instructions at the same time. That means that being the register file and the mux that selects the data in in two different stages, when writing the register file the instruction that will be in the same stage as the register file will be different than the one that the data in signal refers to. How to make sure that the write index signal will also refers to that same instruction, and not to the one currently in the decode stage? The solution is cutting the write index sigal and pass it on to the interstage register, towards the next stage, and pass it through all the stages until the same stage where the data in signal is calculated, to then be 'passed back' together with it. This ensures that the two signals refer to the same instruction, when writing the register. This also means that although we are placing the register file in this decode stage, functionally it is also part of the same stage that calculates the data in signal, which we have still to build. We can say that the register file chip is part of the decode stage when reading from it, and part of the other stage when writing to it.
-
-
EX Stage: The third area contains the components that perform the "execution function", that is the alu and the two muxes that select its inputs and the branch comparison chip.
-
Of the outputs of these components, the alu result and the value of source register two will go in the interstage register, towards the next stage, together with the write index signal and the program counter plus four signal, that are not used here. The alu result is also passed back to the fetch stage, where it is used as a possible next program counter value, as we already said.
-
Of the inputs of the components, all are covered by the components themselves or by signals passed from the previous stage, therefore no external ones are needed.
-
-
MEM Stage: The fourth area contains the components that perform the 'memory function', which is just the data memory.
-
Of its outputs, the only one, the data out signal, is passed to the next stage, together with the alu result signal, that other than being used here as the address input of the data memory, is also used as input of the register file write data selector mux, and together with the write index signal and the program counter plus four signal, that are again not used here.
-
Of its inputs, both are covered by the signals coming from the previous stage, so no external ones are needed.
-
-
WB Stage: Finally we get to the last stage, consisting of the only remaining component, the register file write data selector mux, that simply passes its output and the write index to the decode stage to perform the write operation. Remember that functionally the register file is also part of this stage, as its write function is controlled from here.
We now have a rudimentary pipelined processor. In the next part of the lesson, we will briefly see how each type of instruction activates each of its stages.
As an assignment you will be asked to load the same processor on Ripes but with the Extended layout, notice all the chips and signals that were previously hidden, and place them in the correct spot in the various stages, completing the table.
Wrap Up Exercises
- Where in the groups we just defined are placed the components that were hidden in the standard visualization of the processor but that are shown in the extended visualization? Complete the table above with all the new components and signals.
If needed, here you can find a sketchable screenshot of the extended single cycle processor.
- Synthesize a circuit in SHEAS that receives in input any RV32I instruction in hex form and outputs a single bit flag that indicates whether or not the input is a jump instruction.
Please write your answers on a document, to then submit as PDF in the "Assignments" section. Feel free to add images to your answers, if it helps to explain something more concisely.