LEARN BY FIXING: one more VERILOG CPU

since I commonly work with students, I’m always on the look-out for a simple CPU, preferably in Verilog, in the Goldilocks zone. That is, not as well simple as well as not as well hard. I had high hopes for this 16-bit RISC processor presented by [fpga4student], however without some additional work, it most likely isn’t usable for its intended purpose.

The CPU itself is quite simple as well as fits on a relatively long web page. However, the details about it are a bit sparse. This isn’t always a poor thing. You can offer trainees as well much help. then again, you can likewise offer as well little. However, what was worse is one of the modules needed to get it to work was missing! You may suggest it was an exercise delegated the reader, however it most likely should have been explained that way.

At first, I was prepared to delete the bookmark as well as move on. then I decided that the process of fixing this style as well as doing a bit analysis on it may really be more instructive than just studying a completely working design. So I decided to share my repair with you as well as look inside the design a bit more. On top of that, I’ll show you exactly how to get the thing to run in an on the internet simulator so you can experiment with no software application installation. Of course, if you are comfortable with a Verilog toolchain (like the ones from Xilinx or Altera, or even free ones like Icarus or CVer) you should have no issue making that work, either. this time I’ll focus on exactly how the CPU works as well as next time I’ll show you exactly how to simulate it with some free tools.

The Design

Let’s begin with a block diagram of the CPU. It isn’t much different from other RISC architectures, particularly any type of that don’t utilize a pipeline. A program counter (PC) drives the direction memory. There’s a dedicated adder to add four to the PC for each direction since each direction is four bytes. A mux lets you tons the PC for the next direction or with a jump target (actually, an absolute jump, a computed branch, or a return address). There’s one more dedicated adder for the computed branches.

The processing occurs in an arithmetic logic system (ALU) that performs different operations. The destination can be primary memory or one of the registers. The register data uses an old technique to prevent a typical problem. suppose you can checked out one register per cycle. If you only enable one register in an instruction, that’s fine. however if you enable an direction to do something like add two registers, you’ll have difficulty loading both of them unless you stretch out the direction time. That’s why the register file has two output ports.

The reality is, the register data is at least one area where the style would not synthesize to genuine hardware in addition to it could. For one thing, there’s a for loop in the preliminary block to zero the registers. most synthesis tools would just throw that away. You’d be much better off with a reset signal. The other possible issue depends on what precise FPGA you will target as well as what tools you use.

The designer provides two checked out ports to the registers, however the underlying storage is the same. This would make it difficult to utilize specialized RAM cells if they were available. one more typical method is to just utilize two separate register blocks, one for each checked out port. A compose will send data to both blocks so from the outside you can’t tell the difference. Frequently, though, this will result in a faster as well as more compact design.

It would be fascinating (and not extremely difficult) to rewrite the register data to do this. However, if you aren’t going to develop down to hardware you most likely won’t notice any type of difference.

Like most similar CPUs, the whole control works out to muxes selecting what data gets sent where. In particular, there are four muxes in the processor’s data path:

PCSrc – Routes the “next” PC value to the program counter

RegDst – Selects what register to compose from two fields in the direction (the diagram shows three inputs, however that appears to be an error)

BSrc – Selects the second disagreement to the ALU (either an instant value or a register value)

WBSrc – The “write back” mux selects what data is set back to the registers for writing

Design Tables

The rest of the style shows the thirteen instructions, the five direction formats, as well as the control signals required for each of the formats. The nuances of the instructions in each classification depend upon what the ALU is set to do. In other words, an add direction as well as a subtract direction are precisely the exact same except for what the ALU does. As you may imagine, the ALU takes two inputs as well as an operation code as well as creates an output.

The original publish doesn’t really state which instructions are in which category, however it is quite simple to puzzleout. The tons as well as store instructions are in the memory gain access to formats. The Branch on equal as well as Not equal instructions are in the branch category. The jump direction has its own format. All the other instructions are “data processing.” The one table shows a “hamming distance” op code, however this doesn’t appear anywhere else–including in the code–so I suspect it is a cut as well as paste error.

The two tables do a great task of summarizing the operations requirement to make the CPU work. There are nine unique control signals:

RegDst – This corresponds to the mux in the diagram of the exact same name as well as selects if the destination is a register (shows up as reg_dst in the code)

ALUSrc – Selects the source of the ALU disagreement (same as the BSrc mux in the diagram, as well as shows up as alu_src in the code)

MemtoReg – active when a memory to register transfer occurs (mem_to_reg in the code)

RegWrite – set when compose should go to a register (reg_write in the code)

MemRead – set when a memory checked out is the source data for the direction (mem_read in the code)

MemWrite – set when memory is the compose destination (mem_write in the code)

Branch – active when a branch is in development (combination of beq as well as bne signals in the code)

ALUOp – combined with part of the instruction, selects the operation to perform in the ALU (alu_op in the code)

Jump – active when a jump is in progress

The table corresponds directly to Verilog in the control system except for the name changes, which is regrettable as it makes the table a bit harder to follow. For example, right here is the code for a data processing direction with opcode 0010:

4’b0010: // data_processing
begin
reg_dst = 1’b1;
alu_src = 1’b0;
mem_to_reg = 1’b0;
reg_write = 1’b1;
mem_read = 1’b0;
mem_write = 1’b0;
beq = 1’b0;
bne = 1’b0;
alu_op = 2’b00;
jump = 1’b0;
end
Compare that to the table in the original publish as well as you’ll see it maps directly. In English, the direction is a checked out from two registers that writes back to the registers with an ALU operation code of 0 as well as it isn’t a jump or a branch.

Inexplicably, this block is duplicated for all the data processing instructions even though it shouldn’t be necessary. Luckily, for simulation, it won’t truly matter as well as most synthesis tools will figure it out as well as merge the identical code for you.

Next Time

In the next installment, I’ll show you exactly how to tons the style into one of my preferred quick style tools, EDA Playground. There was a missing data as well as some massaging necessary to get it to work with the on the internet tool. However, the CPU does work as promised, once you figure out a few peculiarities. If you want a sneak peek at the simulation, you can inspect out the video, below.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post