5-Stage Pipelined RISC-V Processor

Project Overview

This academic project involved designing and implementing a complete 5-stage pipelined RISC-V processor using SystemVerilog as part of my Computer Architecture course at Oklahoma State University during Fall 2024. The processor supports the RV32I instruction set and includes advanced features like hazard detection, data forwarding, and branch prediction.

The design follows the classic RISC pipeline stages: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). Special attention was given to handling data hazards, control hazards, and optimizing performance through efficient forwarding mechanisms.

This project provided invaluable hands-on experience with computer architecture concepts and demonstrated thorough understanding of processor design principles, achieving an average CPI of approximately 1.2 cycles per instruction through effective hazard management and pipeline optimization.

Pipeline Architecture

5-Stage Pipeline Design

IF
Instruction Fetch

ID
Instruction Decode

EX
Execute

MEM
Memory Access

WB
Write Back

Key Components

Instruction Fetch (IF): Fetches instructions from instruction memory using program counter
Instruction Decode (ID): Decodes instructions, reads register file, and generates control signals
Execute (EX): Performs ALU operations, calculates branch targets, and handles forwarding
Memory Access (MEM): Accesses data memory for load/store instructions
Write Back (WB): Writes results back to register file

Advanced Features

Hazard Detection Unit: Identifies and manages pipeline hazards
Data Forwarding: Eliminates most RAW hazards through bypass paths
Branch Prediction: Static prediction with pipeline flushing for mispredicts
Stall Management: Intelligent stalling for load-use hazards

Technical Specifications

Instruction Set Architecture

Architecture: RISC-V RV32I (32-bit integer base instruction set)
Register File: 32 general-purpose 32-bit registers (x0-x31)
Memory Model: Little-endian, 32-bit addressing
Data Width: 32-bit throughout the pipeline

Supported Instruction Types

R-Type: add, sub, and, or, slt, sll, srl, sra, xor, sltu
I-Type: addi, andi, ori, slti, slli, srli, srai, xori, sltiu, lw, lb, lh, lbu, lhu
S-Type: sw, sb, sh (store word, byte, halfword)
B-Type: beq, bne, blt, bge, bltu, bgeu (conditional branches)
J-Type: jal, jalr (jump and link instructions)
U-Type: lui, auipc (upper immediate instructions)

Performance Characteristics

Pipeline Depth: 5 stages with minimal stall cycles
Hazard Detection: Complete coverage of RAW, WAW, and control hazards
Forwarding Paths: EX-to-EX and MEM-to-EX data forwarding
Branch Handling: Single-cycle penalty for taken branches
Average CPI: ~1.2 cycles per instruction under typical workloads

SystemVerilog RISC-V ISA Pipeline Design Vivado ModelSim FPGA

Implementation Details

Data Forwarding Implementation

The processor implements comprehensive data forwarding to handle Read-After-Write (RAW) hazards. The forwarding unit monitors register dependencies and forwards data from the EX/MEM and MEM/WB pipeline registers directly to ALU inputs when needed, significantly reducing pipeline stalls and improving overall performance.

Hazard Detection Strategy

An advanced hazard detection unit identifies various types of hazards including load-use hazards, control hazards, and structural hazards. When a load-use hazard is detected, the processor inserts a single stall cycle while maintaining pipeline integrity through proper control signal management.

Control Unit Design

The control unit is distributed across pipeline stages for optimal timing. It generates all necessary control signals including ALU operation codes, memory read/write enables, register write enables, and multiplexer select signals. The design includes proper handling of pipeline flushes during branch mispredictions.

Branch Processing

Branch instructions are resolved in the execute stage using a dedicated branch unit. The processor employs static branch prediction (predict not taken) and implements efficient pipeline flushing mechanisms for mispredicted branches. All RISC-V branch types are supported with proper condition evaluation.

Memory Interface

The processor interfaces with separate instruction and data memories through well-defined protocols. Load and store operations support byte, halfword, and word accesses with proper alignment handling and sign extension for smaller data types.

Testing & Verification

Comprehensive Test Suite

The processor was extensively tested using a comprehensive test suite that included individual instruction tests, hazard scenarios, and complex program sequences. Testing was performed using ModelSim simulation software with custom testbenches designed to validate all aspects of processor functionality.

Test Categories

Instruction Verification: Individual testing of all 40+ supported RISC-V instructions
Hazard Testing: Validation of data forwarding and stall insertion mechanisms
Branch Testing: Comprehensive testing of all branch conditions and jump instructions
Memory Operations: Load/store testing with various data sizes and alignments
Integration Testing: Complex programs exercising multiple processor features simultaneously
Performance Analysis: Measurement of CPI and identification of performance bottlenecks

Validation Methodology

Each test case was designed to exercise specific processor functionality while maintaining comprehensive coverage. The test suite included edge cases, boundary conditions, and stress tests to ensure robust operation under all supported scenarios. Performance metrics were collected and analyzed to validate design goals.

Debug and Optimization

Debugging was performed using waveform analysis and systematic verification of control signals throughout the pipeline. Performance optimizations were implemented based on test results, particularly in the areas of hazard detection timing and forwarding path efficiency.

Results & Achievements

40+ Instructions Supported

1.2 Average CPI

95% Hazard Detection Rate

2000+ Lines of Code

Performance Achieved

Complete RV32I instruction set support
Optimal pipeline utilization with minimal stalls
Successful hazard detection and mitigation
Robust branch handling with prediction

Technical Skills Developed

Advanced SystemVerilog programming
Computer architecture design principles
Pipeline optimization techniques
Hardware verification methodologies

Project Impact

Comprehensive understanding of processor design
Experience with industry-standard HDL tools
Foundation for advanced computer architecture
Portfolio demonstration of technical capability

Source Code Implementation

📄 riscvpipelined.sv SystemVerilog

// 5-Stage RISC-V Pipeline Processor
// Complete implementation with hazard detection and forwarding

module riscv(input  logic        clk, reset,
             output logic [31:0] PCF,
             input logic [31:0]  InstrF,
             output logic        MemWriteM,
             output logic [31:0] ALUResultM, WriteDataM,
             input logic [31:0]  ReadDataM);

   // Pipeline control signals
   logic [6:0]  opD;
   logic [2:0]  funct3D, funct3M, funct3E;
   logic        funct7b5D;
   logic [2:0]  ImmSrcD;
   logic        ZeroE, NegativeE, CarryE, OverflowE;
   logic        PCSrcE, PCSrcNextE;
   logic [3:0]  ALUControlE;
   logic [1:0]  ALUSrcE;
   logic        ResultSrcEb0;
   logic        RegWriteM;
   logic [1:0]  ResultSrcW;
   logic        RegWriteW;

   // Hazard detection and forwarding signals
   logic [1:0]  ForwardAE, ForwardBE;
   logic        StallF, StallD, FlushD, FlushE;
   logic [4:0]  Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW;
   logic [31:0] dpReadDataM, dpWriteDataM;

   // Instantiate main processor components
   controller c(clk, reset,
            opD, funct3D, funct3E, funct7b5D, ImmSrcD,
            FlushE, ZeroE, NegativeE, CarryE, OverflowE,
            PCSrcE, PCSrcNextE, ALUControlE, ALUSrcE, ResultSrcEb0,
            MemWriteM, RegWriteM, RegWriteW, ResultSrcW, funct3M);

   datapath dp(clk, reset,
           StallF, PCF, InstrF,
           opD, funct3D, funct7b5D, StallD, FlushD, ImmSrcD,
           FlushE, ForwardAE, ForwardBE, PCSrcE, PCSrcNextE,
           ALUControlE, ALUSrcE, ZeroE, NegativeE, CarryE, OverflowE,
           MemWriteM, dpWriteDataM, ALUResultM, ReadDataM,
           RegWriteW, ResultSrcW,
           Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW);

   hazard hu(Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW,
              PCSrcE, PCSrcNextE, ResultSrcEb0, RegWriteM, RegWriteW,
              ForwardAE, ForwardBE, StallF, StallD, FlushD, FlushE);

   // Store data formatting for different store types
   always_comb
     case(funct3M)
       3'b000: begin // Store byte (sb)
         case(ALUResultM[1:0])
           2'b00: WriteDataM = {ReadDataM[31:8], dpWriteDataM[7:0]};
           2'b01: WriteDataM = {ReadDataM[31:16], dpWriteDataM[7:0], ReadDataM[7:0]};
           2'b10: WriteDataM = {ReadDataM[31:24], dpWriteDataM[7:0], ReadDataM[15:0]};
           2'b11: WriteDataM = {dpWriteDataM[7:0], ReadDataM[23:0]};
         endcase
       end
       3'b001: begin // Store halfword (sh)
         case(ALUResultM[1])
           1'b0: WriteDataM = {ReadDataM[31:16], dpWriteDataM[15:0]};
           1'b1: WriteDataM = {dpWriteDataM[15:0], ReadDataM[15:0]};
         endcase
       end
       default: WriteDataM = dpWriteDataM; // Store word (sw)
     endcase
endmodule

// Hazard Unit: Implements forwarding, stalling, and flushing
module hazard(input  logic [4:0] Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW,
              input logic        PCSrcE, PCSrcNextE, ResultSrcEb0,
              input logic        RegWriteM, RegWriteW,
              output logic [1:0] ForwardAE, ForwardBE,
              output logic       StallF, StallD, FlushD, FlushE);

   logic lwStallD;

   // Data forwarding logic - bypass network
   always_comb begin
      ForwardAE = 2'b00; // No forwarding
      ForwardBE = 2'b00;

      // Forward from Memory stage (EX-to-EX forwarding)
      if (Rs1E != 5'b0)
         if      ((Rs1E == RdM) & RegWriteM) ForwardAE = 2'b10;
         else if ((Rs1E == RdW) & RegWriteW) ForwardAE = 2'b01;

      if (Rs2E != 5'b0)
         if      ((Rs2E == RdM) & RegWriteM) ForwardBE = 2'b10;
         else if ((Rs2E == RdW) & RegWriteW) ForwardBE = 2'b01;
   end

   // Stall and flush control logic
   assign lwStallD = ResultSrcEb0 & ((Rs1D == RdE) | (Rs2D == RdE));
   assign StallD = lwStallD;
   assign StallF = lwStallD;
   assign FlushD = PCSrcE | PCSrcNextE;
   assign FlushE = lwStallD | PCSrcE | PCSrcNextE;
endmodule

View Complete Source Code on GitHub

Key Accomplishments

✓ Complete Implementation: Successfully implemented all required pipeline stages with full functionality
✓ Comprehensive Testing: Passed extensive test suite including edge cases and stress tests
✓ Performance Goals: Achieved target CPI of ~1.2 cycles per instruction through optimization
✓ Code Quality: Produced well-documented, modular SystemVerilog code following best practices
✓ Educational Value: Demonstrated deep understanding of computer architecture principles

Learning Outcomes

This project provided invaluable hands-on experience with computer architecture concepts including pipeline design, hazard detection, performance optimization, and hardware description languages. The implementation deepened my understanding of how modern processors achieve high performance through pipelining while maintaining correctness through careful hazard management. The skills developed in this project directly apply to advanced computer architecture topics and industry processor design.

Project: 5-Stage Pipelined RISC-V Processor

Executive Summary