Home Projects Resume About
Back to Projects

Project: 5-Stage Pipelined RISC-V Processor

Executive Summary

Designed and implemented a complete 5-stage pipelined RISC-V processor using SystemVerilog for my Computer Architecture course. The processor supports the RV32I instruction set with advanced features including hazard detection, data forwarding, and branch prediction. This academic project demonstrates comprehensive understanding of processor design and digital systems.
40+ Instructions Supported
1.2 Average CPI
5 Pipeline Stages
Spring 2025 Completed
5-Stage Pipeline Architecture
Complete Processor Diagram
Hazard Detection Unit
Datapath Design
Control Unit Architecture
Project Overview

This academic project involved designing and implementing a complete 5-stage pipelined RISC-V processor using SystemVerilog as part of my Computer Architecture course at Oklahoma State University during Fall 2024. The processor supports the RV32I instruction set and includes advanced features like hazard detection, data forwarding, and branch prediction.

The design follows the classic RISC pipeline stages: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). Special attention was given to handling data hazards, control hazards, and optimizing performance through efficient forwarding mechanisms.

This project provided invaluable hands-on experience with computer architecture concepts and demonstrated thorough understanding of processor design principles, achieving an average CPI of approximately 1.2 cycles per instruction through effective hazard management and pipeline optimization.

Pipeline Architecture

5-Stage Pipeline Design

IF
Instruction Fetch
ID
Instruction Decode
EX
Execute
MEM
Memory Access
WB
Write Back

Key Components

  • Instruction Fetch (IF): Fetches instructions from instruction memory using program counter
  • Instruction Decode (ID): Decodes instructions, reads register file, and generates control signals
  • Execute (EX): Performs ALU operations, calculates branch targets, and handles forwarding
  • Memory Access (MEM): Accesses data memory for load/store instructions
  • Write Back (WB): Writes results back to register file

Advanced Features

  • Hazard Detection Unit: Identifies and manages pipeline hazards
  • Data Forwarding: Eliminates most RAW hazards through bypass paths
  • Branch Prediction: Static prediction with pipeline flushing for mispredicts
  • Stall Management: Intelligent stalling for load-use hazards
Technical Specifications

Instruction Set Architecture

  • Architecture: RISC-V RV32I (32-bit integer base instruction set)
  • Register File: 32 general-purpose 32-bit registers (x0-x31)
  • Memory Model: Little-endian, 32-bit addressing
  • Data Width: 32-bit throughout the pipeline

Supported Instruction Types

  • R-Type: add, sub, and, or, slt, sll, srl, sra, xor, sltu
  • I-Type: addi, andi, ori, slti, slli, srli, srai, xori, sltiu, lw, lb, lh, lbu, lhu
  • S-Type: sw, sb, sh (store word, byte, halfword)
  • B-Type: beq, bne, blt, bge, bltu, bgeu (conditional branches)
  • J-Type: jal, jalr (jump and link instructions)
  • U-Type: lui, auipc (upper immediate instructions)

Performance Characteristics

  • Pipeline Depth: 5 stages with minimal stall cycles
  • Hazard Detection: Complete coverage of RAW, WAW, and control hazards
  • Forwarding Paths: EX-to-EX and MEM-to-EX data forwarding
  • Branch Handling: Single-cycle penalty for taken branches
  • Average CPI: ~1.2 cycles per instruction under typical workloads
SystemVerilog RISC-V ISA Pipeline Design Vivado ModelSim FPGA
Implementation Details

Data Forwarding Implementation

The processor implements comprehensive data forwarding to handle Read-After-Write (RAW) hazards. The forwarding unit monitors register dependencies and forwards data from the EX/MEM and MEM/WB pipeline registers directly to ALU inputs when needed, significantly reducing pipeline stalls and improving overall performance.

Hazard Detection Strategy

An advanced hazard detection unit identifies various types of hazards including load-use hazards, control hazards, and structural hazards. When a load-use hazard is detected, the processor inserts a single stall cycle while maintaining pipeline integrity through proper control signal management.

Control Unit Design

The control unit is distributed across pipeline stages for optimal timing. It generates all necessary control signals including ALU operation codes, memory read/write enables, register write enables, and multiplexer select signals. The design includes proper handling of pipeline flushes during branch mispredictions.

Branch Processing

Branch instructions are resolved in the execute stage using a dedicated branch unit. The processor employs static branch prediction (predict not taken) and implements efficient pipeline flushing mechanisms for mispredicted branches. All RISC-V branch types are supported with proper condition evaluation.

Memory Interface

The processor interfaces with separate instruction and data memories through well-defined protocols. Load and store operations support byte, halfword, and word accesses with proper alignment handling and sign extension for smaller data types.

Testing & Verification

Comprehensive Test Suite

The processor was extensively tested using a comprehensive test suite that included individual instruction tests, hazard scenarios, and complex program sequences. Testing was performed using ModelSim simulation software with custom testbenches designed to validate all aspects of processor functionality.

Test Categories

  • Instruction Verification: Individual testing of all 40+ supported RISC-V instructions
  • Hazard Testing: Validation of data forwarding and stall insertion mechanisms
  • Branch Testing: Comprehensive testing of all branch conditions and jump instructions
  • Memory Operations: Load/store testing with various data sizes and alignments
  • Integration Testing: Complex programs exercising multiple processor features simultaneously
  • Performance Analysis: Measurement of CPI and identification of performance bottlenecks

Validation Methodology

Each test case was designed to exercise specific processor functionality while maintaining comprehensive coverage. The test suite included edge cases, boundary conditions, and stress tests to ensure robust operation under all supported scenarios. Performance metrics were collected and analyzed to validate design goals.

Debug and Optimization

Debugging was performed using waveform analysis and systematic verification of control signals throughout the pipeline. Performance optimizations were implemented based on test results, particularly in the areas of hazard detection timing and forwarding path efficiency.

Results & Achievements
40+ Instructions Supported
1.2 Average CPI
95% Hazard Detection Rate
2000+ Lines of Code

Performance Achieved

  • Complete RV32I instruction set support
  • Optimal pipeline utilization with minimal stalls
  • Successful hazard detection and mitigation
  • Robust branch handling with prediction

Technical Skills Developed

  • Advanced SystemVerilog programming
  • Computer architecture design principles
  • Pipeline optimization techniques
  • Hardware verification methodologies

Project Impact

  • Comprehensive understanding of processor design
  • Experience with industry-standard HDL tools
  • Foundation for advanced computer architecture
  • Portfolio demonstration of technical capability

Source Code Implementation

📄 riscvpipelined.sv SystemVerilog
// 5-Stage RISC-V Pipeline Processor
// Complete implementation with hazard detection and forwarding

module riscv(input  logic        clk, reset,
             output logic [31:0] PCF,
             input logic [31:0]  InstrF,
             output logic        MemWriteM,
             output logic [31:0] ALUResultM, WriteDataM,
             input logic [31:0]  ReadDataM);

   // Pipeline control signals
   logic [6:0]  opD;
   logic [2:0]  funct3D, funct3M, funct3E;
   logic        funct7b5D;
   logic [2:0]  ImmSrcD;
   logic        ZeroE, NegativeE, CarryE, OverflowE;
   logic        PCSrcE, PCSrcNextE;
   logic [3:0]  ALUControlE;
   logic [1:0]  ALUSrcE;
   logic        ResultSrcEb0;
   logic        RegWriteM;
   logic [1:0]  ResultSrcW;
   logic        RegWriteW;

   // Hazard detection and forwarding signals
   logic [1:0]  ForwardAE, ForwardBE;
   logic        StallF, StallD, FlushD, FlushE;
   logic [4:0]  Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW;
   logic [31:0] dpReadDataM, dpWriteDataM;

   // Instantiate main processor components
   controller c(clk, reset,
            opD, funct3D, funct3E, funct7b5D, ImmSrcD,
            FlushE, ZeroE, NegativeE, CarryE, OverflowE,
            PCSrcE, PCSrcNextE, ALUControlE, ALUSrcE, ResultSrcEb0,
            MemWriteM, RegWriteM, RegWriteW, ResultSrcW, funct3M);

   datapath dp(clk, reset,
           StallF, PCF, InstrF,
           opD, funct3D, funct7b5D, StallD, FlushD, ImmSrcD,
           FlushE, ForwardAE, ForwardBE, PCSrcE, PCSrcNextE,
           ALUControlE, ALUSrcE, ZeroE, NegativeE, CarryE, OverflowE,
           MemWriteM, dpWriteDataM, ALUResultM, ReadDataM,
           RegWriteW, ResultSrcW,
           Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW);

   hazard hu(Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW,
              PCSrcE, PCSrcNextE, ResultSrcEb0, RegWriteM, RegWriteW,
              ForwardAE, ForwardBE, StallF, StallD, FlushD, FlushE);

   // Store data formatting for different store types
   always_comb
     case(funct3M)
       3'b000: begin // Store byte (sb)
         case(ALUResultM[1:0])
           2'b00: WriteDataM = {ReadDataM[31:8], dpWriteDataM[7:0]};
           2'b01: WriteDataM = {ReadDataM[31:16], dpWriteDataM[7:0], ReadDataM[7:0]};
           2'b10: WriteDataM = {ReadDataM[31:24], dpWriteDataM[7:0], ReadDataM[15:0]};
           2'b11: WriteDataM = {dpWriteDataM[7:0], ReadDataM[23:0]};
         endcase
       end
       3'b001: begin // Store halfword (sh)
         case(ALUResultM[1])
           1'b0: WriteDataM = {ReadDataM[31:16], dpWriteDataM[15:0]};
           1'b1: WriteDataM = {dpWriteDataM[15:0], ReadDataM[15:0]};
         endcase
       end
       default: WriteDataM = dpWriteDataM; // Store word (sw)
     endcase
endmodule

// Hazard Unit: Implements forwarding, stalling, and flushing
module hazard(input  logic [4:0] Rs1D, Rs2D, Rs1E, Rs2E, RdE, RdM, RdW,
              input logic        PCSrcE, PCSrcNextE, ResultSrcEb0,
              input logic        RegWriteM, RegWriteW,
              output logic [1:0] ForwardAE, ForwardBE,
              output logic       StallF, StallD, FlushD, FlushE);

   logic lwStallD;

   // Data forwarding logic - bypass network
   always_comb begin
      ForwardAE = 2'b00; // No forwarding
      ForwardBE = 2'b00;

      // Forward from Memory stage (EX-to-EX forwarding)
      if (Rs1E != 5'b0)
         if      ((Rs1E == RdM) & RegWriteM) ForwardAE = 2'b10;
         else if ((Rs1E == RdW) & RegWriteW) ForwardAE = 2'b01;

      if (Rs2E != 5'b0)
         if      ((Rs2E == RdM) & RegWriteM) ForwardBE = 2'b10;
         else if ((Rs2E == RdW) & RegWriteW) ForwardBE = 2'b01;
   end

   // Stall and flush control logic
   assign lwStallD = ResultSrcEb0 & ((Rs1D == RdE) | (Rs2D == RdE));
   assign StallD = lwStallD;
   assign StallF = lwStallD;
   assign FlushD = PCSrcE | PCSrcNextE;
   assign FlushE = lwStallD | PCSrcE | PCSrcNextE;
endmodule

Key Accomplishments

  • ✓ Complete Implementation: Successfully implemented all required pipeline stages with full functionality
  • ✓ Comprehensive Testing: Passed extensive test suite including edge cases and stress tests
  • ✓ Performance Goals: Achieved target CPI of ~1.2 cycles per instruction through optimization
  • ✓ Code Quality: Produced well-documented, modular SystemVerilog code following best practices
  • ✓ Educational Value: Demonstrated deep understanding of computer architecture principles

Learning Outcomes

This project provided invaluable hands-on experience with computer architecture concepts including pipeline design, hazard detection, performance optimization, and hardware description languages. The implementation deepened my understanding of how modern processors achieve high performance through pipelining while maintaining correctness through careful hazard management. The skills developed in this project directly apply to advanced computer architecture topics and industry processor design.