Jamie Iles

Home
  • Resume
  • RXV RISC-V CPU
  • 80186 Compatible CPU
  • Oldland CPU

View Image

RXV: RISC-V RV32IMA CPU

RXV microarchitecture
Trial layout on SKY130A
FPGA SoC Diagram

The RXV core is a RISC-V RV32IMA CPU that successfully runs mainline Linux with a CoreMark score of 2.61/MHz. The core is in-order with a 7 stage pipeline, out of order completion, register renaming, dynamic branch prediction and indirect branch target prediction, set-associative caches, fully associative TLBs and AXI4 support. A design for the Digilent Arty S7 board uses 32KB 8-way associated cache and 16 way fully associative TLB each for instruction and data paths. Externally, Xilinx IP provides AXI4 interconnect, SPI controller, interrupt controller, DDR3 controller and 16550A.

On top of the RV32IMA base, the following extensions are implemented:

  • Zicsr
  • Cifencei
  • Sstc
  • Sscofpmf

The pipeline consists of:

  • Fetch 1: next PC generation, combined BTB/PHT lookup
  • Fetch 2: instruction cache tag lookup, instruction TLB lookup, branch prediction resteer
  • Fetch 2: tag compare, prefetch queue write
  • Decode: instruction decode, register rename, misidentified branch resteer and issue. All instructions apart from atomic fetch/op/store are a single uop, atomic operations may require more than one uop
  • Execute: execute in one of 4 functional units:
    • LSU: fully pipelined 3 cycle load-use latency. Misaligned load/store exceptions are raised in the first cycle, invalid AMO (atomic to device memory) and page faults are handled in the second cycle
    • Integer: integer operations, branches, CSR accesses, 1 cycle latency with result forwarding back to input. Illegal instruction, branch alignment and environment call exceptions are raised here
    • Multiply: fully pipelined 5 cycle result-use latency
    • Divide: non-pipelined 33 cycle result-use latency
  • Writeback: results are written back to the register file
  • Commit: rename file updated and exceptions raised

Atomic operations are broken down into multiple micro-ops using register rename.

Performance Counters

There are a configurable number of performance counters in addition to the mcycle+minstret fixed counters which can be used for profiling:

- PMU_CYCLES
- PMU_INSTRET
- PMU_BRANCH
- PMU_BRANCH_MISPRED
- PMU_FE_STALL
- PMU_BE_STALL
- PMU_L1D_READ
- PMU_L1D_READ_MISS
- PMU_L1D_WRITE
- PMU_L1D_WRITE_MISS
- PMU_L1I_READ
- PMU_L1I_READ_MISS
- PMU_DTLB_READ
- PMU_DTLB_READ_MISS
- PMU_ITLB_READ
- PMU_ITLB_READ_MISS

FPGA Implementation

The default Arty S7 configuration has:

  • RXV Core
    • 32KB 8-way set associated instruction cache
    • 32KB 8-way set associated data cache
    • 16-way fully associated instruction TLB
    • 16-way fully associated data TLB
    • MTIME reference at ~10MHz
    • Separate AXI4 instruction+data busses
  • Xilinx AXI interrupt controller
  • Xilinx AXI 16550A UART
  • Xilinx AXI SPI master with 1 chip select and 256 byte FIFO
  • Xilinx AXI4 SmartConnect
  • Xilinx AXI memory adapter connecting to the BootROM

The bootrom fetches the generic platform build of OpenSBI and a flattened device tree from a FAT partition on the micro SD along with a gzip compressed Linux kernel before transferring control to OpenSBI.

© Jamie Iles 2017-2023 View Jamie Iles's profile on LinkedIn Follow @jamiediles on Twitter @jamiediles@mastodon.social Mastodon