Bit twiddling

explorations into hardware and software

Porting the Keynsham SoC to the DE0-CV

| Comments

The Keynsham SoC featuring the Oldland CPU now runs on the DE0-CV. The DE0-CV is a bigger board than the DE0-NANO that I was previously using and has a number of nice features:

  • A bigger, Cyclone V device.
  • 64MB SDRAM versus 32MB on the DE0-NANO.
  • VGA output with a simple 12-bit ADC.
  • PS/2 controller.
  • 6x7 segment displays
  • More accessible pushbuttons.

On the new board the Keynsham SoC only uses ~25% of logic resources so there is plenty of room for expansion!

Porting to the new board was a relatively pain-free exercise. To do this, I first added support for multiple SoC configurations, so it is now possible to have a YAML file to describe the memory map of the SoC and some CPU parameters such as cache sizes, number of TLB entries etc.

Editing the pin assignments is a fairly tedious process, as was refreshing the Altera IP components. I couldn’t find an easy way to take a component that was previously instantiated for Cyclone IV and move the same configuration to a Cyclone V so they just had to be manually recreated.

The PLLs on Cyclone V are different and have a couple of extra non-optional ports compared to the Cyclone IV PLLs. In particular, there is a “locked” output and a “reset” input because seemlingly the PLL can lose lock at some point during running. For now I have wired the active high reset to 0 and left the locked output unconnected, but may wire this output to the SDRAM state machine to wait for PLL lock before initialising if this proves important. It doesn’t look like there is a good way to handle the PLL losing lock during running, so that might be a fatal error.

The virtual JTAG and oldland-test script really paid off here - it was very quick to verify that everything was performing as expected. I plan on adding support for the 7-segment display next followed by a text mode VGA controller to take advantage of the DE0-CV features.

Simulating Verilog on Linux

| Comments

Simulating Verilog, particularly once a design grows beyond anything simple can be both tricky and frustrating. I’m going to describe the simulation process that I’ve used during development of the Oldland CPU project in the hope that it might be useful to others.

Don’t develop in FPGA tools

It is entirely possible to develop a design in the FPGA tools, but this doesn’t scale - you’re confined to the lacklustre editor, long compile times and cryptic error messages in addition to being tied to a vendor flow. Instead, I’d recommend a flow where development happens outside of the FPGA tools and then the FPGA is a validation target instead of the main flow.

FPGA tools are generally less friendly than some of the free tools, notably:

  • It takes a suprisingly long time to even complain about syntax errors.
  • Some of the errors/warnings are detected post-synthesis. This can be particularly confusing for things like combinational loops once optimizations have been applied - the loop can include generated nodes that have no clear origin and aren’t easily mapped to source lines. Verilator is so much better here!

Use multiple simulators

For any moderately complex design, it’s important to use multiple simulations, typically using one as a golden reference for behaviour and another to develop the RTL. In the Oldland CPU case, I have a behavioural model of the CPU, written in C that I use as the golden reference that allows me to quickly prototype new features and be sure that the design is sensible. When I implement a new feature, I first make sure that it works in the software simulation, then I can develop the Verilog, making sure that all simulations match. The software model can run at high speeds making it practical to run real software under there, using it to test higher level system tests.

For the Verilog itself, there are two main free options on Linux, Icarus Verilog and Verilator.

Icarus is a compliant simulator that is event driven and so provides an accurate model of the circuit and supports the full language including non-synthesizable constructs, making it useful for certain models. For example, SDRAM vendor models such as the Micron model I use in the Oldland project use delays to model the real-world delays and so this simulator can model board-level delays and give confidence that the design will work on real hardware. The main downsides to Icarus are that it only supports Verilog and not SystemVerilog, and that the accuracy of the simulation means that it is not the fastest.

Verilator takes a different approach, converting Verilog into optimized C++ models that work on a cycle basis, dropping the multiple drive levels and event triggering. This means that Verilator models can be many times faster than Icarus models, also capable of running real-world applications. Verilator also supports elements of SystemVerilog which can be beneficial to a design.

Where possible it’s best to support every simulation possible and verify them all against each other. In some cases it can even be possible to run each simulation in lockstep and verify that they all maintain the same state.

Integrate debugging

In general, debugging Verilog is harder work than debugging software, but there are some advantages. Most simulators will support writing trace files such as vcd or lxt2 which allows you to view the state of any signal in the design at any time, which is fantastic for post-mortem analysis, but doesn’t allow for interactivity. Additionally, dumping traces of the entire system can really slow down the simulation, so consider adding a mechanism to start tracing on a given command, event or state.

For interactive debugging, build debugging into the design and present it to the outside world in a consistent way across all simulations. In the Oldland CPU, this interface is the JTAG debugger, and this gets exported as a TCP socket. In the software simulation there is a thread that receives events from the socket and interacts with the model. For the verilog there is a common debug controller that is present in the Icarus and Verilator simulations, but also in the FPGA design too. The Icarus simulator uses a small VPI stub to connect the debug controller to a C library to talk to the socket, the Verilator model uses direct C calls, and the FPGA uses the Altera virtual JTAG.

The end result is that the debugger can connect to the C model, either Verilog simulator or the FPGA itself and load programs, read/write the registers, memory and any control registers to get the system into a specific state and trigger tracing.

Conclusion

Maintaining multiple simulators is the only way to develop quickly and be sure that you don’t introduce bugs. The most important things that I have learnt are:

  • Never develop in the FPGA tools, treat them as a validation target for your design.
  • You must be able to build all simluations in a single command.
  • All of your tests should be run with a single command on all simulators and should take as short a time as possible.
  • The framework for running your tests should not be dependent on the simulation type.
  • Use Verilator for linting even if you don’t use it for simulationg, the errors and warnings are far superior to Icarus and FPGA tools.

Debugging Bare-metal on a Raspberry Pi

| Comments

I’ve recently released an early version of a GDB remote serial protocol implementation that runs on bare-metal on a Raspberry Pi that allows you do debug bare metal applications without the use of a JTAG. Once running you can connect with GDB as a remote target and perform all of the normal debugging that you’d expect and you can even single step through interrupt handlers and exceptions.

The implementation uses the security extensions of the ARM core so that the target application requires no modification. The security extensions are present in v6K and later revisions of the ARM architecture and are marketed as TrustZone. These extensions allow the CPU and peripherals to be partitioned into secure and non-secure regions so that it is possible to run a small trusted system and prevent a larger OS like Linux from accessing secret data. The Broadcom SoC on the Raspberry Pi doesn’t implement enough (or have sufficient documentation – who knows?) to do any useful security hardening with these extensions – there would usually be a protection controller (TZPC) for gating peripheral accesses and a secure memory adapter along with onchip memory that the non-secure world can’t access such as TCM’s which this SoC doesn’t have.

So it’s no good security, but the extra operating modes mean we can create a debugger. The security extensions introduce a new operating state – secure monitor mode, and a new set of vectors which the CPU can be configured to branch to on certain, well defined events. A new CPU instruction SMC (Secure Monitor Call) provides a way for the CPU to enter the monitor mode, and the CPU can be configured to branch to the FIQ/IRQ vectors in monitor mode rather than the normal vectors. So to implement a debugger we can use the FIQ entry for handling input from the serial port and the SMC entry for breakpoints then the rest is just glue.

This does impose a couple of limitations though:

  • The security extensions can’t be used by the application being debugged – that’s probably fine as they aren’t useful on this SoC anyway.
  • The core can only handle 1 FIQ and that’s used by the debugger for the UART
  • the application will have to make do with IRQ’s. The nice thing about using the FIQ is that this can effectively be turned into an NMI to the application.
  • The UART is used for communication with the debugger – the application can’t use it.
  • The monitor mode can’t really single step the core – GDB is clever enough to emulate this with breakpoints at the next instruction, but this does mean that whilst single stepping the core can jump to the IRQ vector and handle a whole IRQ unlike where a JTAG single steps the whole core in lockstep.
  • Watchpoints can’t be configured to branch to the monitor mode without application support so hardware watchpoints don’t work.

The code is available on my github page, please give it a go – patches are welcome!

Finely Grained Dynamic Debug Print Statements With GCC

| Comments

Introduction

Any reasonably sized program needs some half decent debug logging mechanism. At a minimum we want something that can allow us to enable debug prints at compile time for production builds, but often we want more than that – we need to handle different verbosity levels, and once we have a number of components we probably want to be able to selectively enable some debug. The other part of debug logging is where we store the information – it may be stdout, a file, syslog, but for now lets assume stdout for simplicity.

A debug mechanism needs to be flexible – enabling too much debug can often prevent the problem you’re trying to debug, especially if it’s timing related, so sometimes it can be really handy to selectively enable debug statements at a more finely grained level than module or verbosity. With a few GCC extensions we can achieve exactly that.

The debug metadata

To enable us to control what we enable, we need to associate some metadata with each statement that will allow us to find it. The source filename, line number and function name should be sufficient for that, so we can create a structure like:

The debug key
1
2
3
4
5
6
struct debug_key {
  const char        *filename;
  const char        *function;
  unsigned int      line_no;
  bool              enabled;
};

with an additional flag to say whether the statement is enabled or not. Now, we can create a macro to instantiate everything:

The debug() definition
1
2
3
4
5
6
7
8
9
#define debug(fmt, ...) ({                                               \
        static struct debug_key __key = {                                \
                .filename = __FILE__,                                    \
                .function = __func__,                                    \
                .line_no = __LINE__,                                     \
        };                                                               \
        if (__key.enabled)                                               \
                printf(fmt, ##__VA_ARGS__);                              \
})

Now we have a debug key for the statement, and some conditional code that will only print when the key’s enabled field is true. Now we just need a way to manipulate that field.

GNU extensions to the rescue

The GNU toolchain has a really need feature whereby if you create a new ELF section that has a valid C identifier as it’s name then you get some handy variables thrown in for free. For example, if you have a section named foo, then you’ll have some extern variables named __start_foo and __stop_foo that you can use to find the section’s address in memory.

Using a new section we can build up an array of pointers to the debug keys then iterate over them. So we can revise our debug macro to:

With extra sections
1
2
3
4
5
6
7
8
9
10
11
12
#define __dbgkey __attribute__((section("dbgkey"), __used__))

#define debug(fmt, ...) ({                                               \
        static struct debug_key __key = {                                \
                .filename = __FILE__,                                    \
                .function = __func__,                                    \
                .line_no = __LINE__,                                     \
        };                                                               \
        static struct debug_key *__dbgkey __key_addr = &__key;           \
        if (__key.enabled)                                               \
                printf(fmt, ##__VA_ARGS__);                              \
})

which is the same as before, but with the addition of the __key_addr variable that holds the address of the debug key. This is marked with __dbgkey which I’ve defined to tell the toolchain to put the variable in the dbgkey section and to tell GCC that the variable really is used so that it doesn’t get optimized out or give us build warnings. Storing the address of the key in this section rather than the key itself means that we don’t have to worry about the padding and alignment of the key if we decide to change it at any point, though it could be done that way to save some bytes.

Putting it all together

Now that we have our table of debug key pointers, we can selectively enable debug keys by iterating over that table. For example, to enable all debug statements in the function foo() in the file foo.c:

Enabling the keys
1
2
3
4
5
6
7
8
9
10
11
12
13
14
static void enable_keys(void)
{
        extern struct debug_key *__start_dbgkey[];
        extern struct debug_key *__stop_dbgkey[];

        struct debug_key **addr;

        for (addr = __start_dbgkey; addr < __stop_dbgkey; ++addr) {
                struct debug_key *key = *addr;
                if (!strcmp(key->filename, "foo.c") &&
                    !strcmp(key->function, "foo"))
                        key->enabled = true;
        }
}

Obviously we wouldn’t want to open code this sort of thing into our application, but this could easily grab descriptions from files, environment variables or the command line.

This could also easily be extended to include verbosity levels, modules etc, but this is a pretty self-contained example to illustrate the principles.