Digital design manager Neil Howard of SWINDON Silicon Systems discusses the functional safety perspective of a System on Chip design
‘Functional Safety’ is becoming increasingly important in industrial and automotive applications as is adherence to the relevant standards (IEC61508 and ISO26262 respectively). Functional Safety is about building systems that are dependable. It is important that any faults are detected at the earliest opportunity so that the problem can be fixed/replaced rather than the fault only becoming apparent when the system fails. Mixed-signal System on Chip design for industrial and automotive applications typically comprise sensors and/or actuators with a wired or wireless communications link to a host. The processor within the SoC must continually perform diagnostic checks on the system to ensure that it is fault-free so that the results provided are valid and so can be relied upon.
Techniques described elsewhere for improving Functional Safety in SoCs typically focus on the processor. In this brief article I focus on the practical issues of building a Functional Safety System on Chip design from its various components.
The presence of firmware significantly complicates Functional Safety. In abstract terms, the problem with a processor executing firmware is that the state space rapidly becomes too large, making it impossible to verify every possible state. We want to minimize the number of states not covered in simulation. In particular, interrupts could arrive on any processor clock cycle, exploding the state space, and should therefore be avoided. Instead for example:
- Firmware should be split into tasks of fairly well-determined execution duration.
- Wait, polling for the next 1-millisecond boundary after the end of each task.
- Poll status to determine the next task to perform.
- Separate out control from data paths wherever possible.
A typical watchdog timer that merely checks if the watchdog has been patted ‘not later than time tmax’ is inadequate. Better practice is to:
- Break firmware down into tasks, each with a particular ID and minimum and maximum execution durations tmin and tmax. Before starting a task, the scheduler programs the watchdog with this information.
- On completion, the task pats the watchdog with the required ID.
- A watchdog reset occurs if the watchdog is patted before tmin as well as not before tmax, or if the data does not match the required ID.
Ideally the watchdog should use an independent oscillator and local power supply.
DMA transfers upset normal processor operation just as interrupts do and hence should be avoided where possible.
In ‘large’ SoCs, DMA must transfer chunks of data in order to maximize data throughput by using burst accesses but these bursts intrude on processor execution. In ‘small’ SoCs, data throughput is less significant than getting data into/out of peripherals quickly enough to minimize the amount of buffering within each peripheral. Techniques to assist Functional Safety include:
- Firmware sets up a single multi-channel controller with DMA transfers cycling around all DMA channels for all peripherals, including diagnostic memory-to-memory transfers. Consequently there are only two bus masters in the System on Chip.
- Accesses are arbitrated at the slaves rather than at the masters.
- Limit the DMA-controller’s access to just the DMA-able peripherals and memory (ideally just some banks within memory).
- Deterministic memory access – DMA transfers are completely transparent to the CPU. For example:
- DMA accesses take precedence over CPU and are never back-to-back.
- DMA accesses are always zero wait-state.
- CPU accesses to DMA-able SRAM may be possible in 1 cycle but are forced to always be with 1 wait state.
An existing serial communications controller IP will typically contain a transmitter, a receiver, FIFOs and control/status registers blocks and a processor bus interface. Usually this IP will contain additional features that are not needed, or it may lack features such as a receiver for diagnostic checking.
We want to be able to peel the outer layer of the IP away to reveal the basic IP components which is where the real value of IP resides. For example:
- Instantiate the basic IP components directly e.g. 1 transmitter and 2 receivers;
- Aggregate control and status registers into a single bank of SoC-level registers;
- Tie unused control signals into the basic IP components off rather than using a register;
- Separate the control from data by only using the standard bus interface for data.
Even if the processor in the SoC has a Memory Protection Unit, it is desirable to apply memory protection mechanisms to the DMA controller and control registers as well.
SRAM interfaces can include memory protection to block DMA access to certain regions but this assumes that the bus it is connected to identifies the master (CPU or DMA) that bus transactions originate from. Instead, programmable ‘watermark addresses’ can determine the region(s) of memory that the DMA bus can write and read, before arbitration with the CPU bus.
Control registers of all peripherals should be aggregated into a System on Chip level control block which can:
- Permit specific registers to be locked-down. Writing to locked registers cause exceptions. This lockdown should be through a lock sequence rather than single data value.
- Permit register addresses to be allocated so that there is a Hamming distance of at least 2 between each control registers.
- Allow ECC to be applied to these control registers.
And obviously ensure that exceptions are generated for all invalid addresses.
All internal volatile and non-volatile memory is normally protected by Error Correction Codes. Rather than having 1 per memory, it may be possible to move ECC closer to the bus masters and to use the ECC encode/decode logic for the SRAMs for the centralized control registers as well. In any case, there is the need to be able to bypass any ECC to be able to write invalid codes so that diagnostics can check correct operation of the ECC.
Reusing a proven implementation flow is also very important for Functional Safety.
A standard design flow that is fully automated – including through layout (place-and-route) – strongly encourages fixing issues properly (at source RTL) and helps reduce the number of tool warnings to a sensible number that can actually be reviewed properly.
Unfortunately, third-party IP generally creates difficulties here. Adherence to suitably strict coding and implementation rules is still the exception rather than the rule with such IP, whose authors generally rely too much on ‘downstream’ synthesis optimization. Unfortunately, there is still too much reliance on faith with IP and warnings at the interfaces of the IP are particularly problematic. Generally speaking third-party IP should be avoided if possible.
This article has covered some of the practicalities of System on Chip design. It is not usually possible to design the entire system from scratch, and it is equally impossible to design the system first and then retrofit Functional Safety features. However, a top down approach where Functional Safety is taken into account when selecting and integrating IP, careful planning of data transfers around the SoC and hardware is provided. This planning allows firmware to be written in such a way that reduces risk and ensures the design of the SoC fulfils all Functional Safety requirements.