Chilling out

2 mins read

One common question with serial RapidIO (sRIO) boards is how to support hot swap. Generically speaking, hot swapping should be easy – simply insert or remove the hardware, with some expectation that data transmission will restart.

The sRIO protocol is targeted for high speed chip to chip communication and data transfer. Compared to gigabit Ethernet and PCI-Express, sRIO has minimal latency, less header overhead, higher efficiency and a guaranteed delivery advantage. Ironically, these key benefits are the crux of the hot swap headache. The devil is in the detail. If the user doesn’t understand the details, then packets stop, ports go into error states, ports stop and, potentially, the system goes down. To achieve guaranteed delivery, sRIO uses Acknowledge Identification (AckID) counters to track each packet and ensure it has been delivered – ultimately to its destination. AckID is located in the physical layer, not in the packet header. Each port keeps track of its own AckID counters and these values are used in the packet acknowledgment control symbol to update the packet transmit status between devices. Each port has inbound, outbound and outstanding AckIDs. Inbound AckID is the next expected AckID value for an input port. When the input port receives a packet, it will use that value as the packet ID, send a response and indicate whether or not the packet with packet ID has been accepted. Outbound AckID is the output port’s next packet AckID value. When the device sends a packet, outbound AckID will increase from 0 to 1, but the outstanding AckID will not increase until it receives a response. On the receiving side, the device will use inbound AckID to respond. If the packet accepts, then inbound AckID will increase by 1 for the next expected packet. AckID is a 5bit counter and, when it reaches 31, will rollover to 0. When the sending port receives a packet accept control symbol, the sender will compare response AckID with outstanding AckID. If it matches, it will increase outstanding AckID by 1. If not, it will report ‘unexpected AckID error’. The only requirement to support hot swap is simply to match the inbound and outbound AckIDs on both sides of the link. Serial RapidIO ports on the device that has been hot swapped or where power has been cycled or reset, will begin transmission with an AckID value of 0 and will expect to receive packets from the connected port beginning with an AckID value of 0. If the connected port was previously transmitting and receiving, it is unlikely that it will resume transmission with an AckID value of 0 or expect the next packet received to have an AckID value of 0. Thus, it goes into an error state and traffic is halted. The sRIO spec does not provide for automatic hardware resynchronisation, but it does allow the software to resynchronise the AckIDs so transmission can resume. While this provides the user with the ways and means to resolve a broken link, it does require software to manage hot swap activity. So it’s likely those eval boards that lock up during hot swap in the lab do not have the application layer software support running.