Saturday, February 4, 2012

Even "Design Patterns" still have bugs. A common Queue bug.

Recently I picked up the book Design Patterns for Embedded Systems in C: An Embedded Software Engineering Toolkit by Bruce Powel Douglass. I thought I might pick up some new techniques by reading this book. Always looking for that non-existent magic bullet for creating bug free code all of the time. Alas I was disappointed at the very first Design Pattern, the Queue. The queue design pattern as implemented in the book has a significant bug. The second queue design pattern, will show the bug even faster.

The queue design pattern is the classic first-in/first-out single writer, single reader queue. This 'Design Pattern' has been around for decades. I first used it on a Motorola/Hitachi 6301 in assembly language. Today I use Dave Hylands 'cbuf.h' to implement my queues.

Typically such queses are used where the writer is inside an interrupt routine receiving data from an external process, via something like a UART, Ethernet etc. and the reader is outside of the interrupt. Such a queue allows for the data to be received when presented from an asynchronous source, so it is not lost, rather than processing the data in the interrupt, which would length its run time (always a bad thing). The reader of the queue processes the data when it has the time and resources to do so, outside of the interrupt.

So what is this nasty queue bug? It is the asynchronous usage of the 'size' counter both inside, and outside of the interrupt routine without any type of protection:

#define QUE_SIZE (512U)
uint16_t size_u16;
void que_write( uint8_t const data_u8 )

uint8_t que_read( void )

On parts where 'size' is incremented atomically, say as a 16 or 32 bit indivisible operation there is no problem at all. Our nasty race condition bug will show itself at apparently completely random times on 8-bit machines.

Jack Ganssle went into the gory details of this class of race condition in his article on Asynchronicity (Also take a look at Herb Sutter's multi-part article on Lock-Free Code: A False Sense of Security), so I'll just summarize the problem here. Which is, if the que_write() interrupt happens just as que_read() runs, while the queue size count is 0x00FF, the size will end up wrong. que_read() reads the value of 0x00FF which is interrupted by que_write() that increments the value to 0x0100. que_read() not knowing this happened decrements the 'size' count and saves it as 0x1FE or 0x00FE depending on the implementation. A byte has been lost, or 256 have been gained, in the 'size' count. Things could deteriorate farther when the head/tail full/empty flags no longer end up matching the 'size' count.

Also, properly sizing a queue can be tricky. If it is to small, the queue will overflow and data will be lost. To large and memory space is wasted.

In the case of 'cbuf.h' the size must always be a power of two. As I've covered before I like to use compile time assertions to keep from shooting myself in the foot. I use the following to make sure a non-power-of-two will fail to compile:

#define UART0_RX_BUFFER_SIZE (128U) /* Size of receive buffer [Defined in my project file] */

/* Make sure buffer is sized to be a power of two [In my actual code, in different file]: */
#error RX buffer size is not a power of 2

Something else that really bugs me about the book is that I ordered it in eBook format rather paper format. Elsevier eBook format is some creation of theirs that can not be read outside of their special reader for Windows or the Mac. So if you want to read on Linux, xBSD or a tablet, your just out of luck. Also as an anti-piracy measure if you print out anything, it comes out fuzzy. To make matters even worse I found they charged my credit card $1.19 for a fright charge! A freight charge for an eBook?? You'll be far better off to buy the real paper book, take a band-saw to the binding and scan it yourself that use this eBook carp. Maybe you'll want to join the Elsevier Boycott as well...