Showing posts with label GCC. Show all posts
Showing posts with label GCC. Show all posts

Sunday, February 27, 2011

The Anatomy of a Race Condition: Toyota vs AVR XMega

NASA has released their report on Toyota's sudden acceleration problem. The report indicates that there was no problems found with the electronics, hardware or software.
They blame the issue on user error and bad floor mats. As no problem was found, we can be 100% certain that no problems at all exist in the hundreds of thousands of lines of software code, in the vehicles electronics, right?
"Because proof that the ETCS-i caused the reported UAs [Unintended Accelerations] was not found does not mean it could not occur." - pg 17.
"Today's vehicles are sufficiently complex that no reasonable amount of analysis or testing can prove electronics and software have no errors. Therefore, absence of proof that the ETCS-i has caused a UA does not vindicate the system." - pg. 20.
Something that I find most annoying is that the areas where the embedded system hardware is discussed the most, is the area of the most redaction (blacked out sections). Why?

If a problem was to still be lurking, unfound, it could be what is known as a Software Race Condition. What does a software race condition actually look like?

We can find an easy example to pick apart in the AVR-LibC bug tracker, bug#29774: "Prologue/epilogue stack pointer manipulation not interrupt safe in [AVR] XMega".

To understand the problem here, you need a bit of historical background. In AVR's prior to the XMega, when an Enable Interrupt instruction was executed, the instruction following the Enable was guaranteed to execute with interrupts still turned off. In the mists of time someone thought it was a cool hack to save an instruction cycle by restoring half of the stack pointer, enabling interrupts,then restoring the other half of the stack pointer. The problem with such novel hacks is they invariably come back to bite you in the future.
Like a bad Soap-Opera story you can probably already see where thisis going? In the XMega when interrupts are enabled the following instruction is not guaranteed to execute before an interrupt occurs. Now the stage is set for the race condition.

The current generation of XMega parts can run code in a singlec ycle at up to 32 MHz. That means we have at minimum one 1/32 MHz, or31.25 nano-second window for the software race to happen. In a complex system there are probably more than one interrupt enable happening. To add more pain, the XMega can nest interrupts three levels deep.

You see that if an interrupt occurs exactly at the point where interrupts are enabled, only half of the stack pointer has been restored. So the new interrupt saves its registers someplace,odds are high it is not the right place! The new interrupt eventually returns, tries to restore its registers,from someplace that might have been read-only-memory, and bang we are off to the races with a crashed system doing who knows what.Maybe a full open throttle? No message shows up in any logs because there was no event logged through a call to the event logging system,as this was never an anticipated event; "systematic software malfunction in the main central processor unit (CPU) that is not detected by the monitor system".

Due to the short length of the 31.25 ns race window possibility, a crash may never happen, may happen every 18 hours and 22 minutes, or as often as I win the lottery [Give Wheeling Systems a try]. It could take some certain combination of options and users actions to cause the conditions of enabling interrupts while returning from an interrupt, while getting an interrupt. Turn the radio dial, press the brake peddle while the over automated headlights turn themselves on perhaps?

While the this bug was actually reported in the AVR-LibC bug tracker, the problem is actually within the AVR port of GCC. Specifically the file gcc-version/gcc/config/avr/libgcc.S.

I fixed my copy of WinAVR-GCC with a hex editor, so my projects would not suffer for this bug. Realistically how many other people will have done that? Not many I would guess. It is impossible to tell from the hideous Atmel website (all glitz, no useful information) what the state of the bug truly might be today.

For those that want to fix the problem the solution is to simply write the lower half of the stack pointer first: "To prevent corruption when updating the Stack Pointer from software, a write to SPL will automatically disable interrupts for up to 4 instructions or until the next I/O memory write". As GCC does not yet nativity support the XMega, the XMega features are maintained as a set of patches. Those patches have been updated to fix the problem.

It would be easy for some to say that one should not use Open Source compilers for real production projects, as I've seen a few prominent people state. I have a copy of IAR's AVR compiler, at no small price tag, that I've seen produce complete crap for output. So just because you paid, perhaps a lot of money, for it doesn't mean it is error free.

Some standards require that the code generated by the tool be inspected. At what level of detail is the question? I once actually ran into an assembler that produced correct listings, however the generated .HEX file was wrong. That problem took days to find. Disassembling, with an independent tool from a different provider, the generated .HEX file is one option, however it is not always easy to figure out what optimized compiler code is doing in a reasonable amount of time.

What kind of tool problems have you ran into?

Now returning back to Toyota. Section 6.7.1.2 tells us that a Renesas,formally NEC, V850E1, and GreenHills ISO/ANSI Compiler are used for the control software of interest to us. Alas the section that might shed light on Race Conditions is completely redacted.

Starting on page 112 Tin Whiskers become a prominent failure mechanism. Keep in mind that according to page 19 of the report only six vehicles were analyzed. The whisker problem discussed from a seventh vehicle accelerator assembly only.

Tin (and Other Metal Whisker) Whisker are such a problem that NASA has given them their own Homepage. Tin Whiskering on PCBA Capacitors in Storage by Terry Munson gives a different, but still depressing, view of the Tin Whisker problem.

The comments about the whiskers over at Circuit Assembly Magazine are also educational, that I recommend that you read, for example:

Brett Emison pokes several holes in the NASA report, What NASA's Report Said About Toyota Sudden Acceleration for example:
The NHTSA/NASA report did little to address issues documented by drivers who actually experienced an unexplained sudden unintended acceleration event.
NASA's findings do not solve the question of what caused Kevin Haggerty's well documented sudden acceleration event. Haggerty owned a 2007 Toyota Avalon that experienced at least 5 different sudden acceleration events. Haggerty did not have accessory floor mats and his OEM mats were secured in place. Sticky pedals couldn't have caused the problem because he didn't have his foot on the pedal. On Haggerty's final incident, he was actually able to drive the vehicle while the engine was racing out of control into his local Toyota dealership.
He got to the parking lot, shifted to neutral and stopped the car with its brake smoking and engine racing out of control. He got out of the car and the engine was still racing (no pedal misapplication) Service technicians were able to look at he car and confirm the unintended acceleration was not caused by floor mats, sticking pedals or driver error. They also confirmed no computer error codes (meaning the computer was not detecting whatever was causing the problem).
Bob Landman
LDF Coatings, LLC
http://www.ldfcoatings.com
I do wonder if NASA applied their Software Safety Guidebook to Toyota's source code? I assume they did not, as it is not listed among the techniques they did apply.

...Tune in tomorrow for an other episode of As The Stack Churns...

Sunday, April 19, 2009

The Power of Ten 10 Rules for Writing Safety Critical Code

I just came across a site in an ad on Embedded.com that every reader of this blog needs to check out:

The Power of Ten 10 Rules for Writing Safety Critical Code.

Their rule #10 comments supports our position you want every useful compiler warning you can get.

Do you just ignore Compiler Warnings?

Something I just saw on the AVR-GCC list:

"I mean the compiler gives some of the most stupid warnings, such as , when a function that is declared but not used..."

or a past favorite of mine "It is only a warning, just ignore it". Yellow colored traffic lights are "only warnings", that most people do seem to ignore, and governments are gaming to enhance revenue; sorry wrong blog...

I have always had a zero tolerance for warnings in code. If you have a warning in your code, your code is broken.

If you are using GCC here are the warnings you can enable, that I use in my own Makefiles:

Make sure you have -W and -Wall in your CFLAGS.

CFLAGS += -W -Wall -Wstrict-prototypes -Wchar-subscripts

I generally run with every warning/error message turned on with the exception of pedantic and unreachable-code. The later frequently gives bogus results and the former goes off on commonly accepted code.

# -Werror : Make all warnings into errors.
CFLAGS +=  -Werror

# -pedantic : Issue all the mandatory diagnostics listed in the C
# standard. Some of them are left out by default, since they trigger frequently
# on harmless code.
#
# -pedantic-errors : Issue all the mandatory diagnostics, and make all
# mandatory diagnostics into errors. This includes mandatory diagnostics that
# GCC issues without -pedantic but treats as warnings.
#CFLAGS +=  -pedantic

#-Wunreachable-code
#Warn if the compiler detects that code will never be executed. [Seems
to give bogus results]
#CFLAGS += -Wunreachable-code

#Warn if an undefined identifier is evaluated in an `#if' directive.
CFLAGS += -Wundef

# Dump the address, size, and relative cost of each statement into comments in
# the generated assembler code. Used for debugging avr-gcc.
CFLAGS += -msize

# -Winline : Warn when a function marked inline could not be
#        substituted, and will give the reason for the failure.
CFLAGS +=  -Winline


Most of the following are turnned on via -Wall:

# Functions prologues/epilogues expanded as call to appropriate
# subroutines. Code size will be smaller.  Use subroutines for function
# prologue/epilogue. For complex functions that use many registers (that needs
# to be saved/restored on function entry/exit), this saves some space at the
# cost of a slightly increased execution time.
CFLAGS += -mcall-prologues

# Use rjmp/rcall (limited range) on >8K devices. On avr2 and avr4 architectures
# (less than 8 KB or flash memory), this is always the case. On avr3 and avr5
# architectures, calls and jumps to targets outside the current function will
# by default use jmp/call instructions that can cover the entire address range,
# but that require more flash ROM and execution time.
#CFLAGS += -mshort-calls

# Do not generate tablejump instructions. By default, jump tables can be used
# to optimize switch statements. When turned off, sequences of compare
# statements are used instead. Jump tables are usually faster to execute on
# average, but in particular for switch statements where most of the jumps
# would go to the default label, they might waste a bit of flash memory.
# CFLAGS += -mno-tablejump

# Allocate to an enum type only as many bytes as it needs for the declared
# range of possible values. Specifically, the enum type will be equivalent to
# the smallest integer type which has enough room.
# CFLAGS += -fshort-enums

# Dump the address, size, and relative cost of each statement into comments in
# the generated assembler code. Used for debugging avr-gcc.
CFLAGS += -msize

# Dump the internal compilation result called "RTL" into comments in the
# generated assembler code. Used for debugging avr-gcc.
# CFLAGS += -mrtl

# Generate lots of debugging information to stderr.
#CFLAGS += -mdeb

#-Wchar-subscripts
#Warn if an array subscript has type char. This is a common cause of
error, as programmers often forget that this type is signed on some
machines. This warning is enabled by -Wall.
#
#-Wcomment
#Warn whenever a comment-start sequence `/*' appears in a `/*'
comment, or whenever a Backslash-Newline appears in a `//' comment.
This warning is enabled by -Wall.
#
#-Wfatal-errors
#This option causes the compiler to abort compilation on the first
error occurred rather than trying to keep going and printing further
error messages.
#
#-Wformat
#Check calls to printf and scanf, etc., to make sure that the
arguments supplied have types appropriate to the format string
specified, and that the conversions specified in the format string
make sense.
#
#-Winit-self (C, C++, Objective-C and Objective-C++ only)
#Warn about uninitialized variables which are initialized with
themselves. Note this option can only be used with the -Wuninitialized
option, which in turn only works with -O1 and above.
#
#-Wimplicit-int
#Warn when a declaration does not specify a type. This warning is
enabled by -Wall.
#
#-Wimplicit-function-declaration
#-Werror-implicit-function-declaration
#Give a warning (or error) whenever a function is used before being
declared. The form -Wno-error-implicit-function-declaration is not
supported. This warning is enabled by -Wall (as a warning, not an
error).
#
#-Wimplicit
#Same as -Wimplicit-int and -Wimplicit-function-declaration. This
warning is enabled by -Wall.
#
#-Wmain
#Warn if the type of `main' is suspicious. `main' should be a function
with external linkage, returning int, taking either zero arguments,
two, or three arguments of appropriate types. This warning is enabled
by -Wall.
#
#-Wmissing-braces
#Warn if an aggregate or union initializer is not fully bracketed. In
the following example, the initializer for `a' is not fully bracketed,
but that for `b' is fully bracketed.
#
#          int a[2][2] = { 0, 1, 2, 3 };
#          int b[2][2] = { { 0, 1 }, { 2, 3 } };
#
#This warning is enabled by -Wall.
#
#-Wmissing-include-dirs (C, C++, Objective-C and Objective-C++ only)
#Warn if a user-supplied include directory does not exist.
#
#
#-Wparentheses
#Warn if parentheses are omitted in certain contexts, such as when
there is an assignment in a context where a truth value is expected,
or when operators are nested whose precedence people often get
confused about.
#
#This warning is enabled by -Wall.
#
#-Wsequence-point
#Warn about code that may have undefined semantics because of
violations of sequence point rules in the C standard.
#
#This warning is enabled by -Wall.
#
#-Wreturn-type
#Warn whenever a function is defined with a return-type that defaults
to int. Also warn about any return statement with no return-value in a
function whose return-type is not void.
#
#This warning is enabled by -Wall.
#
#-Wswitch
#Warn whenever a switch statement has an index of enumerated type and
lacks a case for one or more of the named codes of that enumeration.
(The presence of a default label prevents this warning.) case labels
outside the enumeration range also provoke warnings when this option
is used. This warning is enabled by -Wall.
#
#-Wswitch-default
#Warn whenever a switch statement does not have a default case.
#
#-Wswitch-enum
#Warn whenever a switch statement has an index of enumerated type and
lacks a case for one or more of the named codes of that enumeration.
case labels outside the enumeration range also provoke warnings when
this option is used.
#
#-Wtrigraphs
#Warn if any trigraphs are encountered that might change the meaning
of the program (trigraphs within comments are not warned about). This
warning is enabled by -Wall.
#
#-Wunused-function
#Warn whenever a static function is declared but not defined or a
non-inline static function is unused. This warning is enabled by
-Wall.
#
#-Wunused-label
#Warn whenever a label is declared but not used. This warning is
enabled by -Wall.
#
#-Wunused-parameter
#Warn whenever a function parameter is unused aside from its declaration.
#
#-Wunused-variable
#Warn whenever a local variable or non-constant static variable is
unused aside from its declaration This warning is enabled by -Wall.
#
#-Wunused-value
#Warn whenever a statement computes a result that is explicitly not
used. This warning is enabled by -Wall.
#
#To suppress this warning cast the expression to `void'.
#
#-Wunused
#All the above -Wunused options combined.
#
#-Wuninitialized
#Warn if an automatic variable is used without first being initialized
or if a variable may be clobbered by a setjmp call.
#
#This warning is enabled by -Wall.
#
#-Wstring-literal-comparison
#Warn about suspicious comparisons to string literal constants. In C,
direct comparisons against the memory address of a string literal,
such as if (x == "abc"), typically indicate a programmer error, and
even when intentional, result in unspecified behavior and are not
portable.
#
#-Wall
# All of the above `-W' options combined. This enables all the warnings about
# constructions that some users consider questionable, and that are easy to
# avoid (or modify to prevent the warning), even in conjunction with macros.
# This also enables some language-specific warnings described in C++ Dialect
# Options and Objective-C and Objective-C++ Dialect Options.
#
#
#
#
#-Wextra
#-Wfloat-equal
#Warn if floating point values are used in equality comparisons.
#
#-Wtraditional (C only)
#Warn about certain constructs that behave differently in traditional
and ISO C. Also warn about ISO C constructs that have no traditional C
equivalent, and/or problematic constructs which should be avoided.
#
#-Wdeclaration-after-statement (C only)
#Warn when a declaration is found after a statement in a block
#
#
#-Wshadow
#Warn whenever a local variable shadows another local variable,
parameter or global variable or whenever a built-in function is
shadowed.
#
#
#-Wunsafe-loop-optimizations
#Warn if the loop cannot be optimized because the compiler could not
assume anything on the bounds of the loop indices.
#
#
#-Wpointer-arith
#Warn about anything that depends on the 'size of' a function type or
of void. GNU C assigns these types a size of 1, for convenience in
calculations with void * pointers and pointers to functions.
#
#-Wbad-function-cast (C only)
#Warn whenever a function call is cast to a non-matching type. For
example, warn if int malloc() is cast to anything *.
#
#-Wcast-qual
#Warn whenever a pointer is cast so as to remove a type qualifier from
the target type. For example, warn if a const char * is cast to an
ordinary char *.
#
#-Wcast-align
#Warn whenever a pointer is cast such that the required alignment of
the target is increased. For example, warn if a char * is cast to an
int * on machines where integers can only be accessed at two- or
four-byte boundaries.
#
#-Wwrite-strings
#When compiling C, give string constants the type const char[length]
so that copying the address of one into a non-const char * pointer
will get a warning; when compiling C++, warn about the deprecated
conversion from string constants to char *. These warnings will help
you find at compile time code that can try to write into a string
constant, but only if you have been very careful about using const in
declarations and prototypes. Otherwise, it will just be a nuisance;
this is why we did not make -Wall request these warnings.
#
#-Wconversion
#Warn if a prototype causes a type conversion that is different from
what would happen to the same argument in the absence of a prototype.
This includes conversions of fixed point to floating and vice versa,
and conversions changing the width or signedness of a fixed point
argument except when the same as the default promotion.
#
#
#-Wsign-compare
#Warn when a comparison between signed and unsigned values could
produce an incorrect result when the signed value is converted to
unsigned. This warning is also enabled by -Wextra; to get the other
warnings of -Wextra without this warning, use `-Wextra
-Wno-sign-compare'.
#
#
#-Waggregate-return
#Warn if any functions that return structures or unions are defined or
called. (In languages where you can return an array, this also elicits
a warning.)
#
#
#-Wstrict-prototypes (C only)
#Warn if a function is declared or defined without specifying the
argument types. (An old-style function definition is permitted without
a warning if preceded by a declaration which specifies the argument
types.)
#
#-Wold-style-definition (C only)
#Warn if an old-style function definition is used. A warning is given
even if there is a previous prototype.
#
#-Wmissing-prototypes (C only)
#Warn if a global function is defined without a previous prototype
declaration. This warning is issued even if the definition itself
provides a prototype. The aim is to detect global functions that fail
to be declared in header files.
#
#-Wmissing-declarations (C only)
#Warn if a global function is defined without a previous declaration.
Do so even if the definition itself provides a prototype. Use this
option to detect global functions that are not declared in header
files.
#
#-Wmissing-field-initializers
#Warn if a structure's initializer has some fields missing.
#
#-Wmissing-noreturn
#Warn about functions which might be candidates for attribute
noreturn. Note these are only possible candidates, not absolute ones.
Care should be taken to manually verify functions actually do not ever
return before adding the noreturn attribute, otherwise subtle code
generation bugs could be introduced. You will not get a warning for
main in hosted C environments.
#
#-Wmissing-format-attribute
#Warn about function pointers which might be candidates for format
attributes. Note these are only possible candidates, not absolute
ones.
#
#-Wpacked
#Warn if a structure is given the packed attribute, but the packed
attribute has no effect on the layout or size of the structure.
#
#-Wpadded
#Warn if padding is included in a structure, either to align an
element of the structure or to align the whole structure. Sometimes
when this happens it is possible to rearrange the fields of the
structure to reduce the padding and so make the structure smaller.
#
#-Wredundant-decls
#Warn if anything is declared more than once in the same scope, even
in cases where multiple declaration is valid and changes nothing.
#
#-Wnested-externs (C only)
#Warn if an extern declaration is encountered within a function.
#
#-Wunreachable-code
#Warn if the compiler detects that code will never be executed.
#
#-Winline
#Warn if a function can not be inlined and it was declared as inline.
Even with this option, the compiler will not warn about failures to
inline functions declared in system headers.
#
#-Winvalid-pch
#Warn if a precompiled header (see Precompiled Headers) is found in
the search path but can't be used.
#
#-Wvolatile-register-var
#Warn if a register variable is declared volatile. The volatile
modifier does not inhibit all optimizations that may eliminate reads
and/or writes to register variables.
#
#-Wdisabled-optimization
#Warn if a requested optimization pass is disabled. This warning does
not generally indicate that there is anything wrong with your code; it
merely indicates that GCC's optimizers were unable to handle the code
effectively. Often, the problem is that your code is too big or too
complex; GCC will refuse to optimize programs when the optimization
itself is likely to take inordinate amounts of time.
#
#-Wstack-protector
#This option is only active when -fstack-protector is active. It warns
#about functions that will not be protected against stack smashing.

Saturday, February 28, 2009

In C are you a Righty or Lefty?

Do you write your code, like almost everyone does, like this (Those are Zeros if you have a funky font):
if( x == 0 ){...}
or do you do it correctly and do it this way?:
if( 0 == x ){...}
Why is the latter the correct way? It prevents you from making this mistake:
if( x = 0 ){...}
"Unless Debugging is an Obsession" put the constants on the left in any conditional test. Also use a lot of parentheses, you can never have to many parentheses, if there is more than one condition in the test. When you put the condition on the left, the compiler will refuse to compile the code at all, because you can not assign a value to a constant. Putting the constants on the right may elicit a warning if you are using a good quality compiler, if your lucky. I've been giving out this advice for years. The responses have been interesting:
I've never made that mistake. I don't need such crutches. -- AVR GCC List It does not read right. -- Well known Compiler Guru, in private email.
What is wrong with reading it as "if Zero is equivalent to X"? Do you want to ship products on time and under budget, or do you want to write code in the way that everyone else does?

Friday, December 19, 2008

Embedded System Compilers generate dangerous code

Volatiles Are Miscompiled, and What to Do about It by Eric Eide and John Regehr raises some troubling concerns about the tools that form the cornerstone of many Embedded Systems that we depend on daily. They ask the question Why are compilers so buggy? then follow up with several reasons. Mostly due to the badly generated code that involves the C keyword 'volatile'. Code is presented to test and in some cases correct this hidden software danger.