Saturday, February 27, 2010

Explaining Interrupts to Non-Programmers, or "I got a go Potty!".

I needed to explain interrupts to people that know nothing about how computers work, can you think of a better example?:

EMI has been getting a lot of blame for the Toyota problems. I speculate that the problem is an even more insidiously interrupt race condition.

Designing for EMI compliance is not new, as this simple and archaic introduction to the subject shows, from 1999: http://www.zilog.com/docs/appnotes/an_noise_imm.pdf

What those of us in the Embedded System industry want to see is the Toyota Source Code. The standard cooperate argument is that this would be a "Trade Secret", and lead to the loss of revenue if available. Perhaps. But in Toyota's case they've already lost trust and substantial amount of revenue. The only way to regain trust is an independent analysis of their software.

There are experts in such systems, like myself, that know how to look for things like improperly handled timer overflow interrupts, or other race conditions. Looking at the source code is what needs to be done, to put the issue of there being a software problem to reset. What we lack is the source code from Toyota to analyze.

Even without EMI causing problems for Embedded Systems (ECU's), there are other more subtle ways that things can go wrong with software that is not correctly written.

In any modern system there are things known as "Interrupts" running. The timing of interrupts can be tricky to those without experience. You could unknowingly create a single instruction window that last only nano-seconds, that if an interrupt falls in *exactly* the wrong place, things go bad, and do not recover. It may take exactly the right sequence of events, to fall exactly in the aberrant interrupt timing window, that is *extremely* hard to reproduce because of the timing issue.

Lets put this in a non-programming perspective:

Lets say you and a group of friends are all setting around your living room talking, drinking some healthy Green Tea. We will call this a running program. Now your two year old daughter enters the room and forcefully announces for all to hear "I got go potty!". Your conversation has been "interrupted", and unless you like it messy, you immediately stop what you are doing, put your tea down, and go service your daughter's request. Once her needs have been dealt with you return to your friends, pick up your tea and continue your conversation where you left off.

The point that your daughter makes her request is completely random in relation to the conversation that is taking place. There might be one particular spot in the conversation where handling the request could have a worse overall long term outcome, such as you just dumped boiling hot tea on someone. Such can be the timing of interrupts.

Program Runs: Chat with friends while drinking tea.
Program is interrupted: I got to go potty!
Save Program State: Pause conversation, put down tea.
Handle interrupt request: Take daughter to bathroom; need I say more here?
Interrupt ends.
Restore original program state: Pickup tea, continue conversation.
http://www.softwaresafety.net

Saturday, February 20, 2010

Software Quality Assurance Outline

I came a across a short introduction to Software Quality Assurance that is worth skimming if you are new to the field.

Alas the site is abysmal as it is hard to see the document covered by all of the ads, and wants you to register to download the document. In almost all cases when I'm asked to register at a site, I just move along. I have no idea what they want my data for, nor how they plan on securing it, they just don't needed in in my view.

An other almost useful, but abysmal site due to most of the links not working or leading to pay for view sites, is PDFGeni, for example Software Safety Assessment. At least they lists one of the works I was involved with several years ago Mine Safety and Health Administration (MSHA) – ACC -System Safety .... You can find more about that at my hardware site.

Wednesday, February 10, 2010

Prius software bug?

Toyota has taken the standard approach to potential software bugs of It is the users fault, for several *years*. I recall discussing the purported "floor mat" problem a few years ago, with someone else in the Embedded System industry.

Myself I think the cause that will ultimately be found with be EMI, probably from intermodulation "intermod" (Frequency Mixing) ,that makes simulating the problem in the lab very difficult.

Something that has been missed by most is that Ford licenses some of Toyota's technology, and they too have issued a recall.

For anyone with the time and inclination to dig into the official reports, this is the place:

http://www-odi.nhtsa.dot.gov/cars/problems/defect/

The Investigations Search Engine will allow searches of current and past NHTSA Investigations of vehicles, tires and equipment opened since 1972, by single year, make or model. An optional item of Vehicle Component may be selected to help narrow the focus of the search.

Sunday, February 7, 2010

"What You See Is Not What You Execute (WYSINWYX)"

Embedded.com [Which annoyingly refuses to load in Opera.] has posted an article worth reading: When good compilers go bad, or What you see is not what you execute by Paul Anderson and Thomas W. Reps. Also don't miss the comments at the end of the article.

Paul and Thomas cover some interesting examples, where the compiler produces correct but unexpected code, especially in security applications.

To me it seems to come down to writing your own security function's in assembly language.

Alas even writing in assembly language is no guarantee that What You See Is What You Execute (WYSIWYX). Long ago I had an 1805 based project that I was developing using an Avocet Systems assembler.

The assembler generated a correct listing file, so I blamed my own code for days for the project continually crashing. When I'd simplified the code to the point that almost none of it was left, I found that the HEX file being produced was wrong! The HEX file had the high and low bytes swapped in a long jump instruction, just the opposite of what the listing showed! To Avocet's credit they did quickly supply a fix.

In that same 1805 era I also had a "bug" that turned out to really be a bad 1805. The 1805 XOR instruction was broken, but only on certain bit patterns! This is why some safety standards mandate that the execution unit be tested at power on. They then go on to mandate that the system must be operational in under one second. An exhaustive test of today's complex micros is not practical to do in under a second. The Paper Pushers never cease to find ways to make conflicting requirements...

Getting back to the WYSINWYX issue. In a safety application it is always good, and some times required, to look at the generated output in an independent tool. Something like the IDA Pro disassembler.

Why would you want to look at the results in an independent tool?

I ran into a problem with an expensive commercial circuit board program. Somehow I managed to get a big hoking hole in the middle of the board, but neither the layout tool, nor the built-in Gerber viewer showed the problem, because they both used the same rendering engine. After that day I learned to always view my Gerber files in an independent Gerber tool. I've also switched to higher quality Open Source layout tools.

"Don't let these disasters happen to you: A pox on modern engineering"

If you have not seen the two part series Don't let these disasters happen to you: A pox on modern engineering by Lewin A.R.W. Edwards, from 2006, on IBM's Developer Works, you should check it out:

Part 1
Part 2
  • It's no longer possible to build any practical device that doesn't rely on patented technology
  • System complexity issues
  • Sustained high production volume from an established vendor is not, by itself, a guarantee of quality
  • Requirements that are not driven by engineering goals are constantly injected into the design process
  • Purchasing and engineering are frequently separated
  • Modern components are designed for modern mass-production techniques
  • Service information is becoming increasingly difficult to obtain and less useful when you do manage to find it
  • The feedback loop between customers and engineering is often nonexistent today

Saturday, February 6, 2010

Would you like a competitive quote to a free product?

Continuing on our rant about tool prices, I got a call from IAR Systems this week.

The IAR person wanted to know what we thought of their compiler, as we have three seats for the AVR. I told the fellow that I was only using it in a single project now, and that I'd switched all but one of my projects to the Open Source WinAVR compiler, based on GCC.

They then asked me if I'd like a competitive quote to WinAVR. When I asked how you have a competitive quote to something that is free, my request was meet with silence. Kind of sums up my view of IAR.

Texas Instruments SafeRTOS Contest

Texas Instruments is sponsoring the DESIGNStellaris 2010 design contest run by Circuit Cellar, with a prize of up to $10,000 USD.

I bring this up in our Software Safety blog because the Stellaris ARM chip comes preloaded with SafeRTOS, an offshoot of the FreeRTOS project.

"SafeRTOS is a unique real-time, deterministic operating system specially designed for critical applications. It is available pre-certified according to key standards in markets including Industrial and Medical. First certified by TÜV SÜD in 2007, SafeRTOS was developed in compliance with IEC61508 SIL3 [Safety Integrity Level], and it continues to set the pace as the first pre-certified real-time operating system available in the ROM of a micro-controller. The Texas Instruments LM3S9B96 is now supplied with SAFERTOS embedded in ROM at no additional cost, saving tens of thousands of dollars and offering a low risk path to certification."

Oddly this means that TI is competing directly with themselves.

A few years ago TI came out with the their virtually unknown dual core ARM part the TMS570, aimed at automotive safety applications.

"Running in asynchronous mode, the TMS570 device was the first to achieve IEC 61508 compliance, the highest level of safety and reliability for automotive applications. The TMS570 is the first homogeneous Cortex™ ARM® R4 based MCU target safety critical automotive applications with a patent-pending implementation of the lock-step cores."

The TI contest, comes with a tools limited to six-months of usage. Something that has always irritated me is that chip vendors, for the most part, do not supply free tools to support their parts. Microchip has always spouted the party line of "If we have free tools, then we will not get third party tool suppliers to support us.". Which is pure BS. Companies like Image Craft exist just fine, as well as many others, to support other companies products. Either be in the component business or the tool business, not both. After all we will spend far more time with the tools, than we will with the chips.

Something else that has irritated me even more, for a long time with TI is their support.

Years ago I designed a TI battery charger chip in to a product. We had some problems with it. When the front line FAE support, who has always been a great asset, could not find any problems with our implementation, they moved us up to the next level of support, where I was told, and this a direct quote: We do not have time to help you. Today they seem to wonder why we don't design in more of their products...

Virtual Network Computing (VNC) Security Implications?

RealVNC has announced their VNC® Viewer Plus for direct connection to the all new 2010 Intel® Core™ vPro™ Processor Family.

"With VNC Viewer Plus, IT or a service provider help desk no longer needs to rely on a functioning operating system and network drivers to take control of an affected user’s PC. VNC Viewer Plus uses the Intel Core vPro processor’s out-of band KVM access so that complex issues such as OS failures and boot problems can be diagnosed and addressed remotely. Users are even able to remotely watch a full PC boot sequence, manipulate BIOS settings or re-install an operating system without the need to take a physical trip to the PC."

There can not possibly be any security implications to that, can there...