Saturday, June 19, 2010

Proper handling of Proper Names

Proper handling of Proper Names

Patrick McKenzie recently wrote a blog entry Falsehoods Programmers Believe About Names, based on the understandable rant by John Graham-Cumming, on databases not handling proper names. John's proper last name contains a hyphen.

Patrick gives a list forty assumptions that are wrong when handling the entry of proper names into a system. This article on Slash Dot Org further elaborates and amends Patrick's list.

Alas in my experience the problem of not handling non-alphanumeric characters is not limited web forums. I've wanted to use "+5" and "-5" as net labels in CAD packages in the past. The leading numeric signs caused the program to explode because it did not know how to handle an arithmetic operation in a net label. Which leads to the next problem, that all inputs must be sanitized in some fashion to prevent crashing systems.

What do you do to trade off making inputs secure verses allowing what should be valid proper names and addresses? Unicode and International Components for Unicode can be a help, but they don't always fit into the memory space of small embedded systems. Is there a smaller solution?