Software Development Paradigm Trap main page

Feedback: Software Development Paradigm Trap

Greg Nelson - 8/4/2006

Mark,

Great article -- I just found out about it from Jack Ganssle's "Embedded Muse" newsletter. Being a fan of philosophy of science (including Kuhn) I think your concepts are very good... I wish they were coming into practice faster. (There's a supposition out there that the old paradigm can only be replaced when the last of its adherents dies off!)

There has certainly been a lot of academic research on parallelism and how to use it well. While some of this has been mired in the old paradigm (how to get old FORTRAN scientific apps and COBOL business apps to take advantage of parallel hardware without rewriting them from scratch) a big part of this has been how to split tasks in ways that don't introduce new hazards (cache coherency, etc.) or inefficiencies (processors waiting on each others results). The idea of offloading I/O tasks has certainly been used to great effect (from IBM mainframes to modern PCs) but it's not clear how this helps to address fundamentally complex computational algorithms. It seems like a lot of smart people have been trying to divide tasks up among processors for a long time, with some success but with many unresolved challenges.

This is connected to another untamed issue that falls only partially within the technical realm -- the specifications themselves. Consider, for example, the Therac disasters. In a sense, you could say these were caused by code being written to specifications which were then not followed by the users. Even though the problem came down to a single computation that (in your proposal) would probably reside on a single processor, the results were wrong because the code was not written to do what the users were doing with it.

I believe specifications become even more critical and difficult in light of interacting processors with additional interfaces between them. In addition to a specification of every task, you now potentially have a multi-layer hardware-and-software specification for each pair of communicating processes. There are seemingly simple "standards" out there like SPI -- and even SPI doesn't work the same way everywhere. (Why are there two different choices for clock phase and clock polarity, anyway? We recently ran into an implementation that failed intermittently because it relied on the deassertion of the "slave select" signal for framing -- and depending on the speed of the memory out of which the code was running on the faster master processor, the slave select signal could be deasserted for an interval so short (tens of ns) that the slower slave processor didn't ever see it change. To fix it, we had to discard the "reuse" of a ready-made FPGA IP core and build our own SPI IP.)

A completely different aspect to this problem, one that I think is often overlooked, is that the poor quality of software is not always a result of poor engineering -- often it is attributable to poor management. I fought my old boss for years to institute code reviews, which are well documented as being more effective at getting the bugs out than testing. What happened? Every development cycle was accompanied by a testing cycle, done by the same people who wrote the code, and no code reviews were ever instituted. (The company I work for now is too small to have independent code reviews -- I'm the only software person!)

Perhaps even worse are cases where the non-technical management (CEOs, boards, owners) say "get the product out on X date" without regard for whether it is a quality product. I don't believe the developers at Microsoft (or name any other brand name company you prefer) want to send out operating systems and applications with gaping security holes. But when your product has been announced as "App 2005" and it's 2006 with no release in sight, you can be sure the upper management has their vision blurred with anger and is no longer focusing on quality. (Admittedly, some of this is due to the difficulty-of-development problems you are raising in your article.)

Also, I am concerned that your proposal can only fly in larger companies that can afford larger teams of software developers. Your home construction analogy is very personally familiar, so I'll start there. If I want my house built fast, and I can afford it, I can hire a big team of specialists. On the other hand, if I want to save money, I have to sacrifice speed and learn to do each of the jobs (framing, roofing, plumbing, electrical work, painting) myself. Applying this to the idea of developing multiple distinct apps on discrete micros requires me to serialize each of the jobs, and I suspect the product will take much longer to get to market. (As you've pointed out, this may only happen the first time. However, the first time is usually the only time for small, entrepreneurial companies.)

Related to this are cost issues with many small processors which are insignficant in some industries but murderous in others. Packaging and interconnect are inherently pricey. (In your 200MIPS ARM versus 20 * 10MIPS 8051s, did you evaluate the street pricing? I'm guessing we're talking about $15 for the ARM and $60 for this 8051s.)

Personally, the greatest hope I see is in getting people more familiar with things like the state machine approaches Peter Wolstenholme talks about in his commentary. I have personally taken a couple of large, complicated applications that were subtly and tragically broken and, by reanalysing them as a set of moderately complex interacting state machines, been able to rewrite core sections to make them reliable and correct. Ideas like protection rings and hardware-backed memory managment and protection are other ways of taking your hardware-oriented multiprocessor ideas and implementing them with greater reliability on single processors.

Things may have changed since I was in school, but I don't think these ideas are taught or emphasized nearly enough.

Sincerely,
Greg Nelson, Engineering Manager
PGT Instruments, Inc.

Mark Bereit - 8/6/2006

Greg,

Thanks for a number of excellent observations.

I have had several responses that say "your approach won't work because..." The reasons are varied: cost, power consumption, programming approaches, management mindset. Well, I don't think I have a "silver bullet" solution, and I tend not to believe in them, because one size never fits all. But I think that there are some interesting approaches which could be pursued in the embedded systems domain that, successful, could spill over into more mainstream development. And some of those problems could subside on their own. For example, in semiconductors, cost follows popularity, not gate count. And programmer thought processes do change for advantage; structured programming, object orientation and Extreme Programming have all been shifts that have been made where advantageous.

More discussion on the "could be" and "may be" of development can only help broaden our horizons. Thanks for your thoughts!