Are we insane? 7 simple rules for parallel programming

One of my favorite definitions of insanity is: “doing the same thing over and over again and expecting different results.”

With that definition in mind, think about parallel computing. The goal is to make parallel programming a routine skill expected of every competent programmer. In the 80’s and 90’s we addressed this problem by creating hundreds of parallel programming languages and APIs. Given that today only a miniscule fraction of programmers write parallel software, clearly that approach didn’t work. So what are we doing this time? Well so far, we’re just creating lots of new parallel programming languages; Fortress, CUDA, Ct, TBB, Chapell, X10, OpenMP 3.0, UPC,. etc.

Are we insane? Sometimes I suspect that we may be.

History is a powerful guide. If we want to get it right this time around, we should step back and think about the lessons we’ve learned from the long history of parallel computing. I have studied this history extensively and boiled things down to 7 simple rules parallel programming environment developers MUST follow if they wish to succeed.

1. Less is more … i.e. too many parallel programming languages damage our cause.

2. If a commercial relevance is the goal, than industry must play a leadership role in creating the technology.

3. Metrics matter … if your metrics are all kernels and “toy-codes”, then you’ll produce silly, toy technologies.

4. A successful parallel programming technology must address platforms ISVs care about, not the platforms YOU want the ISVs to care about.

5. Good programming technologies abstract the hardware, but they don’t hide it.

6. It’s called “Amdahl’s law.” not “Amdahl’s recommendation.”

7. Everything starts and ends with the programmer. Hardware doesn’t matter.

In the next few blogs, I will expand on each of these rules. I will provide examples of these rules in action from the history of parallel computing. But I’d be interested in any examples you might have to suggest. And of course, I’d be interested in any lessons from the history of parallel computing you might have to offer.

Responses to comments on my first Blog,

First off, I want to thank everyone who commented on my first blog. I wish I had time to respond to each one in detail. I agree with much of the content in these comments. In particular, I agree that (1) we will never find a one size fits all solution and (2) we will need to build off legacy programming technologies so extensions like OpenMP and TBB are vitally important. In the following, I’ll provide a few random comments on your comments.

1. Declarative languages: these are good when specialized to the needs of a large and very important domain (i.e. Databases with SQL). But they usually don’t work well enough with legacy software. Another issue is the learning curve associated with declarative languages. I know from experience …. I’ve tried to push some of these languages in the past, and have come to realize that programmers just don’t want to move off their familiar base languages. Hence, if a language strays too far from C, it has a low probability of success.

2. Graphical IDE’s that let you program by drawing boxes on a screen and connecting them with links to represent the flow of data have proven very successful in digital signal processing. I’ve used a couple of these in the past and when they fit the problem domain at hand, they are hard to beat. But once again, they just aren’t general enough. Extending them to handle general acyclic graphs with logically atomic operations makes sense. Very smart people have tried these and other generalizations to these visual languages, but as you generalize them and move away from straightforward dataflow algorithms, they become awkward to use.

3. The Multicore organization: I know of this organization but I haven’t been following its work very closely. I plan to do so. I am a big believer in industry groups working together to solve touch problems. I look at the OpenMP Architecture Review Board as a good example of how industry groups can make a big difference in a relatively short amount of time.

4. I’m not sure I agree with a focus on synchronous processes. But I agree that “asynchronous threads are the work of the devil.” I believe that in the future … maybe five years; maybe 10 years … we will look back on the era of asynchronous threads operating within a single address space as a period of crazy, evil programming. I should point out that most of my colleagues at Intel roll their eyes and comment on my weak grasp on reality when I say things like this. But if you’ve ever spent painful days of frustration hunting down a trivial race conditions, you’ll understand what I mean.

5. Do we really need parallel programming? Aren’t serial programs plenty fast for much of what we do on a computer? Do I really need multiple cores just to run my word processor? These are good questions. Clearly, if the workloads we run on our computers don’t change, then there will be little need for many core chips running parallel programs. I can keep four cores busy running on program on each core. But if you give me 64 cores, can I really used them all to read email and edit a document? Probably not. We believe workloads will evolve into something we call “model based computing”. It’s a long argument and I can’t go into it now. The short answer is that workloads will evolve so they don’t just collect, organize and present data. Emerging workloads will increasingly focus on turning data into knowledge by constructing models. It’s a long story and I can’t do justice to it right now … maybe I’ll address this topic in a later Blog.

3 Responses to Are we insane? 7 simple rules for parallel programming

  1. Lord Volton says:

    “Do we really need parallel programming? Aren’t serial programs plenty fast for much of what we do on a computer? Do I really need multiple cores just to run my word processor?”
    A lot of reporters love to use this flawed logic. My TRS-80 Model III has a word processor but I wouldn’t trade in my laptop for it.
    With more power comes more features and functionality. And, of course, all of the things reporters never considered.
    It’s a very silly argument. All the cores Intel and other companies can build will eventually be utilized.
    Just like we tend to find uses for all energy we produce. The more energy we generate the more uses we find for it.

  2. John Dallman says:

    Asynchronous threads in a shared address space are indeed a crazy and evil way of doing programming. However, they’re probably inescapable while we keep on trying to add threading to existing architectures that weren’t really designed for it. Something quite different, that isn’t Von Neumann architecture at all, may well be needed to make good use of all the transistors that can go on a chip these days. The EDGE/TRIPS project ( seems to be sufficiently radical, although I have no idea if it will succeed. It’s always as well to remember that computer architecture isn’t something we do in a vacuum. It’s about trying to put transistors to productive use.
    Special-purpose declarative languages tend to have a common flaw. They’re special-purpose: they map onto some problems very well, but other problems don’t fit the language well. They also don’t help with existing code bases, and some of those are quite important.
    On “serial is enough”, no, it isn’t. Remember that most reporters don’t understand any kind of computing more complex than their own word processing and maybe some page layout. However, serial is enough for some things. To make an analogy, I quite often have Notepad, Word and Emacs all open and editing at the same time. That may sound crazy, but Word is for documentation with tables, Emacs is for source code and Notepad is for five-line test scripts. Simple jobs can be done with simpler tools.
    The idea of applications that centre round models is very valid. Some kinds of applications have worked that way for decades. They have the common property that users will always build models to the limits of the machine’s capability. As they get better machines, they build more complex models, and do accomplish their work, whatever it is, with increasing quality (by some measurement of quality that depends on what they’re doing).
    But don’t just assume that if lots of cores can be built, they will be used. If they’re too hard to use in some way, they won’t get used. Lots of processors down the years have had features that weren’t used, because they weren’t useful.

  3. Hello Timothy,
    Though I’m not a programmer as a coder, I work with the real English as an assembler programming language for the technology which I have called – AI-based expert translation (I told about it in my comments to Sean Koehl’s posts). I consider a really working multi-core processor as a model of the human brain – actually, what could we simulate except the brain for processing information.
    The practical model of the human brain for me is a Cell processor – the cerebellum is like a PU (or PPE) and the sensory areas of the cerebral cortex like APUs (SPEs). I read several articles about the programming the Cell processor, for example, by Nicholas Blachford and Bernard Cole.
    Also, the programming model as a concept is described in “External data interface in a computer architecture for broadband networks” patent by
    Masakazu Suzuoki and Takeshi Yamazaki.
    And, I found a HPCwire’s review – “The Potential of the Cell Processor for Scientific Computing”. There is a general point that:
    “We also conclude that Cell’s heterogeneous multi-core implementation is inherently better suited to the HPC environment than homogeneous commodity multi-core processors.”
    And I’m also projecting these developments on the future mobile version of the Cell processor (by Toshiba) for my project of the Cell PC Platform. What do you think about the Cell processor?