Parallel Programming with Transactions

One of the challenges of parallel programming is synchronizing concurrent access to shared memory. Today, programmers use locks for synchronization, but locks have many pitfalls that make them difficult to use for building large, robust, parallel applications. In the past several years, my group has been working on a new synchronization construct called transactional memory. Transactional memory promises to address many of the pitfalls of locks, providing a synchronization construct that supports composing robust parallel applications in a much better way than locks can.

The idea behind transactional memory (TM) is to replace lock-based synchronization with transactions over shared memory. Using a new transactional language construct, the programmer can declare that a particular code block should execute atomically and in isolation, as if the whole block executes in an all-or-nothing fashion without any interference from other threads. Meanwhile, the system under the hood allows multiple transactions to execute concurrently as long as it can still provide the illusion of atomicity and isolation.

Databases have used transactions for decades so the idea of using transactions for concurrency control is not new. TM simply brings some of the ideas that have proven so successful in databases to mainstream programming languages such as C++ and Java, and to future languages that will support parallelism from the ground up.

The problems with locks

When using locks, a programmer faces a basic tension between ease of use and scalability. Simplistic coarse-grain locking is easy to use but can result in synchronization bottlenecks that hurt scalability. There are several ways you (the programmer) can eliminate these bottlenecks: You can associate locks with individual data elements (fine-grain locking) so that different threads can access disjoint data concurrently. You can use reader-writer locks for individual data elements so that more than one thread can read the same data element concurrently. Or, if you are one of the few genius programmers who understands memory models or non-blocking algorithms, you can try to eliminate locks and write code in which concurrent threads access shared data without any locking (send me your resume if you know how to do this).

The problem with these optimizations is that they risk introducing bugs due to deadlocks and data races. Programmers who do these kinds of optimizations know how tricky it is to get them right.

What’s perhaps worse is that locks don’t support composition. By this I mean that with locks it’s hard to build scalable, thread-safe software components that can be composed together in a manner that doesn’t introduce concurrency bugs but still retains scalability. Even after you’ve performed optimizations and built a scalable software component, once other programmers use your component to create applications, they’ll likely wrap a lock around pieces of code that use your component, effectively reverting to coarse grain locking and losing the benefits of your tuning. If you want other programmers to use your software component and still retain the scalability that you worked so hard to achieve, you have to define and expose locking protocols through the component’s interface. Today’s languages provide no support for expressing, enforcing, or validating locking protocols, so the best you can do is to use comments to specify the protocols and hope that clients of the interface use locks correctly.

Because it’s hard to compose applications using locks, locks don’t support mainstream application development very well. Today, developers build large, commercial applications by composing together software components that are written by different developers often working for different software vendors. Large applications can comprise many libraries that interact in complex and deeply nested ways. Some refer to the development of such applications as “programming in the large”, a phrase that characterizes the scale and complexity of both the programs and their development. To enable the development of commercial parallel applications, we need a synchronization mechanism that supports programming in the large better than locks do. Transactional memory promises to be such a mechanism.

Transactional memory

The best way to provide TM to the programmer is to introduce a new language construct of the form atomic { B } that executes the statements in block B as a transaction. Using an atomic block construct, the programmer simply declares that a block of code should execute atomically and in isolation. The system takes responsibility for implementing atomicity and isolation while allowing as much concurrency as possible. In this way, transactions improve the programmer’s productivity by shifting some of the difficult concurrency-control problems from the application developer to the system designer.

An implementation of TM compiles the atomic block statement to code that makes explicit calls into a TM library. The TM library tracks each memory access inside a transaction and allows transactions to execute in parallel as long as they don’t conflict; that is, as long as one of the transactions doesn’t write to a memory location that other concurrently executing transactions have also read or written.

Like with locks, the programmer has to use atomic blocks correctly to avoid high-level data races and ensure forward progress. The programmer has to use atomic blocks in a way that correctly protects program invariants and data structure consistency in the presence of concurrent updates by multiple threads. And, the programmer has to co-ordinate among threads correctly to avoid deadlock or livelock. Transactions don’t excuse the programmer from getting the synchronization and co-ordination correct in their application.

But unlike with locks, the programmer doesn’t have to deal with eliminating locking bottlenecks when using transactional memory. Rather than worry about optimizing locking bottlenecks and defining locking protocols to support modular software engineering, with transactions the programmer instead tunes the component to avoid conflicts between concurrent transactions. So the programmer still has to be concerned with the inherent scalability of the program’s underlying algorithms and data structures, but the hard optimization problems of using fine-grain locks, reader-write locks, or no locks at all are delegated to the compiler and TM library (possibly to the hardware also). Transactions thereby significantly reduce the tension between ease of use and scalability.

In many ways, transactional memory is similar to other language features such as garbage collection (GC) that improve program robustness and programmer productivity by delegating some difficult aspect of programming to the system. The GC analogy is actually a good one. GC delegates memory management to the system, eliminating some pitfalls associated with explicit memory management (such as dangling pointers) while alleviating other pitfalls (such as memory leaks). Garbage collection supports programming in the large by eliminating the need for programmer-defined memory management conventions at module interfaces (e.g., who has responsibility for freeing which piece of memory). But when programming in a garbage collected language, the programmer must still take care not to leak memory (e.g., by inserting objects into a large collection pointed to by a static field). Dan Grossman at University of Washington has a great article exploring the GC analogy.

For an overview of transactional memory, please see this ACM Queue article from last year.

The Intel Transactional Memory Compiler

My group at Intel has published a lot of our research on core TM technology, but our ultimate objective is to introduce new parallel programming technologies like TM into mainstream use. When introducing new programming technologies, the biggest hurdle is always getting a prototype into the hands of users so that you can get feedback to refine the technology and to improve its performance. One of our goals, therefore, has been to put this technology into the hands of early adopters. To this end, we’ve built a prototype that integrates transactions into the C programming language using the Intel production C compiler, and we’ve made this prototype available on whatif.intel.com. (We describe the underlying technology in this prototype in this paper that appeared in the CGO 2007 conference.) By making our prototype available, we hope to encourage early adopters to experiment with the TM programming model and to build TM workloads. We also hope to get feedback so that we can refine the programming model. I encourage you to download this prototype and experiment with TM.

In future blogs, I’ll cover some of the challenges of transactional memory.

3 Responses to Parallel Programming with Transactions

  1. Joe Seigh says:

    I did an experimental STM implementation in Java mainly to see what an api for it would look like. Pretty ugly. Support for enclosures and pointer abstraction would be really nice. Won’t know how well it really scales until we get low power octocores for the desktop. I figure 8 cores is where you can really see if something has decent scaling potential.
    BTW, lock-free, not just obstruction-free. Read tranactions can be made wait-free if I change the implmentation a little.

  2. Victor Louis says:

    Defects in multithreaded application programming like Thread block, Dead lock and Race conditions can also be solved using static tools. Companies like Symbian , Juniper networks ,Research in Motion(Blackberry),Cisco are using Coverity Prevent, a Static analysis code inspection tool for analyzing source code for fixing defects. Coverity Prevent is also used by the Department of Homeland security to scan many open source projects.