Throughput Computing for Risk: A Quick Note on Financial Engineering

One of the things my group does while developing parallel programming models is to try to comprehend the application programming models and patterns that our tools will be used to implement. We believe this is essential to any work on programming tools, especially with the resurgence of parallel computing because domain knowledge helps enormously in deciding appropriate parallelization strategies.

It sounds obvious, but it’s surprising to me how often this isn’t the case with tools developers. With our work on Ct, we’ve been looking at a pretty broad set of applications, including image/video/signal processing, games, and select High Performance Computing market segments (the traditional niche of parallel computing). In the last category, computational finance or financial engineering has garnered a lot of attention for a few reasons. First, parts of applications are archetypical Throughput Computing workloads. That is, these algorithms have a seemingly insatiable appetite for compute cycles and they exhibit a lot of parallelism. Second, much attention around GPGPU has focused around computational finance (GPUs do quite well with a subset of throughput computing workloads). Third, IT spending in the financial services is a significant percentage of global IT spending. Besides these reasons, we found enough diversity in the required data parallel programming patterns that we chose to use this field as a case study for our first externally published application note (there are more on the way). Please check it out and send me your comments.

3 Responses to Throughput Computing for Risk: A Quick Note on Financial Engineering

  1. James says:

    Recently quite a number of companies have made progress in what is called HPC SOA programming models to address exactly the same problem. Companies like Platform Computing, Datasynapse, and Digipede, as well as Microsoft all have such application middleware.
    Platform’s Symphony DE is available for free download and use from
    Like to know people’s comments on these approaches as alternatives to MPI.

  2. I_B_M says:
    is very interesting – and multicore scalability is VERY impressive.
    But why there are no absolute performance numbers (only speedup)?
    It’s also not clear what data sizes are (vector sizes etc) and how main memory bottleneck could be avoided with 8 cores if speedup is indeed 50-100x (or more).
    CPUs used (Xeon 5345) is kind of outdated already. Why can’t you test with (for example) E5472?