Author Archives: James Reinders (Intel)

TACC symposium and programming two SMP-on-a-chip devices

one presenter exclaimed “Time spent optimizing for MIC is time well spent because it optimizes your code for non-MIC processors at the same time.” Continue reading

Wellington and Austin: programming lots of cores

A couple of back-to-back opportunities to see great talks about harness lots of cores, and to give talks about programming options and why we do not need to give up on programmability in our quest for high performance. Wellington this week, Austin next week. Programming is not easy, and neither is parallel programming. Nevertheless, many [...] Continue reading

Coarse-grained locks and Transactional Synchronization explained

Coarse-grained locks, and the importance of transactions, are key concepts that motivate why Intel Transactional Synchronization Extensions (TSX) is useful.  I’ll do my best to explain them in this blog. In my blog “Transactional Synchronization in Haswell,” I describe new instructions (Intel TSX) that will improve the performance of coarse-grained locks.  Understanding coarse-grained locks and [...] Continue reading

Transactional Synchronization in Haswell

We have released details of Intel® Transactional Synchronization Extensions (TSX) for the future multicore processor code-named “Haswell”. The updated specification (Intel® Architecture Instruction Set Extensions Programming Reference) can be downloaded. In this blog, I’ll introduce Intel TSX and provide a little background. Please refer to The Transactional Synchronization Extensions Chapter (Chapter 8) in the manual [...] Continue reading

OPEN CASCADE introduced parallelism into SALOME SMESH Module (using our tools)

OPEN CASCADE S.A.S and Intel Corporation software teams decided to join their efforts to introduce parallel calculations into Salome SMESH Module. They developed with the help of Intel® Parallel Studio XE. They wrote an article about it which can be downloaded (for free) from Parallelism_in_SMESH.pdf   Continue reading

"Award Winning" Intel Parallel Studio XE

HPCwire recognized Intel Parallel Studio XE, the same month we added even more to like with Intel Cluster Studio XE. Continue reading

"Award Winning" Intel Parallel Studio XE

HPCwire recognized Intel Parallel Studio XE, the same month we added even more to like with Intel Cluster Studio XE. Continue reading

MIC architecture support by software tools – SC11 wrap-up

This week we demonstrated the Knights Corner co-processor at SC11 and we had many developers demonstrating real results with the prototype systems. During the “SC11 season,” a number of tool vendors announced they will be providing versions of their software tailored to supporting MIC architecture, starting with the Knights Corner co-processor. Here are the ones I know [...] Continue reading

quick chat about MIC architecture with Mike Dewar, NAG

I ran into Mike Dewar at SC11 today as the exhibition draws to a close.  Mike is the CTO of NAG Ltd. – a company we’ve had the good fortune to work with for years. NAG is one of a handful of companies that have been providing feedback on our Knights Ferry (prototype MIC architecture). [...] Continue reading

Seeing One TeraFlop, the software side, and feeling a bit emotional

I’ve known this day was coming – but when I saw Knights Corner clearly sustaining a TeraFlop (DGEMM, wide range of block sizes) – I was surprised by my emotional reaction inside. Hard to describe; it was a good feeling. Tuesday November 15, 2011, we showed a Knights Corner co-processor for the first time outside [...] Continue reading

Ready for 2X Moore’s Law: Intel Cluster Studio XE

While Moore’s Law continues to double transistor count every 18 months, the translation into performance of the Top 500 computers in the world is resulting in a much faster pace. Helping software development keep pace requires great tools. Continue reading

Let’s rename "for" to "serial_for"…

Proposal: rename for in C and C++ to serial_for No more incumbent “for.” (it was voted off the island) (let’s assume parallel_for == cilk_for in this discussion) Consider: serial_for (i=0; i < n; i++) { body } vs. parallel_for (int i=0; i < n; i++) { body } serial_for allows the values of n and i [...] Continue reading

Parallelism as a First Class Citizen in C and C++, the time has come.

It is time to make Parallelism a full First Class Citizen in C and C++.  Hardware is once again ahead of software, and we need to close the gap so that application development is better able to utilize the hardware without low level programming. The time has come for high level constructs for task and [...] Continue reading

Hamburg Germany – ISC’11

I’m headed to Hamburg Germany for the International Supercomputing Conference next week. We will be talking a lot about our Many-Integrated Core (MIC) Architecture and how to realize amazing performance on highly parallel applications. We have some incredible demos and partners in our booth – so I hope you can drop by to visit us. [...] Continue reading