This week Intel Labs achieved an impressive #6 placement on the Graph500 ranking, a semi-annual listing of the highest performance machines for emerging “big data” supercomputing applications. The results were announced Tuesday at the Supercomputing 2011 conference in Seattle.
In computer science, “graphs” are a way to represent and explore the connections between things, such as all the stops in a subway system (pictured here for New York), all the links in a large computer network, or all the relationships between people on social networks like FaceBook. Graph computing is also used to discover meaningful correlations between events for applications such as medical informatics and cybersecurity.
However, these graph searches are challenging to compute efficiently because unlike a typical database, graphs vary wildly in size, shape and number of connections between nodes. The most interesting graph problems explore connections across large populations or massive datasets – billions of nodes with many more possible relationships between them. Drawing meaning from such “Big Data,” which could represent any type of business, scientific, or social information, is a key challenge for supercomputing and cloud computing alike. Since any node could lead to any other node (or nodes) in a vast web, it is difficult to predict which piece of data will be needed next, resulting in many trips to slower memory such as hard drives before then next computation can be performed.
Intel Architecture is well suited to graph problems because it has a cache hierarchy that is friendly to such irregular data access patterns. The challenge lies in creating the algorithms which effectively utilize this hardware.
Satish Nadathur, Jatin Chhugani, and Changkyu Kim of our Parallel Computing Lab were able to demonstrate the excellent processing power and energy efficiency of IA for the Graph500 benchmark. They achieved this result through innovative algorithm-architecture co-design which allowed the task to make much more efficient use of memory and compute on a Xeon (Westmere) cluster. Sandia Lab’s Richard Murphy, the owner of the benchmark, commended the team on an “outstanding job.”
Even more impressive than the raw performance was the efficiency of PCL’s algorithmic approach. Compare Intel Labs’ result with the #1 entry on the list from NNSA and IBM Research running on BlueGeneQ. For the same graph size (2^32 vertices, over 4 billion), that team delivered 4.4x higher performance – but also using more than 10 times the compute nodes and 17 times the cores. Although some of this is due to architectural differences, much of the reason for Intel’s improved per-core efficiency was in the algorithmic optimizations from our researchers.
The team in Intel Labs is excited to be among the world-class researchers tackling this new challenge.