By John Du, reposted from our Chinese language blog.
Here I would like to discuss about some hot technical topics. About tera-scale, some readers of the Chinese blog made comments about the communication and the memory bandwidth solutions. I would talk more about it. About the interconnection for many cores on the chip We know that the many core system indeed is an integrated system in one chip with many processor cores. The communication among these cores relied on the on-chip interconnection network. For the interconnect solution, we have 3 considerations of cost in the design, which are power, die area and the complexity. Fist of all, the power – Interconnect fabric is very demanding on power, it can consume up to 36% of chip power supply. Naturally, if we increase the fabric bandwidth, we also increase the power consumption. We need to balance the demand on bandwidth and the request from a power management system, so that we can get a dynamic on-demand power supply to save energy. Second factor is the die area. The feature of VLSI chips is to provide vast number of transistors for designers to build various functional circuits. The interconnect network was just built on these transistors, which possibly occupy more than 1/5 of the die area on one core. The more for interconnect fabric, the less for the processing functions on computing. There must be a trade-off at some point. We prefer not sacrifice too much area from computing function, which means the die area for interconnect fabric is limited. The third is the design complexity. All the circuit design should be optimized. Obviously, a simple circuit is easy to optimize, while the optimization of complex circuits is difficult. Consider 4 common types of the network topology, BUS, bi-directional Ring, 2-D mesh, and the Crossbar. The BUS is the simplest, but each time it can only submit or receive one message. The bi-directional Ring can simultaneously submit or receive messages, and the links can operate at high rate. But when core number increased, it is less economic, because there is only one route. If we add one dimension of the topology, we can get the 2D mesh network, which can process more concurrent messages, with many possible routes. If we continuously increase the dimensions, for example, in a crossbar network, all the cores can communicate with each others simultaneously. The higher dimension networks have better architectural properties. However, they are more difficult to design optimally.





2 Responses to Inside an 80-core chip: the on-chip communication and memory bandwidth solutions
How much of this will be applicable to 16 and 32 core solutions coming down the pipeline in the next 4-6 years?
On a similar note, when we have 16 cores is any of this applicable to 4×4 multiple chip systems?
Interesting stuff!
How many MPEG video streams can an n-CPU multi-core chip decode concurrently?