Chris Wilkerson is a senior staff research scientist at Intel Labs, Hillsboro, Oregon. He is interested in all things technical, and has published papers on a number of micro-architectural topics including low voltage and power efficient CPU microarchitecture, and cache and memory sub-system power and performance. Chris holds more than 20 U.S. patents on these and other topics.
Imagine an SUV that transforms into a motorcycle at the push of a button. When lugging around the family or car full of friends, you’d use the SUV option. When driving solo, you’d save on gas by pushing the button and taking a motorcycle. You’d have the power and capacity of the SUV when you need it and the fuel efficiency of the motorcycle when you don’t. Similarly, to maximize battery life in today’s mobile devices, it’s critical that each task performed on the system is completed with the right resources. Our CPUs should be small and efficient when performing lightweight tasks such as chatting or messaging, but also offer the performance headroom required for heavier workloads such as video and image processing.
Although we may be a long way from implementing technology like this in our cars, recent research shows promising results for CPUs, indicating a potential for 10% performance gains and 22% better energy efficiency. In fact, a recent collaboration between Intel Labs and the University of Texas (UT) has won acclaim for its approach to this problem.
Intel Labs University Research Office has been working with an international team of researchers at universities such as UT, Carnegie Mellon, NC State and the American University of Beirut. This collaboration hopes to develop designs that can configure themselves for a wide range of workloads, scaling down for lightweight tasks, and scaling up to maximize performance for heavyweight tasks. In addition to providing funding, Intel encourages engineers to work closely with students and faculty at the universities to maximize the exchange of ideas. This approach to research has started to bear fruit. The collaboration between Intel and Professor Yale Patt and his HPS Research Group at The University of Texas resulted in the development of MorphCore, a CPU that “morphs” between two configurations. One configuration tailored for high performance single-threaded workloads, and a second for higher throughput multi-threaded workloads. At the 2012 International Symposium on Microarchitecture this work received the “best paper” award for its “conceptual novelty and anticipated long term impact”.
To better understand how this works, recall that maximizing single thread performance requires high performance CPUs to allocate all available resources to the currently running thread. In particular, single thread performance depends on large register files that re-order instructions, allowing them to execute just-in-time rather than forcing them to wait for preceding instructions. In contrast, when concurrently executing several threads, it’s often more efficient for the CPU to delay work on a stalled thread, focusing instead on other threads. In this mode, buffers that allow instruction re-ordering don’t improve performance, and may even reduce efficiency by continuing to consume power.
UT’s MorphCore addresses this problem by modifying the design of a high performance CPU to permit shutting down some buffers and repurposing others. Throughput mode splits the largest of these structures, the physical register file (PRF), into equal partitions, each storing the architectural state of one executing thread. This dramatically simplifies renaming, which assigns buffer space for fetched instructions. In addition, throughput mode turns off the load buffer and much of the store buffer, sacrificing memory reordering in favor of reduced power. With optimizations explored so far, MorphCore simulations show 10% performance improvements as well as energy efficiency (Energy^2*Delay) gains of 22% versus a traditional high performance CPU. With these and other promising ideas developed in collaboration with Intel’s academic partners, processors in the next 5-10 years may offer the best of both worlds: high performance to minimize delay and deliver the best user experience; as well as throughput mode to maximize efficiency when single thread performance is less important.
UPDATE 2/5/2013: Added a reference above to Professor Yale Patt and his group at UT.