Prototyping intelligent circuits (Intel Labs @ ISSCC part 2)

In my previous blog I wrote about four papers being presented at ISSCC next week that contribute to our efforts in Tera-scale Computing. Today, I’d like continue scanning through the ISSCC abstracts and highlight four other papers, which I’d categorize as examples of “Intelligent Circuits.”

Fundamentally, digital circuits are at the heart of nearly every form of machine intelligence today. However, the qualities that we consider “intelligent,” the ability to perceive, learn, and adapt would usually be found at higher layers of the computer architecture. As sophisticated processors find their way into to more and more devices, often with strict space and power requirements, we are finding that we have a need for our circuits to be smarter, to adapt to their environment and user needs.

The first example of this adaptability is seen in a pair of papers that describe a prototype processor that uses resilient circuits to enable “extreme” operation. This means either running at ultra-low, near threshold voltages for low power or overclocking to ultra-high frequencies to improve performance. Usually, a processor is limited in minimum voltage or maximum clock speed because at some point when circuits are pushed too far, errors begin to occur. However, even when pushed beyond the normal limits (called guardbands) these errors don’t happen often. If you catch them and correct them you can continue to push the circuits. This is very common for I/O circuits, but relatively new for processor logic and memory.

The first of the two papers describes the core logic of the resilient design: “A 45nm Resilient and Adaptive Microprocessor Core for Dynamic Variation Tolerance.” It shows that one can gain more than 20% performance by reducing guardbands and effectively overclocking the chip. The second paper, entitled “PVT-and-Aging Adaptive Wordline Boosting for 8T SRAM Power Reduction” describes a resilient memory (cache) to go along with the resilient core. When turning voltages down it is actually the on-chip SRAM memory that tend to fail first, not the processor logic. So, this prototype incorporates “charge pumps” that boost the voltages on these SRAMs when they are vulnerable to errors – i.e. during memory reads and writes. This allows the overall processor to run a much lower voltage. The prototype showed up to a 27% reduction in power consumption using this technique. Take a look at this video for a demonstration of this resilient technology.

The next paper “A 5-to-25Gb/s 1.6-to-3.8mW/(Gb/s) Reconfigurable Transceiver in 45nm CMOS” describes a “universal” transceiver for system I/O. The goal of this work is to create a single transmitter (TX) and receiver (RX) pair that could be used to support almost any type of I/O in an enterprise platform, from board-to-board I/O in a server rack to chip-to-chip I/O within the box, from standard copper wires to next generation optical fibers. The transceiver can reconfigure to change voltages, speeds (5-25Gb/s), signaling methods, and the equalization technique used to account for noise during transmission. The end result of this approach could be a simplified approach to system I/O that helps reduced costs through the re-use of a single TX/RX design for many applications.

Finally, “A 320mV-to-1.2V On-Die Fine-Grained Reconfigurable Fabric for DSP/Media Accelerators in 32nm CMOS” describes a flexible accelerator that could speed up a variety of tasks in future mobile SoC chips as well as high-end tera-scale systems. The prototype is a fabric of basic elements which can be reconfigured on-demand, transforming itself from one type of accelerator to another. Our researchers have shown that it can process apps from image processing to text searches and a variety of digital signal processing tasks at much higher efficiency than CPU (as much as 2.6 trillion operations per watt). It also can scale down to ultra-low voltage operation to 0.3V to support battery-conserving modes on future mobile devices.

6 Responses to Prototyping intelligent circuits (Intel Labs @ ISSCC part 2)

  1. Peng Zeng (Illinois Tech) says:

    good idea and great job. looking forward to read these paper from ISSCC as soon as possbile!

  2. Jianchen says:

    Thank you for your sharing.
    I have read those paper, and I am curious about how the bandwidth or throughput is measured in your tera-scale or other NoCs.
    Since I think the input pattern of the network affects the throughput, I want to know which benchmark and compiler are used for those performance evaluations.

  3. Robert Sawyer says:

    So when you say error, you mean a qualifying error event (clock mistiming, or voltage droop)?
    Was this integrated into the new 32nm i3 & i5′s or will this be in the next i9 line?

  4. Sean Koehl says:

    Yes – those are good examples. I might have sait that we are catching faults but preventing them from becomeing errors that would be noticed by applications.
    These are research circuits on a 45nm process. As research projects, they are concepts we are experimenting with as candidates for future product designs, but we haven not announced implementation in a specific architecture yet. We tend to look 5-10 years in the future for a typical research project.

  5. Sean Koehl says:

    Jianchen, I passed this question along to the manager of the team that submitted this paper, Ram Krishnamurthy. His response: “We have on-die traffic generator circuits which are used for measuring throughput, bandwidth, power etc. These on-die traffic generators enable injecting data traffic patterns in the NOC for worse-case power, max throughput, and random traffic for average power/performance measurements. We have shown the on-die traffic generator in the ISSCC die-photo but have not described them in depth due to space constraints. We have however shown the power/performance measurements for worst-case power, max throughput, and random traffic.”

  6. Yuzhong Sun says:

    I seems a little bit despaired after reading through the blog although I was hitted by the title “intelligent circuits”. Why? Those papers didn’t go beyond lowering voltages or overclocking over a logic circuit for processors. Maybe, tera-scale computing has to compress its power. Regarding of green computing, I wonder if it is a good news that a so-complexed-making-processes based 32nm circuits would be “idle” for a non-determnined period online. In the field of HPC, green TOP 500 is aside to TOP 500. Thus, we may should consider the tera-scale computing from another view on how many end-user-interested operations/second/watt/nm.
    Furthermore, the blog’s teches are still out of recycled economy requirements regarding of no intelligence of circuites adapts to their next generations by the self-decomposed-and-evolved mode so that Intel and AMD still pollute the earth using their Moore-law-driven “intelligent circuit”. Just think about the updated rate of your cell phones.
    Finally, we do need a brand-new way to take into account of processor tech from the point of view of green earth.
    -Yuzhong