Across the spectrum of computing devices, from supercomputers and datacenters to tablets and wearables, there is a constant pressure to increase capabilities while consuming less power. For the high end, it is because we want to fit more processing power into the same space. For mobile, we want to fit our computing into smaller and smaller spaces. At the VLSI Symposia this week in Hawaii, Intel is presenting five papers detailing fundamental advancements in circuits and semiconductor technology with the common theme of energy-efficient performance.
Fine-grained Power Management
For many types of devices, especially mobile ones, it has become common to use a “system on a chip” approach to save both space and power by integrating a variety of processors (e.g. CPUs, GPUs, radios, accelerators…) onto the same silicon chip.
Although the system may power the whole chip at one voltage, in reality each of these elements will require different amounts of power at different times depending on what they are doing. Hence it is becoming common to add voltage regulators (VRs) across the chip so that each region is powered only with the voltage level it really needs.
The ideal VR is fast (to respond quickly to changes in processing workloads) and fully digital, so that anyone building custom SoCs can quickly create them from standard building blocks. This is exactly the achievement outlined in an Intel Labs paper entitled: “A 500 MHz, 68% efficient, Fully On-Die Digitally Controlled Buck Voltage Regulator on 22nm Tri-Gate CMOS.”
This voltage regulator, implemented on a research test chip, is fully digital, switches at 500MHz and has a bandwidth 20x larger than the best reported comparable design. This enables fast and efficient power management across a wide range of voltages and currents. And because the controller is digital it will benefit from Moore’s Law, scaling readily to smaller sizes on future process technologies.
Securing Small Devices
In addition to power management, security is a consideration that spans the entire spectrum of computing devices. Many of Intel’s Core and Xeon processors now incorporate Intel® Advanced Encryption Standard New Instructions (AES-NI). AES was adopted by the U.S. government in 2001 and is widely used across the software ecosystem to protect network traffic, personal data, and corporate IT infrastructure. AES-NI was designed to implement some of the complex and performance-intensive steps of the AES algorithm using hardware acceleration.
Smaller, battery-constrained devices (including phones, wearables, sensors, etc.) will require lightweight, ultra-low energy hardware accelerators that provide just enough encryption acceleration to ensure data integrity. As such, Intel Labs researchers wanted to find ways to accelerate the AES algorithm using as little space and power as possible. That’s the subject of the second paper (with an appropriately cryptic title) “340mV-1.1V, 289Gbps/W, 2090-Gate Nano-AES Hardware Accelerator with Area-optimized Encrypt/Decrypt GF(2^4)^2 Polynomials in 22nm Tri-Gate CMOS.” By right-sizing the performance to the needs of these small devices, researchers created circuits on a test chip that consume less than 1/10 the energy of previously reported work. It’s also extremely small (using just over 2000 transistor gates) which helps to keep these devices compact.
Speaking of transistors, Intel also continues to make advancements in the fundamental semiconductor building blocks of computing. The following sections were provided by Rob Willoner of our Technology and Manufacturing Group.
Connecting all those Transistors while Delivering High Performance and Low Power
We hear a lot about Moore’s Law, and how its progress enables more and more transistors to be placed on a chip, delivering benefits in performance, energy efficiency, and cost per transistor. But interconnecting all those billions of transistors on a chip without adversely affecting circuit performance and power is no easy feat, and it is getting more and more challenging over time. In an invited paper, “Process Technology Scaling in an Increasingly Interconnect Dominated World,” Intel describes how delay and power restrictions imposed by the resistance and capacitance of the interconnect system can contribute to poor circuit performance in an increasingly severe manner as dimensions shrink. Interconnect resistances are increasing faster than the rate at which feature sizes scale, and capacitance improvements are constrained by the required mechanical strength of the assembled stack. Collectively, these cause a bottleneck in both local and global information transfer on a chip.
Novel processing techniques and materials — specifically, deposition methods and new conductor materials — are being explored as means to increase conductive cross sectional area. Molecular ordering of the interlayer dielectric materials is an opportunity to simultaneously deliver lower capacitance with the required mechanical strength. In addition to these improvement paths, novel micro-architectures and applications that are more tolerant of resistive and capacitive scaling constraints must be explored in tandem.
Increasing Memory Bandwidth at Low Latency and Power
At this conference last year, Intel presented technical details of its eDRAM technology (eDRAM is DRAM built on a high-performance logic process). This technology enables the CPU and graphics processor to access large amounts of memory with very high bandwidth and low power. The eDRAM is manufactured on an SoC variant of Intel’s 22nm logic technology, featuring tri-gate transistors which deliver an unprecedented combination of performance improvement and power reduction. Intel is using this technology in select Core processors to increase memory bandwidth at low latency and power, thereby delivering high CPU and graphics performance.
This year, in a paper entitled “2nd Generation Embedded DRAM with 4X Lower Self Refresh Power in 22nm Tri-Gate CMOS Technology,” Intel is describing a much improved 1Gbit, 2GHz embedded DRAM, developed in 22nm tri-gate CMOS technology, with 4X lower self-refresh power compared to the prior generation. Retention time has been improved by 3X by process and design optimizations. Source-synchronous clocking is integrated in the design to reduce clock power without penalizing bandwidth. Charge pump power is reduced by 4X by employing comparator based regulation. Temperature controlled refresh enables minimum refresh power at all temperature conditions.
Critical Circuits Set New Records in Energy Efficiency, Compactness, with 14nm Tri-Gate Technology
Serializer/Deserializer (SerDes) circuits are critical for getting data on and off logic chips. In the paper, “A 2GHz-to-7.5GHz Quadrature Clock Generator Using Digital Delay Locked Loops for Multi-Standard I/Os in 14nm CMOS,” Intel demonstrates the first building blocks for high-speed SerDes in Intel’s 14nm tri-gate technology. The 2.0-7.5GHz quadrature generator is designed to support next-generation SerDes IPs such as CEI-28G and 100Gb Ethernet. The voltage-regulated digital locked loop (DLL) architecture is designed to leverage the area and power benefits of Intel’s 14nm technology. The DLL is at least 4X smaller and 40% more energy efficient than the best previously published DLLs. Three levels of delay tuning, including a 1-bit delta-sigma modulator for fine delay control, provide nearly 4X tuning range along with extremely low jitter and quadrature error. The measured jitter at 7GHz is 176fs-rms, nearly 2X lower jitter than previously reported quadrature generators.
Debating the Outlook and Challenges Ahead
Intel participated in two rump sessions. In one, called “Who gives up on scaling first: device and process technology engineers, circuit designers, or company executives? Which scaling ends first – memory or logic?” Intel Senior Fellow Mark Bohr argued that process engineers, circuit designers and company executives all share the responsibility for continuing scaling. Each has its part to play to ensure that scaling delivers expected improvements in performance, power and cost per transistor. But company executives will tend to pull R&D funding when either process or design engineers can no longer demonstrate a viable scaling path leading to manufacturable and profitable products.
Finally, in a second rump session, entitled “Lessons and Challenges for Future Mixed-Signal, RF, and Memory Circuits,” Intel Fellow and Vice President of the Technology and Manufacturing Group, Kevin Zhang, argued that the demand for larger and ever-faster static RAMs will continue to increase, and that innovation in technology and design will drive SRAM scaling well into the future.