Reinventing DRAM with the Hybrid Memory Cube

Today, Intel CTO Justin Rattner is demonstrating the Hybrid Memory Cube, the fastest and most efficient Dynamic Random Access Memory (DRAM) ever built. I want to give you some background on how and why we collaborated with Micron on this new memory technology. One of my research passions is helping to design computers to be faster and more energy efficient. A portion of my creative energy over my career has been to improve the interconnect within computer systems so that communication between the microprocessor, DRAM, storage and peripherals is faster and lower power with each successive generation. In other words, I’m an I/O guy. One of the biggest impediments to scaling the performance of servers and data centers is the available bandwidth to memory and the associated cost. As the number of individual processing units (“cores”) on a microprocessor increases, the need to feed the cores with more memory data expands proportionally. Legacy DDR-style of DRAM main memory isn’t going to cut it for much of the future high-end systems. Being an I/O researcher, my initial efforts to solve the memory bandwidth problem were focused exclusively on the I/O to improve the circuits, connectors and wires that help to form the connection between the microprocessor and memory. In the past our research team has demonstrated very low-power I/O connecting multiple microprocessors together at high rates. However, the process technology used to implement a CPU is dramatically different than that used for a DRAM and it quickly became clear that there were severe limitations to achieving high-speed and low-power using a commodity DRAM process.

HMC_Stack.jpg

Don’t get me wrong, commodity DRAM processes are amazing feats of modern technology. DRAM process technologies incorporate the ability to hold massive memory capacity within a tiny piece of silicon all at an amazingly low manufacturing cost. However, with this exceptional memory density and cost structure brings limits on logic density and I/O performance; an area in which logic process technology optimized for a CPU really shines. We knew that future high-speed memory will need to conquer a challenging set of tradeoffs and achieve low cost and power as well as high density and speed. We came to the conclusion that mating DRAM and a logic process based I/O buffer using 3D stacking could be the way to solve the dilemma. We found out that once we placed a multi-layer DRAM stack on top of a logic layer, we could solve another memory problem which limits the ability to efficiently transfer data from the DRAM memory cells to the corresponding I/O circuits.

Getting the data out of the memory cells to the I/O is analogous to the difficulty of navigating the streets of a crowded city. However, placing the logic layer underneath the DRAM stack has a similar effect to building a high-speed subway system underneath the streets, bypassing encumbrances such as the DRAM process as well as the routing restricted memory arrays. Additionally, the adjacent logic layer enables integration of an intelligent control logic to hide the complexities of the DRAM array access, allowing the microprocessor memory controller to employ much more straightforward access protocols than what has been achievable in the past.

HMC1.jpg

The result of this joint research project between Micron Technology and Intel has been the development of some key achievements. Last year, Intel designed and demonstrated an I/O prototype that achieved a record-breaking 1.4 milliwatts per gigabit per second energy efficiency that was optimized for this hybrid-stacked DRAM application. The two companies worked together to jointly develop and specify a high-bandwidth memory architecture and protocol for a prototype that was designed and manufactured this year by Micron. This hybrid-stacked DRAM prototype, known as the Hybrid Memory Cube (HMC), is the world’s highest bandwidth DRAM device with sustained transfer rates of 1 terabit per second (trillion bits per second). On top of that, it is also the most energy efficient DRAM ever built when measured in number of bits transferred versus energy consumed. This groundbreaking prototype has 10 times the bandwidth and 7 times the energy efficiency than even the most advanced DDR3 memory module available.

These developments will likely have a fundamental impact on data centers and supercomputers that thirst for low-power high-bandwidth memory accesses. With this technology, next generation systems formerly limited by memory performance will be able to scale dramatically while maintaining strict power and form factor budgets. Additionally, these developments may play a key role in the optimization of system architectures and memory hierarchies of future mainstream systems in the client and server markets.

Comments (9)