Reinventing DRAM with the Hybrid Memory Cube

Today, Intel CTO Justin Rattner is demonstrating the Hybrid Memory Cube, the fastest and most efficient Dynamic Random Access Memory (DRAM) ever built. I want to give you some background on how and why we collaborated with Micron on this new memory technology. One of my research passions is helping to design computers to be faster and more energy efficient. A portion of my creative energy over my career has been to improve the interconnect within computer systems so that communication between the microprocessor, DRAM, storage and peripherals is faster and lower power with each successive generation. In other words, I’m an I/O guy. One of the biggest impediments to scaling the performance of servers and data centers is the available bandwidth to memory and the associated cost. As the number of individual processing units (“cores”) on a microprocessor increases, the need to feed the cores with more memory data expands proportionally. Legacy DDR-style of DRAM main memory isn’t going to cut it for much of the future high-end systems. Being an I/O researcher, my initial efforts to solve the memory bandwidth problem were focused exclusively on the I/O to improve the circuits, connectors and wires that help to form the connection between the microprocessor and memory. In the past our research team has demonstrated very low-power I/O connecting multiple microprocessors together at high rates. However, the process technology used to implement a CPU is dramatically different than that used for a DRAM and it quickly became clear that there were severe limitations to achieving high-speed and low-power using a commodity DRAM process. 

Don’t get me wrong, commodity DRAM processes are amazing feats of modern technology. DRAM process technologies incorporate the ability to hold massive memory capacity within a tiny piece of silicon all at an amazingly low manufacturing cost. However, with this exceptional memory density and cost structure brings limits on logic density and I/O performance; an area in which logic process technology optimized for a CPU really shines. We knew that future high-speed memory will need to conquer a challenging set of tradeoffs and achieve low cost and power as well as high density and speed. We came to the conclusion that mating DRAM and a logic process based I/O buffer using 3D stacking could be the way to solve the dilemma. We found out that once we placed a multi-layer DRAM stack on top of a logic layer, we could solve another memory problem which limits the ability to efficiently transfer data from the DRAM memory cells to the corresponding I/O circuits.

Getting the data out of the memory cells to the I/O is analogous to the difficulty of navigating the streets of a crowded city. However, placing the logic layer underneath the DRAM stack has a similar effect to building a high-speed subway system underneath the streets, bypassing encumbrances such as the DRAM process as well as the routing restricted memory arrays. Additionally, the adjacent logic layer enables integration of an intelligent control logic to hide the complexities of the DRAM array access, allowing the microprocessor memory controller to employ much more straightforward access protocols than what has been achievable in the past.

The result of this joint research project between Micron Technology and Intel has been the development of some key achievements. Last year, Intel designed and demonstrated an I/O prototype that achieved a record-breaking 1.4 milliwatts per gigabit per second energy efficiency that was optimized for this hybrid-stacked DRAM application. The two companies worked together to jointly develop and specify a high-bandwidth memory architecture and protocol for a prototype that was designed and manufactured this year by Micron. This hybrid-stacked DRAM prototype, known as the Hybrid Memory Cube (HMC), is the world’s highest bandwidth DRAM device with sustained transfer rates of 1 terabit per second (trillion bits per second). On top of that, it is also the most energy efficient DRAM ever built when measured in number of bits transferred versus energy consumed. This groundbreaking prototype has 10 times the bandwidth and 7 times the energy efficiency than even the most advanced DDR3 memory module available.

These developments will likely have a fundamental impact on data centers and supercomputers that thirst for low-power high-bandwidth memory accesses. With this technology, next generation systems formerly limited by memory performance will be able to scale dramatically while maintaining strict power and form factor budgets. Additionally, these developments may play a key role in the optimization of system architectures and memory hierarchies of future mainstream systems in the client and server markets.

10 Responses to Reinventing DRAM with the Hybrid Memory Cube

  1. This sounds like some really interesting technology, but what I’m really wondering about is what this will do to memory latency? In my line of work, game development, memory latency is a much bigger problem than memory bandwidth. Considering that another buffer is added between the memory and the CPU, it sounds like latency would increase. I’d love to be wrong on this obviously

  2. Steve says:

    @Sander: The short of it is that no, memory latency is not reduced. Other technology from Micron such as RL-DRAM (reduced latency DRAM) trades off die-area for a reduction in latency. In short, they chop the bitlines up into shorter pieces, reducing RC delay, and getting your data out faster. The HMC is all about low-power and bandwidth. In many video applications, getting larger bandwidth is actually more important than reduced latency, especially as density increases. I imagine that with a logic chip built in a logic-specific process would contribute very little in terms of latency to the system. Remember that transistors in a DRAM are incredibly slow due to the high-temperature steps in capacitor formation. A high-speed logic device will be capable of running at incredible frequencies – implying very minimal gate delays (and thus, an insignificant increase in latency relative to the RC delay of a large bitline).

  3. Don says:

    One could use a staged memory setup /Raid process to enhance I/O and capitalize even more on the Volume of data transported…

  4. Harrison says:

    I remember when I was young when the 100 mhz Intel processor was mentioned on CNBC. I thought that processor flew when I finally got one a computer. Nowadays, my computer is 100 times faster. I cannot imagine what computing will be like in 15 years or so with this technology. Thank goodness for Intel.

  5. scott skillman says:

    The question of cost is also very important to the DRAM market. If you look at RAMBUS, it was superior but not adopted due to the price/performance ratio being out of whack.
    If this technology is taken to JEDEC and made into a price competitive commodity then it could take off.
    Another issue is power, the faster you run, the more power you create. If you stack DRAM, then the power density increases as well.