By Jose Alvarez, Senior Director, Intel CTO Office, PSG
The following blog is adapted from a keynote speech that Jose Alvarez presented as the recent The Next FPGA Platform event, held in San Jose, California.
Professors John Hennessy and Dave Patterson gave a fantastic talk when they accepted their Turing Award from the Association of Computing Machinery (ACM) in 2018. The professors titled their speech and the subsequent paper in the journal Communications of the ACM titled “A New Golden Age for Computer Architecture.” I encourage you to read the paper. It’s fantastic. Professors Hennessy and Patterson believe that we are in a new golden age for computer architecture. They make the case that we need to look for additional ways to continue the semiconductor innovation fueled by Moore’s Law during the last half century and they believe the path to this innovation is through domain-specific architectures (DSAs). They concluded that this new golden age of computer architectures, based on DSAs, will also result in domain-specific languages, all of which require reconfigurability.
I think we’re lucky to be in this new golden age. Gordon Moore published a paper in the proceedings of the 2003 IEEE International Solid-State Circuits Conference titled “No exponential is forever: but ‘Forever’ can be delayed!” In that paper, Moore explained how the performance of semiconductors might be improved using a variety of new technologies including then-new 3D transistors. That vision was realized as Intel 3D Tri-Gate transistors, which have brought us to the present – to the time we are living in now, to the new golden age discussed by Professors Hennessy and Patterson.
This leads to a question: What is the best way to implement these DSAs?
There are many ways to implement DSAs. You can design an ASIC for a specific application. The Google Tensor Processing Unit (TPU) is an example of such an ASIC. You can also use an FPGA. There are several alternatives.
At Intel, we believe that the best approach to creating DSAs in the data center – the way that Microsoft has implemented its cloud-based, deep-learning platform for real-time AI inference called Project Brainwave – is to use overlays. Microsoft’s Project Brainwave employs a soft Neural Processing Unit (NPU) implemented with a high-performance Intel® FPGA to accelerate deep neural network (DNN) inferencing. This NPU DSA has multiple applications in computer vision and natural language processing.
The term “overlays” has been around for a long time. It’s not new. Today’s FPGA technology permits you to create DSAs with your own, custom instruction set architecture (ISA). Once you have an ISA that is specifically tailored to your workload, you can implement the associated DSA using an FPGA, which transforms your workload problem from a hardware design problem – which requires time-consuming compilation, synthesis, routing, and placement – into a software-coding task that compiles quickly. You design the DSA once, download it into an FPGA, and now you have a workload-specific, software-centric engine. It’s all about productivity.
Many advanced workload problems in the data center are being solved in exactly this manner, using the same hardware overlay with a variety of programs compiled for a specific DSA. It’s convenient and it permits you to develop new workload software very, very quickly. Using hardware overlays changes the whole problem and FPGA reconfigurability provides the flexibility needed to develop DSAs in nascent markets when things are in flux. Personally, I think we are very lucky to be in Professors Hennessy’s and Patterson’s new golden age for computer architecture.
Note: The Intel® FPGA Acceleration Hub is a good place to start learning about DSA implementations using FPGAs.
Legal Notices and Disclaimers:
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Results have been estimated or simulated using internal Intel analysis, architecture simulation and modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.
Intel does not control or audit third-party data. You should review this content, consult other sources, and confirm whether referenced data are accurate.
Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings.
Circumstances will vary. Intel does not guarantee any costs or cost reduction.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
Altera is a trademark of Intel Corporation or its subsidiaries.
Cyclone is a trademark of Intel Corporation or its subsidiaries.
Intel and Enpirion are trademarks of Intel Corporation or its subsidiaries.
Other names and brands may be claimed as the property of others.