What’s Needed to Fully Realize the Promise of Heterogeneous Computing in Datacenters?

By Hua Xue, Vice President and General Manager, PSG Software Engineering, Intel Corporation

 

The following guest blog is based on remarks by Hua Xue during a panel discussion on “FPGA Software Innovations” at the recent The Next FPGA Platform event, held last month in San Jose, California.

 

Heterogeneous platforms are coming to the fore in datacenters, as there’s no such thing as one size fits all in this market. To be successful, you must bring together various computing architectures and support them with a common software framework. However, software tools for heterogeneous development are really just now starting to blossom. More innovation is needed to truly realize the full heterogeneous computing vision.

Let me comment on the state of heterogeneous software-development tools for FPGAs: high-level synthesis (HLS) is nothing new. It’s been around for 30 years. Back then, HLS was mostly used as a tool for addressing developer productivity issues, certainly not heterogeneous programming issues. Everybody was trying to improve the performance and density optimization of these HLS tools to match the results achieved with RTL coding. These same challenges remain today.

What’s changed is the audience and the expectations for these development tools. Today, these tools are moving away from exclusively serving RTL hardware designers, who already love FPGAs and who know everything about FPGAs in terms of the architecture, in terms of the clock structure, and all the way down to the FPGA LUT structures.

Most of today’s programmers working on solving datacenter workload problems do not need to know what’s inside the FPGA or any other computing element for that matter. These people just want to understand the programming model. “Does the programming model allow me to write C code and compile it?” they ask. They want something that’s just like a standard C++ development environment and compiler: compile the program and load it, and then it just works.

From that perspective, I think our challenge, and our audience, has changed significantly and our new challenges are quite different from HLS challenges in the 1990s. For example, the traditional HLS figures of merit for high-level synthesis are performance and density overhead. Even today, a high-level synthesis flow will likely add perhaps 10% to 30% or even 50% overhead to a benchmark RTL design. From an RTL programmer’s perspective, that’s too much.

However, our comparison reference point is very different today. RTL-coded designs are not the ultimate benchmark they once were. People today compare the performance of high-level-synthesized FPGA designs against C programs that are compiled and run on CPUs. Consequently, this change creates an opportunity for Intel to demonstrate significant workload performance gains for FPGA-based computing workloads compared to C programs compiled for CPUs. If the workloads compiled for FPGAs deliver significant performance gains, they’re a viable alternative solution for most applications.

At Intel, we believe that open approaches are one key to solving the heterogeneous development puzzle. The question is: How do we solve this puzzle while truly letting open concepts serve their intended purpose?

I think that you address this question with four fundamental elements:

The first element is the choice of language. If the targeted developer is a C++ programmer, then by definition, any language with too many vendor-specific language extensions or specific coding styles is not truly an open-source solution. You need to start with an industry standard, maybe C++ or SYCL from the Khronos Group. Intel’s proposal, Data Parallel C++, enhances SYCL.

Second, you need to have an integrated environment for the development tool. A standalone compiler that accepts a program in a standard language and spits out a vendor-specific FPGA bitstream is not good enough for a conventional C programmer. You need to have an integrated system that provides the right balance between ease of use and performance. You cannot achieve maximum performance with complete ease of use, so you need an integrated design environment that includes a debugger, a performance profiler, advisors, and optimizers. Language integration into a comprehensive development environment is the second key element. The Intel® oneAPI unified programming model provides this unified IDE environment across multiple architectures including CPUs, GPUs, FPGAs, and accelerators.

The third key element is to have a truly open solution with an open community. You need to build an open ecosystem for an entire heterogeneous computing platform. The Intel approach here was to launch the open oneAPI Intitiative and to invite everyone to participate.

The fourth key element is not purely related to software. At the architectural level, from the perspective of heterogeneous hardware integration, you need conscious decision making to help drive the acceleration of datacenter workloads. For example, the kind of memory interface you have and the interconnect you use among computing elements will fundamentally change the compiler’s behavior and its optimizations. Another aspect that will influence performance is the partitioning of code into accelerated kernels versus keeping the workload code on the CPU host. We simply cannot think about software alone. We must also consider the heterogeneous collection of hardware including interconnects that will serve as a target for these heterogeneous workloads to have a fully realized vision of heterogeneous computing.

 

For more information about the oneAPI initiative and the Intel® oneAPI beta product, please click here.

 

 

 

Legal Notices and Disclaimers:

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.  Notice Revision #20110804

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

 

Published on Categories Acceleration, oneAPITags ,
Steven Leibson

About Steven Leibson

Be sure to add the Intel Logic and Power Group to your LinkedIn groups. Steve Leibson is a Senior Content Manager at Intel. He started his career as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He’s served as Editor in Chief of EDN Magazine and Microprocessor Report and was the founding editor of Wind River’s Embedded Developers Journal. He has extensive design and marketing experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.