Imagine a world where the Internet is entirely secure and attackers have no place to hide. A major step toward realizing this vision of world-wide security is making sure that all the traffic exchanged between servers and clients is encrypted. This is a very difficult technical challenge since networking speeds are excessively high (10-100 Gbps), whereas cryptographic algorithms consume millions of processor cycles to execute. Intel® is researching solutions toward realizing this vision that can accelerate secure Internet transactions by orders of magnitude. First, the latest Core™ micro-architecture (Nehalem) re-introduces the feature of Simultaneous Multi-threading Technology, SMT into the CPU. SMT is ideal for hiding the cycles of compute-intensive public key encryption software under the stall times of network application memory lookups. Following Nehalem, Westmere adds new instructions for potentially speeding up symmetric encryption by a factor of 3-4X. These instructions not only provide better performance but also protect applications against an important type of threats known as side channel attacks. Last, Intel® has developed superior Integer arithmetic software that can speed key exchange and establishment procedures by a factor of 2X.Simultaneous Multi-threading Technology The most recent Core™ Micro-architecture (Nehalem) developed by Intel® re-introduces the feature of hyper-threading (also referred to Simultaneous Multi-threading Technology, SMT) into the CPU. This represents a major departure from the earlier Core™ micro-architecture where each core was single-threaded. As part of our research we have demonstrated that simultaneous multi-threading can result in substantial performance improvement for a certain class of workloads. Such workloads are associated with secure web transactions. We propose a new programming model where one compute-intensive thread performs only RSA public key encryption operations and another thread performs memory access-intensive tasks. We have shown that RSA is an ideal companion thread for four representative memory access-intensive workloads when Simultaneous Multi-threading is used resulting in 14%-2X potential performance gain. The most benefit for the system comes when a thread performing dependent memory lookups is paired with an RSA thread. The throughput of the memory thread almost doubles, reaching the value it has if not paired with RSA. Another way to interpret the same result is that the RSA computation comes for free due to SMT. In reality the RSA computation is hidden under the very long stall times of the memory thread. We also observe that the throughput of a single memory thread is increased approximately by 30% when SMT is switched ON and the memory thread is multiplexed with another memory thread. The same throughput is almost doubled when the memory thread is paired with an RSA thread. These results indicate that RSA is a much better companion thread than a second memory thread due to the fact that one workload is memory access-intensive and the other is compute-intensive. The RSA performance also increases by 21%. New Processor Instructions In addition to SMT, in the next generation of the Intel® Core™ Micro-architecture following Nehalem (Westmere) a new set of instructions that enable high performance and secure symmetric encryption and decryption will be introduced. These instructions are AESENC (AES round encryption), AESENCLAST (AES last round encryption), AESDEC (AES round decryption) and AESDECLAST (AES last round decryption). These instructions accelerate the Advanced Encryption Standard (AES) AES by a factor of 3-4X. The Advanced Encryption Standard (AES) is the United States Government standard for symmetric encryption, defined by FIPS Publication #197 (2001). It is used in a large variety of applications where high throughput and security are required. Two additional instructions are also introduced for implementing the key schedule transformation: AESIMC and AESKEYGENASSIST. Together with the AES instructions, Intel will also introduce a new instruction supporting carry-less multiplication named PCLMULQDQ. The PCLMULQDQ instruction performs carry-less multiplication of two 64-bit quad words and can support high performance and secure implementation of encryption modes of operation suitable for high speed networking (e.g., AES-GCM). Superior Key Establishment Software Last, Intel has developed superior Integer arithmetic software that can accelerate big number multiplication and modular reduction by at least 2X. Such routines are used not only in RSA public key encryption but also in Diffie Hellman key exchange and Elliptic Curve Cryptography. Using our software we are able to accelerate RSA 1024 from a performance of approximately 1500 signatures per second (OpenSSLg) to potentially 2900 signatures per second on a single Nehalem Core. Similarly we are able to accelerate other popular cryptographic schemes like RSA 2048 and Elliptic Curve Diffie-Hellman based on the NIST B-233 curve. In summary, Intel® is researching new technologies that offer orders of magnitude cryptographic algorithm acceleration. Our ultimate goal is to make general purpose processors capable of processing and forwarding encrypted traffic at very high speeds so that the Internet can be gradually transformed to a completely secure information delivery infrastructure. Authors: Michael E. Kounavis is a research scientist and Satyajit Grover is a network software engineer working on cryptographic algorithm acceleration in Intel’s Network Technology Lab.
Connect With Us
Tags#IntelR&Dday @idf08 Big Data Cloud Computing Ct CTO energy efficient Future Lab Future Lab Radio HPC IDF IDF2008 IDF 2010 Immersive Connected Experiences innovation Intel Intel Labs Intel Labs Europe Intel Research ISSCC Justin Rattner many core microprocessor mobility multi-core parallel computing parallel programming radio Rattner ray tracing research Research@Intel Research At Intel Day Robotics security silicon silicon photonics software development Stanford technology terascale virtual worlds Wi-Fi WiMAX wireless