rENIAC’s FPGA-based Distributed Data Engine boosts Cassandra database performance as much as 10X when deployed as data proxy.

rENIAC’s Distributed Data Engine accelerates the Apache Cassandra NoSQL database by both increasing throughput and reducing latency. Facebook originally developed the Apache Cassandra distributed NoSQL database to power its inbox search feature and released it to the open-source community more than a decade ago, where it has continued to flourish. The distributed Cassandra database application handles large amounts of data across many commodity servers while providing high availability. Cassandra’s distributed nature makes single-point failures impossible.

The rENIAC Data Engine employs FPGA technology, specifically the Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA, to improve the database’s throughput and reduce its latency degradation, which is caused by system bottlenecks and background data-compaction operations. The graphs in Figure 1 show how this performance degradation manifests.


Figure 1: Cassandra’s need to compact data results is one of the tasks that impact periodic throughput and latency performance loss.


The four graphs in Figure 1 illustrate the effects of system bottlenecks related to compute, I/O, and compaction operations on Cassandra database performance. The graph in the upper left shows database throughput in transactions per second (TPS) as it degrades over time.

Continuous use of the database compounds the impact of these bottlenecks. For example, the database’s throughput slowly declines over time when using the standard Cassandra stress test that is part of the Cassandra distribution, with significant, periodic throughput loss every 200 seconds when the data-compaction algorithm initiates. Data compaction is one of the key factors that negatively impact the performance of log-structured merge databases. Each period of significant throughput loss due to data compaction lasts approximately ten to twenty seconds.

The other three graphs in Figure 1 show application latency in response to database queries, with the upper right graph showing mean latency for all query transactions. The lower left graph shows transaction latency at the 95th percentile, and the lower right graph showing transaction latency at the 99th percentile. The three latency graphs show large latency increases occurring with the same 200-second periodicity, which appear as TPS declines in the throughput graph.

The Cassandra database is frequently used for transaction processing and these large performance lapses can cause significant problems for real-time and retail applications, which require a more predictable level of performance. Yet the database compaction algorithm must run.

To combat this loss of performance, rENIAC has developed a technology platform that can accelerate data-centric workloads like Cassandra. This platform employs a data proxy built upon a Data Engine that rENIAC has built using an Intel Programmable Acceleration Card (PAC) with Intel Arria 10 GX FPGA. The performance results for a Cassandra database that’s been accelerated with the rENIAC Data Engine deployed as data proxy for the Cassandra data base appear in Figure 2.



Figure 2: Accelerating the Cassandra database with the rENIAC Distributed Data Engine greatly reduces the periodic throughput and latency degradation due to data compaction and other I/O bottlenecks.


The four graphs in Figure 2 juxtapose the accelerated Cassandara database performance, shown in orange, with the unaccelerated performance graphs, shown in blue. The first graph, appearing in the upper left of Figure 2, shows that accelerated database throughput has jumped by a factor of as much as 4X compared to the unaccelerated database performance. The other three graphs in the figure show that the large, periodic latency spikes seen afflicting the unaccelerated Cassandra database are gone when running the accelerated version.

The latency for the accelerated version of Cassandra is now low and predictable. In addition to mean, 95th, and 99th percentile latencies shown in figure 2, this effect is further seen from the measured 99.9th percentile latencies for both read-heavy and mixed workloads. As shown in Figure 3. The read-heavy data shows a 10X improvement in latency and a 7X latency improvement for a mixed workload.


Figure 3: 99.9th percentile latencies for mixed and read heavy workloads


These performance improvements are due to a combination of local data caching, which alone is responsible for a doubling of performance, along with FPGA acceleration, which further boosts performance by as much as 10X. Meanwhile, rENIAC is working on incorporating additional optimizations and expects to further improve Cassandra database performance in the future.


For more information about rENIAC’s Distributed Data Engine technology and FPGA-based acceleration in general, click here.



Legal Notice and Disclaimers

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at

Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Intel, the Intel logo, Intel Xeon, Intel Arria, and Intel eASIC are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© Intel Corporation



Published on Categories Acceleration, Arria, CloudTags , ,
Steven Leibson

About Steven Leibson

Be sure to add the Intel Logic and Power Group to your LinkedIn groups. Steve Leibson is a Senior Content Manager at Intel. He started his career as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He’s served as Editor in Chief of EDN Magazine and Microprocessor Report and was the founding editor of Wind River’s Embedded Developers Journal. He has extensive design and marketing experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.