Today, at Strata + Hadoop World, I announced that Intel is investing in a broad initiative around Apache Spark, including new collaborations with Databricks and AMPLab, to empower developers, unlock insights, and drive faster time-to-knowledge. This is in addition to our ongoing engagement with Cloudera and Apache* Hadoop* community for driving the foundation of big data with Hadoop as enterprise data hub. We believe Spark’s efficient in-memory computation within Hadoop enterprise data hub, combined with the performance of Intel® Architecture, enables advanced analytics with faster real-time decisions.
Watch the video demo of Analytics with Facial Recognition using Spark and Hadoop technologies on Intel platform: https://software.intel.com/en-us/videos/making-tomorrow-s-possibility-a-reality
We’re doing this because the world is moving at a new speed. Operating at a new frequency. Changing all the time. Business transactions take place online, collaboration takes place across the globe, and millions of bytes are transmitted every second through the digital atmosphere, in an ever-growing lattice of information.
The challenges are also big. Customers expect technology providers to fit their exact needs and provide flawless user experiences. If they don’t, those customers will find another product or service. This creates a fiercely competitive landscape for tech businesses, as they vie for market share. The Internet of Things is creating a world more connected than we ever imagined is expanding the mind-blowing amount of information our society produces every second. The term Big Data is, itself, an acknowledgement of the wealth of insights to be discovered in the way people interact in the digital world – if you have the infrastructure to store, manage, process, and analyze it.
Data is the currency of the future, and the economy is booming.
To provide a high performance, secure, robust foundation for Big Data Analytics, Intel offers competitive advantages through Intel® Architecture including advanced silicon acceleration and security capabilities, software performance optimization, and seamless integration of our hardware and software. Upon this foundation lies open source software including Java*, Apache* Hadoop*, and Apache Spark*, for which Intel Software is leading innovations to provide the ecosystem with the tools to accelerate the deployment of meaningful solutions. This includes working closely with open source community leaders in Big Data and Analytics such as Cloudera, Databricks, AMPLab, and others to optimize big data and analytics on Intel Architecture.
What I announced today is a new initiative around Apache Spark and new collaboration with Databricks and AMPLab. Apache Spark is an open-source, in-memory cluster computing framework for Big Data designed for machine learning and originally developed in the AMPLab at UC Berkeley. It has been widely adopted by data scientists over the past year. We have engaged with Databricks, one of the pioneers of Apache Spark, to advance analytics capability for the Spark on Intel Architecture platforms and to accelerate the development of the Spark projects such as GraphX, MLlib, and Spark Streaming.
We also are collaborating with AMPLab to accelerate the development of Data Analytics technologies in real-world solutions through developments for SparkR (distributed statistic computing in R on top of Spark) and collaborations for utilizing Tachyon (an in-memory file system in BDAS) to address challenges in Internet-scale machine learning (e.g., Parameter Server, Big Model).
With Databricks, we plan to demonstrate the efficiency of running Spark-based analytics on Intel Architecture-based platforms to datacenter owners using benchmarks and technology education. In addition, we are accelerating analytics using in-memory techniques and enhancing data security using hardware and software mechanisms.
Encryption is one critical and commonly used function in big data. Intel Software recently did a major IA optimization for the encryption feature by leveraging the new Intel security instruction set AES-NI, it resulted in a 16X performance increase for encryption*. In other words, encryption overhead is now down to 94% lower than what it used to be with overhead now down to 3% from 52%. See graphs below. Many industry partners expressed significant interest in this, including Cloudera who incorporated this feature in their CDH 5.3 release.
Intel is heavily engaged with open source community to ensure that all developers have the tools needed to change the world with Apache Hadoop and Apache Spark and to deliver on the promise of big data. Intel’s vision is a horizontal, reusable and extensible architectural framework around these two big data cornerstones that supports many big data domains, facilitates simple integration of data science innovation and allows developers to focus on building the products and services that deliver the benefits and profits of big data analytics.
To learn more visit: http://software.intel.com/bigdata.
* No computer system can be absolutely secure. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.