I saw a conversation today on Twitter that asked why we don’t just embed proper security into Hadoop instead of suggesting the API gateway approach to Hadoop security that my colleague Blake proposed. The same could be asked about any number of applications and services, but the bottom line is that we believe that a two-pronged approach is best.
Internally, we have dramatically improved Hadoop’s security capabilities via Project Rhino. This enables best security practices like encryption at rest, which cannot be implemented anywhere else. We are also working to standardize the authorization framework and implement token based authentication with single sign-on. These are all core capabilities that absolutely need to be added to Hadoop’s code base.
The gateway approach addresses something else – the API layer. While I agree that any application should protect against common attacks, consider this in the bigger picture. First, consider the number of different features that may be required by Hadoop adopters: tokenization, data field encryption, integration with Active Directory, mapping to OAuth for mobile applications, etc. It would take a staggering number of man-hours to implement all of these features within Hadoop. Now consider the number of enterprise applications that expose APIs — consider the investment required to duplicate those features within each of these application suites. Finally, consider the job of the poor sysadmin who has to selectively enable these features consistently across everything in their domain, along with the one who gets to come along behind him and audit for compliance. Add to that the probability (or lack thereof) that all of these vendors implemented the features with common configuration processes…
Our façade proxy abstracts much of this functionality to an external system with an easy-to-use graphical interface. Implementation and inspection of common security policies can be managed across all APIs within the enterprise. More complex, custom workflows can be created and reused as well. Finally, the gateway complements the Project Rhino work which provides a solid security foundation that can then be extended (in a standard fashion) by the gateway.
Part of the objection/confusion is shown here:
@alexis_gil my problem with it is standard – adding a totally new layer in between with completely new protocol doesn’t help adoption
— Alex Gorbachev (@alexgorbachev) May 14, 2013
I want to clarify that we are talking about standard protocols – in fact, the gateway pattern is really putting a secure front on the already standardized APIs found in Hadoop such as WebHDFS and Stargate. These APIs aren’t new protocols, but the façade pattern helps with separation of concerns and lets the data scientists worry about data and the security folks worry about security.