In an earlier post on is Amazon EC2 right for SAAS, I wrote briefly on whether it would be possible to have proprietary database files in Amazon S3. Based on a recent announcement by EnterpriseDB and further research, I came across Elastra a company, if I understand correctly, provides some kind of a storage virtualization service on top of Amazon S3. See their architecture**.
This is cool technology because, in addition to having elastic compute cloud, it now becomes possible to have elastic database storage cloud. The key is, “database storage” and not ordinary key-value storage that the regular S3 is supposed to be.
Elastra currently has support for MySQL and PostreSQL, the two most popular open-source databases. With this technology, all of a sudden a large-scale database deployment is available to startups as a utility (or as some are calling it as PAAS, platform as a service).
What does it mean to the big players in the database space, mainly Oracle, IBM and Microsoft? Since Amazon EC2 platform is Linux based, Oracle and IBM should be able to soon roll out their own database PAAS on Amazon Web Services infrastructure. Microsoft probably need to figure out how it can enter into this space, especially if they end up acquiring Yahoo!, they would have to deal with a lot of BSD infrastructure as well!
You can’t call a system secure unless it’s really secure. Any application that does validation on the client side of the browser can’t afford to bypass it on the server side because someone can always programatically post invalid data. Long back when the app servers and http sessions were just coming up, a few e-commerce applications implemented shopping carts and item pricing using html form hidden variables and got hacked.
Anyway, I am trying to understand the security aspect of using Amazon S3, Amazon’s Simple Storage Service. When a system stores a resource locally, it has the control over serving the resource by authenticating the user. However, when using Amazon S3, this is usually not the case. That is, the very reason most people want to use Amazon S3 is for scalability. This can be achieved only if the url points directly to the Amazon S3 servers itself rather than pointing to your own server. If the url is routed through your own server, then you can do a server-side fetching of the resource stored in Amazon S3 after authenticating the user and then send it. But that defeats the purpose completely since your server becomes a bottleneck. Also, in that case, why store it in S3 and not locally?
Amazon S3 offers query-string authentication mechanism. This passes 3 additional URL parameters, your Amazon Web Services Access Key, Expiration and the Signature which is encrypted string of the resource url and a few other details. This guarantees that a resource is not accessible unless the signature is available, but the moment the url is given to one, and anyone else with an access to the url (which can happen in various ways) can also access the resource. So, essentially there is no user level security. The expiration field (which is also part of the signature, encrypted and safe) offers some level of defense, but still may not be good enough for certain class of secure applications.
So, it’s important to understand the limitations of using an OnDemand Service for powering up your applications. Amazon S3 as a personal backup drive is no brainer. Similarly, using it for completely public access also is no issue. Just those applications that require a tight security based on user (and not just based on the resource itself), can’t make use of S3. This issue is not specific to Amazon S3. Any service, with simple operations as those available in Amazon S3 WebServices will have this issue.