Daily Archives: November 1, 2007

Amazon S3 Security

You can’t call a system secure unless it’s really secure. Any application that does validation on the client side of the browser can’t afford to bypass it on the server side because someone can always programatically post invalid data. Long back when the app servers and http sessions were just coming up, a few e-commerce applications implemented shopping carts and item pricing using html form hidden variables and got hacked.

Anyway, I am trying to understand the security aspect of using Amazon S3, Amazon’s Simple Storage Service. When a system stores a resource locally, it has the control over serving the resource by authenticating the user. However, when using Amazon S3, this is usually not the case. That is, the very reason most people want to use Amazon S3 is for scalability. This can be achieved only if the url points directly to the Amazon S3 servers itself rather than pointing to your own server. If the url is routed through your own server, then you can do a server-side fetching of the resource stored in Amazon S3 after authenticating the user and then send it. But that defeats the purpose completely since your server becomes a bottleneck. Also, in that case, why store it in S3 and not locally?

Amazon S3 offers query-string authentication mechanism. This passes 3 additional URL parameters, your Amazon Web Services Access Key, Expiration and the Signature which is encrypted string of the resource url and a few other details. This guarantees that a resource is not accessible unless the signature is available, but the moment the url is given to one, and anyone else with an access to the url (which can happen in various ways) can also access the resource. So, essentially there is no user level security. The expiration field (which is also part of the signature, encrypted and safe) offers some level of defense, but still may not be good enough for certain class of secure applications.

So, it’s important to understand the limitations of using an OnDemand Service for powering up your applications. Amazon S3 as a personal backup drive is no brainer. Similarly, using it for completely public access also is no issue. Just those applications that require a tight security based on user (and not just based on the resource itself), can’t make use of S3. This issue is not specific to Amazon S3. Any service, with simple operations as those available in Amazon S3 WebServices will have this issue.


Filed under Amazon S3, amazon web services, Security, Web Services

PageRank Preserving Page Redirection

I had a few cgi scripts written in perl some of which don’t even have any parameters and hence are ideal for Google indexing. These scripts had .pl as the file extension and some of them already have a PageRank. However, recently I decided I wanted to have a more generic extension for my scripts, typically .do, so that I have the flexibility of changing them to .php or .jsp or .asp or .py or .rb or whatever technology it is. However, in doing this, I wanted to make sure that the PageRank is preserved. In addition, I don’t have control over the various pages from where I have in-bound links to these scripts. So, I still have to continue to support the old links but with the right PageRank (once Google starts recognizing the new pages).

From Giving search engine spiders direction with a 301 redirect I figured preserving PageRank can be achieved through a permanent external redirection. With this, here is how I went about implementing the strategy.

Below is the .htaccess file

RewriteEngine On
RewriteRule ^(.*).do$ $1.pl [ne]

RewriteCond %{REQUEST_URI} ^.*.pl$
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteRule ^(.*).pl$ $1.do [R=301,ne]

Now, let me explain each of the above.

The first Rewrite rule makes a file with .do to be replaced to .pl which is the actual file in the system. The ne flag is to make it not do escaping of characters.

The second Rewrite Rule (along with the conditions) indicates that if the request uri is using .pl extension but it’s not an internal redirection (from rule 1), then do an external redirection with status code of 301 (permanent redirect). Note that both conditions are needed, otherwise, the rules would go into infinite loop.

That’s it. I just implemented this strategy today. So, will wait and watch for the PageRank to transfer to the new URLs.

If you have a better and efficient way, please post it in the comments section.

BTW, I also use a hosting solution that provides unlimited domains by mapping an internal sub directory for each hosted domain. So, having the .htaccess in the outer most directory, while works for all the sub-directories accessed from the same host, there is a problem with the virtual hosts mapped to sub-directories. It may be possible to come up with a more complex approach by using RedirectBase and some regular expressions and HTTP variables, but for now, I just copied the same exact code to the sub-directory of the virtual host. I would be interested in a solution that takes care of virtual hosts without having to copy to each virtual host directory.

1 Comment

Filed under mod_rewrite, pagerank, SEO