Category Archives: cgi

CGI HTML Compression

These days with broadband connection, the size of the html size being served shouldn’t really matter right? Wrong! Trust me, when you have a 200 kb file that has to be served, compressing it and serving it will definitely help your audience. And perhaps, even you, if you are constrained by bandwidth limits of your hosting plan. Some hosting plans may provide the compression by default. But if that is not the case, then you may have to do it on your own, at the application layer as opposed to the webserver layer.

Here is some code in Perl, on how that can be achieved.

use Compress::Zlib;

$compress = ($ENV{HTTP_ACCEPT_ENCODING} =~ /gzip/);

if($compress) {
  print "Content-Type: text/html\\n";
  print "Content-Encoding: gzip\\n\\n";
  binmode STDOUT;
  $gz = gzopen(\\*STDOUT,"wb");
  $printf = sub { $gz->gzwrite(@_); };
else {
  print "Content-Type: text/html\\n\\n";
  $printf = sub { print @_; };

That's it. In the rest of your code, that has something like

 print "Hello world";

change it to 

 &$printf("Hello world");

and you are done!

So, what does all the code above mean? First and foremost, not all browsers may recognize compression. Hence, we need to make sure that our strategy works for both types of user agents, those that recognize gzip and the other that don’t. This information is available in the Accept-Encoding header, that’s available as HTTP_ACCEPT_ENCODING environment variable. So, if that contains gzip pattern in it, then we know the client can accept compressed form. Based on that, the response headers should indicate to the client if we are serving the content in compressed more or plain text mode.

Next, based on that, the function defined for variable $printf uses either plain text or compressed streaming using Compress::Zlib. That’s pretty much there to it.

Now, I believe using advanced concepts like tie of perl it’s possible to remove the ugly &$printf statements and retain the original print statements as is. I need to learn some more perl to get to that state. But since I needed to do this for only one existing perl script, I just resorted to converting all the print statements to &$printf(); statements. If I invest more time on learning tie and have some concrete code, I will post it some day.

So, finally, for my specific use case, files of sizes up to 180k are reduced down to those of 15k. That’s about 20 times saving! I could see a noticeable difference in the page rendering. One note though is, that the compression is going to put additional load on the server while it eases up some network load.

Note: Window based compression like lzw is what really made this efficiently possible, being able to compress as content is being written on to the wire. Imagine if we only had Hoffman encoding.

Leave a comment

Filed under cgi