Category Archives: Scalability

Operating System, Database and The Web

I thought of using “Microsoft, Oracle and The Web” but settled for a more generic title.

Most people say that the web has made the operating system less powerful. These days, much of the time is spent on the web for a typical user and as long as there is a browser like Firefox, that works fine on multiple operating systems, then it doesn’t matter which box is used, the end user gets the same experience.

DHTML and AJAX made the web applications so much more powerful, interactive and painless to use. Ofcourse, they are no match for a well designed desktop application (which is why many people prefer to use Thunderbird instead of a web-based corporate email). But for many casual applications, the powerful web applications of today are quite good enough.

One of the less explored idea is that the web applications also made us rethink the way databases are used. Serving large number of pages to the user which are very dynamic in nature is no small feat and relying entirely on a database, however powerful or popular, is not going to work. For example, as of this writing, AdBrite is touting to serve “738 million impressions a day”. Now imagine, tracking all this info realtime in a database. And as time passes, AdBrite will cross the billion page mark and move on. Similarly, the number of searches served by Google or Yahoo! is also in billions. Friendster, MySpace and YouTube are all churning lots of lots of pages (and media) content.

All this is possible, because of newer architectures that are quite different from the run-of-the-mill ERP applications that just rely on ERDs and 3rd normalizations. They require radically different thinking. Massive parallelization followed by deferred aggregation. In this approach, the transaction storage is initially in flat files and then later aggregated to a database. So, the mission critical importance goes to the file storage and not entirely the database.

Most out-of-the-box ERP systems both open-source and closed source for example capture item catalogs in the database. If you are as fortunate as Amazon or even 1/100th or may be even 1/1000th of that, pulling the item data from the database every time the user wants to view the item details is not going to work. So, periodic pre-generation of content is another key mechanism that reduces your database load and improves the performance significantly.

Leave a comment

Filed under Scalability, Web Applications