Monthly Archives: July 2007

Operating System, Database and The Web

I thought of using “Microsoft, Oracle and The Web” but settled for a more generic title.

Most people say that the web has made the operating system less powerful. These days, much of the time is spent on the web for a typical user and as long as there is a browser like Firefox, that works fine on multiple operating systems, then it doesn’t matter which box is used, the end user gets the same experience.

DHTML and AJAX made the web applications so much more powerful, interactive and painless to use. Ofcourse, they are no match for a well designed desktop application (which is why many people prefer to use Thunderbird instead of a web-based corporate email). But for many casual applications, the powerful web applications of today are quite good enough.

One of the less explored idea is that the web applications also made us rethink the way databases are used. Serving large number of pages to the user which are very dynamic in nature is no small feat and relying entirely on a database, however powerful or popular, is not going to work. For example, as of this writing, AdBrite is touting to serve “738 million impressions a day”. Now imagine, tracking all this info realtime in a database. And as time passes, AdBrite will cross the billion page mark and move on. Similarly, the number of searches served by Google or Yahoo! is also in billions. Friendster, MySpace and YouTube are all churning lots of lots of pages (and media) content.

All this is possible, because of newer architectures that are quite different from the run-of-the-mill ERP applications that just rely on ERDs and 3rd normalizations. They require radically different thinking. Massive parallelization followed by deferred aggregation. In this approach, the transaction storage is initially in flat files and then later aggregated to a database. So, the mission critical importance goes to the file storage and not entirely the database.

Most out-of-the-box ERP systems both open-source and closed source for example capture item catalogs in the database. If you are as fortunate as Amazon or even 1/100th or may be even 1/1000th of that, pulling the item data from the database every time the user wants to view the item details is not going to work. So, periodic pre-generation of content is another key mechanism that reduces your database load and improves the performance significantly.

Leave a comment

Filed under Scalability, Web Applications

Google Search Privacy

If you are a web master, you know that Google tells you “top search queries” and also “top search query clicks” and the corresponding “average top position” for each of those queries. As a web master, I am sure you would love this information. But if you are the end user doing those queries?

Let’s take a pause and see, what these two types of information are. One gives the position of your web page for the searches that are conducted while the other gives the position of your web page when the user actually clicked on your web page in the search results. The first piece of information is simple and straightforward. However, how it possible to get the second piece? The only ways it’s possible is by tracking the clicks by the users.

Recently, Google’s search results also has “View and manage your web history” link on the top of the page. I personally can’t understand why people want to keep track of what they have searched in the past. No one really wants that, especially if they are concerned about privacy.

So, how is it possible to track the clicks? If you actually see the search result links, they have the standard href=link syntax. So, the link itself is directly pointing to the website itself and so when you place the cursor on top of the link, you do get the link in the status bar. However, there is also the onmousedown event that actually routes the click through a function that hijacks the original link and replaces it with a redirect through Google. That’s how Google knows about the click.

So, if you are over cautious about privacy, what would you do? I have searched userscripts.org for any GreaseMonkey’s user scripts that fix this issue. One seemed to fix the issue, but the way it did it was to register an additional event listener that sets the link back to the original. The reason why that author had to do it that way is perhaps, from within GreaseMonkey scripts, it’s not possible to directly alter the events. Instead, one has to use the addEventListener to register an additional listener. So, it’s not possible to prevent the listener set by the original content.

While the above way of resetting the link back to the original link is fine, the way I addressed this problem is with a oneliner. It is


unsafeWindow.clk = function() { };

That’s it. What this does it, it replaces the window.clk function of the results document that’s called from the onmousedown event listener with a different function that does nothing. Ofcourse, this is specific to Google and the earlier idea of resetting the link may work as a generic case.

Leave a comment

Filed under Google, Google Search Privacy

Multi-Table Insert

Today, I was trying to help a friend in migrating data from one data model to another. The first data model had all the levels of the dimension in the same table while the second data model had them in different tables. So, instead of writing some complex procedural logic, I tried using Oracle’s Multi-Table Insert which essentially has the following syntax

insert all
when then into <table> values (…)
when then …
select … from where … connect by … starts with … order by level;

The data models in this specific architecture are auto generated and have the foreign key constraints automatically created. As a result, I tried to order the data by the level of the dimension hierarchy so that the parent level is always inserted before the child level. However, the above sql still gave constraint violation error.

After a bit of a research, apparently, Oracle doesn’t guarantee the order of inserts in spite of the explicit ordering in the select clause. Well, I fixed the problem in a quick and easy manner using a pl/sql block with looping on the level from 1..n.

Anyway, I think the Multi-Table Insert is a cool feature but given the fact that this has been designed specifically keeping ETL in mind, it would be great if this can be enhanced to honor the ordering of the data.

Leave a comment

Filed under Advanced SQL, data migration, ETL, Oracle

Is Compiere The First To Introduce Model-Driven Enterprise Software Architecture?

Compiere’s CEO recently had an article about the product in which he mentioned

“Compiere is more than an open source company. We are an innovative ERP and business solution provider. Our software utilizes a powerful model-based application platform. This enables Compiere to define all of the business logic of the application in a data dictionary. The platform allows customers to modify, extend or build on top of our system by simply specifying the business logic in our dictionary. It is rapid, productive, and results in higher-quality applications.”

Back in 1998, I interviewed with a company called TenFold which had it’s glory during the dot com boom with the stock as high as $70 or so and today it’s a penny stock. While I didn’t join the company, I had a few friends who did and that’s how I knew 10fold had a very powerful architecture and they do have model driven architecture called TenFoldDictionary. Check their patent.

The reason for TenFold’s failure as a stock is for various different reasons. However, my understanding is that they did have a very good framework. I even know they spent time and money in making their architecture work with MySQL.

So, for people who are purely interested in a model-driven framework, make sure to evaluate TenFold as well. But if you are looking for an out of the box ERP, then Compiere may be the choice. However, I don’t see now on TenFold’s website, but a while back they used to have a downloadable bundle with documentation touting how a business analyst can build a complete application like SalesForce.com in 1hr (yes, 1hr) using their software.

Whatever, these claims be, frankly, great ERP software doesn’t end with a good framework. The nitty/gritty details and the continuous evolution of business processes and new business models is what makes it more challenging.

Leave a comment

Filed under compiere, Model Driven Architecture, TenFold

Cloud of Clouders

A lot of people are using tag cloud or keyword clouds these days. Many websites provide free tools to create these clouds. ToCloud.com has used it’s log of the websites that have been converted to keyword clouds, for each of those websites, got the number of bookmarks on del.icio.us and then plotted it as cloud of clouders.

Is this “cloud of clouders” a tag cloud or a keyword cloud? I think it’s neither. Because, tag clouds are based on tagging while keyword clouds are based on converting of text within a page, article or book. However, a cloud created using the names of an entity with some metric of those entities as cloud frequency can probably called as “metric cloud” or “statistical cloud”.

Leave a comment

Filed under keyword cloud, mashup, metric cloud, statistical cloud, tag cloud, word cloud

Thoughts On Net Neutrality

I am thinking more about Net Neutrality these days. These are only some of the thoughts like a devil’s advocate and not finalized opinions.

I see two types of proponents of Net Neutrality. Those that don’t want either the content providers to pay nor the consumers to pay additional fee to the ISPs. And, those that don’t want content providers to pay to ISPs, but agree to ISPs providing tiered pricing to consumers. I frankly don’t understand the first category, since I don’t think installing and operating a network is not free and some one has to pay for that work. So, I want to explore if the second type of proponents are correct.

Now, take YouTube for example, which generates a lot of bandwidth requirement due it’s video streaming. It’s free for end users and the service makes money through ads. Any good service can never be free and some one got to pay for it. A different model for YouTube would be to charge the consumers a minimal fee and not have ads at all. However, YouTube wouldn’t want to do this because, they know that it’s possible to make far more money by making the advertisers bid for their ad space than charging a flat-rate to consumers.

Basically, everyone knows it’s usually much more profitable and have higher margins in a B2B model than in a B2C model. So, these very Net Neutrality proponents who justify that the ISPs should make their additional money to operate additional network bandwidth by charging the consumers based on their usage and hence essentially suggesting a B2C model, themselves want to go with a B2C model.

Think about it, Google could have chosen to make search as a subscription based service to consumers and let the various businesses to put their ads in the search results for free instead of making them to bid for their position.

If content providers have the desire to make their content reach the end user without having to pay to the ISPs, even by bandwidth alone and let alone by bidding to that bandwidth, wouldn’t every website have the same desire to reach the consumers through the search engine?

In the above analogy,

Content Provider = Website
ISP = Search Engine
Consumer = Consumer
ISP Subscription = Search Service Subscription (note, the price of ISP need not be same as Search Service price).

If the search service providers (SSPs) don’t want the websites to have a free ride of their precious page-view bandwidth, why would an ISP want content providers to get a free ride of their network bandwidth?

Let me know how the above thinking is flawed or can be reinforced with tweaks.

Leave a comment

Filed under AdWords, Google, ISP, MSN, Net Neutrality, SSP, Yahoo!, YouTube

Why Google acquired Grand Central?

I don’t know the real answer. I like to pen down the main reason I can think of.

Let me first digress a bit. If you use LinkedIn, you would know that it’s possible for LinkedIn to create a profile of you based on the people you are connected to. This is in addition to all the personal details you provide about yourself. However, personal information like school and work will not completely distinguish two people. As the saying goes, “A Man is known by the Company he Keeps”, in addition to the personal information, the LinkedIn connections will give more information about a person.

The more accurate profile any company has about a person, the more it can target it’s services. For Google, that’s typically advertisement. With a service like Grand Central, Google will be able to amass the people relationships using the phone calls (A calls B). Currently, LinkedIn has no way to give weightage to a relationship. When two childhood buddies connect on LinkedIn that’s no different from when a recruiter hooks up with a person. Given that beyond that initial connection, the actual email communication happens outside LinkedIn, there is no better way for LinkedIn to establish additional weightage to each relationship.

On the other hand, the services offered by Grand Central allows it to track who is calling you all the time. The more calls you receive from a number, the more weightage can be given to that connection.

In addition, say you are trying to buy a house (well, now is not the right time to do so in many parts of the US at present, but say you are one of those who is still thinking of buying one). Now, if Grand Central figures out that you are working with some local real estate agent based on the calls you have been constantly receiving, Google can start showing you mortgage related ads, real estate ads etc. Ofcourse, they can do that based on what you are searching as well. But based on what it knows about that particular realtor, it can target even more.

Infact, Google has already been doing this with email. While Yahoo & Hotmail choose to not put any email address that you send an email into your address book by default, GMail does the opposite. It’s essentially cataloging all your network and the more you keep using Gmail, the more it can learn about you! By acquiring Grand Central, it not only knows your email network, it also knows about your phone network!

Leave a comment

Filed under Google, Grand Central, linkedin