Category Archives: Web 2.0

CloudStore – Product Catalogs using Image Clouds

If you liked tag cloud / keyword cloud concept using text, think of what can be achieved using images instead of text! That is exactly what CloudStore – Online Shopping using Image Clouds from ToCloud does. The Digital SLR Cameras Image Cloud displays all the Digital SLR Cameras from Amazon as an Image Cloud. The cameras are ordered from left-to-right and top-to-bottom using Amazon’s SalesRank while the size of the Image is set to reflect the list price of the digital cameras. So, those digital SLR cameras that are more expensive are shown big while those that are cheap are shown small. Further, the images have a border rendered with different colors. Green indicates a “too low to display” price of Amazon, orange indicates that the sales price on Amazon is less than the list price while Yellow indicates that the list and sales prices are the same.

As far as I know, this is the first instance where a Web 2.0 concept of tag clouds has been implemented for Product Catalogs. What’s cool about this is the fact that it makes use of html image maps to be able to show the user additional information about each product and clicking on a particular product takes the user to the product details page on Amazon.

I have noticed an Image Cloud from listed at wikipedia which seems to have multiple drawbacks. They are, 1) there is no semantics to the ordering of the images 2) each image in the Image Cloud is a separate which ends up requesting several http requests. But perhaps that website is the first to come up with the concept of Image Clouds while ToCloud is perhaps the first to use Image Clouds for Product Catalogs.

Leave a comment

Filed under Image Cloud, Procurement, Product Catalog, tag cloud, Web 2.0

University 2.0

What is university 2.0? This is just an idea I am mulling about. With the cost of education going up, people may not have time, money and commitment to study beyond Bachelors or Masters all the way to complete a PhD. That doesn’t mean at a later stage they can’t continue to invest time and effort in something they are passionate about. If that something is inline with the work they are doing that would be even better.

However, to do independent research, why should that be termed as University 2.0? Well, that’s where I want to explore the Web 2.0 ideas into this independent research. Perhaps, there will be open, free (or inexpensive) collaboration to do research. For example, an open source research journal, willing volunteers ready to spend time mentoring the research aspirants (this should be possible just the same way people are ready to contribute to Wikipedia) and perhaps even virtual degrees (well, those who are interested in researching on the side, perhaps don’t really care about the certificates, but hae, why not?). And as these days every social-network system seems to be interested in giving a number to everything (perhaps it all started with PageRank?), some kind of a popularity ranking to each research.

1 Comment

Filed under PhD, research, university research, Web 2.0

What about PermaPages?

As you keep writing more and more blog articles, the older articles disappear from the main page. However, they can still be accessed, using a unique url that never changes. These are called PermaLinks and they are important from a search engine perspective.

However, I would like to see the concept of PermaPages for blogs. Presently, on WordPress for example, the previous entries are accessed using and /page/3/ etc. Instead, they should be accessed as<n>/ /page/<n-1>/ etc.

Why is this? This way, my first 10 blog articles will always get /page/1/ and the next 10 /page/2/ and so on. The latest is always /page/n/ or simply /page/ or even more simply just the homepage of the blog. With this, each set of 10 blog articles will collectively get indexed.

So, if you are implementing a system that has a rolling log of content, try to use the above scheme to create PermaPages. This will be good especially if you have AdSense as your 10 (or X number) of articles are indexed together and remain constant.

Leave a comment

Filed under AdSense, Web 2.0

Mashup: Tag Cloud + Amazon Products has extended the Tag Cloud mashup with Google Suggest to now support exploring Amazon Products from the Tag Cloud. This gives an opportunity for bloggers to quickly check what kind of products correspond to their blogging content. This is useful for people considering placing affiliate links to generate additional revenue.

Here is an example of tag cloud and when you click on each link, you can choose either Google Suggest or Amazon to display the related content for each of the words in the cloud. When Amazon is chosen, it is also possible to pick the product category.

Leave a comment

Filed under affiliates, keyword cloud, mashup, page cloud, tag cloud, Web 2.0, word cloud Effects on a Tag/Keyword Cloud is one of the popular Web 2.0 javascript library which goes with the theme of “it’s about user interface baby!”

And Tag Cloud is also a Web 2.0 concept.

So, what if we combine these two together? You get effect on a Tag Cloud. That’s exactly what the ToCloud Keyword Cloud Generator has done. Here are a few examples.

Plusate Effect on My Blog

Grow Effect on Amazon Homepage

Shake Effect on

BlindDown Effect on Yahoo!

Leave a comment

Filed under DHTML, javascript, keyword cloud,, tag cloud, Web 2.0, word cloud

News Cloud

How about converting the news feeds into a keyword cloud? I did some research and this is already being done for quite sometime. The notable entries are which creates a news cloud out of Google News. is another site that’s doing news cloud not only for Goolge News, but also other sites like Slashdot. also uses Google News to create the news cloud. However, only phrases are presented. The advantage with a cloud made of phrases as compared to keywords is, that the phrases give more context. This is especially important for a text source like news that keeps continuously changing and a keyword’s importance is very temporal.

Leave a comment

Filed under news cloud, Web 2.0

Keyword/Tag Cloud Suggest – A mashup

What happens if a keyword cloud or a tag cloud is mixed with Google Suggest? You get a Cloud Suggest. The motivation behind this is that say you have a tag cloud of your blog. The cloud gives you and your readers a quick idea of what topics you mostly cover in your blog. But what if you or your readers want to know what are the popular searches related to those tags? That’s where you can make use of Google Suggest. seems to be the first keyword cloud generator that has this Cloud Suggest idea. So, once a cloud is generated, clicking on any of the word/phrase opens a popup that fetches suggestions for that word/phrase from Google Suggest.

Leave a comment

Filed under keyword cloud, mashup, tag cloud, Web 2.0

Advanced Keyword Cloud Features

Creating a keyword cloud from a page shouldn’t be that hard as it just involves breaking up the text into words, then counting the frequency of the words and then finally displaying them as a cloud. That’s it. Right? Wrong!

A keyword cloud can be made more sophisticated. Some of the features to keep in mind are

1. preserving case for abbreviations. So, for example, if there is a web page about SEO (Search Engine Optimization), then when creating a cloud of that web page should not be displaying it as seo but as SEO. This is very important as people are more used to seeing any abbreviation in uppercase and not lower case.

2. displaying the keywords in the order of their occurance in the page. Wondering why this may be useful? Say you have a blog which contains the most recent blogs at the top of the page. Obviously, you then may want a cloud that provides keywords of your recent articles first and then subsequent keywords.

3. one of the most difficult parts of the keyword cloud generation is the extraction of phrases. now has the capability to extract meaningful keyword phrases from a page and so my blog’s keyword cloud starts showing up keyword phrases (click the My Blog To Cloud link to see this in action).

1 Comment

Filed under keyword cloud, Web 2.0

Skill Cloud – Making a resume Web 2.0 compliant?

Earlier I wrote about and how it converts pages to keyword clouds and tag summary pages to keyword clouds. A new tool offered by is skill cloud creator which takes a set of skills at differenet skill levels and converts them into a skill cloud. Will people like to put their skills as skill clouds?

Leave a comment

Filed under Web 2.0

Tag Cloud Algorithm/Logic/Formula

I wanted to implement a very efficient tag cloud generator. Initially I thought it’s a simple task, but realized making it efficient is a bit challenging. I came up with a bunch of ideas on how to do that and then searched on the web to find if there are any articles related to it. I noticed that most of them talk about how to divide the data into buckets, using some sort of a formula including logarithms etc. There are bits and pieces of code here and there, but somehow nothing excited me. So, let me put together some of my thoughts on this.

A tag cloud requires a tag and a number associated with that tag. That number is usually a metric. What’s so special about a tag cloud? Typically information in business applications is presented as a table which can then be sorted. So, at any time, user can sort by the name of the entity in the report or by the metric of that entity. For example, by customer name or the dollar amount spent by the customer. However, what a tag cloud offers is the ability to get the ordering of both the entity and the metric in a single visual representation. This is done by laying out the data in the order of the entity but changing the size/color intensity of that entity based on the metric value. As a result, while the user can scan top to bottom (and left to right) for alphabetical ordering of the entities, user can also scan for the font-size/color intensity at the same time. So, an extra sort is avoided to gather the ordering for each. Ofcourse, for precise details, one has to sort either for the entity or the metric explicitly.

Now, the next question is, how to vary this size/intensity metric? Is some linear interpolation sufficient enough? Does it have to be logarithmic? This to a large extent depends on the data distribution. If the difference between the highest value and the least value of the metric is so large (o(10^n)), then logirthmic interpolation may help. However, sometimes it may not be worth showing every entity in the tag cloud. Just the top N entities are good enough. If we go with the top N approach, then max and the min of the top N entities may not be that wide spread and in this case a linear interpolation should suffice.

One reason I would caution against using a logarithmic interpolation is that it’s expensive to compute and if you are doing it real-time and with huge volume, then that’s going to be CPU intensive. So, try using the topN and linear interpolation.

Next, in the linear interpolation, how do we set the min and max boundaries for the font size/color intensity? I notice that for example, is ranging it’s font sizes between 80% and 280%. So, the lowest tag in the cloud would get a font size of 80% and the highest tag 280%. I have decided to go with the following formula


This nicely gives a font size from 75% to 300% as the metric changes from a potential 0 to maxm. Check Tag Cloud Generator for this formula in action.

Ok, if we go with this topN approach, then the next question is how do we get this top N? For this, one has to invariably write a SQL statement. Something like

“select entity,metric from fact order by metric desc” which gives all the entities.

One can refine this to restrict only to the topN by doing the following

“select entity,metric from fact ordre by metric desc limit 0,<n>” where you can plugin a particular number suitable for your application.

Now, with the above SQL, we obtained the Top N entities. However, we want them in alphabetical order as that’s how we want to display the cloud. How do we do this? One approach is to fetch them all first and then do a sort in the middle-tier. Depending on the size of the N and the number of middle tiers you have, you have to chose doing this in middle tier vs database. Assuming you have a single middle tier server, then perhaps doing in the database (also a single server) may not be bad. So, the above SQL will refine to

“select * from (select entity,metric from fact order by metric desc limit 0,<n> ) order by entity”

In the above configuration of a single mt and db server, chosing to do this in database gives the advantage of not having to create an array of records in the middle-tier for doing the sort as the sort is done in the database itself (which I am assuming has more optimal sorting strategies). So, one can just loop through the result set and output the entities.

However, there is one small problem with this. By sorting the TopN alphabetically in the database itself, we don’t have the max metric value. If we don’t have the max metric value, how do we then really calculate the size/intensity? So, does it mean I have to get the results set into an array first and then scan through to get the max? Then that defeats the purpose of double sorting in the database as mentioned above.

With Oracle, it’s possible to use Analytical functions and get the max of the entire set as a column in the query. But hae, most guys out there are using MySQL for their web apps. Isn’t it? So, what next?

That’s when I thought of using the javascript to do the fontsize calculation on the client side! Yes, the idea is, loop through the results set and generate the HTML code.
And in due process maintain the max value and output it as a javascript variables that will be used in the client side computation. Now, when the tags are generated as links, make use of the link’s title attribute to capture the metric value. Like the title may read “some description: “.

Now, in the javascript, you can loop through each of the link, compute the font size, and set it for the link. A snippet of that function would look like

function processCloud(id,max) {
var cloud = getElement(id);
if(!cloud) return;
var tags = cloud.getElementsByTagName("a");
for(var i=0;i<tags.length;i++) {
var tag = tags[i];
var title = tag.getAttribute("title");
var f = title.substring(title.indexOf(":")+1);
var fontSize = (150.0*(1.0+(1.5*f-max/2)/max))+"%"; = fontSize;

Here, getElement is a utility function that gets the element from the document based on a given id. So, your tag cloud can be placed in a div element with an id and that’s the id you pass to the processCloud function along with the max value that is computed as part of generating the html.

That’s it. This essentially does the following optimizations

1. Since we first sort by metric and limit only the top N elements, there is no need to bring in all the elements into the middle tier.
2. Since we then sort the data by name, there is no need to create an array in the middle tier and do the sort.
3. Finally, since the fontsize/intensity calculation is pushed to the client side, there is no need to create an array in the middle tier.

That’s all there is. Hope this helps in your application!


Filed under keyword cloud, tag cloud, tags, Tech - Tips, Web 2.0, word cloud