Monthly Archives: March 2008

aStore and AJAX

I am writing this article based on someone visiting my blog searching “amazon astore in ajax”. For those that are not aware what Amazon aStore is, it is a solution that allows Amazon’s affiliates to quickly build a store front based on their own set of products so that their visitors can benefit from the carefully hand-picked products relevant to the website. Whether this is executed to this spirit or not is a different point :).

The astore that people build resides on Amazon’s own web server. It is not possible to put it on your own website directly. You have to use the frame or iframe ui components to have your astore show up inline. So, this is perhaps what prompted some creative developer to think about “wait a minute, why not I use ajax and display the astore content directly inline without using the frames”? Well, sounds like a good idea. And there may as well be a solution, but first I want to talk about potential issues with this approach.

  1. AJAX driven content is not SEO friendly
  2. Even if the homepage of your astore is fetched using AJAX and displayed inline, what about further navigation within the store? Those URLs directly point to the Amazon astore website. But I think some extra javascript coding can easily make clicking those links go through the same AJAX process and load the content. I just wanted to list this anyway, so that you don’t forget this fact.
  3. Now, the bigger issue is, AJAX does not allow cross domain scripting for security reasons. So, since your website domain and astore’s domain are different, how can AJAX help?

So, I did a bit of research and identified two potential solutions, one of which is not exactly AJAX. The solutions are

  1. AJAX Cross Domain script that allows you to write cross domain AJAX. Don’t get so excited in thinking that some clever hacker found a way to beat the security restriction of AJAX. This script is a magic 🙂 in that, it creates the illusion that you are doing cross domain ajax, but in reality, you are not. The call is routed through your server.
  2. aStore Proxy The second solution is the astore proxy script that does the same thing like the above, but tailored to be for astore and not a generic AJAX cross domain solution. The underlying principle of this is the same, in that the astore content is retrieved on the server side, all the links are rewritten to point back to your own server so that when a user clicks one of the links within the astore, the request goes back to your server and not to amazon’s server.

So, you can user either of the above solutions. With the first one, there would be some additional code involved to make everything work but still has a drawback with respect to aStore SEO since the astore content is not part of your web pages but gets included on the fly using javascript. But with the second script, the search engine crawler would see your astore content within your web page and does not have to understand that additional content can be fetched using javascript. Remember, most (or perhaps all) search engines don’t execute javascript to figure out your final page structure. They just parse the content as is.

Leave a comment

Filed under AJAX, aStore

Software As A Service & Businesss Intelligence (SAAS-BI)

There has been a lot of buzz around SAAS-BI. As a person with a lot of enterprise level BI experience, I want to provide a practical view on SAAS-BI. The key question is whether SAAS-BI makes sense and if so, under what scenarios.

SAAS makes sense to deliver applications over the internet for the SAAS providers and to get rid of upfront license costs and in-house IT staff to run the applications for the customers. With higher competition in the SAAS area, as time goes, the cost of the subscription is going to go down. This invariably forces the SAAS providers to cut costs and that means doing more with less. One easy way to achieve that is to host multiple customers on the same machine and potentially same database and middle-tier using such techniques as virtualization, data striping and VPD.

When people talk about SAAS-BI, there are two scenarios.

  1. Providing SAAS-BI to a in-house hosted application.
  2. Providing SAAS-BI to an existing SAAS application.

Before evaluating the challenges for each of these, let me first talk about a few challenges common to business intelligence applications. While BI is possible directly on the same database as the OLTP database, since SAAS-BI involves using a separate database, which is no different from managing a separate data warehouse, the following assumes using source and a target databases.

  1. BI is fundamentally a very CPU and IO intensive task.
  2. BI solution causes performance on source system. For companies that have 24/7 operations, timing of running the extracts is critical. A cron-job like scheduling may or may not work because, when the source system is not heavily loaded can’t be pre-determined. Several factors like pay-roll crunching, quarter/year end financial results crunching, periodic appraisals and promotions and other routine functionality makes the non-peak load time non-deterministic.
  3. Identifying delta-changes efficiently is a non-trivial and most difficult part of any Business Intelligence solution. Some times, this requires analyzing and understanding the load on the source database and taking the appropriate action. There are two common ways to identify delta changes
    1. Creating an index on the last update date of the table if there is such a column to audit the transactions. For immutable transactions, it is also possible to create an index on the unique id of the transaction, but in most such cases, there is already an index for the unique id.
    2. Creating snapshot log (materialized view log). Oracle also has a concept called Change Data Capture.

    In both the above cases, the related techniques primarily add extra over-head to the source system and exist solely for the purpose of supporting the business intelligence initiative. This is important because, this creates a lack of interest to those who are responsible for providing high-performing OLTP applications that can support thousands of transactions per hour.

  4. Most non-trivial intelligence solutions require joins to dozens of tables. Change detection when there are several joins is typically very inefficient. Inspite of the best efforts, this ends up in several full-table joins even though the final result of the identified change is only a small fraction of the total records. So, the load on the source system is very high in detecting changes for non-trivial metrics.
  5. Further, certain type of transaction changes, while result in a change to a single record on the transaction side, results in changes to hundreds and thousands of changes to the related metrics. A good example of this is, setting the currency conversion rate for a given date results in updating the related financial metrics for every monetary transaction on that date for the entire company. Once again a huge impact on the change detection and extraction side.

This list could be easily extended further with several use cases, but I just want to highlight a few key common issues of business intelligence primarily on the source side, no matter what the software model is. There were times I wondered, after going through several of these complications, if the most easiest and best thing is to just do a disk-level replication and do full-refresh of the fact tables. Such is the pain with incremental maintenance of facts and dimensions, the building blocks of business intelligence.

So, once you understand and acknowledge that the Business Intelligence solution is not just about aggregating and reporting on the data warehouse side, but also constant performance monitoring, tuning and many times fundamentally inventing and adapting techniques and changing architectures to make the extraction process as efficient as possible, then you would understand my current thinking that SAAS-BI doesn’t make sense for some scenarios.

So, when evaluating SAAS-BI, here is what you need to consider

In-house application hosting scenario So, while the SAAS-BI provider pitches you that you don’t have to maintain the extra hard-ware and IT department to deal with aggregating and reporting, who is going to take care of your source database and hardware for their performance? Even if the SAAS-BI provider is ready to send someone to your data center and trouble-shoot it, would they consider the holistic requirement of ensuring that both your OLTP transactions and the BI extractions are optimal or just care about the later?

SAAS applications Why would salesforce or netsuite or any other SAAS provider care about providing the extra indexes or materialized view logs just so that your 3rd party SAAS-BI provider can extract the data easily? That too, when the SAAS application provider is using a multi-tenant model to keep his costs lower and competitive, extracting data for one client would adversely impact the performance for the other. Why would SAAS provider risk this for a 3rd party SAAS-BI provider? That may force you as the customer to go with a single-tenant SAAS solution, just for also getting a SAAS-BI solution.

So, I think, however much the SAAS-BI providers want to convince you that they have patents and IP that involves database kernel level tweaking or hacking to make SAAS-BI possible, first and foremost, it’s a problem that is created to support their business model, not yours! Ofcourse, this is no different from creating a custom inhouse application that is very specific to your needs vs using a off-the-shelf product that is more generic. However, for applications, the performance is not as severe.

SAAS-BI Predictions So, that leaves me with predicting future outcome for SAAS-BI. First, does this mean SAAS-BI doesn’t make sense at all? No. It does make sense in two cases.

  1. SAAS-BI provided by your own SAAS provider. Yes, they already have your data, it makes it so much easier for them to provide you the related intelligence.
  2. SAAS-BI provided by anyone when your applications are hosted in-house. However, I would seriously advice you against this because the ROI calculations may not have factored in the related cost in monitoring and maintaining your source system for performance and scalability. Not to mention, all the extra network bandwidth needed to encrypt and transfer the data from your data center to the SAAS-BI data center. That also means factoring your internet pipes for much more peak-bandwidth else your potential customers visiting your corporate website might have network problems and worse, you might lose sales and loyal customers.

7 Comments

Filed under SAAS, SAAS BI

perl: String hashCode

I had a need to create unique numbers out of strings. So, I explored the option of using hashcodes (which won’t be 100% unique especially if they need to be represented in limited number of bits). I couldn’t find any function suitable within perlfunc. After doing a bit of research on the web and not wanting to spend too much time understanding the theory, I just looked at java.lang.String source code and the hashcode function was like

hash = 0;
for(character in string) {
hash = 31*hash+character;
}

So far so good. When I tried this with perl, I got completely different answer. So, it turns out this is because, by default perl doesn’t use integer arithmetic in calculations. But you can force it to do so with “use integer;” and most importantly, this behavior can be controlled by block. So, here is my final perl String hashCode function that is same as Java’s implementation of a String hashCode.

sub hashCode {
my $hash = 0;
use integer;
foreach(split //,shift) {
$hash = 31*$hash+ord($_);
}
return $hash;
}

That’s it.

3 Comments

Filed under perl

GMail Slow

One of the reasons I switched from Yahoo! and Hotmail to GMail is the fact that it supports https not only at the time of login but also during the rest of the session. One advantage with this is, even if you send personal emails from anywhere outside home, no one would know what you are sending/receiving. Gives that extra level of privacy comfort if not anything else. However, last few weeks I have been observing performance problems with GMail. I have been using https for almost a few yrs now that I didn’t bother to try the http version. Today I tried the http version and to my surprise, it is much faster. Hopefully their https servers will continue to work faster, otherwise my incentive to use GMail would reduce given that now a ways all the other major email providers like Yahoo! and Hotmail (would it become Yahmail in the future? :)) also started giving GBs of storage.

Leave a comment

Filed under Gmail

OutSAASing

SAAS by definition lets a company outsource it’s IT operations and use software as a service. Now, what happens if the SAAS vendor is from one of the countries that specialize in outsourcing? Well, I call it OutSAASing just for the heck of it (I just googled and not a single result for this word, so I can probably say that I am the first to come up with this word, even if it’s stupid :)).

But most importantly, this article on Zoho indicates what it means to have the option of OutSAASing

“Marc Benioff, chief executive of Salesforce.com, has made an offer to buy Zoho for an undisclosed amount. Benioff seems appropriately nervous, since Salesforce.com’s sales and administration costs are high, eating up most of his earnings. Can he afford to compete if Zoho undercuts him at such a dramatic scale?”

A SAAS provider can typically expect to make decent margins by pooling resources and provide service to multiple customers. When I mean pooling resources, for a developer this could be using a single database, using a single middle-tier for multiple customers. For COOs and higher ups, it could also mean using a fewer DBAs, fewer Network admins and fewer people pooled to provide the service.

Recently Oracle announced a Single Tenant OnDemand CRM. For those familiar with web hosting models, this is like having your own server that you can reboot anytime you want and do whatever you want. Essentially it comes with a separate database and midleware for a single customer. Oracle’s move into virtualization in 2006 is perhaps with the intention of being able to provide single tenant model through virtualization.

But anyway, coming back to the OutSAASing, some of the recent developments in India regarding SAAS is what prompted me to write this article in the first place. First of them is NIIT’s announcement of offering ProcureEasy which can be a competitor to other SAAS providers in this space such as Coupa. The next is NIIT and Ramco Systems announcement of partnership to provide OnDemand ERP which is a competitor to NetSuite and may be Compiere when it gets to that stage.

Thing is, while Oracle’s single tenant system makes things “virtual private” infrastructure resources, the OutSAASing companies can simply choose to offer dedicated DBA, dedicated Sys Admin and if required, even a few developers into the mix in their overall offering to a few large customers and still maintain their margins. Now all of a sudden, instead of just “Software As A Service”, it becomes “IT Department As A Service”. Ofcourse, large MNCs are already having a lot of presence in India and China and wherever there is quality labor available cheaply and can follow the suite. But, with manufacturing gone outside the US in the past decades and services replacing it, SAAS being an internet oriented offering and hence can be delivered from anywhere, even SAAS is likely to slip away from the US. Ofcourse, it’s not easier done than said. When companies like Amazon itself have outages to their popular platform web services such as S3, the key for the success of OutSAASing is precise execution like what Japanese did with quality and cars. Providing better SLAs, quick response times and six-sigma precision.

On the other hand, inspite of throwing additional cheaply available personnel, powercuts and even internet outages make it difficult to centralize the infrastructure at a single place. So, the infrastructure should perhaps be distributed closer to the customer countries while the operations can be centralized.

1 Comment

Filed under SAAS

SQLite: Some More SQL Tuning

In SQLite: Join And Group vs Group By And Join, I mentioned about the need to rewrite the SQL to make it perform better. In this post, I am going to discuss a few more that I did.

Note that when doing performance tuning, rewriting of the SQL to make it perform is one of the ways while the other way is to create the appropriate indexes if applicable. Restructuring the query is needed from smaller databases to large and commercial databases. However, the big guys like Oracle are capable of handling a lot more scenarios. Here are two of my cases that I had to rewrite

1. co-related sub-query:

select … from p where … abc in (select xyz from ch where p.l = ch.m);

is changed to

select … from p,ch where … p.l = ch.m and abc = xyz;

2. transitive filter:

select … from p where … abc = (select id from c where n = ?) and xyz in (select lid from ch where id = abc);

Here, the index is on xyz. So, changing it to

select … from p,ch where … abc = lid and id = (select id from c where n = ?) and xyz = lid;

worked because the plan would then use the index on id and index on xyz.

SQL tuning is all about understanding the access paths and helping the databases a bit in case they are not smart enough to figure out things themselves.

Leave a comment

Filed under performance tuning, SQL Tuning, SQLITE