Category Archives: SAAS BI

Revisiting SAAS BI in the context of Oracle’s Exadata Storage Server

A while back I wrote about SAAS BI and why I thought it may not be the best strategy to avail BI as a SAAS. A few commenters didn’t agree with me.

Guess what? Today, at Open World, Larry Ellison, the guy who would have better understanding of where the databases are going within the next decade, more than most of us, has unveiled one of the coolest products of the decade in the BI space. No, it’s not yet another cool reporting tool or a sexy graph renderer or a user friendly pivoting or some meta data layer that allows analysts create ad-hoc queries without learning about outer joins and SQL aggregations. The innovation is at the very core layer, the hardware layer.

When you hear Larry talk about TPC-H, the challenges that large companies are facing today with tera-bytes of data, mind-you, inspite of a big database grid, you wonder how the SAAS BI guys can solve the problem of extracting those tera-bytes of data into their own hosted environments.

Exadata Storage Server is essentially a specialized storage hardware that has database level knowledge (think of it as a hard disk with part of the database logic put together) and so, when a query has to be executed, instead of having to return blocks of data from the storage server, only the results set is returned back. This is a big big big thing. If you have a large table and your query requires a full-table scan on it, in the traditional approach, those thousands of data-blocks have to be returned (even with optimizations such as multi-block fetch for FTS). But in the new approach of specialized storage solution with built-in database logic, it just needs to return the summary data. That’s a lot of bandwidth optimization between the storage grid and the database grid. Apparently, currently the pipes between the Exadata storage server and the database grid have so much high-bandwidth that they are no longer the bottleneck. Seems, now instead the hard-drives are the bottleneck! Well, if only the SSDs (Solid-State Disks) mature and become more price effective, then perhaps that equation would shift. But, atleast for now, Oracle’s Exadata server blows the competition away by a huge margin. Forget about SAAS, we are talking about raw processing power here!

The benchmarks shown in the presentation indicate that this solution performed much much better than the solutions like Teradata and Netezza.

Leave a comment

Filed under Exadata, Oracle, SAAS BI

Software As A Service & Businesss Intelligence (SAAS-BI)

There has been a lot of buzz around SAAS-BI. As a person with a lot of enterprise level BI experience, I want to provide a practical view on SAAS-BI. The key question is whether SAAS-BI makes sense and if so, under what scenarios.

SAAS makes sense to deliver applications over the internet for the SAAS providers and to get rid of upfront license costs and in-house IT staff to run the applications for the customers. With higher competition in the SAAS area, as time goes, the cost of the subscription is going to go down. This invariably forces the SAAS providers to cut costs and that means doing more with less. One easy way to achieve that is to host multiple customers on the same machine and potentially same database and middle-tier using such techniques as virtualization, data striping and VPD.

When people talk about SAAS-BI, there are two scenarios.

  1. Providing SAAS-BI to a in-house hosted application.
  2. Providing SAAS-BI to an existing SAAS application.

Before evaluating the challenges for each of these, let me first talk about a few challenges common to business intelligence applications. While BI is possible directly on the same database as the OLTP database, since SAAS-BI involves using a separate database, which is no different from managing a separate data warehouse, the following assumes using source and a target databases.

  1. BI is fundamentally a very CPU and IO intensive task.
  2. BI solution causes performance on source system. For companies that have 24/7 operations, timing of running the extracts is critical. A cron-job like scheduling may or may not work because, when the source system is not heavily loaded can’t be pre-determined. Several factors like pay-roll crunching, quarter/year end financial results crunching, periodic appraisals and promotions and other routine functionality makes the non-peak load time non-deterministic.
  3. Identifying delta-changes efficiently is a non-trivial and most difficult part of any Business Intelligence solution. Some times, this requires analyzing and understanding the load on the source database and taking the appropriate action. There are two common ways to identify delta changes
    1. Creating an index on the last update date of the table if there is such a column to audit the transactions. For immutable transactions, it is also possible to create an index on the unique id of the transaction, but in most such cases, there is already an index for the unique id.
    2. Creating snapshot log (materialized view log). Oracle also has a concept called Change Data Capture.

    In both the above cases, the related techniques primarily add extra over-head to the source system and exist solely for the purpose of supporting the business intelligence initiative. This is important because, this creates a lack of interest to those who are responsible for providing high-performing OLTP applications that can support thousands of transactions per hour.

  4. Most non-trivial intelligence solutions require joins to dozens of tables. Change detection when there are several joins is typically very inefficient. Inspite of the best efforts, this ends up in several full-table joins even though the final result of the identified change is only a small fraction of the total records. So, the load on the source system is very high in detecting changes for non-trivial metrics.
  5. Further, certain type of transaction changes, while result in a change to a single record on the transaction side, results in changes to hundreds and thousands of changes to the related metrics. A good example of this is, setting the currency conversion rate for a given date results in updating the related financial metrics for every monetary transaction on that date for the entire company. Once again a huge impact on the change detection and extraction side.

This list could be easily extended further with several use cases, but I just want to highlight a few key common issues of business intelligence primarily on the source side, no matter what the software model is. There were times I wondered, after going through several of these complications, if the most easiest and best thing is to just do a disk-level replication and do full-refresh of the fact tables. Such is the pain with incremental maintenance of facts and dimensions, the building blocks of business intelligence.

So, once you understand and acknowledge that the Business Intelligence solution is not just about aggregating and reporting on the data warehouse side, but also constant performance monitoring, tuning and many times fundamentally inventing and adapting techniques and changing architectures to make the extraction process as efficient as possible, then you would understand my current thinking that SAAS-BI doesn’t make sense for some scenarios.

So, when evaluating SAAS-BI, here is what you need to consider

In-house application hosting scenario So, while the SAAS-BI provider pitches you that you don’t have to maintain the extra hard-ware and IT department to deal with aggregating and reporting, who is going to take care of your source database and hardware for their performance? Even if the SAAS-BI provider is ready to send someone to your data center and trouble-shoot it, would they consider the holistic requirement of ensuring that both your OLTP transactions and the BI extractions are optimal or just care about the later?

SAAS applications Why would salesforce or netsuite or any other SAAS provider care about providing the extra indexes or materialized view logs just so that your 3rd party SAAS-BI provider can extract the data easily? That too, when the SAAS application provider is using a multi-tenant model to keep his costs lower and competitive, extracting data for one client would adversely impact the performance for the other. Why would SAAS provider risk this for a 3rd party SAAS-BI provider? That may force you as the customer to go with a single-tenant SAAS solution, just for also getting a SAAS-BI solution.

So, I think, however much the SAAS-BI providers want to convince you that they have patents and IP that involves database kernel level tweaking or hacking to make SAAS-BI possible, first and foremost, it’s a problem that is created to support their business model, not yours! Ofcourse, this is no different from creating a custom inhouse application that is very specific to your needs vs using a off-the-shelf product that is more generic. However, for applications, the performance is not as severe.

SAAS-BI Predictions So, that leaves me with predicting future outcome for SAAS-BI. First, does this mean SAAS-BI doesn’t make sense at all? No. It does make sense in two cases.

  1. SAAS-BI provided by your own SAAS provider. Yes, they already have your data, it makes it so much easier for them to provide you the related intelligence.
  2. SAAS-BI provided by anyone when your applications are hosted in-house. However, I would seriously advice you against this because the ROI calculations may not have factored in the related cost in monitoring and maintaining your source system for performance and scalability. Not to mention, all the extra network bandwidth needed to encrypt and transfer the data from your data center to the SAAS-BI data center. That also means factoring your internet pipes for much more peak-bandwidth else your potential customers visiting your corporate website might have network problems and worse, you might lose sales and loyal customers.


Filed under SAAS, SAAS BI