Category Archives: SOA

Nested Loops vs Hash Join, The SOA Limitation

These days, SOA is the next big thing. Middle-ware vendors are hyping it up to make their money. Applications vendors are cautiously jumping on to the same to show that they are innovating and adding value to justify newer licenses to existing customers who might otherwise be happy with what they have.

Some strategists and architects dream of applications where the data source can be anywhere and everywhere. By anywhere, I mean, it could be in the same database or a different system exposed as a web service. By everywhere I mean, the same application will support data from the database, from internal system A, customer B or vendor C.

While some of the intentions to service enable (as a producer or consumer) are good intentions, there are some fundamental limitations. Before I talk about that, let’s get to some database concepts.

There are 3 major types of table joins. The nested loops, hash joins and merge joins. Below is a brief description of each of them and when they are suitable

Nested Loops This is like a for loop within another for loop. When the data fetched from both the tables, after applying the filters is small, then the nested loops is a good option. Other times when nested loops are used, even when the other types of joins are better, is when one is interested in returning the results as soon as possible rather than as efficiently as possible.

Hash Join This is useful when one table returns relatively small results while the other returns a lot. In this case, the smaller data from one of the tables is organized as a hashmap and the larger data set from the second table is fetched and probed against the hashmap.

Merge Join This is useful when both tables return large sets of data. In this case, the data from each table is sorted and then merged.

As you can see, depending on the characteristics of the data returned in a query, the database can choose different algorithms to execute in the most optimal manner.

Pulling data from multiple databases and putting together a report is obviously more difficult than from a single table. Some databases like Oracle provide the concept of a database link, that is, accessing data in one database from another. The advantage of this is, it is possible to use the all familiar SQL statements to fetch data from tables across the databases as if all the tables are in the same database. Underneath, the database takes care of pulling the data from each of the databases and apply the same type of join algorithms. However, note that there are a few challenges with this approach since the execution plan optimizer of the database where the query is executed will not have a comprehensive information. So, the distributed database queries are likely to have some sub-optimal execution plans compared to a single database queries.

With SOA, data access is primarily through web services. Web services typically use XML encoding and tend to be verbose. While using the service oriented architecture provides the flexibility of having data in different systems, it limits the end user experience. This is because, say you have two entities A and B which are related and entity A in system 1 and entity B in system 2 and entity B is accessed as a web service. Now, if there is a report that displays a list of results from entity A table and you also want to show the related data from entity B. How would it be possible? Would you issue a web service request and fetch all the data from entity B table? That’s simply not possible. As a result, the user interfaces typically end up being a report with just the entity A attributes and a link to entity B data. And the user is expected to keep clicking the details link to see the attributes from entity B one at a time. The issues with this approach are

1) Not possible to search for entity A based on attributes of B
2) Not possible to sort the report of entity A based on attributes of B
3) Not possible to compare the data side-by-side.

So, while using SOA is not a bad thing, both technical and non-technical stakeholders should understand the limitations of the extra flexibility.

Leave a comment

Filed under SOA

SOA Doesn’t Change The Data Quality Equation

On 9/30, in the last 3 minutes of the market close, for some mysterious reasons Google stock price tanked while the rest of the market was going up especially after the 9%+ slide of NASDAQ yesterday, thanks to the bailout fallout. Being busy at work (no seriously), I missed out the opportunity to grab the stock at those lower prices. Ofcourse, it turned out that this is a big mistake and so NASDAQ is going to cancel many of those orders, so it wouldn’t really matter if I wasn’t busy at work :).

Anyway, the problem I want to mention is, companies like Yahoo! still are showing $322.99, almost 11 hours later inspite NASDAQ’s correction. So, I used one of the stock quotes web services that I am aware of and tried out online the stock quote for Google (thanks to Visual Web Service which provides online forms interface to submit web services with http binding). Here is what I got as the response.

Stock Quote Web Service

Stock Quote Web Service

So, essentially, if one popular web service goes wrong, all the business processes that make use of the web service would go wrong. The point is not to advocate that SOA is not good or anything, but just to point out that changing to SOA from traditional architectures will not solve certain types of problems.

Now, it beats me how NASDAQ’s system would let it go through even if the error originated at a different market center.

Leave a comment

Filed under SOA, Web Services