Nested Loops vs Hash Join, The SOA Limitation

These days, SOA is the next big thing. Middle-ware vendors are hyping it up to make their money. Applications vendors are cautiously jumping on to the same to show that they are innovating and adding value to justify newer licenses to existing customers who might otherwise be happy with what they have.

Some strategists and architects dream of applications where the data source can be anywhere and everywhere. By anywhere, I mean, it could be in the same database or a different system exposed as a web service. By everywhere I mean, the same application will support data from the database, from internal system A, customer B or vendor C.

While some of the intentions to service enable (as a producer or consumer) are good intentions, there are some fundamental limitations. Before I talk about that, let’s get to some database concepts.

There are 3 major types of table joins. The nested loops, hash joins and merge joins. Below is a brief description of each of them and when they are suitable

Nested Loops This is like a for loop within another for loop. When the data fetched from both the tables, after applying the filters is small, then the nested loops is a good option. Other times when nested loops are used, even when the other types of joins are better, is when one is interested in returning the results as soon as possible rather than as efficiently as possible.

Hash Join This is useful when one table returns relatively small results while the other returns a lot. In this case, the smaller data from one of the tables is organized as a hashmap and the larger data set from the second table is fetched and probed against the hashmap.

Merge Join This is useful when both tables return large sets of data. In this case, the data from each table is sorted and then merged.

As you can see, depending on the characteristics of the data returned in a query, the database can choose different algorithms to execute in the most optimal manner.

Pulling data from multiple databases and putting together a report is obviously more difficult than from a single table. Some databases like Oracle provide the concept of a database link, that is, accessing data in one database from another. The advantage of this is, it is possible to use the all familiar SQL statements to fetch data from tables across the databases as if all the tables are in the same database. Underneath, the database takes care of pulling the data from each of the databases and apply the same type of join algorithms. However, note that there are a few challenges with this approach since the execution plan optimizer of the database where the query is executed will not have a comprehensive information. So, the distributed database queries are likely to have some sub-optimal execution plans compared to a single database queries.

With SOA, data access is primarily through web services. Web services typically use XML encoding and tend to be verbose. While using the service oriented architecture provides the flexibility of having data in different systems, it limits the end user experience. This is because, say you have two entities A and B which are related and entity A in system 1 and entity B in system 2 and entity B is accessed as a web service. Now, if there is a report that displays a list of results from entity A table and you also want to show the related data from entity B. How would it be possible? Would you issue a web service request and fetch all the data from entity B table? That’s simply not possible. As a result, the user interfaces typically end up being a report with just the entity A attributes and a link to entity B data. And the user is expected to keep clicking the details link to see the attributes from entity B one at a time. The issues with this approach are

1) Not possible to search for entity A based on attributes of B
2) Not possible to sort the report of entity A based on attributes of B
3) Not possible to compare the data side-by-side.

So, while using SOA is not a bad thing, both technical and non-technical stakeholders should understand the limitations of the extra flexibility.

Advertisements

Leave a comment

Filed under SOA

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s