Category Archives: performance tuning

The Cost Of Performance, Or The Lack There Of!

Recently I helped someone to build a dropship like solution. The product data from the provider was accessed using a web service. Initially I was expecting the number of items would be just a few hundred and carefully hand picked. However, he just choose to put in thousands of items. As a result, the cron job setup to pull the item data on a daily basis was running for much longer duration and also consumed higher CPU. As a result, his shared hosing solution couldn’t handle it and the cron job used to get killed due to the quota limitations.

I thought about ways to sync the product data without reaching the resource limits but that meant a lot of redesign and would have costed him quite a bit upfront for the development. So obviously, not designing a program keeping the resources under which it has to work in mind has a cost.

But here is the kicker. He decided to go from shared to dedicated hosting that costed him 20 times more per month than what he was paying! Yeah, that’s 20 times more! He would probably recoup the money he would have to spend to optimize his program for performance within a year if he could go with the shared plan.

When the cost of performance tuning goes up, with hardware becoming cheaper and cheaper, it becomes cost effective to live with sub-optimal programs most of the time. However, in cases like SAAS or hosting solutions where the hardware is loaned, the extra cost would add up over time.

Leave a comment

Filed under performance, performance tuning

SQLite Performance – Table Ordering

Recently I have developed a website that had user generated data growing steadily. The database used is sqlite3. Before putting it up on the internet, I made sure to verify that all the pages are performing reasonably well (sub-second response times). Then, as time passed, one of the pages started slowing down. And with each passing day, it was getting bad. Then, after searching the web for some performance tips for SQLite, I noticed that the order of the tables matter! Yeah, coming from a Oracle CBO (Cost Based Optimizer) background, I wasn’t expecting that I had to do that, but apparently yes, the order matters. So, after changing the order, the reports are now back to sub-second response time. What’s better is, given that my reports have a time interval, no matter how much the underlying tables grow, the report is likely to take relatively constant time.

So, make sure that in the from clause, first place the table or sub-query that is very selective and then the rest of the tables. The rest of the tables are typically used to resolve a few foreign key references of the main table and usually not selective. For example, if userid is a foreign key and you need to get the username but the report is not selective by users, then it’s better to place the users table after the transaction table that gets filtered by the time interval or other parameters.

Leave a comment

Filed under performance tuning, SQL Tuning, SQLITE

Can we hibernate, seriously?

I come from, what some may think as an old school, the development environment where a lot of SQL is hand-coded and well tuned. So, to me using a generic framework that can understand the database schema based on some meta-data and automatically translate a generic syntax into the target database is a bit concerning. I have done performance tuning on Oracle, MySQL, SQLite and Derby and my experience had been that, while abstracting the SQL generation such that the same SQL definition can run on all of these databases is probably not that difficult, for anything that’s more serious, such as a complex reporting SQL, not all databases behave the same way with the same form of SQL adjusting a bit for their syntactic differences. For example, check my articles MySQL COUNT DISTINCT vs DISTINCT and COUNT, SQLite Join and Group By vs Group By and Join and Experience with Derby to see how each of these databases required restructuring the SQL statements fundamentally so different, that a generic ORM (object relational mapping) such as Hibernate will not be sufficient for complex SQLs.

Depending on the application, I would say 60 to 80% of the SQLs could be very simple mostly providing the basic CRUD services and can be done using the generic frameworks. But that remaining 20 to 40% is where it may be desirable to hire database experts to hand-tune SQLs.

Leave a comment

Filed under Hibernate, MySQL, Oracle, ORM, performance tuning, SQL Tuning, SQLITE

SQLite: Some More SQL Tuning

In SQLite: Join And Group vs Group By And Join, I mentioned about the need to rewrite the SQL to make it perform better. In this post, I am going to discuss a few more that I did.

Note that when doing performance tuning, rewriting of the SQL to make it perform is one of the ways while the other way is to create the appropriate indexes if applicable. Restructuring the query is needed from smaller databases to large and commercial databases. However, the big guys like Oracle are capable of handling a lot more scenarios. Here are two of my cases that I had to rewrite

1. co-related sub-query:

select … from p where … abc in (select xyz from ch where p.l = ch.m);

is changed to

select … from p,ch where … p.l = ch.m and abc = xyz;

2. transitive filter:

select … from p where … abc = (select id from c where n = ?) and xyz in (select lid from ch where id = abc);

Here, the index is on xyz. So, changing it to

select … from p,ch where … abc = lid and id = (select id from c where n = ?) and xyz = lid;

worked because the plan would then use the index on id and index on xyz.

SQL tuning is all about understanding the access paths and helping the databases a bit in case they are not smart enough to figure out things themselves.

Leave a comment

Filed under performance tuning, SQL Tuning, SQLITE

SQLITE: Join And Group By vs Group By And Join

A while back I wrote about MySQL count distinct performance issue. Today I am going to discuss about a performance issue I faced with SQLITE.

First the use case. I am experimenting a bit with online product catalogs. So, I have a products table and a categories table. I wanted to get the list of categories grouped providing the count of the items in each category that matches the search criteria. So, I wrote the query as

select c.name,count(1)
from products p, categories c
where p.category1 = c.category_id
and ...
group by c.name
order 2 desc, 1 collate nocase;

The idea is to sort them such that categories with higher product count appear first and if there are two categories with the same product count, they are displayed in alphabetical order (case insensitive).

This query took a lot of time. So, I changed it to

select c.name,g.*
from categories c
,(select category1,count(1) from products where ... group by c.name)
where c.category_id = g.category1
order by 2 desc, 1 collate nocase;

This worked pretty fast. BTW, in the first case, the CPU also maxed out to 90% and above. The later case is much better.

BTW, this is using SQLITE3 (3.3.12)

3 Comments

Filed under performance tuning, SQL Tuning, SQLITE

MySQL: SELECT RANDOM ROW, Very Efficient

When living in a world of chaos, it shouldn’t be a surprise on the requirement to select a random row from a table. Randomness helps to make an otherwise static web page a bit more dynamic. Or it helps to rotate a banner or an ad or a forum post and so on.

I looked at the various methods of selecting random rows from a table and wanted to write down what I have done for the system I am working on.

I have a table where there is an ID field that is auto incremented. In addition, I already have a need to fetch the latest ID. This is mostly the MAX(id). However, I also have a status field that prevents rows with a certain status to not show up.

Here is how I ended up with fetching a random id.

1) First get the max id of the table. Something like

select id,... from tablex where status = 1 order by creation_date desc limit 0,1

Here, I have an index on creation_date. So, the cost of essentially executing the above query is traversing down a b-tree index from the maximum value side and resolving the rows in the table to filter by status and get the first row. Assuming most of the recent rows have a status of 1, the number of rows resolved using an index should be 1 or just a few.

2) Then generate a random value with the above fetched id as the max. Keep doing this till the value is != 0. This is because, the ID starts from 1 in my case. Also, it’s possible to have other variations such as the ID being more than x% of the max ID. This typically helps in fetching a more recent random row if that’s what is desired.

3) Now, just do
select id,* from tablex where status = 1 and ID = ? and bind the ID with the random ID generated in the previous step. There are two reasons why this
query may not result in a row. 1) the status of this row is not appropriate 2) the id
perhaps doesn’t exist in the table (may be it’s deleted). Assuming the chances of
these is almost negligible, it’s possible to find a row almost always immediately. The
cost of the above SQL is nothing but fetching a row using a unique index which is
quite efficient. Just to keep it generic, loop through steps 2 and 3, till a row is identified. In large data sets, it’s likely to always find a row eventually, but just in case, have contingency for not finding a row after, say N (perhaps 5) iterations.

That’s pretty much it. So, given that I already have a need to fetch the latest row
in the table, the cost of step 1 is not a factor for my random row selection. This just
left with issuing just another unique index based SQL. For a homepage that has to
display both the latest row(s) and a random row, this technique is quite efficient.

6 Comments

Filed under MySQL, performance tuning

JavaScript Performance Tuning

Most server-side programming languages have tools to do performance tuning on applications written in them. With the popularity of AJAX based applications and other rich-client HTML applications on the rise, more and more code is being written in JavaScript. So, how the heck do you go about troubleshooting performance issues for these type of client-side heavy applications written in JavaScript?

I faced this issue recently in two different applications where the Javascript library was not performing fast enough causing tardy user experience. I figured out that Firebug is an extension for Firefox that can be used to do the performance tuning. The way this works is,

a) install Firebug (and restart the Firefox browser)
b) enable Firebug if not currently enabled
c) load the page that needs performance tuning
d) open up Firebug and in the console, click on Profile
e) start using the UI controls in the page that have performance problem
f) Click on the profile and see a table of all the function calls ordered by the amount of time spent

This gives you an idea of the functions that are slow. The granularity of this report is only by function (thought it would have been good if there is a way to have line level granularity for some hot-spot functions to pin-point the line number). From this, it’s possible to figure out what’s going on and go about optimizing. In my specific cases, the two things I had to do one in each app are

1) Remove regular expression object creation and comparing from an inner loop
2) Reduce the number of Math.round calls

The performance gain in case of 1) was significantly high.

Leave a comment

Filed under Firebug, javascript, JavaScript Performance, performance tuning