Recently I have developed a website that had user generated data growing steadily. The database used is sqlite3. Before putting it up on the internet, I made sure to verify that all the pages are performing reasonably well (sub-second response times). Then, as time passed, one of the pages started slowing down. And with each passing day, it was getting bad. Then, after searching the web for some performance tips for SQLite, I noticed that the order of the tables matter! Yeah, coming from a Oracle CBO (Cost Based Optimizer) background, I wasn’t expecting that I had to do that, but apparently yes, the order matters. So, after changing the order, the reports are now back to sub-second response time. What’s better is, given that my reports have a time interval, no matter how much the underlying tables grow, the report is likely to take relatively constant time.
So, make sure that in the from clause, first place the table or sub-query that is very selective and then the rest of the tables. The rest of the tables are typically used to resolve a few foreign key references of the main table and usually not selective. For example, if userid is a foreign key and you need to get the username but the report is not selective by users, then it’s better to place the users table after the transaction table that gets filtered by the time interval or other parameters.
I come from, what some may think as an old school, the development environment where a lot of SQL is hand-coded and well tuned. So, to me using a generic framework that can understand the database schema based on some meta-data and automatically translate a generic syntax into the target database is a bit concerning. I have done performance tuning on Oracle, MySQL, SQLite and Derby and my experience had been that, while abstracting the SQL generation such that the same SQL definition can run on all of these databases is probably not that difficult, for anything that’s more serious, such as a complex reporting SQL, not all databases behave the same way with the same form of SQL adjusting a bit for their syntactic differences. For example, check my articles MySQL COUNT DISTINCT vs DISTINCT and COUNT, SQLite Join and Group By vs Group By and Join and Experience with Derby to see how each of these databases required restructuring the SQL statements fundamentally so different, that a generic ORM (object relational mapping) such as Hibernate will not be sufficient for complex SQLs.
Depending on the application, I would say 60 to 80% of the SQLs could be very simple mostly providing the basic CRUD services and can be done using the generic frameworks. But that remaining 20 to 40% is where it may be desirable to hire database experts to hand-tune SQLs.
In SQLite: Join And Group vs Group By And Join, I mentioned about the need to rewrite the SQL to make it perform better. In this post, I am going to discuss a few more that I did.
Note that when doing performance tuning, rewriting of the SQL to make it perform is one of the ways while the other way is to create the appropriate indexes if applicable. Restructuring the query is needed from smaller databases to large and commercial databases. However, the big guys like Oracle are capable of handling a lot more scenarios. Here are two of my cases that I had to rewrite
1. co-related sub-query:
select … from p where … abc in (select xyz from ch where p.l = ch.m);
is changed to
select … from p,ch where … p.l = ch.m and abc = xyz;
2. transitive filter:
select … from p where … abc = (select id from c where n = ?) and xyz in (select lid from ch where id = abc);
Here, the index is on xyz. So, changing it to
select … from p,ch where … abc = lid and id = (select id from c where n = ?) and xyz = lid;
worked because the plan would then use the index on id and index on xyz.
SQL tuning is all about understanding the access paths and helping the databases a bit in case they are not smart enough to figure out things themselves.
A while back I wrote about MySQL count distinct performance issue. Today I am going to discuss about a performance issue I faced with SQLITE.
First the use case. I am experimenting a bit with online product catalogs. So, I have a products table and a categories table. I wanted to get the list of categories grouped providing the count of the items in each category that matches the search criteria. So, I wrote the query as
from products p, categories c
where p.category1 = c.category_id
group by c.name
order 2 desc, 1 collate nocase;
The idea is to sort them such that categories with higher product count appear first and if there are two categories with the same product count, they are displayed in alphabetical order (case insensitive).
This query took a lot of time. So, I changed it to
from categories c
,(select category1,count(1) from products where ... group by c.name)
where c.category_id = g.category1
order by 2 desc, 1 collate nocase;
This worked pretty fast. BTW, in the first case, the CPU also maxed out to 90% and above. The later case is much better.
BTW, this is using SQLITE3 (3.3.12)