Monthly Archives: December 2007

MySQL: SELECT RANDOM ROW, Very Efficient

When living in a world of chaos, it shouldn’t be a surprise on the requirement to select a random row from a table. Randomness helps to make an otherwise static web page a bit more dynamic. Or it helps to rotate a banner or an ad or a forum post and so on.

I looked at the various methods of selecting random rows from a table and wanted to write down what I have done for the system I am working on.

I have a table where there is an ID field that is auto incremented. In addition, I already have a need to fetch the latest ID. This is mostly the MAX(id). However, I also have a status field that prevents rows with a certain status to not show up.

Here is how I ended up with fetching a random id.

1) First get the max id of the table. Something like

select id,... from tablex where status = 1 order by creation_date desc limit 0,1

Here, I have an index on creation_date. So, the cost of essentially executing the above query is traversing down a b-tree index from the maximum value side and resolving the rows in the table to filter by status and get the first row. Assuming most of the recent rows have a status of 1, the number of rows resolved using an index should be 1 or just a few.

2) Then generate a random value with the above fetched id as the max. Keep doing this till the value is != 0. This is because, the ID starts from 1 in my case. Also, it’s possible to have other variations such as the ID being more than x% of the max ID. This typically helps in fetching a more recent random row if that’s what is desired.

3) Now, just do
select id,* from tablex where status = 1 and ID = ? and bind the ID with the random ID generated in the previous step. There are two reasons why this
query may not result in a row. 1) the status of this row is not appropriate 2) the id
perhaps doesn’t exist in the table (may be it’s deleted). Assuming the chances of
these is almost negligible, it’s possible to find a row almost always immediately. The
cost of the above SQL is nothing but fetching a row using a unique index which is
quite efficient. Just to keep it generic, loop through steps 2 and 3, till a row is identified. In large data sets, it’s likely to always find a row eventually, but just in case, have contingency for not finding a row after, say N (perhaps 5) iterations.

That’s pretty much it. So, given that I already have a need to fetch the latest row
in the table, the cost of step 1 is not a factor for my random row selection. This just
left with issuing just another unique index based SQL. For a homepage that has to
display both the latest row(s) and a random row, this technique is quite efficient.

6 Comments

Filed under MySQL, performance tuning

The Prime Ministers: A Mom And Son, A Dad And Daughter

Wikipedia comes handy to do research when major events like yesterday’s (PST time, not WordPress’s GMT) Benazir Bhutto’s assasination happen. With this recent sad event comes a commonality between India and Pakistan.

India’s Prime Minister Indira Gandhi and her son Rajiv Gandhi who was also India’s Prime Minister were both assassinated.

And now,

Pakistan’s Prime Minister Zulfikar Ali Bhutto and his daughter Benazir Bhutto who was also Pakistan’s Prime Minister were executed (through a controversial trial) and assassinated respectively.

Some key facts, thanks to the collaborative, community driven and copyleft license from Wikipedia

Fact The Gandhis (India) The Bhuttos (Pakistan)
Indira Gandhi Rajiv Gandhi Zulfikar Ali Bhutto Benazir Bhutto
Picture
Birth Date Nov 19th, 1917 Aug 20th, 1944 Jan 5th, 1928 Jun 21st, 1953
Died Oct 31, 1984 66yrs May 21st, 1991 46yrs Apr 4th, 1979 51yrs Dec 27th, 2007 54yrs
Cause of Death Assassinated Assassinated Executed Assassinated
Prime Minister# 3rd and 6th 7th 10th ???
PM during 66/01/19-77/03/24
80/01/15-84/10/31
84/10/31-89/12/02 73/08/14-77/07/05 88/12/02-90/08/06
93/10/19-96/11/05
Age at First PM term 48yrs 40yrs 45yrs 35yrs
Higher Education   Imperial College London, part of the University of London, and at Trinity College, University of Cambridge   Lady Margaret Hall, Oxford, Radcliffe College, Harvard University

It took me a while to assemble this information inspite it’s readily available on Wikipedia.
Now if only all the Web 2.0 startups in the people “data mining” area such as Spock.com could do this automatically? Or may be the Freebase.com from Metaweb.com, the guys that are trying to create a semantic web.

Leave a comment

Filed under Benazir Bhutto, Indira Gandhi, Prime Minister, Rajiv Gandhi, Zulfikar Ali Bhutto

URL ReWrite & Relative URIs

Have you noticed how the permalink urls of wordpress contain the date as yyyy/mm/dd/post-title? And if you just type https://poeticcode.wordpress.com/ you get all the posts in that year and if you use // then all the posts within that month? This can be done using what’s called URL Rewriting. So, a URL such as

https://poeticcode.wordpress.com/2007/12/ gets rewritten to something like

https://poeticcode.wordpress.com/posts.do?year=2007&month=12 (this is not the exact link that wordpress uses).

This is usually a good approach, especially since it’s considered as a SEO friendly format. However, with this approach, one issue is that the generated HTML assumes the base of the document to be all the way up to /2007/12, when in reality, your base is actually at /. So, this can be easily fixed by using the BASE tag of the HTML.

If you are like me, who maintains a separate test site and a production site, then it gets a bit tricky. This is because of two reasons. One is that the name of the test and production servers is different. Second, the test server may not be setup to be starting from the root directory (/). For the first problem, it may be tempted to write the base tag with href pointing to just ‘/’. However, as per the BASE tag reference at w3.org, “This attribute specifies an absolute URI that acts as the base URI for resolving relative URIs.” That is, the base tag’s href has to be absolute.

So, the following perl code help to fix up this base tag.

$server = $ENV{SERVER_NAME};
$script = $ENV{SCRIPT_NAME};
$script =~ s/[^\/]+$//; # knock off the name and retain the directory
$base = "http://$server$script";

then, you can out put

Here, there is no need to use the $script if the test setup is also at the root directory level. Then, it would just be

$base = “http://$server/”;

I am sure this technique can be extended to take care of the http scheme (http vs https) and the non-default port.

7 Comments

Filed under URL Rewrite

TEXTAREA HTML Editor, Open Source

I needed a textarea html editor so that I could copy and paste some HTML content from one page into the system. After a bit of a research, I first tried out Yahoo! User Interface Library (YUI). But then I had to give up that as it was both slow and also got confused when there are multiple textareas in the same page.

After a bit more research, I came across TinyMCE which is fast, worked with multiple textareas, easy to integrate and best of all, it’s licensed under LGPL.

BTW, I use Firefox and so one of the criteria is to use a solution that works in Firefox. There was another application I tried out before TinyMCE but abandoned it as it works only with IE. Also, my specific application is used only by me (the admin) and so I don’t care for cross-browser compatibility. Just that it works in IE. So, I came across Mozilla’s own rich text editing solution but gave up when I realized in that solution that it works with iframes. My criteria is to be able to easily integrate into existing html form that’s already using the textarea elements.

Leave a comment

Filed under rich text editor, textarea html editor

Where is Google In WordPress’s Referrers Report?

WordPress gives Referrer details so that you know where your visitors are coming from. I have seen answers.com, dogpile.com and a plethora of other websites but I haven’t seen Google, Yahoo!, MSN/Live. I know I do have traffic from these websites. When you get more than 200 hits to a single page on a Christmas day, you know there got to be a few of them coming from these search engines. So, that’s when I looked at the past history and noticed that these top search engines are never shown in referrer links. Ofcourse, WordPress shows the “Search Engine Terms” statistics. But I would like to know how many of those are from Google vs Yahoo! vs MSN/Live vs AOL vs Ask.com.

1 Comment

Filed under Wordpress

Childzilla

Most people know Mozilla and most geeks may know Bugzilla. But what about Childzilla? Yes, the there is a website called Childzilla that tracks the chores and reward points of kids. Is the name of this domain chosen by the owners to indicate that it’s an application appropriate for kids who are well, as scary as the Mozilla logo or it’s a tracking system similar to Bugzilla? Besides the name, I think it’s a cool online system and best of all it’s for free. So, why not all you parents out there give it a try?

Leave a comment

Filed under Childzilla

Shared WebHosting, Database Security

It is important to realize that even a small website, even it is not profiting, should give importance to the security. Many users use the same password or a little variation of it when registering at many websites. So, a compromise to the password at your small website can lead to a bigger problem at a more serious website for your user.

Here are a few tips on how to ensure there is a decent level of security on your site

1. Always ensure to one-way encrypt user passwords. You only compare the encrypted password with the encrypted user-input at the login time to see if they match. With this model, there is no password retrieval mechanism. User has to request for resetting the password and you just send an email with a url containing a large key that’s hard to guess and when the user clicks the url, you can let the user reset the password.

2. A lot of code on the web shows hard-coded password when connecting to the database. For example, in perl it would be something like

$con = DBI->connect(‘dbi:mysql:your-database:localhost’,’your-db-name’,’plain-text-password’);

As there is no connection pooling or an application level connection management with simple cgi-scripts, the tendency is to have the above piece of code in each cgi-script. This essentially means that the database password has been written in plain text in multiple files. One choice is to put the connect statement in a file and include it into all the scripts that require database connection. With this, it’s easy to change the database details as necessary in a central place. In addition, ensuring proper read file permissions to this single file is sufficient to a large extent.

1 Comment

Filed under database security