I wanted to have a routine that, given a hostname will return me the “main domain” component of it. That is, let’s say I give it “poeticcode.wordpress.com”, it should return “wordpress.com”. Simple right? Wrong! Checkout how complex this could get if you were to address all possible tlds and sub-tlds.
Now, imagine some project manager factoring time for this routine. Perhaps 10 minutes? But it could turn out to be a day or may be even more if you have to test it out thoroughly.
Recently I asked a Java Web Component Developer certified person on what are the various components of a URL. The answer didn’t go that well, so let me summarize them here.
1) Protocol (http:// , https:// , ftp://)
2) Domain Name / IP Address (poeticcode.wordpress.com)
3) Port Number (80 is default for http, 443 for https, so not needed to specify if the server is configured to the default ports)
4) Resource Path (/2006/08/27/the-components-of-a-url)
All the above makes it https://poeticcode.wordpress.com/2006/08/27/the-components-of-a-url
In addition to the above 4 components, there are 2 more components.
5) The parameters (?abc=xyz&def=ijk). These provide parameters either entered by user or to navigate to another page with additional context
6) The anchor (#link2). This is the one that allows positioning the page in the browser to a particular section of the page. That is, using the html A element and the name and href attributes to create an anchor and reference it.
So, these 6 components together make the complete URL. All put together, there can be a url like
The above url doesn’t really exist, but it kind of explains that this will show all the spam comments received for the blogs in the last 3 months which could be below the non-spam comments. Ofcourse, the excellent service wordpress.com is that we can knock out spam using Akismet! But you get the point.
Books on web technology should cover some of these basic concepts.