Search Friendly URLs or No-Question-Marks-Or-Equals URLs

Sun, 30th Dec, '07


Prettifying the address bar is a last stop in the modern day Internet. It is always this way:

Good: http://www.whatasite.com/dude/djjo/music

Bad: http://www.whatasite.com/user.php?cat=dude&profile=djjo&page=music

The name here is search-friendly urls, although how search friendly they have been, nobody knows[SEO people ‘can’ tell], but they certainly are friendly to users, among other things. So if your question is “how to make search friendly urls or urls without question marks and full stops”, here is how I do it.

This article is based on the Apache server platform. If you are running Windows, you might try installing WAMP or XAMPP to get Apache. Once you have this running open up httpd.conf in your pink-n-blue text editor and look for something like this:

#LoadModule rewrite_module modules/mod_rewrite.so

If you can’t see the ‘#’ prefix to the rest of the line, you are lucky, or else remove the ‘#’ from the front[uncommenting the command] and restart Apache.

Once this has been done, to check if you are ready to begin and that mod_rewrite is really active, create a file in your web root directory, lets say test.txt with some text in it. And now over to doing the magic. Locate the .htaccess file in your webroot and again get your pink-n-yeahyeah editor and add the following lines:

RewriteEngine On
RewriteRule ^magic\.txt$ test.txt

Now try to get magic.txt in your browser and you will definitely see some magic txt. What you will see is the contents of test.txt, as if you asked for test.txt, which is what you did. Lets explain the above two lines to ourselves. URL Prettifying needs a little bit of knowledge of Regular Expressions. You can learn about regular expressions here, here & here. The first line sets the RewriteEngine rolling and the second line is a rewrite rule. The first part of the rule:

^magic\.txt$
^ = the beginning
\. = look for the character ‘.’. A plain ‘.’ in a regex means any non-whitespace character.
$ = and this is the end

The second part is the name of the file or the new url which is sent to the server instead of the first part. So in our case, asking for ‘magic.txt’ will get you test.txt. So basically a RewriteRule is of the form:

RewriteRule <what the browser asks> <what you serve>

Please note that, all rules should come after the ‘RewriteEngine On‘ directive, and unless this line is found, not a single rule will be interpreted it. And now for a funnier version, try this rule instead of the earlier one:

RewriteRule ^magic\.txt$ http://www.google.com

So before you get ideas, let me finish what we begun here. Lets say we have a website that generates a URL like this one to go to an album page:

http://www.whatasite.com/album.php?user=djjo&page=7

Now as cumbersome that is to remember, and search engines are known to run for cover on seeing such pages as these dynamic pages are known to be pointing towards itself and doing such round and closed references thus taking an awful lot of time of the search crawler. Bad. We want the crawler to see this, for which we need to make it look like regular html pages with no scary or ‘special’ characters:

http://www.whatasite.com/album/djjo/7

And the rule to do the above is:

RewriteRule ^album/([a-z]+)/([0-9]+)$ album.php?user=$1&page=$2

Just a gentle reminder, that these rules are to be written in a file named .htaccess lying in your web root(typically named ‘www’) directory. Now onto simplifying the above rule:

^album/([a-z]+)/([0-9]+)$

^, from the start match the string ‘album’ followed by the first slash

([a-z]+), after the first slash, look for a group of characters[the characters allowed are from ‘a’ to ‘z’ and ‘+’ sign means ‘one or more’], which we know to be the user name. A group enclosed in brackets is considered as a variable. Since this is the first group[the username], it will be called $1. We will use this in the second part of the rule.

/([0-9]+)$, and after the second slash, look for another group of numerals from 0 to 9, and one number or more. And as this is the second group[the page number], this is $2. The final $ at the end of the expression means we end the first part here, so its like a terminator.

album.php?user=$1&page=$2

We have $1[username] & $2[page number] from the first part of our rule. So we will substitute these values to our URL as they earlier were supplied, which make the second part quite easily look like what we have here.
I guess that pretty much does the job here. And oh yeah, if your user name has numerals in it[alongwith alphabets], just use something like this [a-z0-9]+ instead of [a-z]+. A range of uses of url rewriting have been written about here. Should you have any queries or doubts or clarifications to make, just leave a comment and I will be good enough to notice it.

Advertisements

3 Responses to “Search Friendly URLs or No-Question-Marks-Or-Equals URLs”

  1. yellens Says:

    But does this mean we will have to do a rewrite for every url with obscure parameters?? I mean we are talking about thousands of pages. Just a thought.

  2. krahulg Says:

    not for every url, but for every url type. which makes it like a user page is one type, his blog is another type, and so on … not to write rules for every user. $1, $2, … are variables in the regex, use them as your creativity limits you to.

  3. JayVee Says:

    You have got some very interesting articles here. I’ve been spending some time getting idea’s on this site! Thanks for sharing your tricks.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: