PHP and Smart 404's

Programming for Search Engines 101. An area for avid PHP and .NET developers to chat about Programming techniques and how to make better use of search engines.

Moderator: Moderators

PHP and Smart 404's

Postby Mike » Wed Oct 07, 2009 8:37 am

Using custom 404 handlers isn't new. Most decent web development companies will as a matter of course use the layout of a wqebsite design to implement a 404 page that maintains the look and feel.

One trick to take the 404 to the next level is building in some smarts. In a recent exercise, I utilized a very little-known function in PHP to give a 404 some useful functionality.

At the heart of these smarts - the sitemap.xml file and PHP's levenshtein function.

http://www.php.net/levenshtein

The levenshtein function uses a unique algorithm to find the reletive proximity of 2 strings by dtermining how many insert, remove and replace operations would be needed to transform the first string in the arguments to the second.

For example:

If you have the strings contact_us.php and contact-us.php, these strings would have a levenshtein distance of 1. That is, it would take one operation, the replacement of the underscore with the hyphen, to transform the first string into the second.

How is this handy?

Let's say you had a file on your site called contact_us.php and you realized that a hyphen was better. Sure you'd add the 301 in there for a while but eventually you'd have to remove it, otheriwse you'd have to keep every iteration of your website on the server, which would become cluttered pretty quickly. Once the file was removed, someone who had the old file bookmarked would get the 404 page. In the 404 page you have the name of the requested file available to you. Using the sitemap.xml your function can traverse the hierarchy of your entire website and calculate the levenshtein distance between the requested page and each page on your site. In our example above, contact-us.php would be at the top of the heap and worthy of either a "did you mean". If there were no others even close you might even consider using a header to forward the visitor to the new page.

404's don't have to be just pretty. They can and should be functional. Not a lot of work, plenty of wow factor.



Mike
Mike
 

Re: PHP and Smart 404's

Postby latha » Thu Oct 08, 2009 12:01 am

Mike excellent.

We have to use this. This is good stuff.
Can you post a snippet of the sitemap example ?

Need help..
We figure out contact_us.php is the closest match using the levenshtein($sitemap, $requestedURL)
Then what are you doing? Redirecting to the closest match contact_us.php, in this case?

How do you handle a situation of no match? Maybe we can display probable matches for those having a distance more than 1.

Anyways I am totally blown away by this method. When we know what a customer wants or is looking for
and it isn't exactly there on our site, we could suggest similar pages to browse through.

For example if we had an internal search box on our site:
Visitor searched for "Personalized Favors" - not there on site
But we could suggest that they take a look at "Personal Favors".

I wonder if this is used a lot.
latha
 

Re: PHP and Smart 404's

Postby jay » Thu Oct 08, 2009 1:42 am

Indeed a great find Mike, this function is definitely going to help us !

This will help us to list possible alternative pages if page not found error is found or can redirect to the page if the return value of the function levenshtein() is -1 or -2.

Total characters of page names can be counted to compare the return value of the function levenshtein() and can be listed giving priority to more accurate matches.

If the url typed is about-my-dream-car (total chars - 18) and if levenshtein() returns result -10 after checking the sitemap links then it's better not to to suggest such a link in the 404 page but if it's below 8 then probably it can be shown.
Jay M
Write Less, Do More
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Re: PHP and Smart 404's

Postby Mike » Thu Oct 08, 2009 8:36 am

There's a number of different approaches that can be taken once the comparison function is finished doing its levenshtein() thing. I usually run it and just output the results for testing purposes - just to get a sense of what the best approach is.

Sometimes what I'll do is keep track of the best match and the second best match. If the distance between these two is significant then I usually propose the best match to the user in a "Did you mean....". I would follow that up with the next best 5-10 matches. If the site is small I'd display the best match with the "Did you mean..." then just show the rest of the site map.

If a site is really large I would take the top 10 matches and propose those in a "Did you mean...", take the next 10 matches and offer them as a "You can also see...". Depending onhow big it is you could display the rest of the sitemap thereafter.

Another thing to consider is a rudimentary search function for much larger sites. I've built these basically by crawling the sitemap, fopen()'ing all the pages and sucking the strip_tag()'d contents into a fulltext field in a database with the matching URL. The 404 could then also do a fulltext search (with or without boolean mode) on the resulting table and display matches ordered by their relevance, even with a brief synopsis of the content pulled from the description and other metas.

Mike
Mike
 

Re: PHP and Smart 404's

Postby jay » Mon Jun 14, 2010 6:19 am

Tips on creating useful from Google Webmaster Central !
Jay M
Write Less, Do More
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.


Return to Programming

Who is online

Users browsing this forum: No registered users and 5 guests