Know .htaccess better !

Programming for Search Engines 101. An area for avid PHP and .NET developers to chat about Programming techniques and how to make better use of search engines.

Moderator: Moderators

Know .htaccess better !

Postby jay » Wed Jun 18, 2008 10:46 am

An htaccess file is a simple ASCII file, such as you would create through a text editor like NotePad or SimpleText.

.htaccess is the file extension. It is not file.htaccess or somepage.htaccess, it is simply named .htaccess

In order to create the file, open up a text editor and save an empty page as .htaccess (or type in one character, as some editors will not let you save an empty page). Chances are that your editor will append its default file extension to the name (ex: for Notepad it would call the file .htaccess.txt). You need to remove the .txt (or other) file extension in order to get yourself htaccessing.

You may need to CHMOD the htaccess file to 644 or (RW-R--R--). This makes the file usable by the server, but prevents it from being read by a browser, which can seriously compromise your security. (For example, if you have password protected directories, if a browser can read the htaccess file, then they can get the location of the authentication file and then reverse engineer the list to get full access to any portion that you previously had protected. There are different ways to prevent this, one being to place all your authentication files above the root directory so that they are not www accessible, and the other is through an htaccess series of commands that prevents itself from being accessed by a browser, more on that later)

htaccess is an Apache thing. htaccess files affect the directory they are placed in and all sub-directories, that is an htaccess file located in your root directory (yoursite.com) would affect yoursite.com/content, yoursite.com/content/contents, etc. It is important to note that this can be prevented (if, for example, you did not want certain htaccess commands to affect a specific directory) by placing a new htaccess file within the directory you don't want affected with certain changes, and removing the specific command(s) from the new htaccess file that you do not want affecting this directory. In short, the nearest htaccess file to the current directory is treated as the htaccess file. If the nearest htaccess file is your global htaccess located in your root, then it affects every single directory in your entire site.

Handling Error Documents

In order to specify your own Error Documents, you need to be slightly familiar with the server returned error codes.

1. Successful Client Requests

200 OK
201 Created
202 Accepted
203 Non-Authoritative Information
204 No Content
205 Reset Content
206 Partial Content

2. Client Request Redirected

300 Multiple Choices
301 Moved Permanently
302 Moved Temporarily
303 See Other
304 Not Modified
305 Use Proxy

3. Client Request Errors

400 Bad Request
401 Authorization Required
402 Payment Required (not used yet)
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable (encoding)
407 Proxy Authentication Required
408 Request Timed Out
409 Conflicting Request
410 Gone
411 Content Length Required
412 Precondition Failed
413 Request Entity Too Long
414 Request URI Too Long
415 Unsupported Media Type

4. Server Errors

500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported

In order to specify your own customized error documents, you simply need to add the following command, on one line, within your htaccess file like :- ErrorDocument code /directory/filename.ext

Example: ErrorDocument 404 /errors/notfound.html
This would cause any error code resulting in 404 to be forward to yoursite.com/errors/notfound.html

Likewise with:
ErrorDocument 500 /errors/internalerror.html

You can name the pages anything you want and you can place the error pages anywhere you want within your site, so long as they are web-accessible (through a URL). The initial slash in the directory location represents the root directory of your site, that being where your default page for your first-level domain is located. It is preferred to keep them in a separate directory for maintenance purposes and in order to better control spiders indexing them through a ROBOTS.TXT file.

If you were to use an error document handler for each of the error codes I mentioned, the htaccess file would look like the following (note each command is on its own line):

ErrorDocument 400 /errors/badrequest.html
ErrorDocument 401 /errors/authreqd.html
ErrorDocument 403 /errors/forbid.html
ErrorDocument 404 /errors/notfound.html
ErrorDocument 500 /errors/serverr.html

You can specify a full URL rather than a virtual URL in the ErrorDocument string (http://yoursite.com/errors/notfound.html vs. /errors/notfound.html). But this is not the preferred method by the server's happiness standards.

You can also specify HTML, believe it or not!

ErrorDocument 401 "<body bgcolor=#ffffff><h1>You have
to actually <b>BE</b> a <a href="#">member</A> to view
this page!
Jay M
Write Less, Do More
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Enabling SSI Via htaccess

Postby jay » Fri Jun 20, 2008 4:49 am

2. Enabling SSI Via htaccess

Many people want to use SSI, but don't seem to have the ability to do so with their current web host. You can change that with htaccess. A note of caution first...definitely ask permission from your host before you do this, it can be considered 'hacking' or violation of your host's TOS, so be safe rather than sorry:

AddType text/html .shtml
AddHandler server-parsed .shtml
Options Indexes FollowSymLinks Includes

The first line tells the server that pages with a .shtml extension (for Server parsed HTML) are valid. The second line adds a handler, the actual SSI bit, in all files named .shtml. This tells the server that any file named .shtml should be parsed for server side commands. The last line is just techno-junk that you should throw in there.

And that's it, you should have SSI enabled. But wait...don't feel like renaming all of your pages to .shtml in order to take advantage of this neat little toy? Me either! Just add this line to the fragment above, between the first and second lines:

AddHandler server-parsed .html

A note of caution on that one too, however. This will force the server to parse every page named .html for SSI commands, even if they have no SSI commands within them. If you are using SSI sparingly on your site, this is going to give you more server drain than you can justify. SSI does slow down a server because it does extra stuff before serving up a page, although in human terms of speed, it is virtually transparent. Some people also prefer to allow SSI in html pages so as to avoid letting anyone who looks at the page extension to know that they are using SSI in order to prevent the server being compromised through SSI hacks, which is possible. Either way, you now have the knowledge to use it either way.

If, however, you are going to keep SSI pages with the extension of .shtml, and you want to use SSI on your Index pages, you need to add the following line to your htaccess:

DirectoryIndex index.shtml index.html

This allows a page named index.shtml to be your default page, and if that is not found, index.html is loaded.
Jay M
Write Less, Do More
jay
 
Posts: 475
Joined: Wed Nov 22, 2006 12:05 am
Location: Cochin, India.

Re: Know .htaccess better !

Postby DJ » Fri Jun 20, 2008 9:49 am

This is fantastic research on .htaccess files guys! I hope we are taking notes!
The Deej
|
DJ
Site Admin
 
Posts: 1022
Joined: Thu May 04, 2006 4:47 pm

Re: Know .htaccess better !

Postby C-Note » Tue Jun 24, 2008 1:07 am

nice jay!

but how about php includes?
- then we can do some fancy, dancy, crazy php coding stuff :P
C-Note
 


Return to Programming

Who is online

Users browsing this forum: No registered users and 2 guests