Page 1 of 1

Use of Roborts.txt in SEO

PostPosted: Wed Aug 22, 2007 6:43 am
by Sathish


This is a useful file that keeps search engines from indexing pages you do not want spidered. Why would you not want a page indexed by a search engine? Perhaps you want to display a page that shows an example of spamming the search engines. This type of page might include an example of repeated keywords, hidden tags with keywords, and other things that could get a page or an entire site banned from a search engine.

The robots.txt file is a good way to prevent this page from getting indexed. However, not every site can use it. The only robots.txt file that the spiders will read is the one at the top html directory of your server. This means you can only use it if you run your own domain. The spiders will look for the file in a location similar to these below:
http://www.pageresource.com/robots.txt
http://www.javascriptcity.com/robots.txt
http://www.mysite.com/robots.txt


Any other location of the robots.txt file will not be read by a search engine spider, so the file locations below will not be worthwhile:
http://www.pageresource.com/html/robots.txt
http://members.someplace.com/you/robots.txt
http://someisp.net/~you/robots.txt


Now, if you have your own domain- you can see where to place the file. So let's take a look at exactly what needs to go into the robots.txt file to make the spider see what you want done.

If you want to allow all the spiders from a certain directory within your site, you would write the following:
User-agent: *
Disallow:



If you want to exclude all the search engine spiders from your entire domain, you would write just the following into the robots.txt file:
User-agent: *
Disallow: /


If you want to exclude all the spiders from a certain directory within your site, you would write the following:
User-agent: *
Disallow: /aboutme/


If you want to do this for multiple directories, you add on more Disallow lines:
User-agent: *
Disallow: /aboutme/
Disallow: /stats/


If you want to exclude certain files, then type in the rest of the path to the files you want to exclude:
User-agent: *
Disallow: /aboutme/album.html
Disallow: /stats/refer.htm


If you are curious, here is what I used to keep an article from getting indexed:
User-agent: *
Disallow: /zine/article002.htm


If you want to keep a specific search engine spider from indexing your site, do this:
User-agent: Robot_Name
Disallow: /


You'll need to know the name of the search engine spider or robot, and place it where Robot_Name is above. You can find these names from the web sites of the various search engines.
So, if you need to exclude something from search engine indexing, this is the most effective tool recognized by the search engines- so use it to keep the spiders out of any part of your web you want them to avoid.

Comments

PostPosted: Wed Aug 22, 2007 7:00 am
by prasanth
Great information Satheesh! Thank you for sharing this with all!

Re: Use of Roborts.txt in SEO

PostPosted: Sun Nov 28, 2010 10:54 pm
by jackson
Google officially released the Robots.txt Specifications last week.. Read More on the details of Robots.txt from .

Re: Use of Roborts.txt in SEO

PostPosted: Wed Jan 12, 2011 11:52 pm
by jithin
The Allow: and Sitemap: directives are very useful and are accepted by many major search engines.

Re: Use of Roborts.txt in SEO

PostPosted: Wed Apr 06, 2011 10:45 pm
by BryanF
By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all.

Re: Use of Roborts.txt in SEO

PostPosted: Tue Jun 26, 2012 11:16 pm
by Steve Smith
The robots.txt file is a form of communication between visiting robots (spiders) that index the content of your web site pages. Every search engine has a spider, Google has one and so do Yahoo!, Msn and Ask. A well written robots.txt file will improve your chances of ranking in the search engines, if it’s written properly.