Friday

Use and Advantages of Robots.txt With Example

"robots.txt" file can protect private content from appearing online, save bandwidth, and lower load on your server. A missing "robots.txt" file also generates additional errors in your apache log whenever robots request one.
In order to pass this test you must create and proper install a robots.txt file.
For this, you can use any program that produces a text file or you can use an
online tool (Google Webmaster Tools has this feature).
Remember to use all lower case for the filename: robots.txt, not ROBOTS.TXT.
A simple robots.txt file looks like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /pages/thankyou.html
This would block all search engine robots from visiting "cgi-bin" and "images" directories and the page "http://www.yoursite.com/pages/thankyou.html"

TIPS:
  • You need a separate Disallow line for every URL prefix you want to exclude
  • You may not have blank lines in a record because they are used to delimit
    multiple records
  • Notice that before the Disallow command, you have the command: User- agent: *.
    The User-agent: part specifies which robot you want to block. Major known crawlers are: Googlebot (Google), Googlebot-Image (Google Image Search), Baiduspider (Baidu), Bingbot (Bing)
  • One important thing to know if you are creating your own robots.txt file is that although the wildcard (*) is used in the User-agent line (meaning "any robot"), it is not allowed in the Disallow line.
  • Regular expression are not supported in either the User-agent or Disallow lines
    Once you have your robots.txt file, you can upload it in the top-level directory of your web server. After that, make sure you set the permissions on the file so that visitors (like search engines) can read it.

No comments:

Post a Comment