"robots.txt"
file can protect private content from appearing online, save
bandwidth, and lower load on your server. A missing "robots.txt"
file also generates additional errors in your apache log whenever
robots request one.
In
order to pass this test you must create and proper install a
robots.txt file.
For
this, you can use any program that produces a text file or you can
use an
online
tool (Google Webmaster Tools has this feature).
Remember
to use all lower case for the filename: robots.txt, not ROBOTS.TXT.
A
simple robots.txt file looks like this:
User-agent:
*
Disallow:
/cgi-bin/
Disallow:
/images/
Disallow:
/pages/thankyou.html
This
would block all search engine robots from visiting "cgi-bin"
and "images" directories and the page
"http://www.yoursite.com/pages/thankyou.html"
TIPS:
- You need a separate Disallow line for every URL prefix you want to exclude
- You may not have blank lines in a record because they are used to delimitmultiple records
- Notice that before the Disallow command, you have the command: User- agent: *.The User-agent: part specifies which robot you want to block. Major known crawlers are: Googlebot (Google), Googlebot-Image (Google Image Search), Baiduspider (Baidu), Bingbot (Bing)
- One important thing to know if you are creating your own robots.txt file is that although the wildcard (*) is used in the User-agent line (meaning "any robot"), it is not allowed in the Disallow line.
- Regular expression are not supported in either the User-agent or Disallow lines
Once
you have your robots.txt file, you can upload it in the top-level
directory of
your web server. After that, make sure you set the
permissions on the file so that
visitors (like search engines) can
read it.
No comments:
Post a Comment