1. What is Robots.txt file ?
The robots exclusion protocol (REP), or robots.txt is a text file webmasters create
to instruct robots (typically search engine robots) how to crawl and index pages
on their website. In short Web site owners use the /robots.txt file to give
instructions about their site to web robots; this is called The Robots Exclusion
Protocol.
2. How to create robots.txt file for your website?
Step1 : Go to the following website :
http://tools.seobook.com/robots-txt/generator/
You will the following screen :
3. Step 2 : Suppose you don’t want the robots to have access to your about-us page of
your website. Then just select /about-us.html from your website as shown in the
image below :
4. And paste that part in the files or directories tab and click add you your robots.txt will be
ready as shown below :
Copy this code in notepad and save
as robots.txt in your main folder
Very important point
5. Step 3 : Then upload the robots.txt file to your website using filezilla ftp client :
6. Step 4 : Check whether the file is uploaded to your website by typing robots.txt infront of
your website’s url :
Your robots.txt file is uploaded
successfully to your website
7. You can directly write the code in notepad and save the file as robots.txt too if you don’t
want to use the online tool.
Important Note :
The "/robots.txt" file is a text file, with one or more records. Usually contains a single
record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
In this example, three directories are excluded.
Note that you need a separate "Disallow" line for every URL prefix you want to exclude
you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank
lines in a record, as they are used to delimit multiple records.
Note also that globbing and regular expression are not supported in either the User-agent
or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot".
Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or
"Disallow: *.gif".
What you want to exclude depends on your server. Everything not explicitly disallowed is
considered fair game to retrieve. Here follow some examples:
8. Some examples :
1) To exclude all robots from the entire server
2) To allow all robots complete access
(or just create an empty "/robots.txt" file, or don't use one at all)
3) To exclude all robots from part of the server
User-agent: *
Disallow: /
User-agent: *
Disallow:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
9. 4) To exclude a single robot
5) To allow a single robot
6) To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed
into a separate directory, say "stuff", and leave the one file in the level above this directory:
7) Alternatively you can explicitly disallow all disallowed pages :
User-agent: BadBot
Disallow: /
User-agent: Google
Disallow: User-agent: *
Disallow: /
User-agent: *
Disallow: /~joe/stuff/
User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html