2. • Robots.txt is a plain text file that you upload to the root
directory of your site. Once the web spiders (ants, boots,
indexers) that index your webpages reach your site, they first
look at that text file and process it. Put differently, robots.txt
says to the spider which pages to crawl.
3.
4. THE SIMPLESTVERSION OF ROBOTS.TXT FILE IS:
USER-AGENT:*
DISALLOW:
THE FIRST LINE INDICATESTHE USER AGENT- ASTERISK INDICATES
THATTHE FOLLOWING LINES APPLYTO ALL AGENTS. SPACE AFTER
"DISALLOW:" MEANSTHAT NOTHING IS LIMITED. THIS ROBOTS.TXT
FILE DOES NOTHING- IT ALLOWS USER AGENTSTO SEE EVERYTHING
ONTHE SITE.
5. NOW LET'S MAKE IT A LITTLE MORE COMPLICATED-THISTIME
WE DO NOT WANT SPIDERSTO CRAWL IN OUR /FAQ
DIRECTORY:
USER-AGENT:*
DISALLOW: /FAQ/
IT IS RELATIVELY EASY. SLASH INDICATESTHATTHIS IS A
DIRECTORY. IFYOU DON'T PUT A SLASH, NOT ONLY/FAQ
DIRECTORY, BUT ALSO EACH FILE WHICH NAME BEGINS WITH
"FAQ" WILL BE BANNED. ALTERNATIVELY, YOU CAN ADD
MORE DIRECTORIESTOYOUR BANNED LIST.