
What
is a Robots.txt file
If you have pages or directories which you do not want to
be indexed by the search engines you can add this information
to the robots.txt file and place the file on the server. When
the search engine spider visits your site it reads the file
and follows the instructions
The robots.txt file need not exist but if it does it must
be called "robots.txt" and must be written in ascii
It must be in the root directory of the web site as spiders
will not look for it anywhere else
Note:
If you do not have a robots.txt file in the root directory
of your web site you may find a large amount of 404 errors
appear on your web stats. This is because the file was requested
by bots or spiders and was not available.
To create a robots.txt file
Create a text file using a Word Processor or HTML editor
using the required coding as examples below
Save the file as robots.txt
Upload the robots.txt file to the root directory using
your FTP software in ACSII mode
Examples
To exclude all robots from parts of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /misc/sitestats/
Exclude a specific spider from parts of the server
User-agent:slurp.so/
Disallow: /cgi-bin/
Disallow: /secure/
Disallow: /products/
Disallow:/misc/sitestats/
This indicates that nothing is disallowed and the spider can
follow all links
User-agent: *
Disallow:
To allow a single robot complete access and exclude
all others
User-agent: Googlebot/1.0
Disallow:
User-agent: *
Disallow: /
This would prevent your
entire web site from being indexed
User-agent: *
Disallow: /
Spider User-agents
Alta Vista : Scooter
Infoseek : InfoSeek Sidewinder Ultraseek
Mozilla
Lycos : Lycos_Spider_(T-Rex)
Google : Googlebot/1.0
Inktomi : Slurp Slurp.so
more...
The reasons for excluding files from some or all spiders could
be privacy, log files or pages optimised for a particular
search engines which you would not want indexing by other
search engines
You can add the Robots meta tag to the head of your web page
to instruct spiders what to index and what not to
<html>
<head>
<title>What Is A Robots text File</title>
<meta name="description" content="If you
have pages or directories which you do not want to be indexed
by the search engines you can add this information to the
robot txt file and place the file on the server">
<meta name="robots" content="index, follow">
</head>
<body>
The RobotsMeta tag has the following options
Indexes the page and follows links
<meta name="robots" content="index, follow">
Does not index the page, but follows links
<meta name="robots" content="noindex, follow">
Indexes the page, but does not follow links
<meta name="robots" content="index, nofollow">
Neither indexes or follows links
<meta name="robots" content="noindex, nofollow">
You can use one of these tags on specific pages according
to your requirements for each page
See Also
Could I design my own web
site
Can I have
pictures on my web site
Can I sell things
over the web
Can I make
alteration to my web site
How will people find
my web site
What are the
benifits of having a web site
What is the
Title of a Web Page
What is the Description
of a Web Page
What are keywords
What is a Meta Tag
What is Cloaking
What is ALT Text
What is a Favorite Icon
Which is the best "Frames or No Frames"
|
|
|