Matthew East wrote:
> Hi,
>
> On Fri, Feb 29, 2008 at 10:15 PM, Jim Campbell <email address hidden> wrote:
>> You can do this through a robots.txt file, through the meta tags on your
>> site... I think you can even do it through modifications to htaccess.
>>
>> ubuntu.com already has a robots.txt file in place, but I'm not sure how
>> robots.txt files applie to subdomains. I also do not know what kind of
>> control we have over the meta tags in the draft documentation. Are the meta
>> tags auto-generated as part of the page creation process?
>
> Yes, although no doubt it is possible to customise them if necessary.
> http://www.sagehill.net/docbookxsl/HtmlHead.html looks like it has the
> relevant instructions and I could take care of that aspect of it. But
> I'm not familiar with robots.txt files.
Either:
doc.ubuntu.com should have a file called 'robots.txt' in the site root
containing the following two lines:
User-agent: *
Disallow: /
(disallow all bots access to all pages)
Or:
The HTML head tag needs to contain a meta tag like so:
<meta name="robots" content="noindex, nofollow">
(noindex means don't index this page, and nofollow means don't crawl any
links on this page)
Matthew East wrote: www.sagehill. net/docbookxsl/ HtmlHead. html looks like it has the
> Hi,
>
> On Fri, Feb 29, 2008 at 10:15 PM, Jim Campbell <email address hidden> wrote:
>> You can do this through a robots.txt file, through the meta tags on your
>> site... I think you can even do it through modifications to htaccess.
>>
>> ubuntu.com already has a robots.txt file in place, but I'm not sure how
>> robots.txt files applie to subdomains. I also do not know what kind of
>> control we have over the meta tags in the draft documentation. Are the meta
>> tags auto-generated as part of the page creation process?
>
> Yes, although no doubt it is possible to customise them if necessary.
> http://
> relevant instructions and I could take care of that aspect of it. But
> I'm not familiar with robots.txt files.
Either:
doc.ubuntu.com should have a file called 'robots.txt' in the site root
containing the following two lines:
User-agent: *
Disallow: /
(disallow all bots access to all pages)
Or:
The HTML head tag needs to contain a meta tag like so:
<meta name="robots" content="noindex, nofollow">
(noindex means don't index this page, and nofollow means don't crawl any
links on this page)
This should be added to every html page.
http:// www.robotstxt. org is a good resource.