Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlug] Website Design: Blocking Robots: Use robots.txt



Two recurring themes are: 

1. Use standards. 
2. K.I.S.S.

"Daniel Son[g]" wrote:

> I really don't want robots to go any further than the front page, ...

There's a standard for that. Use robots.txt: 

   http://en.wikipedia.org/wiki/Robots.txt

You seem to not have one yet: 

   [jep@example.com ~]$  lynx -dump http://meta-for.org/robots.txt

                                      Not Found

      The requested URL /robots.txt was not found on this server.
   [jep@example.com ~]$ 

showing yet another use for lynx. I usually use wget for that.

   [jep@example.com ~]$ wget -O - http://meta-for.org/robots.txt
   --13:57:55--  http://meta-for.org/robots.txt
   Resolving meta-for.org... 208.111.34.112
   Connecting to meta-for.org|208.111.34.112|:80... connected.
   HTTP request sent, awaiting response... 404 Not Found
   13:57:55 ERROR 404: Not Found.

   [jep@example.com ~]$ 

To see what a robots.txt file looks like, try the following: 

   wget -O - http://colug.net/robots.txt

> ... blocking them implicitly, by requiring extra data to be passed 
> in the request body. 

Keep it simple. Use the robots.txt standard. 



Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links