Mailing List Archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlug] Blocking unknown and unclear bots



TLUG,

My sites keep a log of the various bots and user agents that come to my
site.

What I do is I have an .htaccess file that blocks known bad bots, and
then a PHP file that lists known good bots. If the bot is known bad, the
.htaccess blocks it. If the bot is known by me, then nothing happens.

If the bot is not known as good or bad, then it gets logged.

Every now and again, I go through the log file and look up the bots that
went unidentified, and then add them either to the good or bad list.

However, lots of them are not clear.

For example, I just looked up a new user agent called "web-tools". The
problem with this name is that it is obscured by all sorts of
applications relating to web tools, so I'm not sure what it is.

Also, some times I look up a user agent string, and simply don't find
any explanation at all.

I'm wondering what to do with unknowns. Assume they're good and list
them, or assume they're bad and block them?

As an added issue, some of the user agent strings have funky characters,
like exclamation points and apostrophes. I worry that they might cause
trouble in my .htaccess file. In PHP I know what characters I can allow,
but are there any characters that will cause my .htaccess file to be
unhappy?

Any advice would be much appreciated.

-- 
Dave M G


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links