Mailing List Archive

Support open source code!


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tlug: Watching servers: information overload



>>>>> "Darren" == Darren Cook <darren@example.com> writes:

    Darren> Various cron jobs (backups, log rotation,etc.) send emails
    Darren> to say they've done. The problem with this is that if you
    Darren> get the email everythings okay, and if you don't there's a
    Darren> problem. But noticing you've not got an email is hard.

    Darren> So I want to set up two machines (A and B) to watch each
    Darren> other. My plan so far is: A runs it's cron job to do the
    Darren> backup. It makes a report file.  B runs a cron job an hour
    Darren> later, to ftp in to A and get that file. If the file is
    Darren> not there, or the date stamp is not today, or the file
    Darren> size is very different from yesterdays file, it sends an
    Darren> email. Otherwise it writes one line in a log file to say
    Darren> everything okay.

    Darren> I want to expand this to checking servers are running
    Darren> every hour, checking disk space is above a certain amount,
    Darren> etc.

    Darren> Does anyone see any flaw in this plan. Do you know of any
    Darren> scripts/programs that can save me some work? Ideally
    Darren> something that will work on both NT and unix.

Darren,

	Sounds very much like one of our products :) In our case we
have a daemon process that runs on each machine, which reads in a
number of rules that specifies what it should to. These rules get data
from various sources:

	matching regex's in tailed log files
	number of some process running, memory in use
	various system stats, IO, swap
	file system info
	....

The daemon takes the latest updates and applies them to rules. If the
rule fires, then some action is taken, such as starting a process,
sending mail, paging someone etc. The main idea is that this daemon
handles all the system admin for the machine.

In addition, you can send out a query to all the machines to say "who
is running process x", or "show me all the file systems that are > 90%
full" ( most of ours! ) 

The rules can also have attached schedules, so they are only in effect
during certain hours.

This is a commercial product, and as such, is not free. I would be
happy if you want to buy it, but the point here was to give you some
more ideas for putting some scripts together. 

For portability, you might want to write your stuff in perl, as it
will run on NT, and should give you some access to NT stats. I'm
rather ignorant about NT, so I can't help you there much.

Hope this helps,

	Andy
---------------------------------------------------------------
Next Nomikai: 20 November, 19:30 Tengu TokyoEkiMae 03-3275-3691
Next Meeting: 12 December, 12:30 Tokyo Station Yaesu central gate
---------------------------------------------------------------
Sponsor: PHT, makers of TurboLinux http://www.pht.co.jp


Home | Main Index | Thread Index

Home Page Mailing List Linux and Japan TLUG Members Links