TRIUMF Grid System Management Documentation

TRIUMF Grid Routine System Maintenance

Besides patch management, other areas of routine maintenance include monitoring and log file management.

Monitoring

Some montoring pages are available from the web.  The ganglia setup on lcfg tracks the cluster performance.  The Grid functional test site monitors world-wide LCG sites.  Well behaved sites show up green .

Log File Management

Log files are normally rotated by the logrotate rpm via cron early Sunday morning (4h00) according to a default schedule configured in /etc/logrotate.conf

Currently logs rotated by logwatch are kept for 12 weeks.  Log files are permanently archived - see the section on Back-ups.

Logwatch

The logwatch utility runs via cron daily.  It tries to pick out the important events from logs and mails them to root.  User 'trteam@lcfg' also receives a copy of all of root's mail from each machine.  I have been monitoring the logwatch reports for security concerns and system problems from a locally-configured pine setup on lcfg.  This is not a good long-term solution.