RRDTool

The intention is to monitor our postfix installation, which also uses amavisd-new, spamassain and clamav, and then to display that information with graphs on a webpage. We ended up using a combination of rrdtool, mailgraph and cacti. Most of the testing that resulted in this page was performed on ubuntu and then ported to gentoo.


RRDTool Usage


For this particular project, knowing how to use rrdtool is not critical. Mailgraph and cacti will handle that for you. However, a brief overview is certainly useful.

Create an rrd

1. $ rrdtool create datafile.rrd \ 2. DS:packets:ABSOLUTE:900:0:10000000 \ 3. RRA:AVERAGE:0.5:1:9600 \ 4. RRA:AVERAGE:0.5:4:9600 \ 5. RRA:AVERAGE:0.5:24:6000 Where:
  1. Create an rrd called, "datafile.rrd"
  2. This rrd data has one Data Source(DS), called "packets", but additional data sources could be specified.
    "ABSOLUTE" refers to the data. Can also use options such as COMPUTE, COUNTER, DERIVE or GAUGE
    "900" sets the update period or heartbeat to 900 seconds (15 minutes)
    "0:10000000" defines the minimum and maximum valid value that can be entered.
  3. This is the first of three Round Robin Archives(RRA) inside the rdd.
    "AVERAGE" stores the average of the data points stored. MIN, MAX and LAST are also options.
    "0.5" defines the xfiles factor(xff) and can be set from 0 to 1.
    "1" sets the steps to 1 or the actual data point every 15 minutes.
    "9600" sets the number of values or rows to store. 9600 rows at 15 minute intervals is 100 days of data.
  4. The second RRA in our RRD. Only it stores the average of 4 reporting periods (1 hour) for 400 days at 1-hour accuracy.
  5. The third and final RRA in our RRD. It stores the average of 24 reporting periods (6 hours) for 6000 periods for 1500 days of accuracy at 6-hour intervals.
For more information see 'man rrdcreate'.

Update your rrd

$ rrdtool update datafile.rrd N:123456

Values must always be entered in chronoligical order. The letter, "N" represents Now to timestamp the entry but, the unix time (seconds since January 1970 UTC) could also be used. Following the timestamp is the data, "123456" which is entered into the first variable in the rrd file (packets). If we had created additional variables in our rrd then they would follow with colons (:) for delimiters. For more information see 'man rrdupdate'.

Create an graph from your rrd

1. $ rrdtool graph graph.png \ 2. DEF:pkt=datafile.rrd:packets:AVERAGE \ 3. LINE1:pkt#ff0000:Packets Where:
  1. Create a graph file named "graph.png"
  2. "DEF" defines a variable to be plotted named, "pkt", on the graph. "datafile.rrd:packets" defines the source rrd file and the data in the rrd file. "AVERAGE" is the consolidation method - MIN, MAX and LAST are also options. If using multiple rrd files as sources simply include more DEF lines.
  3. "LINE1" is the type of graph to be drawn. Other options are LINE2, LINE3, AREA, STACK, etcetera.
    "pkt" is the variable in the graph to be plotted as defined in the DEF statement.
    "#ff0000" is the color to use (red).
    "Packets" is the name to use in hte graph's legend.
The time range would also be specified with the "--start" option ("--start -1w" gives a graph of the last week for example) but, rrdtool produces a 1-day history by default ("1d"). For more information see 'man rrdgraph'.

View rrd data/info

For more infromation see 'man rrdfetch'. $ rrdtool fetch /var/lib/mailgraph/mailgraph.rrd -s now-1year -e now AVERAGE The rrdtool info command will print the header information of an rrd: $ rrdtool info datafile.rrd The first command will dump the contents of the rrd in XML format. The next line restores the rrd from the xml file created on line one: $ rrdtool dump /var/lib/mailgraph/mailgraph.rrd > mailgraph.xml
$ rrdtool restore mailgraph.xml /var/lib/mailgraph/mailgraph.rrd -r
dsreport is a perl script that will print DS values from an RRDTool databse in tabular format instead of XML above. Also see noc.hep.wisc.edu.

Links


MailGraph


Mailgraph is a simple mail statistics RRDtool frontend in perl for Postfix and Sendmail that produces daily, weekly, monthly and yearly graphs of received/sent and bounced/rejected mail.

Install mailgraph with the first (1) command below on Ubuntu (debian, etc.) and the second (2) on Gentoo: (1) agt-get install mailgraph
(2) emerge -av mailgraph

If you have integrated a content filter like amavisd (for spam and virus scanning) into Postfix like we do, then answer "No" to the "Count incoming mail as outgoing mail?" question during installation. This will prevent Mailgraph from counting email twice (because Postfix delivers emails to amavisd which then - after successful scanning - delivers the mails back to Postfix). If you don't use a content filter, then answer "Yes".

To reconfigure this, run: # dpkg-reconfigure mailgraph Also see the process_line function in /usr/sbin/mailgraph for mail log line filtering.

We are using a central loghost for all servers the path to the log files had to be updated in /etc/default/mailgraph. /usr/lib/cgi-bin/mailgraph.cgi needed to be updated with the correct hostname.

Gentoo uses different paths which are described in the table below.
  Ubuntu Gentoo
conf /etc/default/mailgraph /etc/conf.d/mailgraph
cgi /usr/lib/cgi-bin/mailgraph.cgi /usr/share/webapps/mailgraph/1.14/hostroot/cgi-bin/mailgraph.cgi
exe /usr/sbin/mailgraph /usr/bin/mailgraph
init /etc/init.d/mailgraph
rrd /var/lib/mailgraph/mailgraph.rrd
/var/lib/mailgraph/mailgraph_virus.rrd

As I was testing on a VM instead of the log server and initially rsynced (1) over the logs periodically. Later I mounted it with sshfs (2) which provided much better results in cacti: (1) rsync -avz -e ssh user@server:/var/log/mail.log /var/log/server/ (2) sshfs server:/var/log /var/log/server (unmount with: fusermount -u /var/log/server)

Mailgraph was almost perfect and the webpage showing the mail graphs, /usr/lib/cgi-bin/mailgraph.cgi, could have been very easily customized to blend in with our existing managment pages but, for versatility, we decided to show this data in cacti.

Mailgraph with Multiple Log File Support

Based in large on Jacob Emcken's modifications to support multiple log files in mailgraph

  1. Add a VIRUS_LOG definition to /etc/default/mailgraph:VIRUS_LOG=/var/logs/saffron/mail.log Note:The variable definitions are all in /etc/conf.d/mailgraph on gentoo as opposed to being split between the conf file and the init script on ubuntu

  2. Update the mailgraph init script at /etc/init.d/mailgraph (changes are in red): #!/bin/sh MAILGRAPH_CONFIG="/etc/default/mailgraph" NAME="mailgraph" DAEMON="/usr/sbin/mailgraph" PID_FILE="/var/run/mailgraph.pid" PID_VIRUS_FILE="/var/run/mailgraph_virus.pid" RRD_DIR="/var/lib/mailgraph" IGNORE_OPTION="" [...] case "$1" in start) echo -n "Starting Postfix Mail Statistics: $NAME" if [ -f $VIRUS_LOG ]; then start-stop-daemon -S -q -b -p $PID_FILE -x $DAEMON -- --daemon-pid=$PID_FILE \ --only-mail-rrd -l $MAIL_LOG -d --daemon_rrd=$RRD_DIR $IGNORE_OPTION start-stop-daemon -S -q -b -p $PID_VIRUS_FILE -x $DAEMON -- --daemon-pid=$PID_VIRUS_FILE \ --only-virus-rrd -l $VIRUS_LOG -d --daemon_rrd=$RRD_DIR $IGNORE_OPTION else start-stop-daemon -S -q -b -p $PID_FILE -x $DAEMON -- -l $MAIL_LOG -d \ --daemon_rrd=$RRD_DIR $IGNORE_OPTION fi echo "." ;; stop) echo -n "Stopping Postfix Mail Statistics: $NAME" if [ -f $PID_FILE ]; then kill `cat $PID_FILE` rm $PID_FILE fi if [ -f $PID_VIRUS_FILE ]; then kill `cat $PID_VIRUS_FILE` rm $PID_VIRUS_FILE fi echo "." ;; [...]
  3. Edit the process_line function in /usr/sbin/mailgraph to support clamav lines: elsif($prog eq 'clamd') { if($text =~ /.* FOUND$/) {
    event($time, 'virus');
    }
    }
  4. I also added some tweaks for amavisd-mx, amavisd-milter[-d], sophie, etc.

For details on setting up mailgraph specific to Gentoo click here.

Links


Cacti


Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. We would like the mailgraph data to be available from cacti.

  1. SNMP Method to add Mailgraph data to Cacti
  2. Configure Cacti to read Mailgraph RRDs Directly
  3. Cacti Links

1. Importing Mailgraph data into Cacti via SNMP

Initially, the easiest way to insert Mailgraph data into cacti that I found was with Curu Wong's Cacti graph template for mail server monitoring * via SNMP. Configuration instructions are here but, here's the highlights:

# wget http://www.pineapple.com.hk/blog/curu/wp-content/uploads/2010/12/mailstat_cacti_template_v1.1.zip # unzip mailstat_cacti_template_v1.1.zip # patch -b /usr/sbin/mailgraph mailgraph.patch # mv -i mail_stat.pl /usr/share/cacti/site/scripts/ # /etc/init.d/mailgraph restart After some time, depending on the size of your maillog, /var/tmp/mailstat should be created.

Then add the following two lines to /etc/snmp/snmpd.conf, make sure that it is the first exec rule in the file:

#export mail statistics info from mailgraph exec mailstat /bin/cat /var/tmp/mailstat Uncomment the "com2sec readonly default public" linew2, save and restart snmpd.

Note: You need snmpd and Net-SNMP perl module for this:

# apt-get install snmpd libnet-snmp-perl

Finally, import the cacti_graph_template_postfix_[...].xml templates via Console > Import Templates on the cacti web interfacew3,4.

Adding Multiple Log File Support to SNMP Method

If you enabled multiple log file support in mailgraph (above) then you'll see that your cacti graphs starting to show alot of hard peaks and valleys but, the mailgraph page appears correct. This is because you are running two instances of the mailgraph daemon for viruses and mail data. Curu Wong's modifications re-write /var/tmp/mailstat with only the data from each rrd.

There are a number of approaches I could take to fix this:
  1. Edit /usr/share/cacti/site/scripts/mail_stat.pl to simply read the exisitng mailgraph rrd files and report the correct data. As all this is running on the local machine it would also allow me to bypass SNMP completely.
  2. Edit the update_stat function in /usr/sbin/mailgraph to correctly handle the data (line 910-ish). Here's a quick and dirty example: sub update_stat()
    { my $stat_file = '/var/tmp/mailstat'; open (my $stath, "+<", $stat_file) or die "unable to open $stat_file to write mail statistic $!"; flock($stath, 2); my $line=<$stath>; chomp($line); my %old_stat = (); %old_stat =split /:| /, $line; if ( $opt{'only-virus-rrd'} ) { $sum_stat{sent} = $old_stat{sent}; $sum_stat{received} = $old_stat{received}; $sum_stat{bounced} = $old_stat{bounced}; $sum_stat{rejected} = $old_stat{rejected}; } if ( $opt{'only-mail-rrd'} ) { $sum_stat{virus} = $old_stat{virus}; $sum_stat{spam} = $old_stat{spam}; } seek($stath, 0, SEEK_SET); # Move to beginning of file print $stath "sent:$sum_stat{sent} received:$sum_stat{received} bounced:$sum_stat{bounced} [...]\n"; #truncate($stath, newSizeHere); flock($stath, 8); close($stath); return 1; }

2. Importing Mailgraph RRD Directly into Cacti

The SNMP method described above has some uses, but I would prefer to use the mailgraph rrds directly in Cacti. A method to do so is described in detail here. This method requires much less hacking and is more stable on my system.

3. Cacti Links


Collectd


collectd is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD files.

2. Collectd Links


Woes


Date 4/13/2011
Problem Can not logon to cacti 0.8.7e from web interface on first attempt. Do not know username or password.
Resolution The default administrative logon is "admin" for both username and the password. The first time you login it will ask you to change the password.

To reset the admin account password back to the default of "admin", connect to your Cacti database form the command line and update the user_auth table:

host# mysql -u root -p -D cacti
mysql> update user_auth set password=md5('admin') where username='admin';
 
Date 4/13/2011
Problem Snmpwalk while following the instructions to install Curu Wong's Postfix/Mailgraph modifications for Cacti 0.8.7e returned, ".1.3.6.1.4.1.2021.8 = No more variables left in this MIB View (It is past the end of the MIB tree)"
Resolution Edit /etc/snmp/snmpd.conf to change com2sec from paranoid to readonly (or change group definitions) and restart snmpd: #com2sec paranoid default public
com2sec readonly default public
#com2sec readwrite default private
Running, "snmpwalk -On -v2c -c public localhost ucdavis.extTable" then produced the expected results.
 
Date 4/14/2011
Problem Importing Curu Wong's postfix templates into Cacti 0.8.7e produces a "Error: XML: Hash version does not exist." error.
Resolution Cacti prevents templates that have been exported on new versions back to older versions for compatibility.

The version of the template, as indicated by the "<hash_100021[...]" string where the 1st 2 digits indicate the type of template, the next 4 indicate the cacti version it was created on and the remaining 32 are a random number. The template was created on Cacti 0.8.7g as shown below:

"0.8.7e" => "0019"
"0.8.7f" => "0020"
"0.8.7g" => "0021"
Replacing the "0021" with "0019" in all hash tags allowed the template to be successfully imported.
Links How to determine a Cacti template version
 
Date 4/14/2011
Problem rrd files not being created in /var/lib/cacti/rra/, graphs are empty and /var/log/cacti/cacti.log (or Sytem Utilities > View Cacti Log File) shows the following log entries (date and time removed) in Cacti 0.8.7e with Curu Wong's modifications (DS[14]): POLLER: Poller[0] WARNING: Poller Output Table not Empty. Issues Found: 1, Data Sources: (DS[14]) CMDPHP: Poller[0] Host[1] DS[14] WARNING: Result from CMD not valid. Partial Result: ERROR: Unknown or in Note: Slightly more detail available by executing: php /usr/share/cacti/site/poller.php
Resolution Attempts/Steps Taken:
  1. Executing "perl /usr/share/cacti/site/scripts/mail_stat.pl localhost 161 1 public:username:password:auth_proto:priv_passwd:priv_proto" produced expected results. Added these parameters to Cacti via Data Input Methods > SNMP - Get Mail Statistics and Data Templates > SNMP - Mail Statistics with no improvement.
  2. Manually executed rrdtool command shown in Date Sources > Localhost - SNMP mailstat and empty graph appeared in the Cacti web interface (rrdtool command in Graph Management > Localhost - Mail Statistics - msgs/min produces the .png image). No improvement otherwsie - rrd file is never updated.
  3. Replaced original Input String string modified string below in Data Input Models > SNMP - Get Mail Statistics:
    Original:
    Modified:
    mail_stat.pl script was executed and localhost_received_14.rrd mtime updated and after a few minutes graph started showing the expected data.
Links I just installed Cacti and all of my graphs appear as broken images.
How can you debug a Data Input Method script in Cacti?
 
Date 7/8/2011
Problem Mailgraph reports stop updating. Mailgraph process will not respond to "kill -9". Attempting to remount log point via sshfs returns: fuse: bad mount point `/var/logs/': Transport endpoint is not connected
Resolution Normally, when the mail graphs in cacti stop updating a simple mailgraph restart is all that is required. This situation was slightly more complicated. The following is not pretty but, resolved the issue: # kill <sshfs_pid>
# umount -l /var/logs
umount2: Device or resource busy [...] # lsof | grep '/var/logs' # kill <pid_from_lsof> # ps aux | grep mailgraph # kill <pid_from_ps> # sshfs root@kaylee:/var/logs/ /var/logs/ # /etc/init.d/mailgraph start
 
Date 10/21/2011
Problem Mailgraph is running on server_1, after mounting the mailgraph rrd's onto server_2 via sshfs, mailgraph.cgi on server 1 could no longer access the rrds. Here are some sample lines from /var/log/apache2/error_log on server_1: [date] [error] [client ip_address] Premature end of script headers: mailgraph.cgi, referer: http://server_1/cgi-bin/mailgraph.cgi [date] [error] [client ip_address] ERROR: opening '/var/lib/mailgraph/mailgraph.rrd': Permission denied, referer: http://server_1/cgi-bin/mailgraph.cgi
Resolution It appears that the permissions were changed while mounted to server_2. Fixed on server_1: # chown apache:apache /var/lib/mailgraph