Tuesday, November 15, 2011

How do I receive an email when my sensor stops receiving traffic?

Recently, I logged into Sguil and noticed that a normally busy sensor had no current alerts.  I looked at the full packet capture logs for the sensor and determined that it hadn't received any traffic from the tap in a while.  We resolved the issue with the tap and started seeing traffic again, but I also resolved to create an automated notification for the next time this happens.

Snort is already writing bandwidth statistics to /nsm/sensor_data/$SENSOR/snort.stats and we are going to use OSSEC to monitor the file and send email when the bandwidth drops to 0.  We could possibly write an OSSEC decoder to have it parse snort.stats directly, but let's instead use OSSEC's process monitoring feature so that we can perhaps extend this in the future to use the Linux kernel's built-in packet counters.  For now, we're going to rely on snort.stats.

The first thing we need to do is obtain the full path to the snort.stats file(s) by determining the interfaces that are being monitored by Sguil.  We do this by searching /etc/nsm/sensortab for any lines that are not commented out and piping to awk to print just the first column:
grep -v "^#" /etc/nsm/sensortab |awk '{print $1}'
For each of the sensors in the output of the previous command, we want to look at the most recent bandwidth statistics, so we pipe to a while-loop and use "tail -1" on the respective snort.stats file:
grep -v "^#" /etc/nsm/sensortab |awk '{print $1}' |while read SENSOR; do tail -1 /nsm/sensor_data/$SENSOR/snort.stats; done
snort.stats is a CSV file and we only want the third column of data, so we pipe the previous command to cut and tell it the delimiter is a comma and to output the third field:
grep -v "^#" /etc/nsm/sensortab |awk '{print $1}' |while read SENSOR; do tail -1 /nsm/sensor_data/$SENSOR/snort.stats; done |cut -d\, -f3
Here's some sample output for a sensor with two monitored interfaces:
3.481
0.089
We now have a nice single command that OSSEC can run periodically to retrieve the bandwidth of our monitored interfaces.  We add this as a "command" in /var/ossec/etc/ossec.conf and give it an alias of "bandwidth":
  <localfile>
    <log_format>command</log_format>
    <command>grep -v "^#" /etc/nsm/sensortab |awk '{print $1}' |while read SENSOR; do tail -1 /nsm/sensor_data/$SENSOR/snort.stats; done |cut -d\, -f3</command>
    <alias>bandwidth</alias>
  </localfile>
Upon restart, OSSEC will periodically run the command, but won't do anything with the output until we add a rule to tell it what to do.  We add the following rule to /var/ossec/rules/local_rules.xml to check the output hourly (every 3600 seconds) and see if the bandwidth value has gone down to 0.000:
  <rule id="100001" level="7" ignore="3600">
    <if_sid>530</if_sid>
    <match>ossec: output: 'bandwidth':</match>
    <regex>0.000</regex>
    <description>Bandwidth down to 0.000.  Please check interface, cabling, and tap/span!</description>
  </rule>
If we didn't already have OSSEC configured to send email, we could do so by adding the following to the <global> section of /var/ossec/etc/ossec.conf:
    <email_notification>yes</email_notification>
    <email_to>YOUR.USERNAME@YOUR-DOMAIN.COM</email_to>
    <smtp_server>YOUR-SMTP-RELAY.YOUR-DOMAIN.COM</smtp_server>
    <email_from>OSSEC@YOUR-DOMAIN.COM</email_from>
Next, we restart OSSEC to activate the new configuration:
sudo service ossec restart
Finally, we simulate traffic loss and receive an email like the following:
OSSEC HIDS Notification.
2011 Nov 15 06:47:45
Received From: securityonion->bandwidth
Rule: 100001 fired (level 7) -> "Bandwidth down to 0.000.  Please check interface, cabling, and tap/span!"
Portion of the log(s):
ossec: output: 'bandwidth': 0.000
UpdateA question over on Google+ prompted the following clarification:
Security Onion has Snort's perfmonitor configured for 300-second intervals by default, which means that the value we're inspecting would be the average traffic for 5 minutes. My deployments have enough constant traffic that 0.000 for 5 minutes is a pretty good indicator of failure. YMMV! 

No comments:

Search This Blog

Featured Post

Security Onion 2.4.111 now available!

In October, we released version 2.4.110: https://blog.securityonion.net/2024/10/security-onion-24110-hurricane-helene.html Last week, Surica...

Popular Posts

Blog Archive