Nagios Log File Monitoring: Monitoring log files using Nagios can be just as difficult as it is with any other monitoring application. However, with Nagios, once you have a log monitoring script or tool that can monitor a specific log file the way you want it monitored, Nagios can be relied upon to handle the rest. This type of versatility is what makes Nagios one of the most popular and user friendly monitoring application that there is out there. It can be used to effectively monitor anything. Personally, I love it. It has no equal!
My name is Jacob Bowman and I work as a Nagios Monitoring specialist. I've come to realize, given the number of requests I receive at my job to monitor log files, that log file monitoring is a big deal. IT departments have the ongoing need to monitor their UNIX log files in order to ensure that application or system issues can be caught in time. When issues are known about, unplanned outages can be avoided altogether.
But the common question often asked by many is, what monitoring application is available that can effectively monitor a log file? The plain answer to this question is NONE! The log monitoring applications that does exist require way too much configuration, which in effect renders them not worthy of consideration.
Log monitoring should allow for pluggable arguments on the command line (instead of in separate config files) and should be very easy for the average UNIX user to understand and use. Most log monitoring tools are not like this. They are often complex and require time to get familiar with (through reading endless pages of installation setups). In my opinion, this is unnecessary trouble that can and should be avoided.
Again, I strongly believe, in order to be efficient, one must be able to run a program directly from the command line without needing to go elsewhere to edit config files.
So the best solution, in most cases, is to either write a log monitoring tool for your particular needs or download a log monitoring program that has already been written for your type of UNIX environment.
Once you have that log monitoring tool, you can give it to Nagios to run at any time, and Nagios will schedule it to be kicked off at regular intervals. If after running it at the set intervals, Nagios finds the issues / patterns / strings that you tell it to watch for, it will alert and send out notifications to whoever you want them sent to.
But then you wonder, what type of log monitoring tool should you write or download for your environment?
The log monitoring program that you should obtain to monitor your production log files must be as simple as the below but must still remain powerfully versatile:
Example: logrobot / var / log / messages 60 'error' 'panic' 5 10 -foundn
Output: 2 — 1380 — 352 — ATWF — (Mar / 1) – (16:15) — (Mar / 1) – (17:15:00)
The "-foundn" option searches the / var / log / messages for the strings "error" and "panic". Once it finds it, it'll either abort with an 0 (for OK), 1 (for WARNING) or 2 (for CRITICAL). Each time you run that command, it'll provide a one line statistic report similar to that in the above Output. The fields are delimited by the "—".
1st field is 2 = which means, this is critical.
2nd field is 1380 = number of seconds since the strings you specified last occurred in the log.
3rd field is 352 = there were 352 occurrences of the string "error" and "panic" found in the log within the last 60 minutes.
4th field is ATWF = Do not worry about this for now. Irrelevant.
5th and 6th field means = The log file was searched from (Mar / 1) – (16:15) to (Mar / 1) – (17:15:00). And from the data gathered from that timeframe, 352 occurrences of "error" and "panic" were found.
If you would actually like to see all 352 occurrences, you can run the below command and pass the "-show" option to the logrobot tool. This will output to the screen all matching lines in the log that contain the strings you specified and that were written to the log within the last 60 minutes.
Example: logrobot / var / log / messages 60 'error' 'panic' 5 10 -show
The "-show" command will output to the screen all the lines it finds in the log file that contains the "error" and "panic" strings within the past 60 minute time frame you specified. Of course, you can always change the parameters to fit your particular needs.
With this Nagios Log Monitoring tool (logrobot), you can perform the magic that the big name famous monitoring applications can not come close to performing.
Once you write or download a log monitoring script or tool like the one above, you can have Nagios or CRON run it on a regular basis which will in turn enable you to keep a bird's eye view on all the logged activities of your important servers.
Do you have to use Nagios to run it on a regular basis? Absolutely not. You can use whatever you want.