Nagios XI 2014-R1.0RC3 brief installation testing May 2014
The report contains some details of testing of Nagios XI 2014R1.0RC3 on a small network in Australia. Since this report was written, full Nagios release was announced and our testing was published here:Nagios XI 2014-R1 brief installation testing May2014
Scenario on the test network for a customer:
- Main server is CentOS 6.5 on X84_64 platform.
- Installed Nagios XI 2014R1.0RC3 with Standard Edition license.
- The server runs four KVM virtual machines:
Solaris 11.1 OpenSUSE 13.1 Oracle Linux 6.5 Ubuntu 14.04
- Memory footprint: For simple monitoring of four small virtual machines (each with one network interface and 1 GB RAM) and one host server, Nagios XI seemed to have grabbed lot of memory initially. Further investigation found the memory utilisation by Nagios to be quite low, which was good news as scalability in large networks is one of my evaluation goals.
The following non-standard services and processes were enabled on the host server when Nagios XI was installed:
nagios nagiosxi ndo2db nrpe postgresql mrtg
- Straight after the installation, I started getting lot of emails via cron job for MRTG. They contained the following notes:
Subroutine SNMP_Session::pack_sockaddr_in6 redefined at /usr/local/share/perl5/Exporter.pm line 66. at /usr/bin/../lib/mrtg2/SNMP_Session.pm line 149. Subroutine SNMPv1_Session::pack_sockaddr_in6 redefined at /usr/local/share/perl5/Exporter.pm line 66. at /usr/bin/../lib/mrtg2/SNMP_Session.pm line 604.
To eliminate them, I modified the cron job to divert any standard errors to /dev/null:*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok 2>/dev/null
- Nagios XI installation completely replaced /etc/sysconfig/iptables on the Linux server. It was necessary to revert to the original one and add important lines to accomodate support for Nagios:
# SNMP -A INPUT -m state --state NEW -m udp -p udp --dport 161 -j ACCEPT # SNMP Traps -A INPUT -m state --state NEW -m udp -p udp --dport 162 -j ACCEPT # NRPE -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT # NSCA -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT
-
Nagios XI system component status reports that performance grapher "NPCD not running".
This message seems to be ambiguous because process listing and the log file clearly show that npcd is running:
# ps -elf | grep npc[d] 5 S nagios 3454 1 0 80 0 - 92222 hrtime May15 ? 00:00:01 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
GUI interface fails to restart it through "Action" menu. - NRPE Agent install failed:
ubuntu14.04 is not currently supported. oracleserver6.5 is not currently supported.
Note that Oracle Linux 6.5 is almost identical to RHEL 6.5. In fact, Oracle Linux has two files that describe it:# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.5 (Santiago) # cat /etc/oracle-release Oracle Linux Server release 6.5
And Ubuntu is a fork of Debian. - Within "Monitoring Wizard: Solaris Server" the installation menu reported:
Only Solaris 10 supported.
Solaris 11 has been around for almost three years and it is pity that it is not supported yet. It is still one of the main commercial Unices. - When attempting Solaris 10 agent install on Solaris 11 server, it hung at this point:
================================= Nagios XI Solaris Agent Installer ================================= This script will do a complete install of the Nagios XI Solaris agent by executing all necessary sub-scripts. IMPORTANT: This script should only be used on a Solaris system. Do NOT use this on a system running any other operating system. ./fullinstall[18]: /usr/ucb/echo: not found [No such file or directory]
On Solaris 11, /usr/ucb directory tree is deprecated. There is a workaround, but it should be avoided:# pkg add compatibility/ucb
Therefore, the Nagios XI agent install on Solaris 11 should remove dependency on /usr/ucb commands.The fix for the echo command problem is easy. Here is the modification in fullinstall script:
#/usr/ucb/echo -n "Do you want to continue? [Y/n] " printf "Do you want to continue? [Y/n] "
Unfortunately,the agent installation on Solaris 11 still failed because it was an unsupported platform:================================= Nagios XI Solaris Agent Installer ================================= This script will do a complete install of the Nagios XI Solaris agent by executing all necessary sub-scripts. IMPORTANT: This script should only be used on a Solaris system. Do NOT use this on a system running any other operating system. Do you want to continue? [Y/n] Y Proceeding with installation... This platform () is not currently supported.
- Within "Monitoring Wizard: Linux Server" it did not find OpenSUSE 13.1 agent (Debian agent did not exist for download).
- Auto-Discovery Wizard required CLI to be executed manually:
# chmod +x /usr/local/nagiosxi/html/includes/ components/autodiscovery/setup.sh # cd /usr/local/nagiosxi/html/includes/components/autodiscovery/ # ./setup.sh
- My changes to improve NRPE Agent installation on Solaris 11:
# ln -s nagios-plugins-1.4.16-sol10-i386-local \ nagios-plugins-1.4.16-sol11-i386-local # ln -s nrpe-2.14-sol10-i386-local nrpe-2.14-sol11-i386-local # ln -s top-3.6.1-sol10-x86-local top-3.6.1-sol11-x86-local
Edit init.sh and add Solaris 11 support:if [ $os = "SunOS" ] ; then if [ $ver = "5.10" ] ; then platform="sol10" fi if [ $ver = "5.11" ] ; then platform="sol11" fi fi
Edit fullinstall.sh and add support for Solaris 11:# Check platform and architecture case "$platform" in "sol10"|"sol11") ...
The installation then completed, but additional errors showed and log file summarised it in logs-all-steps-20140511-170345.tar.gz.SUNW-MSG-ID: SMF-8000-YX, TYPE: defect, VER: 1, SEVERITY: major EVENT-TIME: Sun May 11 17:03:16 EST 2014 PLATFORM: KVM, CSN: unknown, HOSTNAME: sol11-vm2.circlingcycle.com.au SOURCE: software-diagnosis, REV: 0.1 EVENT-ID: bf04b56a-9e55-ed1e-a381-c2a4c7b47fef DESC: A service failed - a method is failing in a retryable manner but too often. AUTO-RESPONSE: The service has been placed into the maintenance state. IMPACT: svc:/application/nagios/nrpe:default is unavailable. REC-ACTION: Run 'svcs -xv svc:/application/nagios/nrpe:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted. Please refer to the associated reference document at http://support.oracle.com/msg/SMF-8000-YX for the latest service procedures and policies regarding this diagnosis.
SFM could not start the NRPE services for two reasons (one was related to obsolete openssl library and the other related to the requirement of physical network interface and this Solaris 11 server was just a KVM vurtual machine):# cat /var/svc/log/application-nagios-nrpe:default.log [ May 11 17:01:46 Disabled. ] [ May 11 17:01:53 Rereading configuration. ] [ May 11 17:03:13 Enabled. ] [ May 11 17:03:13 Executing start method ("/lib/svc/method/nrpe start"). ] ld.so.1: nrpe: fatal: libssl.so.0.9.7: open failed: No such file or directory /lib/svc/method/nrpe: line 10: 13937: Killed [ May 11 17:03:15 Method "start" exited with status 9. ] [ May 11 17:03:15 Executing start method ("/lib/svc/method/nrpe start"). ] ld.so.1: nrpe: fatal: libssl.so.0.9.7: open failed: No such file or directory /lib/svc/method/nrpe: line 10: 13939: Killed [ May 11 17:03:15 Method "start" exited with status 9. ] [ May 11 17:03:15 Executing start method ("/lib/svc/method/nrpe start"). ] ld.so.1: nrpe: fatal: libssl.so.0.9.7: open failed: No such file or directory /lib/svc/method/nrpe: line 10: 13941: Killed [ May 11 17:03:15 Method "start" exited with status 9. ] # svcadm enable -r nrpe svcadm: svc:/application/nagios/nrpe:default depends on svc:/network/physical, which has multiple instances. # svcs ... maintenance 17:03:15 svc:/application/nagios/nrpe:default
- Monitoring Wizard seems to take long time during configuration verification to update submitted changes. This needs to be investigated further.
In spite of these initial glitches, there is no doubt that Nagios XI is a great product, with plentiful of features. A small snapshot of how it looks on the network I am testing at the moment: