PacNOG 5: Network Management Workshop
Nagios Exercises
PART I
-----------------------------------------------------------------------------
1. Install Nagios version 3
   
    Do this as root.
    # apt-get install nagios3
2. Create the Web user password file:
    # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin
    New password:         
    Re-type new password: 
   Please use the class password.
2. You should already have a working Nagios!
    - Open a browser, and go to
    http://localhost/nagios3/
    - At the login prompt, login as:
        user: nagiosadmin
        pass: 
3. Let's look at the interface together...
    # cd /etc/nagios3/
    # ls -l 
    -rw-r--r-- 1 root root    1882 2008-12-18 13:42 apache2.conf
    -rw-r--r-- 1 root root   10524 2008-12-18 13:44 cgi.cfg
    -rw-r--r-- 1 root root    2429 2008-12-18 13:44 commands.cfg
    drwxr-xr-x 2 root root    4096 2009-02-14 12:33 conf.d
    -rw-r--r-- 1 root root      26 2009-02-14 12:36 htpasswd.users
    -rw-r--r-- 1 root root   42539 2008-12-18 13:44 nagios.cfg
    -rw-r----- 1 root nagios  1293 2008-12-18 13:42 resource.cfg
    drwxr-xr-x 2 root root    4096 2009-02-14 12:32 stylesheets
    
    # ls -l conf.d/
    -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg
    -rw-r--r-- 1 root root  418 2008-12-18 13:42 extinfo_nagios2.cfg
    -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg
    -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg
    -rw-r--r-- 1 root root  210 2009-02-14 12:33 host-gateway_nagios3.cfg
    -rw-r--r-- 1 root root  976 2008-12-18 13:42 hostgroups_nagios2.cfg
    -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg
    -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg
    -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg
PART II
-----------------------------------------------------------------------------
1. According to what we saw in class, let's add a host to monitor. 
    - Pick any PC in the room, maybe your neighbor's PC.
    $ su -
    # cd /etc/nagios3/conf.d/
    # vi pcNNN.cfg          
define host {
    use         generic-host
    host_name   pcNNN
    alias       PC NNN at APRICOT2009 
    address     _______________       [pcNNN's IP address here]
}
    ... Save and quit. 
2. Let's create a new hostgroup for the occasion, and add our host
   to it
    - Edit the file hostgroups_nagios2.cfg and add a new group. Do
      this at the bottom of the file:
    # vi hostgroups_nagios2.cfg
define hostgroup {
    hostgroup_name  servers
    alias           PacNOG5 PCs
    members         pcNNN
}
3. Now let's associate some services to that host
    # vi services_nagios2.cfg
    - Find the section called "check that ssh services are running",
      and change the line:
hostgroup_name                  ssh-servers
    to
hostgroup_name                  ssh-servers, servers
4. Verify that your configuration file is OK:
    # nagios3 -v /etc/nagios3/nagios.cfg 
    ... You should get :
Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the check.
5. Reload/Restart Nagios
    # /etc/init.d/nagios3 restart
6. Go to the web interface (http://localhost/nagios3) and check the host
   you just added
7. Add ALL the PCs in the room!
    - Add all the PCs in the room to the config
    - Check HTTP for all PCs in the room
    - Remember to verify the configuration file!
    - I suggest that you create a single configuration file to do this.
      (i.e., pcs.cfg, or servers.cfg, etc.).
PART III
-----------------------------------------------------------------------------
1.  Create a parent-child relationship in Nagios. Your PC has a
    parent which is the switch it is attached to. The switch has
    a parent, which is the router it is attached to.
    Let's create this relationship.
2.  Create a file to define the configuration for your switches.
    Maybe "/etc/nagios3/conf.d/switches.cfg". We'll start by
    just entering information for the switch to which your PC is
    attached:
    # cd /etc/nagios3/conf.d
    # touch switches.cfg
    # vi switches.cfg
    Your switch is either mgmt-sw1 or mgmt-sw2
define host {
	use		generic-host
	host_name	mgmt-swN
	alias		switch for 192.168.2.N/25
	address		192.168.2.NNN
	parents		bb-gwN
}
    be sure that you enter the correct values for "N". You can refer
    to our classroom Network Diagram to figure these out:
    http://192.168.1.224/trac/wiki/network
2.  Create a file to define the configuration for the router
    Maybe "/etc/nagios3/conf.d/routers.cfg".
    The router you use is the parent of the switch above...
define host {
	use		generic-host
	host_name  	bb-gwN	
	alias		router for for 192.168.2.N/25
	address		192.168.2.NNN
}
    Save and exit from the file.
3.  Edit the file pcNNN.cfg and add a parent entry to this file:
    # vi pcNNN.cfg
define host {
    use         generic-host
    host_name   pcNNN
    alias       PC NNN at APRICOT2009 
    address     192.168.2.NNN
    parents     mgmt-swN
}
    Save and quit. Eventually we'll need to update the parent for
    your localhost as well, but we'll do this later.
4.  In preparation for putting in multiple pc, switch and
    router entries let's create the initial hostgroups for
    each of these. Edit the file
    /etc/nagios3/conf.d/hostgroups_nagios2.cfg and at the bottom
    of the file add the following:
    # vi hostgroups_nagios2.cfg
    Go to the bottom of the file. You should already have an
    entry for "servers" for pcNNN:
##
## Our local hostgroup definitions
##
define hostgroup {
	hostgroup_name	servers
	alias		PacNOG5 PCs
	members		pcNNN
}
define hostgroup {
	hostgroup_name switches
        alias		PacNOG5 switches
	members		mgmt-swN
}
define hostgroup {
        hostgroup_name  routers 
        alias           PacNOG5 routers 
        members         bb-gwN
}
   For now we won't add services to our switches or routers, but
   later on you may wish to check to see if SSH is running on these
   devices.
5. Verify that your configuration is OK:
    # nagios3 -v /etc/nagios3/nagios.cfg 
    ... You should get :
Total Warnings: 0
Total Errors:   0
Things look okay - No serious problems were detected during the check.
6. Reload/Restart Nagios
    # /etc/init.d/nagios3 restart
7. Go to the web interface (http://localhost/nagios3) and check the
   see how things look.
7. See if the parent-child relationship seems reasonable by clicking
   on the Status Map link on the left-hand side of the Nagios page.
   In Status Map choose "Balanced Tree" from the Layout Method
   drop-down menu and click the Update button. Do you see a
   parent-child relationship as you might expect?
 
   
  
PART III
-----------------------------------------------------------------------------
1. Create a complete Nagios configuration for our classroom network.
   NOTES:
   - This requires more planning. You have switches, routers, and
     the NOC (if you wish to add it). In addition, the IP addresses 
     that you use are for your network router, the classroom router, 
     and the other network's router depend on your position in the 
     network.
   - You want to use internal IP address for your network's router,
     and the gateway router.
 
   - Note that the switches are not running Telnet, they are
     using ssh. So you should do either an ssh check on them or
     a standard ping check (the Nagios default).
   - It is important that you properly define the parent for
     devices. Some examples are given below. Devices can have
     more than one parent, and in our classroom this is true. The
     two switches lan1-lan2-sw and lan3-lan4-sw have two parents
     since they have a single administrative interface, but they
     are connected by two routers each.
3.  Complete the switches configuration
    (/etc/nagios3/conf.d/switches.cfg). There should be thre entries
    in this file, the 2 switches for 192.168.2.0/25 and for
    192.168.2.128/25, and the backbone switch.
4.  Complete the routers configuration
    (/etc/nagios3/conf.d/routers.cfg). There should be two entries
    in this file for router bb-gw1 and bb-gw2.
5.  Complete the PCs configuration. Perhaps change the filename:
    /etc/nagios3/conf.d/pcNNN.cfg to:
    /etc/nagios3/conf.d/servers.cfg
    # cd /etc/nagios3/conf.d
    # mv pcNNN.cfg servers.cfg
    This file should have a entries for each classroom pc. Remember
    to choose the correct parent for each one, including the NOC
    box.
6.  In the file "/etc/nagios3/conf.d/hostgroups_nagios2.cfg"
    complete the  hostgroups for all the routers, switches and
    pcs in the classroom.
    Sample entry:
# hostgroup definition for APRICOT 2009 Network Management Workshop
define hostgroup {
        hostgroup_name routers
        alias          Cisco Routers at APRICOT 2009
        members        
}
define hostgroup {
	hostgroup_name	servers
	alias		PacNOG5 PCs
	members		pc10N, pc20N, ...
}
7.  In the file "/etc/nagios3/conf.d/services_nagios2.cfg" you
    define what groups (not individual devices) will have what
    service checks run on them.
    Sample entry:
# check that ping-only hosts are up
define service {
        hostgroup_name                  routers,switches,servers
        service_description             PING
        check_command                   check_ping!100.0,20%!500.0,60%
        use                             generic-service
        notification_interval           0 ; set > 0 if you want to be renotified
}
7.  The file "/etc/nagios2/conf.d/extinfo_nagios2.cfg" defines
    details for each device defined. Feel free to take a look
    at the extinfo_nagios2.cfg file we are already using for our
    classroom to get a feel for what you can do. Compare your status
    map to the one on the classroom NOC machine. Notice the
    difference, maybe, with icons?
    You can view the NOC's extinfo_nagios2.cfg file here:
    http://192.168.1.224/configs/nagios/conf.d/extinfo_nagios2.cfg
  
9.  You might consider changing the file
    /etc/nagios3/conf.d/localhost_nagios2.cfg.
10. Naturally you can get entire set of Nagios configuration
    files for this network that will only need a few changes
    for your machine from the NOC web server if you wish.
    http://192.168.1.224/configs/nagios/conf.d/
11. You sill need to update a few files. Including:
    /etc/nagios3/conf.d/routers.cfg
    /etc/nagios3/conf.d/servers.cfg
    You should make sure that you have the correct IP
    addresses defined in routers.cfg for your network view,
    and you will want to comment out your pcs entry in
    the file pcs.cfg
    You may have to make additional changes and to troubleshoot
    this using the "Nagios pre-flight check":
    # nagios3 -v /etc/nagios3/nagios.cfg
    Remember to restart Nagios for changes to take affect.
    # /etc/init.d/nagios3 restart
PART IV
----------------------------------------------------------------
1.  Allow for "guest" user access to view your Nagios web pages.
    # cd /etc/nagios3
    # vi cgi.cfg
    Find these two lines (they are together)
    authorized_for_all_services=nagiosadmin
    authorized_for_all_hosts=nagiosadmin
    And change them to read:
    authorized_for_all_services=nagiosadmin,guest
    authorized_for_all_hosts=nagiosadmin,guest
2.  Save the file.
3.  Now you must create the guest user and password.
4.  # htpasswd htpasswd.users guest
    New password: 
    Re-type new password:
    You can use any password you want. It's pretty typical if "guest"
    only has "view" access to the Nagios web pages to pick a very
    simple password, like "guest".
5.  Restart Nagios for the changes to take affect:
    # /etc/init.d/nagios3 restart
6.  Next time you are asked for a password to view the Nagios web
    pages you can use "guest/guest" (if you chose "guest" as a
    password) to view them instead of using "nagiosadmin".
PART V
----------------------------------------------------------------
NOT TO BE DONE.
This session is for reference only.
1.) Here we will tie in the ability of Nagios and Trac to work
    together to help document your network. The concept if
    quite simple. First, go to your local Trac project install
    page at:
    http://localhost/trac/netmanage
    Log in as the admin user so that you can edit the Trac
    wiki.
2.) Create an entry for your PC in the wiki. You can do this by
    clicking on the "Edit this page" button and entering in a
    link like this (example for PC1, use your PC number instead):
    [wiki:PC1 PC1] : '''169.223.140.1'''
    Save the page.
    Alternately, have a look at the main classroom wiki to see
    what has been done:
    http://noc.mgmt.conference.apricot.net/
3.) Click on the PC1 item that's grey with a question mark. Now
    create this page. Enter in some text about your PC and save
    the page.
4.) In Nagios you need to edit the file:
    /etc/nagios3/conf.d/extinfo_nagios2.cfg
   and update your PCs entry in this file with a line like this:
   notes_url       http://localhost/trac/netmanage/wiki/PC1
   You can place this on a line after the "host_name" entry.
   Remember to change "PC1" to your PCs number.
5.) Restart Nagios. 
6.) If you look in your Nagios Service Detail view there should now be
    a new icon next to your machine's entry. This looks like a folder.
    Click on this and the URL you entered for the notes_url entry in
    the extinfo_nagios2.cfg file will open. You can, also, click on
    the machines' icon in the graph views, then click again and this
    page will open.
PART V (OPTIONAL)
-----------------------------------------------------------------------------
1.) Now we will create a plug-in for Nagios. This plug-in will do the
    following:
    * Ping a set of (external) servers.
    * If one server is down a warning will be generated.
    * If two servers are down a critical state will be generated.
    This will be part of our scripting session. The instructions for
    doing this are here:
    http://ws.edu.isoc.org/workshops/2008/ait-net-manage/presos/scripting/bash.html
    These were written for Nagios version 2, but are fine for version 3. Just
    replace occurrences of "/etc/nagios2" with "/etc/nagios3".
    
PART VI
-----------------------------------------------------------------------------
1.) We will update our Nagios contacts definion,
    "/etc/nagios3/conf.d/contacts_nagios3.cfg" to add a local user to
    that will receive alerts for certain condition.
2.) Next we will add another user for our RT ticketing system so
    that a ticket is automatically generated for specific events.
3.) Edit the file "/etc/nagios3/contacts_nagios3.cfg":
    # vi /etc/nagios3/contacts_nagios3.cfg
    In a web browser open up the sample contacts_nagios3.cfg file
    and adapt this to work with what you have. Basically, just 
    replace yours with this one.
    Go to:
    http://noc.mgmt.conference.apricot.net/configs/etc/nagios3/conf.d/ \  
           contacts_nagios3.cfg
4.) Once the files is updated you might have noticed the two lines that read:
        service_notification_commands   notify-service-ticket-by-email
        host_notification_commands      notify-host-ticket-by-email
    The "notify-service-ticket-by-email" and "notify-host-ticket-by-email"
    commands are new. You need to create these in the file
    /etc/nagios3/commands.cfg.
    This is not strictly necessary. For purposes of this exercise you can 
    replace these two commands with:
        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-ticket-by-email
    and skip skip part "4a" if you wish.
4a) These two commands are set aside so that if you wish you can adjust the
    formatting of the email that Nagios sends to be more user friendly to 
    the RT ticketing system. This is up to you. To create these two commands
    we simply copy the original commands and renamve them in 
    /etc/nagios3/commands.cfg.
    The easiest way to see this is to open a web browser and go to:
    http://noc.mgmt.conference.apricot.net/configs/etc/nagios3/commands.cfg
    and then you can copy and past the new items in to your commands.cfg file
    on your machine. Note that you could change the names of these if you wish
    as long as you match the new name to what is in the 
    /etc/nagios3/contacts_nagios3.cfg file.
5.) Once you have updated your contacts_nagios3.cfg file, then run the 
    Nagios pre-flight check:
    # nagios3 -v /etc/nagios3/nagios.cfg
    If it all looks good, then restart Nagios:
    # /etc/init.d/nagios3 restart
    Or, less intrusive is:
    # /etc/init.d/nagios3 reload
6.) Now we need to create a proper alias in our /etc/aliases file using
    the rt-mailgate program to pipe email from Nagios to RT and to the 
    correct queue. 
    Edit the file /etc/aliases:
    # vi /etc/aliases
    And add the following lines to the bottom of the file:
    alerts:         "|/usr/bin/rt-mailgate --queue 'Network Management' --action correspond --url http://localhost/rt"
    alerts-comment: "|/usr/bin/rt-mailgate --queue 'Network Management' --action comment --url http://localhost/rt"
    Make note in the file and verify that there is a line that, also, reads:
    root: netmanage
    This tells the mail system to deliver all mail sent to root@localhost to the
    netmanage account instead.    
    Save the file and quit. In reality we'll only be using the "alerts" alias 
    at this time.
    After you've saved and exited from the /etc/aliases file run:
    # newaliases
    which lets the Postfix MTA know about changes to /etc/aliases. If you run
    in to any problems with errors about rt-mailgate, verify that it 
    is installed by doing:
 
    # apt-get install rt3.6-clients
    this should have been done when you first installed RT.
7.) Now you should go to your RT instance installed on your machine.
    http://localhost/rt
    log in as "root".
    Click on the "Configuration" link, "Queues", "New queue": Be sure that you
    fill in the "Queue Name" field with "Network Management" - including the 
    upper-case 'N' and 'M'.
    You only need to fill in Queue Name and Description. Click the "Save Changes"
    button on the lower right of the screen. 
    Now click the "User Rights" link. You'll see that the 'root' user has no
    rights on this queue. Give your 'root' user enough rights on this queue to
    at least see tickets in the queue and see the queue itself. If you want you
    can be lazy and highlight all the rights and assign 'root' everything. You have
    to press "Modify User Rights" to do this.
    At this point log out of RT and log back in. You should see the Network 
    Management queue listed on the right of the page.
    Now you need to generate a Nagios alert so that a ticket is created in RT. If
    you noticed in the /etc/nagios3/conf.d/contacts_nagios3.cfg the Nagios "alerts"
    queue only sends notifications if a service is in the "c" or "critical" state,
    or if a host is "d" or "down". In addition in the file 
    /etc/nagios3/conf.d/generic-service_nagios2.cfg there is a line that reads:
    notification_interval           0
    This ensures that Nagios will only send one (1) email per critical or down
    state. If this is set to something else, then you will generate multiple
    tickets, which is not good. 
    Try to generate an alert from Nagios, which should generate a ticket in RT
    by doing something. You could check for a service on your neighbor's PC that
    does not exist. You could pull the network cable on your neighbor's PC so that
    it appears to be down. Otherwise, your instructor will come up with something
    as well.
Last update 22 February 2009 by HA