]> git.uio.no Git - check_openmanage.git/blame - check_openmanage.pod
jau
[check_openmanage.git] / check_openmanage.pod
CommitLineData
669797e1 1# Man page created with:
2#
b53ed7ea 3# pod2man -s 8 -r "`./check_openmanage -V | head -n 1`" -c 'Nagios plugin' check_openmanage.pod check_openmanage.8
669797e1 4#
5# $Id$
6
7=head1 NAME
8
9check_openmanage - Nagios plugin for checking the hardware status on
10 Dell servers running OpenManage
11
12=head1 SYNOPSIS
13
14check_openmanage [I<OPTION>]...
b0f29cfc 15
6a3615ec 16check_openmanage -H I<hostname> [I<OPTION>]...
669797e1 17
18=head1 DESCRIPTION
19
20check_openmanage is a plugin for Nagios which checks the hardware
21health of Dell servers running OpenManage Server Administrator
22(OMSA). The plugin checks the health of the storage subsystem, power
23supplies, memory modules, temperature probes etc., and gives an alert
24if any of the components are faulty or operate outside normal
25parameters.
26
27check_openmanage is designed to be used by either locally (using NRPE
28or similar) or remotely (using SNMP). In either mode, the output is
29(nearly) the same. Note that checking the alert log is not supported
30in SNMP mode.
31
32=head1 GENERAL OPTIONS
33
34=over 4
35
36=item -t, --timeout I<SECONDS>
37
38The number of seconds after which the plugin will abort. Default
39timeout is 30 seconds if the option is not present.
40
41=item -p, --perfdata [I<multline>]
42
43Collect performance data. Performance data collected include
44temperatures (in Celcius) and fan speeds (in rpm). On systems that
45support it, power consumption is also collected (in Watts).
46
47If given the argument C<multiline>, the plugin will output the
48performance data on multiple lines, for Nagios 3.x and above.
49
50=item -w, --warning I<STRING> or I<FILE>
51
52Override the machine-default temperature warning thresholds. Syntax is
53C<id1=max[/min],id2=max[/min],...>. The following example sets warning
54limits to max 50C for probe 0, and max 45C and min 10C for probe 1:
55
56check_openmanage -w 0=50,1=45/10
57
58The minimum limit can be omitted, if desired. Most often, you are only
59interested in setting the maximum thresholds.
60
61This parameter can be either a string with the limits, or a file
62containing the limits string. The option can be specified multiple
63times.
64
b0f29cfc 65NOTE: This option should only be used to narrow the field of OK
66temperatures wrt. the OMSA defaults. To expand the field of OK
67temperatures, increase the OMSA thresholds. See the plugin web page
68for more information.
69
669797e1 70=item -c, --critical I<STRING> or I<FILE>
71
72Override the machine-default temperature critical thresholds. Syntax
73and behaviour is the same as for warning thresholds described above.
74
75=item -o, --ok-info I<NUMBER>
76
77This option lets you define how much output you want the plugin to
78give when everything is OK, i.e. the verbosity level. The default
79value is 0 (one line of output). The output levels are cumulative.
80
81=over 4
82
83=item B<0>
84
85- Only one line (default)
86
87=item B<1>
88
89- BIOS and firmware info on a separate line
90
91=item B<2>
92
93- Storage controller and enclosure info on separate lines
94
95=item B<3>
96
97- OMSA version on separate line
98
99=back
100
101The reason that OMSA version is separated from the rest is that
102finding it requires running a really slow omreport command, when the
103plugin is run locally via NRPE.
104
71d7d930 105=item --omreport I<OMREPORT PATH>
106
107Specify full path to omreport, if it is not installed in any of the
108regular places. Usually this option is only needed on Windows, if
109omreport is not installed on the C: drive.
110
669797e1 111=item -i, --info
112
113Prefix any alerts with the service tag.
114
115=item -e, --extinfo
116
117Display a short summary of system information (model and service tag)
118in case of an alert.
119
d27881e0 120=item -I, --htmlinfo [I<CODE>]
669797e1 121
122Using this option will make the servicetag and model name into
123clickable HTML links in the output. The model name link will point to
124the official Dell documentation for that model, while the servicetag
125link will point to a website containing support info for that
126particular server.
127
128This option takes an optional argument, which should be your country
129code or C<me> for the middle east. If the country code is omitted the
130servicetag link will still work, but it will not be speficic for your
131country or area. Example for Germany:
132
133 check_openmanage --htmlinfo de
134
135If this option is used together with either the I<--extinfo> or
136I<--info> options, it is particularly useful. Only the most common
137country codes is supported at this time.
138
139=item --postmsg I<STRING> or I<FILE>
140
141User specified post message. Useful for displaying arbitrary or
142various system information at the end of alerts. The argument is
143either a string with the message, or a file containing that
144string. You can control the format with the following interpreted
145sequences:
146
147=over 4
148
149=item B<%m>
150
151System model
152
153=item B<%s>
154
155Service tag
156
157=item B<%b>
158
159BIOS version
160
161=item B<%d>
162
163BIOS release date
164
165=item B<%o>
166
167Operating system name
168
169=item B<%r>
170
171Operating system release
172
173=item B<%p>
174
175Number of physical drives
176
177=item B<%l>
178
179Number of logical drives
180
181=item B<%n>
182
183Line break. Will be a regular line break if run from a TTY, else an
184HTML line break.
185
186=item B<%%>
187
188A literal C<%>
189
190=back
191
192=item -s, --state
193
194Prefix each alert with its corresponding service state (i.e. warning,
195critical etc.). This is useful in case of several alerts from the same
196monitored system.
197
d27881e0 198=item -S, --short-state
669797e1 199
200Same as the B<--state> option above, except that the state is
201abbreviated to a single letter (W=warning, C=critical etc.).
202
fb90e271 203=item --linebreak I<STRING>
669797e1 204
205check_openmanage will sometimes report more than one line, e.g. if
206there are several alerts. If the script has a TTY, it will use regular
207linebreaks. If not (which is the case with NRPE) it will use HTML
208linebreaks. Sometimes it can be useful to control what the plugin uses
209as a line separator, and this option provides that control.
210
211The argument is the exact string to be used as the line
212separator. There are two exceptions, i.e. two keywords that translates
213to the following:
214
215=over 4
216
217=item B<REG>
218
219Regular linebreaks, i.e. "\n".
220
221=item B<HTML>
222
223HTML linebreaks, i.e. "<br/>".
224
225=back
226
227This is a rather special option that is normally not needed. The
228default behaviour should be sufficient for most users.
229
230=item -d, --debug
231
232Debug output. Will report status on everything, even if status is
233ok. Blacklisted or unchecked components are ignored (i.e. no output).
234
235NOTE: This option is intended for diagnostics and debugging purposes
236only. Do not use this option from within Nagios, i.e. in the Nagios
237config.
238
239=item -h, --help
240
241Display help text.
242
243=item -V, --version
244
245Display version info.
246
247=back
248
249=head1 SNMP OPTIONS
250
251=over 4
252
253=item -H, --hostname I<HOSTNAME>
254
255The transport address of the destination SNMP device. Using this
256option triggers SNMP mode.
257
258=item -P, --protocol I<PROTOCOL>
259
260SNMP protocol version. This option is optional and expects a digit
261(i.e. C<1>, C<2> or C<3>) to define the SNMP version. The default is
262C<2>, i.e. SNMP version 2c.
263
264=item -C, --community I<COMMUNITY>
265
266This option expects a string that is to be used as the SNMP community
267name when using SNMP version 1 or 2c. By default the community name
268is set to C<public> if the option is not present.
269
270=item --port I<PORT>
271
272SNMP port of the remote (monitored) system. Defaults to the well-known
273SNMP port 161.
274
275=item -U, --username I<SECURITYNAME>
276
277[SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires
278that a securityName be specified. This option is required when using
279SNMP version 3, and expects a string 1 to 32 octets in lenght.
280
281=item --authpassword I<PASSWORD>, --authkey I<KEY>
282
283[SNMPv3] By default a securityLevel of C<noAuthNoPriv> is assumed. If
284the --authpassword option is specified, the securityLevel becomes
285C<authNoPriv>. The --authpassword option expects a string which is at
286least 1 octet in length as argument.
287
288Optionally, instead of the --authpassword option, the --authkey option
289can be used so that a plain text password does not have to be
290specified in a script. The --authkey option expects a hexadecimal
291string produced by localizing the password with the
292authoritativeEngineID for the specific destination device. The
293C<snmpkey> utility included with the Net::SNMP distribution can be
294used to create the hexadecimal string (see L<snmpkey>).
295
296=item --authprotocol I<ALGORITHM>
297
298[SNMPv3] Two different hash algorithms are defined by SNMPv3 which can
299be used by the Security Model for authentication. These algorithms are
300HMAC-MD5-96 C<MD5> (RFC 1321) and HMAC-SHA-96 C<SHA-1> (NIST FIPS PUB
301180-1). The default algorithm used by the plugin is HMAC-MD5-96. This
302behavior can be changed by using this option. The option expects
303either the string C<md5> or C<sha> to be passed as argument to modify
304the hash algorithm.
305
306=item --privpassword I<PASSWORD>, --privkey I<KEY>
307
308[SNMPv3] By specifying the options --privkey or --privpassword, the
309securityLevel associated with the object becomes
310C<authPriv>. According to SNMPv3, privacy requires the use of
311authentication. Therefore, if either of these two options are present
312and the --authkey or --authpassword arguments are missing, the
313creation of the object fails. The --privkey and --privpassword
314options expect the same input as the --authkey and --authpassword
315options respectively.
316
317=item --privprotocol I<ALGORITHM>
318
319[SNMPv3] The User-based Security Model described in RFC 3414 defines a
320single encryption protocol to be used for privacy. This protocol,
321CBC-DES C<DES> (NIST FIPS PUB 46-1), is used by default or if the
322string C<des> is passed to the --privprotocol option. The Net::SNMP
323module also supports RFC 3826 which describes the use of
324CFB128-AES-128 C<AES> (NIST FIPS PUB 197) in the USM. The AES
325encryption protocol can be selected by passing C<aes> or C<aes128> to
326the --privprotocol option.
327
328One of the following arguments are required: des, aes, aes128, 3des,
3293desde
330
606e084f 331=item --use-get_table
332
333This option exists as a workaround when using check_openmanage with
334SNMPv3 on Windows with net-snmp. Using this option will make
335check_openmanage use the Net::SNMP function get_table() instead of
336get_entries() while fetching values via SNMP. The latter is faster and
337is the default.
338
669797e1 339=back
340
341=head1 BLACKLISTING
342
343=over 4
344
345=item -b, --blacklist I<STRING> or I<FILE>
346
347Blacklist missing and/or failed components, if you do not plan to fix
348them. The parameter is either the blacklist string, or a file (that
349may or may not exist) containing the string. The blacklist string
350contains component names with component IDs separated by slash
351(/). Blacklisted components are left unchecked.
352
353TIP: Use the option C<-d> (or C<--debug>) to get the blacklist ID for
354devices. The ID is listed in a separate column in the debug output.
355
0b6ba9c9 356NOTE: If blacklisting is in effect, the global health of the system is
357not checked.
669797e1 358
359=over 9
360
361=item B<Syntax:>
362
363component1=id1[,id2,...]/component2=id1[,id2,...]/...
364
02bf599a 365The ID part can also be C<all>, in which all components of that type
0b6ba9c9 366is blacklisted.
367
669797e1 368=item B<Example:>
369
02bf599a 370check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
669797e1 371
372=back
373
0b6ba9c9 374In the example we blacklist powersupply 0, fans 3 and 5, physical disk
3751:0:0:1, and warnings about out-of-date drivers for all
376controllers. Legal component names include:
669797e1 377
378=over 8
379
380=item B<ctrl>
381
0b6ba9c9 382Storage controller. Note that if a controller is blacklisted, all
383components on that controller (such as physical and logical drives)
384are blacklisted as well.
669797e1 385
386=item B<ctrl_fw>
387
388Suppress the special warning message about old controller
389firmware. Use this if you can not or will not upgrade the firmware.
390
391=item B<ctrl_driver>
392
393Suppress the special warning message about old controller driver.
394Particularly useful on systems where you can not upgrade the driver.
395
8dd8083c 396=item B<ctrl_stdr>
397
398Suppress the special warning message about old Storport driver on
399Windows.
400
d27881e0 401=item B<ctrl_pdisk>
402
403This blacklisting keyword exists as a possible workaround for physical
404drives with bad firmware which makes Openmanage choke. It takes the
405controller number as argument. Use this option to blacklist all
406physical drives on a specific controller. This blacklisting keyword is
407only available in local mode, i.e. not with SNMP.
408
669797e1 409=item B<pdisk>
410
411Physical disk.
412
413=item B<vdisk>
414
415Logical drive (virtual disk)
416
417=item B<bat>
418
419Controller cache battery
420
7b02bc55 421=item B<bat_charge>
422
423Ignore warnings related to the controller cache battery charging
7031b02a 424cycle, which happens approximately every 40 days on Dell servers. Note
425that using this blacklist keyword makes check_openmanage ignore
426non-critical cache battery errors.
7b02bc55 427
669797e1 428=item B<conn>
429
430Connector (channel)
431
432=item B<encl>
433
434Enclosure
435
436=item B<encl_fan>
437
438Enclosure fan
439
440=item B<encl_ps>
441
442Enclosure power supply
443
444=item B<encl_temp>
445
446Enclosure temperature probe
447
448=item B<encl_emm>
449
450Enclosure management module (EMM)
451
452=item B<dimm>
453
454Memory module
455
456=item B<fan>
457
458Fan
459
460=item B<ps>
461
462Powersupply
463
464=item B<temp>
465
466Temperature sensor
467
468=item B<cpu>
469
470Processor (CPU)
471
472=item B<volt>
473
474Voltage probe
475
476=item B<bp>
477
478System battery
479
600bd61b 480=item B<amp>
669797e1 481
482Amperage probe (power consumption monitoring)
483
484=item B<intr>
485
486Intrusion sensor
487
488=back
489
490=back
491
492=head1 CHECK CONTROL
493
494=over 4
495
496=item --only I<KEYWORD>
497
498This option can be specifed once and expects a keyword. The different
499keywords and the behaviour of check_openmanage is described below.
500
501=over 4
502
503=item B<critical>
504
505Print only critical alerts. With this option any warning alerts are
506suppressed.
507
508=item B<warning>
509
510Print only warning alerts. With this option any critical alerts are
511suppressed.
512
513=item B<chassis>
514
515Check all chassis components and nothing else.
516
517=item B<storage>
518
519Only check storage
520
521=item B<memory>
522
523Only check memory modules
524
525=item B<fans>
526
527Only check fans
528
529=item B<power>
530
531Only check power supplies
532
533=item B<temp>
534
535Only check temperatures
536
537=item B<cpu>
538
539Only check processors
540
541=item B<voltage>
542
543Only check voltage probes
544
545=item B<batteries>
546
547Only check batteries
548
549=item B<amperage>
550
551Only check power usage
552
553=item B<intrusion>
554
555Only check chassis intrusion
556
557=item B<esmhealth>
558
559Only check ESM log overall health, i.e. fill grade
560
561=item B<esmlog>
562
563Only check the event log (ESM) content
564
565=item B<alertlog>
566
567Only check the alert log content
568
569=back
570
571=item --check I<STRING> or I<FILE>
572
573This parameter allows you to adjust which components that should be
574checked at all. This is a rougher approach than blacklisting, which
575require that you specify component id or index. The parameter should
576be either a string containing the adjustments, or a file containing
577the string. No errors are raised if the file does not exist.
578
579Note: This option is ignored with alternate basenames.
580
581=over 9
582
583=item B<Example:>
584
585check_openmanage --check storage=0,intrusion=1
586
587=back
588
589Legal values are described below, along with the default value.
590
591=over 4
592
593=item B<storage>
594
595Check storage subsystem (controllers, disks etc.). Default: ON
596
597=item B<memory>
598
599Check memory (dimms). Default: ON
600
601=item B<fans>
602
603Check chassis fans. Default: ON
604
605=item B<power>
606
607Check power supplies. Default: ON
608
609=item B<temp>
610
611Check temperature sensors. Default: ON
612
613=item B<cpu>
614
615Check CPUs. Default: ON
616
617=item B<voltage>
618
619Check voltage sensors. Default: ON
620
621=item B<batteries>
622
623Check system batteries. Default: ON
624
625=item B<amperage>
626
627Check amperage probes. Default: ON
628
629=item B<intrusion>
630
631Check chassis intrusion. Default: ON
632
633=item B<esmhealth>
634
635Check the ESM log health, i.e. fill grade. Default: ON
636
637=item B<esmlog>
638
639Check the ESM log content. Default: OFF
640
641=item B<alertlog>
642
643Check the alert log content. Default: OFF
644
645=back
646
647=back
648
649=head1 DIAGNOSTICS
650
651The option C<--debug> (or C<-d>) can be specified to display all
652monitored components.
653
654=head1 DEPENDENCIES
655
656If SNMP is requested, the perl module Net::SNMP is
657required. Otherwise, only a regular perl distribution is required to
658run the script. On the target (monitored) system, Dell Openmanage
659Server Administrator (OMSA) must be installed and running.
660
661=head1 EXIT STATUS
662
663If no errors are discovered, a value of 0 (OK) is returned. An exit
664value of 1 (WARNING) signifies one or more non-critical errors, while
6652 (CRITICAL) signifies one or more critical errors.
666
667The exit value 3 (UNKNOWN) is reserved for errors within the script,
668or errors getting values from Dell OMSA.
669
670=head1 AUTHOR
671
672Written by Trond H. Amundsen <t.h.amundsen@usit.uio.no>
673
674=head1 BUGS AND LIMITATIONS
675
676Storage info is not collected or checked on very old PowerEdge models
677and/or old OMSA versions, due to limitations in OMSA. The overall
678support on those models/versions by this plugin is not well tested.
679
680=head1 INCOMPATIBILITIES
681
682The plugin should work with the Nagios embedded perl interpreter
683(ePN). However, this is not thoroughly tested.
684
685=head1 REPORTING BUGS
686
687Report bugs to <t.h.amundsen@usit.uio.no>
688
689=head1 LICENSE AND COPYRIGHT
690
691This program is free software: you can redistribute it and/or modify
692it under the terms of the GNU General Public License as published by
693the Free Software Foundation, either version 3 of the License, or (at
694your option) any later version.
695
696This program is distributed in the hope that it will be useful, but
697WITHOUT ANY WARRANTY; without even the implied warranty of
698MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
699General Public License for more details.
700
701You should have received a copy of the GNU General Public License
702along with this program. If not, see L<http://www.gnu.org/licenses/>.
703
704=head1 SEE ALSO
705
706L<http://folk.uio.no/trondham/software/check_openmanage.html>
707
708=cut