]> git.uio.no Git - check_openmanage.git/blame - check_openmanage.pod
* version 3.7.1-beta2
[check_openmanage.git] / check_openmanage.pod
CommitLineData
669797e1 1# Man page created with:
2#
b53ed7ea 3# pod2man -s 8 -r "`./check_openmanage -V | head -n 1`" -c 'Nagios plugin' check_openmanage.pod check_openmanage.8
669797e1 4#
5# $Id$
6
7=head1 NAME
8
9check_openmanage - Nagios plugin for checking the hardware status on
10 Dell servers running OpenManage
11
12=head1 SYNOPSIS
13
14check_openmanage [I<OPTION>]...
b0f29cfc 15
6a3615ec 16check_openmanage -H I<hostname> [I<OPTION>]...
669797e1 17
18=head1 DESCRIPTION
19
20check_openmanage is a plugin for Nagios which checks the hardware
21health of Dell servers running OpenManage Server Administrator
22(OMSA). The plugin checks the health of the storage subsystem, power
23supplies, memory modules, temperature probes etc., and gives an alert
24if any of the components are faulty or operate outside normal
25parameters.
26
27check_openmanage is designed to be used by either locally (using NRPE
28or similar) or remotely (using SNMP). In either mode, the output is
29(nearly) the same. Note that checking the alert log is not supported
30in SNMP mode.
31
32=head1 GENERAL OPTIONS
33
34=over 4
35
44e6d376 36=item -f, --config I<FILE>
310299f4 37
38Specify a configuration file. For reference on the config file syntax
39and options, consult the L<check_openmanage.conf(5)> manual page.
40
669797e1 41=item -t, --timeout I<SECONDS>
42
43The number of seconds after which the plugin will abort. Default
44timeout is 30 seconds if the option is not present.
45
aca136f2 46=item -p, --perfdata [I<multline> or I<minimal>]
669797e1 47
48Collect performance data. Performance data collected include
c1c1118a 49temperatures (in Celsius) and fan speeds (in rpm). On systems that
aca136f2 50support it, power consumption is also collected (in Watts). This
51option takes one of two arguments, both of which are optional.
52
53If the argument C<minimal> is specified, the plugin will use shorter
54names for the performance data labels, e.g. C<t0> instead of
55C<temp_0_system_board_ambient>. This can be used as a workaround in
56cases where the plugin output needs shortening, for example if the
571024 character limit of NRPE is reached.
669797e1 58
59If given the argument C<multiline>, the plugin will output the
60performance data on multiple lines, for Nagios 3.x and above.
61
a4deb3cc 62=item --legacy-perfdata
63
64With version 3.7.0, performance data output changed. The new format is
65not compatible with the old format. Users who wish to postpone
66switching to the new performance data API may set this option.
67
669797e1 68=item -w, --warning I<STRING> or I<FILE>
69
70Override the machine-default temperature warning thresholds. Syntax is
71C<id1=max[/min],id2=max[/min],...>. The following example sets warning
72limits to max 50C for probe 0, and max 45C and min 10C for probe 1:
73
74check_openmanage -w 0=50,1=45/10
75
76The minimum limit can be omitted, if desired. Most often, you are only
77interested in setting the maximum thresholds.
78
79This parameter can be either a string with the limits, or a file
80containing the limits string. The option can be specified multiple
81times.
82
b0f29cfc 83NOTE: This option should only be used to narrow the field of OK
84temperatures wrt. the OMSA defaults. To expand the field of OK
85temperatures, increase the OMSA thresholds. See the plugin web page
86for more information.
87
669797e1 88=item -c, --critical I<STRING> or I<FILE>
89
90Override the machine-default temperature critical thresholds. Syntax
91and behaviour is the same as for warning thresholds described above.
92
0b171014 93=item -F, --fahrenheit
94
95Set Fahrenheit as unit for all temperatures. This option will override
96the C<--tempunit> option, if used simultaneously.
97
98=item --tempunit I<CHAR>
99
100Set temperature unit. Legal values are C<F> for Fahrenheit, C<C> for
101Celsius, C<K> for Kelvin and C<R> for Rankine.
102
669797e1 103=item -o, --ok-info I<NUMBER>
104
105This option lets you define how much output you want the plugin to
106give when everything is OK, i.e. the verbosity level. The default
107value is 0 (one line of output). The output levels are cumulative.
108
109=over 4
110
111=item B<0>
112
113- Only one line (default)
114
115=item B<1>
116
117- BIOS and firmware info on a separate line
118
119=item B<2>
120
121- Storage controller and enclosure info on separate lines
122
123=item B<3>
124
125- OMSA version on separate line
126
127=back
128
129The reason that OMSA version is separated from the rest is that
130finding it requires running a really slow omreport command, when the
131plugin is run locally via NRPE.
132
88f61eb1 133=item -B, --show-blacklist
134
135If used together with blacklisting, this option will make the plugin
136output all blacklistings that are being used. The output will have the
137correct blacklisting syntax, and will make it easy to maintain control
138over which blacklistings that are used for each server, as any
139blacklistings can be viewed from Nagios.
140
141When blacklisting is not used, this option has no effect.
142
71d7d930 143=item --omreport I<OMREPORT PATH>
144
145Specify full path to omreport, if it is not installed in any of the
146regular places. Usually this option is only needed on Windows, if
147omreport is not installed on the C: drive.
148
669797e1 149=item -i, --info
150
151Prefix any alerts with the service tag.
152
153=item -e, --extinfo
154
155Display a short summary of system information (model and service tag)
156in case of an alert.
157
d27881e0 158=item -I, --htmlinfo [I<CODE>]
669797e1 159
160Using this option will make the servicetag and model name into
161clickable HTML links in the output. The model name link will point to
162the official Dell documentation for that model, while the servicetag
163link will point to a website containing support info for that
164particular server.
165
166This option takes an optional argument, which should be your country
167code or C<me> for the middle east. If the country code is omitted the
168servicetag link will still work, but it will not be speficic for your
169country or area. Example for Germany:
170
171 check_openmanage --htmlinfo de
172
173If this option is used together with either the I<--extinfo> or
174I<--info> options, it is particularly useful. Only the most common
175country codes is supported at this time.
176
177=item --postmsg I<STRING> or I<FILE>
178
179User specified post message. Useful for displaying arbitrary or
180various system information at the end of alerts. The argument is
181either a string with the message, or a file containing that
182string. You can control the format with the following interpreted
183sequences:
184
185=over 4
186
187=item B<%m>
188
189System model
190
191=item B<%s>
192
193Service tag
194
195=item B<%b>
196
197BIOS version
198
199=item B<%d>
200
201BIOS release date
202
203=item B<%o>
204
205Operating system name
206
207=item B<%r>
208
209Operating system release
210
211=item B<%p>
212
213Number of physical drives
214
215=item B<%l>
216
217Number of logical drives
218
219=item B<%n>
220
221Line break. Will be a regular line break if run from a TTY, else an
222HTML line break.
223
224=item B<%%>
225
226A literal C<%>
227
228=back
229
230=item -s, --state
231
232Prefix each alert with its corresponding service state (i.e. warning,
233critical etc.). This is useful in case of several alerts from the same
234monitored system.
235
d27881e0 236=item -S, --short-state
669797e1 237
238Same as the B<--state> option above, except that the state is
239abbreviated to a single letter (W=warning, C=critical etc.).
240
fb90e271 241=item --linebreak I<STRING>
669797e1 242
243check_openmanage will sometimes report more than one line, e.g. if
244there are several alerts. If the script has a TTY, it will use regular
245linebreaks. If not (which is the case with NRPE) it will use HTML
246linebreaks. Sometimes it can be useful to control what the plugin uses
247as a line separator, and this option provides that control.
248
249The argument is the exact string to be used as the line
250separator. There are two exceptions, i.e. two keywords that translates
251to the following:
252
253=over 4
254
255=item B<REG>
256
257Regular linebreaks, i.e. "\n".
258
259=item B<HTML>
260
261HTML linebreaks, i.e. "<br/>".
262
263=back
264
265This is a rather special option that is normally not needed. The
266default behaviour should be sufficient for most users.
267
268=item -d, --debug
269
270Debug output. Will report status on everything, even if status is
271ok. Blacklisted or unchecked components are ignored (i.e. no output).
272
273NOTE: This option is intended for diagnostics and debugging purposes
274only. Do not use this option from within Nagios, i.e. in the Nagios
275config.
276
277=item -h, --help
278
279Display help text.
280
281=item -V, --version
282
283Display version info.
284
285=back
286
287=head1 SNMP OPTIONS
288
289=over 4
290
291=item -H, --hostname I<HOSTNAME>
292
293The transport address of the destination SNMP device. Using this
294option triggers SNMP mode.
295
296=item -P, --protocol I<PROTOCOL>
297
298SNMP protocol version. This option is optional and expects a digit
299(i.e. C<1>, C<2> or C<3>) to define the SNMP version. The default is
300C<2>, i.e. SNMP version 2c.
301
302=item -C, --community I<COMMUNITY>
303
304This option expects a string that is to be used as the SNMP community
305name when using SNMP version 1 or 2c. By default the community name
306is set to C<public> if the option is not present.
307
308=item --port I<PORT>
309
310SNMP port of the remote (monitored) system. Defaults to the well-known
311SNMP port 161.
312
8e4a6325 313=item -6, --ipv6
314
cf2df3b9 315This option will cause the plugin to use IPv6. The default is IPv4 if
316the option is not present.
8e4a6325 317
318=item --tcp
319
cf2df3b9 320This option will cause the plugin to use TCP as transport
321protocol. The default is UDP if the option is not present.
8e4a6325 322
669797e1 323=item -U, --username I<SECURITYNAME>
324
325[SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires
326that a securityName be specified. This option is required when using
327SNMP version 3, and expects a string 1 to 32 octets in lenght.
328
329=item --authpassword I<PASSWORD>, --authkey I<KEY>
330
331[SNMPv3] By default a securityLevel of C<noAuthNoPriv> is assumed. If
332the --authpassword option is specified, the securityLevel becomes
333C<authNoPriv>. The --authpassword option expects a string which is at
334least 1 octet in length as argument.
335
336Optionally, instead of the --authpassword option, the --authkey option
337can be used so that a plain text password does not have to be
338specified in a script. The --authkey option expects a hexadecimal
339string produced by localizing the password with the
340authoritativeEngineID for the specific destination device. The
341C<snmpkey> utility included with the Net::SNMP distribution can be
342used to create the hexadecimal string (see L<snmpkey>).
343
344=item --authprotocol I<ALGORITHM>
345
346[SNMPv3] Two different hash algorithms are defined by SNMPv3 which can
347be used by the Security Model for authentication. These algorithms are
348HMAC-MD5-96 C<MD5> (RFC 1321) and HMAC-SHA-96 C<SHA-1> (NIST FIPS PUB
349180-1). The default algorithm used by the plugin is HMAC-MD5-96. This
350behavior can be changed by using this option. The option expects
351either the string C<md5> or C<sha> to be passed as argument to modify
352the hash algorithm.
353
354=item --privpassword I<PASSWORD>, --privkey I<KEY>
355
356[SNMPv3] By specifying the options --privkey or --privpassword, the
357securityLevel associated with the object becomes
358C<authPriv>. According to SNMPv3, privacy requires the use of
359authentication. Therefore, if either of these two options are present
360and the --authkey or --authpassword arguments are missing, the
361creation of the object fails. The --privkey and --privpassword
362options expect the same input as the --authkey and --authpassword
363options respectively.
364
365=item --privprotocol I<ALGORITHM>
366
367[SNMPv3] The User-based Security Model described in RFC 3414 defines a
368single encryption protocol to be used for privacy. This protocol,
369CBC-DES C<DES> (NIST FIPS PUB 46-1), is used by default or if the
370string C<des> is passed to the --privprotocol option. The Net::SNMP
371module also supports RFC 3826 which describes the use of
372CFB128-AES-128 C<AES> (NIST FIPS PUB 197) in the USM. The AES
373encryption protocol can be selected by passing C<aes> or C<aes128> to
374the --privprotocol option.
375
376One of the following arguments are required: des, aes, aes128, 3des,
3773desde
378
606e084f 379=item --use-get_table
380
381This option exists as a workaround when using check_openmanage with
382SNMPv3 on Windows with net-snmp. Using this option will make
383check_openmanage use the Net::SNMP function get_table() instead of
384get_entries() while fetching values via SNMP. The latter is faster and
385is the default.
386
669797e1 387=back
388
389=head1 BLACKLISTING
390
391=over 4
392
393=item -b, --blacklist I<STRING> or I<FILE>
394
395Blacklist missing and/or failed components, if you do not plan to fix
396them. The parameter is either the blacklist string, or a file (that
397may or may not exist) containing the string. The blacklist string
398contains component names with component IDs separated by slash
399(/). Blacklisted components are left unchecked.
400
401TIP: Use the option C<-d> (or C<--debug>) to get the blacklist ID for
402devices. The ID is listed in a separate column in the debug output.
403
0b6ba9c9 404NOTE: If blacklisting is in effect, the global health of the system is
405not checked.
669797e1 406
407=over 9
408
409=item B<Syntax:>
410
411component1=id1[,id2,...]/component2=id1[,id2,...]/...
412
02bf599a 413The ID part can also be C<all>, in which all components of that type
0b6ba9c9 414is blacklisted.
415
669797e1 416=item B<Example:>
417
02bf599a 418check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
669797e1 419
420=back
421
0b6ba9c9 422In the example we blacklist powersupply 0, fans 3 and 5, physical disk
4231:0:0:1, and warnings about out-of-date drivers for all
424controllers. Legal component names include:
669797e1 425
426=over 8
427
428=item B<ctrl>
429
0b6ba9c9 430Storage controller. Note that if a controller is blacklisted, all
431components on that controller (such as physical and logical drives)
432are blacklisted as well.
669797e1 433
434=item B<ctrl_fw>
435
436Suppress the special warning message about old controller
437firmware. Use this if you can not or will not upgrade the firmware.
438
439=item B<ctrl_driver>
440
441Suppress the special warning message about old controller driver.
442Particularly useful on systems where you can not upgrade the driver.
443
8dd8083c 444=item B<ctrl_stdr>
445
446Suppress the special warning message about old Storport driver on
447Windows.
448
d27881e0 449=item B<ctrl_pdisk>
450
451This blacklisting keyword exists as a possible workaround for physical
452drives with bad firmware which makes Openmanage choke. It takes the
453controller number as argument. Use this option to blacklist all
454physical drives on a specific controller. This blacklisting keyword is
455only available in local mode, i.e. not with SNMP.
456
669797e1 457=item B<pdisk>
458
459Physical disk.
460
b17cf22e 461=item B<pdisk_cert>
462
463Suppress warning message about non-certified physical disk.
464
669797e1 465=item B<vdisk>
466
467Logical drive (virtual disk)
468
469=item B<bat>
470
471Controller cache battery
472
7b02bc55 473=item B<bat_charge>
474
475Ignore warnings related to the controller cache battery charging
7031b02a 476cycle, which happens approximately every 40 days on Dell servers. Note
477that using this blacklist keyword makes check_openmanage ignore
478non-critical cache battery errors.
7b02bc55 479
669797e1 480=item B<conn>
481
482Connector (channel)
483
484=item B<encl>
485
486Enclosure
487
488=item B<encl_fan>
489
490Enclosure fan
491
492=item B<encl_ps>
493
494Enclosure power supply
495
496=item B<encl_temp>
497
498Enclosure temperature probe
499
500=item B<encl_emm>
501
502Enclosure management module (EMM)
503
504=item B<dimm>
505
506Memory module
507
508=item B<fan>
509
510Fan
511
512=item B<ps>
513
514Powersupply
515
516=item B<temp>
517
518Temperature sensor
519
520=item B<cpu>
521
522Processor (CPU)
523
524=item B<volt>
525
526Voltage probe
527
528=item B<bp>
529
530System battery
531
600bd61b 532=item B<amp>
669797e1 533
534Amperage probe (power consumption monitoring)
535
536=item B<intr>
537
538Intrusion sensor
539
92083947 540=item B<sd>
541
542SD card
543
669797e1 544=back
545
546=back
547
548=head1 CHECK CONTROL
549
550=over 4
551
50cf4d78 552=item --no-storage
553
554Turn off storage checking. This is an alias for C<--check storage=0>.
555
669797e1 556=item --only I<KEYWORD>
557
558This option can be specifed once and expects a keyword. The different
559keywords and the behaviour of check_openmanage is described below.
560
561=over 4
562
563=item B<critical>
564
565Print only critical alerts. With this option any warning alerts are
566suppressed.
567
568=item B<warning>
569
570Print only warning alerts. With this option any critical alerts are
571suppressed.
572
573=item B<chassis>
574
575Check all chassis components and nothing else.
576
577=item B<storage>
578
579Only check storage
580
581=item B<memory>
582
583Only check memory modules
584
585=item B<fans>
586
587Only check fans
588
589=item B<power>
590
591Only check power supplies
592
593=item B<temp>
594
595Only check temperatures
596
597=item B<cpu>
598
599Only check processors
600
601=item B<voltage>
602
603Only check voltage probes
604
605=item B<batteries>
606
607Only check batteries
608
609=item B<amperage>
610
611Only check power usage
612
613=item B<intrusion>
614
615Only check chassis intrusion
616
92083947 617=item B<sdcard>
618
619Only check SD cards
620
669797e1 621=item B<esmhealth>
622
623Only check ESM log overall health, i.e. fill grade
624
625=item B<esmlog>
626
627Only check the event log (ESM) content
628
629=item B<alertlog>
630
631Only check the alert log content
632
633=back
634
635=item --check I<STRING> or I<FILE>
636
637This parameter allows you to adjust which components that should be
638checked at all. This is a rougher approach than blacklisting, which
639require that you specify component id or index. The parameter should
640be either a string containing the adjustments, or a file containing
641the string. No errors are raised if the file does not exist.
642
643Note: This option is ignored with alternate basenames.
644
645=over 9
646
647=item B<Example:>
648
649check_openmanage --check storage=0,intrusion=1
650
651=back
652
653Legal values are described below, along with the default value.
654
655=over 4
656
657=item B<storage>
658
659Check storage subsystem (controllers, disks etc.). Default: ON
660
661=item B<memory>
662
663Check memory (dimms). Default: ON
664
665=item B<fans>
666
667Check chassis fans. Default: ON
668
669=item B<power>
670
671Check power supplies. Default: ON
672
673=item B<temp>
674
675Check temperature sensors. Default: ON
676
677=item B<cpu>
678
679Check CPUs. Default: ON
680
681=item B<voltage>
682
683Check voltage sensors. Default: ON
684
685=item B<batteries>
686
687Check system batteries. Default: ON
688
689=item B<amperage>
690
691Check amperage probes. Default: ON
692
693=item B<intrusion>
694
695Check chassis intrusion. Default: ON
696
92083947 697=item B<sdcard>
698
699Check SD cards. Default: ON
700
669797e1 701=item B<esmhealth>
702
703Check the ESM log health, i.e. fill grade. Default: ON
704
705=item B<esmlog>
706
707Check the ESM log content. Default: OFF
708
709=item B<alertlog>
710
711Check the alert log content. Default: OFF
712
713=back
714
715=back
716
717=head1 DIAGNOSTICS
718
719The option C<--debug> (or C<-d>) can be specified to display all
720monitored components.
721
722=head1 DEPENDENCIES
723
724If SNMP is requested, the perl module Net::SNMP is
725required. Otherwise, only a regular perl distribution is required to
726run the script. On the target (monitored) system, Dell Openmanage
727Server Administrator (OMSA) must be installed and running.
728
729=head1 EXIT STATUS
730
731If no errors are discovered, a value of 0 (OK) is returned. An exit
732value of 1 (WARNING) signifies one or more non-critical errors, while
7332 (CRITICAL) signifies one or more critical errors.
734
735The exit value 3 (UNKNOWN) is reserved for errors within the script,
736or errors getting values from Dell OMSA.
737
738=head1 AUTHOR
739
740Written by Trond H. Amundsen <t.h.amundsen@usit.uio.no>
741
742=head1 BUGS AND LIMITATIONS
743
744Storage info is not collected or checked on very old PowerEdge models
745and/or old OMSA versions, due to limitations in OMSA. The overall
746support on those models/versions by this plugin is not well tested.
747
748=head1 INCOMPATIBILITIES
749
750The plugin should work with the Nagios embedded perl interpreter
751(ePN). However, this is not thoroughly tested.
752
753=head1 REPORTING BUGS
754
755Report bugs to <t.h.amundsen@usit.uio.no>
756
757=head1 LICENSE AND COPYRIGHT
758
759This program is free software: you can redistribute it and/or modify
760it under the terms of the GNU General Public License as published by
761the Free Software Foundation, either version 3 of the License, or (at
762your option) any later version.
763
764This program is distributed in the hope that it will be useful, but
765WITHOUT ANY WARRANTY; without even the implied warranty of
766MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
767General Public License for more details.
768
769You should have received a copy of the GNU General Public License
770along with this program. If not, see L<http://www.gnu.org/licenses/>.
771
772=head1 SEE ALSO
773
a7da681c 774L<check_openmanage.conf(5)>
669797e1 775L<http://folk.uio.no/trondham/software/check_openmanage.html>
776
777=cut