]> git.uio.no Git - check_openmanage.git/blame - check_openmanage.pod
3.7.2
[check_openmanage.git] / check_openmanage.pod
CommitLineData
669797e1 1# Man page created with:
2#
b53ed7ea 3# pod2man -s 8 -r "`./check_openmanage -V | head -n 1`" -c 'Nagios plugin' check_openmanage.pod check_openmanage.8
669797e1 4#
5# $Id$
6
7=head1 NAME
8
9check_openmanage - Nagios plugin for checking the hardware status on
10 Dell servers running OpenManage
11
12=head1 SYNOPSIS
13
14check_openmanage [I<OPTION>]...
b0f29cfc 15
6a3615ec 16check_openmanage -H I<hostname> [I<OPTION>]...
669797e1 17
18=head1 DESCRIPTION
19
20check_openmanage is a plugin for Nagios which checks the hardware
21health of Dell servers running OpenManage Server Administrator
22(OMSA). The plugin checks the health of the storage subsystem, power
23supplies, memory modules, temperature probes etc., and gives an alert
24if any of the components are faulty or operate outside normal
25parameters.
26
27check_openmanage is designed to be used by either locally (using NRPE
28or similar) or remotely (using SNMP). In either mode, the output is
29(nearly) the same. Note that checking the alert log is not supported
30in SNMP mode.
31
32=head1 GENERAL OPTIONS
33
34=over 4
35
44e6d376 36=item -f, --config I<FILE>
310299f4 37
38Specify a configuration file. For reference on the config file syntax
39and options, consult the L<check_openmanage.conf(5)> manual page.
40
669797e1 41=item -t, --timeout I<SECONDS>
42
43The number of seconds after which the plugin will abort. Default
44timeout is 30 seconds if the option is not present.
45
aca136f2 46=item -p, --perfdata [I<multline> or I<minimal>]
669797e1 47
48Collect performance data. Performance data collected include
c1c1118a 49temperatures (in Celsius) and fan speeds (in rpm). On systems that
aca136f2 50support it, power consumption is also collected (in Watts). This
51option takes one of two arguments, both of which are optional.
52
53If the argument C<minimal> is specified, the plugin will use shorter
54names for the performance data labels, e.g. C<t0> instead of
55C<temp_0_system_board_ambient>. This can be used as a workaround in
56cases where the plugin output needs shortening, for example if the
571024 character limit of NRPE is reached.
669797e1 58
59If given the argument C<multiline>, the plugin will output the
60performance data on multiple lines, for Nagios 3.x and above.
61
a4deb3cc 62=item --legacy-perfdata
63
64With version 3.7.0, performance data output changed. The new format is
65not compatible with the old format. Users who wish to postpone
66switching to the new performance data API may set this option.
67
669797e1 68=item -w, --warning I<STRING> or I<FILE>
69
70Override the machine-default temperature warning thresholds. Syntax is
71C<id1=max[/min],id2=max[/min],...>. The following example sets warning
72limits to max 50C for probe 0, and max 45C and min 10C for probe 1:
73
74check_openmanage -w 0=50,1=45/10
75
76The minimum limit can be omitted, if desired. Most often, you are only
77interested in setting the maximum thresholds.
78
79This parameter can be either a string with the limits, or a file
80containing the limits string. The option can be specified multiple
81times.
82
b0f29cfc 83NOTE: This option should only be used to narrow the field of OK
84temperatures wrt. the OMSA defaults. To expand the field of OK
85temperatures, increase the OMSA thresholds. See the plugin web page
86for more information.
87
669797e1 88=item -c, --critical I<STRING> or I<FILE>
89
90Override the machine-default temperature critical thresholds. Syntax
91and behaviour is the same as for warning thresholds described above.
92
0b171014 93=item -F, --fahrenheit
94
95Set Fahrenheit as unit for all temperatures. This option will override
96the C<--tempunit> option, if used simultaneously.
97
98=item --tempunit I<CHAR>
99
100Set temperature unit. Legal values are C<F> for Fahrenheit, C<C> for
101Celsius, C<K> for Kelvin and C<R> for Rankine.
102
669797e1 103=item -o, --ok-info I<NUMBER>
104
105This option lets you define how much output you want the plugin to
106give when everything is OK, i.e. the verbosity level. The default
107value is 0 (one line of output). The output levels are cumulative.
108
109=over 4
110
111=item B<0>
112
113- Only one line (default)
114
115=item B<1>
116
117- BIOS and firmware info on a separate line
118
119=item B<2>
120
121- Storage controller and enclosure info on separate lines
122
123=item B<3>
124
125- OMSA version on separate line
126
127=back
128
129The reason that OMSA version is separated from the rest is that
130finding it requires running a really slow omreport command, when the
131plugin is run locally via NRPE.
132
88f61eb1 133=item -B, --show-blacklist
134
135If used together with blacklisting, this option will make the plugin
136output all blacklistings that are being used. The output will have the
137correct blacklisting syntax, and will make it easy to maintain control
138over which blacklistings that are used for each server, as any
139blacklistings can be viewed from Nagios.
140
141When blacklisting is not used, this option has no effect.
142
71d7d930 143=item --omreport I<OMREPORT PATH>
144
145Specify full path to omreport, if it is not installed in any of the
146regular places. Usually this option is only needed on Windows, if
147omreport is not installed on the C: drive.
148
669797e1 149=item -i, --info
150
151Prefix any alerts with the service tag.
152
153=item -e, --extinfo
154
155Display a short summary of system information (model and service tag)
156in case of an alert.
157
d27881e0 158=item -I, --htmlinfo [I<CODE>]
669797e1 159
160Using this option will make the servicetag and model name into
161clickable HTML links in the output. The model name link will point to
162the official Dell documentation for that model, while the servicetag
163link will point to a website containing support info for that
164particular server.
165
166This option takes an optional argument, which should be your country
167code or C<me> for the middle east. If the country code is omitted the
168servicetag link will still work, but it will not be speficic for your
169country or area. Example for Germany:
170
171 check_openmanage --htmlinfo de
172
173If this option is used together with either the I<--extinfo> or
174I<--info> options, it is particularly useful. Only the most common
175country codes is supported at this time.
176
177=item --postmsg I<STRING> or I<FILE>
178
179User specified post message. Useful for displaying arbitrary or
180various system information at the end of alerts. The argument is
181either a string with the message, or a file containing that
182string. You can control the format with the following interpreted
183sequences:
184
185=over 4
186
187=item B<%m>
188
189System model
190
191=item B<%s>
192
193Service tag
194
195=item B<%b>
196
197BIOS version
198
199=item B<%d>
200
201BIOS release date
202
203=item B<%o>
204
205Operating system name
206
207=item B<%r>
208
209Operating system release
210
211=item B<%p>
212
213Number of physical drives
214
215=item B<%l>
216
217Number of logical drives
218
219=item B<%n>
220
221Line break. Will be a regular line break if run from a TTY, else an
222HTML line break.
223
224=item B<%%>
225
226A literal C<%>
227
228=back
229
230=item -s, --state
231
232Prefix each alert with its corresponding service state (i.e. warning,
233critical etc.). This is useful in case of several alerts from the same
234monitored system.
235
d27881e0 236=item -S, --short-state
669797e1 237
238Same as the B<--state> option above, except that the state is
239abbreviated to a single letter (W=warning, C=critical etc.).
240
8982ff2b 241=item --hide-servicetag
242
243This option will replace the servicetag (serial number) in the output
244with C<XXXXXXX>. Use this option to suppress or censor the servicetag
245in the plugin output.
246
fb90e271 247=item --linebreak I<STRING>
669797e1 248
249check_openmanage will sometimes report more than one line, e.g. if
250there are several alerts. If the script has a TTY, it will use regular
251linebreaks. If not (which is the case with NRPE) it will use HTML
252linebreaks. Sometimes it can be useful to control what the plugin uses
253as a line separator, and this option provides that control.
254
255The argument is the exact string to be used as the line
256separator. There are two exceptions, i.e. two keywords that translates
257to the following:
258
259=over 4
260
261=item B<REG>
262
263Regular linebreaks, i.e. "\n".
264
265=item B<HTML>
266
267HTML linebreaks, i.e. "<br/>".
268
269=back
270
271This is a rather special option that is normally not needed. The
272default behaviour should be sufficient for most users.
273
274=item -d, --debug
275
276Debug output. Will report status on everything, even if status is
277ok. Blacklisted or unchecked components are ignored (i.e. no output).
278
279NOTE: This option is intended for diagnostics and debugging purposes
280only. Do not use this option from within Nagios, i.e. in the Nagios
281config.
282
283=item -h, --help
284
285Display help text.
286
287=item -V, --version
288
289Display version info.
290
291=back
292
293=head1 SNMP OPTIONS
294
295=over 4
296
297=item -H, --hostname I<HOSTNAME>
298
299The transport address of the destination SNMP device. Using this
300option triggers SNMP mode.
301
302=item -P, --protocol I<PROTOCOL>
303
304SNMP protocol version. This option is optional and expects a digit
305(i.e. C<1>, C<2> or C<3>) to define the SNMP version. The default is
306C<2>, i.e. SNMP version 2c.
307
308=item -C, --community I<COMMUNITY>
309
310This option expects a string that is to be used as the SNMP community
311name when using SNMP version 1 or 2c. By default the community name
312is set to C<public> if the option is not present.
313
314=item --port I<PORT>
315
316SNMP port of the remote (monitored) system. Defaults to the well-known
317SNMP port 161.
318
8e4a6325 319=item -6, --ipv6
320
cf2df3b9 321This option will cause the plugin to use IPv6. The default is IPv4 if
322the option is not present.
8e4a6325 323
324=item --tcp
325
cf2df3b9 326This option will cause the plugin to use TCP as transport
327protocol. The default is UDP if the option is not present.
8e4a6325 328
669797e1 329=item -U, --username I<SECURITYNAME>
330
331[SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires
332that a securityName be specified. This option is required when using
333SNMP version 3, and expects a string 1 to 32 octets in lenght.
334
335=item --authpassword I<PASSWORD>, --authkey I<KEY>
336
337[SNMPv3] By default a securityLevel of C<noAuthNoPriv> is assumed. If
338the --authpassword option is specified, the securityLevel becomes
339C<authNoPriv>. The --authpassword option expects a string which is at
340least 1 octet in length as argument.
341
342Optionally, instead of the --authpassword option, the --authkey option
343can be used so that a plain text password does not have to be
344specified in a script. The --authkey option expects a hexadecimal
345string produced by localizing the password with the
346authoritativeEngineID for the specific destination device. The
347C<snmpkey> utility included with the Net::SNMP distribution can be
348used to create the hexadecimal string (see L<snmpkey>).
349
350=item --authprotocol I<ALGORITHM>
351
352[SNMPv3] Two different hash algorithms are defined by SNMPv3 which can
353be used by the Security Model for authentication. These algorithms are
354HMAC-MD5-96 C<MD5> (RFC 1321) and HMAC-SHA-96 C<SHA-1> (NIST FIPS PUB
355180-1). The default algorithm used by the plugin is HMAC-MD5-96. This
356behavior can be changed by using this option. The option expects
357either the string C<md5> or C<sha> to be passed as argument to modify
358the hash algorithm.
359
360=item --privpassword I<PASSWORD>, --privkey I<KEY>
361
362[SNMPv3] By specifying the options --privkey or --privpassword, the
363securityLevel associated with the object becomes
364C<authPriv>. According to SNMPv3, privacy requires the use of
365authentication. Therefore, if either of these two options are present
366and the --authkey or --authpassword arguments are missing, the
367creation of the object fails. The --privkey and --privpassword
368options expect the same input as the --authkey and --authpassword
369options respectively.
370
371=item --privprotocol I<ALGORITHM>
372
373[SNMPv3] The User-based Security Model described in RFC 3414 defines a
374single encryption protocol to be used for privacy. This protocol,
375CBC-DES C<DES> (NIST FIPS PUB 46-1), is used by default or if the
376string C<des> is passed to the --privprotocol option. The Net::SNMP
377module also supports RFC 3826 which describes the use of
378CFB128-AES-128 C<AES> (NIST FIPS PUB 197) in the USM. The AES
379encryption protocol can be selected by passing C<aes> or C<aes128> to
380the --privprotocol option.
381
382One of the following arguments are required: des, aes, aes128, 3des,
3833desde
384
606e084f 385=item --use-get_table
386
387This option exists as a workaround when using check_openmanage with
388SNMPv3 on Windows with net-snmp. Using this option will make
389check_openmanage use the Net::SNMP function get_table() instead of
390get_entries() while fetching values via SNMP. The latter is faster and
391is the default.
392
669797e1 393=back
394
395=head1 BLACKLISTING
396
397=over 4
398
399=item -b, --blacklist I<STRING> or I<FILE>
400
401Blacklist missing and/or failed components, if you do not plan to fix
402them. The parameter is either the blacklist string, or a file (that
403may or may not exist) containing the string. The blacklist string
404contains component names with component IDs separated by slash
405(/). Blacklisted components are left unchecked.
406
407TIP: Use the option C<-d> (or C<--debug>) to get the blacklist ID for
408devices. The ID is listed in a separate column in the debug output.
409
0b6ba9c9 410NOTE: If blacklisting is in effect, the global health of the system is
411not checked.
669797e1 412
413=over 9
414
415=item B<Syntax:>
416
417component1=id1[,id2,...]/component2=id1[,id2,...]/...
418
02bf599a 419The ID part can also be C<all>, in which all components of that type
0b6ba9c9 420is blacklisted.
421
669797e1 422=item B<Example:>
423
02bf599a 424check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
669797e1 425
426=back
427
0b6ba9c9 428In the example we blacklist powersupply 0, fans 3 and 5, physical disk
4291:0:0:1, and warnings about out-of-date drivers for all
430controllers. Legal component names include:
669797e1 431
432=over 8
433
434=item B<ctrl>
435
0b6ba9c9 436Storage controller. Note that if a controller is blacklisted, all
437components on that controller (such as physical and logical drives)
438are blacklisted as well.
669797e1 439
440=item B<ctrl_fw>
441
442Suppress the special warning message about old controller
443firmware. Use this if you can not or will not upgrade the firmware.
444
445=item B<ctrl_driver>
446
447Suppress the special warning message about old controller driver.
448Particularly useful on systems where you can not upgrade the driver.
449
8dd8083c 450=item B<ctrl_stdr>
451
452Suppress the special warning message about old Storport driver on
453Windows.
454
d27881e0 455=item B<ctrl_pdisk>
456
457This blacklisting keyword exists as a possible workaround for physical
458drives with bad firmware which makes Openmanage choke. It takes the
459controller number as argument. Use this option to blacklist all
460physical drives on a specific controller. This blacklisting keyword is
461only available in local mode, i.e. not with SNMP.
462
669797e1 463=item B<pdisk>
464
465Physical disk.
466
b17cf22e 467=item B<pdisk_cert>
468
469Suppress warning message about non-certified physical disk.
470
f9da73db 471=item B<pdisk_foreign>
472
473Suppress warning message about foreign physical disk.
474
669797e1 475=item B<vdisk>
476
477Logical drive (virtual disk)
478
479=item B<bat>
480
481Controller cache battery
482
7b02bc55 483=item B<bat_charge>
484
485Ignore warnings related to the controller cache battery charging
7031b02a 486cycle, which happens approximately every 40 days on Dell servers. Note
487that using this blacklist keyword makes check_openmanage ignore
488non-critical cache battery errors.
7b02bc55 489
669797e1 490=item B<conn>
491
492Connector (channel)
493
494=item B<encl>
495
496Enclosure
497
498=item B<encl_fan>
499
500Enclosure fan
501
502=item B<encl_ps>
503
504Enclosure power supply
505
506=item B<encl_temp>
507
508Enclosure temperature probe
509
510=item B<encl_emm>
511
512Enclosure management module (EMM)
513
514=item B<dimm>
515
516Memory module
517
518=item B<fan>
519
520Fan
521
522=item B<ps>
523
524Powersupply
525
526=item B<temp>
527
528Temperature sensor
529
530=item B<cpu>
531
532Processor (CPU)
533
534=item B<volt>
535
536Voltage probe
537
538=item B<bp>
539
540System battery
541
600bd61b 542=item B<amp>
669797e1 543
544Amperage probe (power consumption monitoring)
545
546=item B<intr>
547
548Intrusion sensor
549
92083947 550=item B<sd>
551
552SD card
553
669797e1 554=back
555
556=back
557
558=head1 CHECK CONTROL
559
560=over 4
561
50cf4d78 562=item --no-storage
563
564Turn off storage checking. This is an alias for C<--check storage=0>.
565
669797e1 566=item --only I<KEYWORD>
567
568This option can be specifed once and expects a keyword. The different
569keywords and the behaviour of check_openmanage is described below.
570
571=over 4
572
573=item B<critical>
574
575Print only critical alerts. With this option any warning alerts are
576suppressed.
577
578=item B<warning>
579
580Print only warning alerts. With this option any critical alerts are
581suppressed.
582
583=item B<chassis>
584
585Check all chassis components and nothing else.
586
587=item B<storage>
588
589Only check storage
590
591=item B<memory>
592
593Only check memory modules
594
595=item B<fans>
596
597Only check fans
598
599=item B<power>
600
601Only check power supplies
602
603=item B<temp>
604
605Only check temperatures
606
607=item B<cpu>
608
609Only check processors
610
611=item B<voltage>
612
613Only check voltage probes
614
615=item B<batteries>
616
617Only check batteries
618
619=item B<amperage>
620
621Only check power usage
622
623=item B<intrusion>
624
625Only check chassis intrusion
626
92083947 627=item B<sdcard>
628
629Only check SD cards
630
669797e1 631=item B<esmhealth>
632
633Only check ESM log overall health, i.e. fill grade
634
635=item B<esmlog>
636
637Only check the event log (ESM) content
638
639=item B<alertlog>
640
641Only check the alert log content
642
643=back
644
645=item --check I<STRING> or I<FILE>
646
647This parameter allows you to adjust which components that should be
648checked at all. This is a rougher approach than blacklisting, which
649require that you specify component id or index. The parameter should
650be either a string containing the adjustments, or a file containing
651the string. No errors are raised if the file does not exist.
652
653Note: This option is ignored with alternate basenames.
654
655=over 9
656
657=item B<Example:>
658
659check_openmanage --check storage=0,intrusion=1
660
661=back
662
663Legal values are described below, along with the default value.
664
665=over 4
666
667=item B<storage>
668
669Check storage subsystem (controllers, disks etc.). Default: ON
670
671=item B<memory>
672
673Check memory (dimms). Default: ON
674
675=item B<fans>
676
677Check chassis fans. Default: ON
678
679=item B<power>
680
681Check power supplies. Default: ON
682
683=item B<temp>
684
685Check temperature sensors. Default: ON
686
687=item B<cpu>
688
689Check CPUs. Default: ON
690
691=item B<voltage>
692
693Check voltage sensors. Default: ON
694
695=item B<batteries>
696
697Check system batteries. Default: ON
698
699=item B<amperage>
700
701Check amperage probes. Default: ON
702
703=item B<intrusion>
704
705Check chassis intrusion. Default: ON
706
92083947 707=item B<sdcard>
708
709Check SD cards. Default: ON
710
669797e1 711=item B<esmhealth>
712
713Check the ESM log health, i.e. fill grade. Default: ON
714
715=item B<esmlog>
716
717Check the ESM log content. Default: OFF
718
719=item B<alertlog>
720
721Check the alert log content. Default: OFF
722
723=back
724
725=back
726
727=head1 DIAGNOSTICS
728
729The option C<--debug> (or C<-d>) can be specified to display all
730monitored components.
731
732=head1 DEPENDENCIES
733
734If SNMP is requested, the perl module Net::SNMP is
735required. Otherwise, only a regular perl distribution is required to
736run the script. On the target (monitored) system, Dell Openmanage
737Server Administrator (OMSA) must be installed and running.
738
739=head1 EXIT STATUS
740
741If no errors are discovered, a value of 0 (OK) is returned. An exit
742value of 1 (WARNING) signifies one or more non-critical errors, while
7432 (CRITICAL) signifies one or more critical errors.
744
745The exit value 3 (UNKNOWN) is reserved for errors within the script,
746or errors getting values from Dell OMSA.
747
748=head1 AUTHOR
749
750Written by Trond H. Amundsen <t.h.amundsen@usit.uio.no>
751
752=head1 BUGS AND LIMITATIONS
753
754Storage info is not collected or checked on very old PowerEdge models
755and/or old OMSA versions, due to limitations in OMSA. The overall
756support on those models/versions by this plugin is not well tested.
757
758=head1 INCOMPATIBILITIES
759
760The plugin should work with the Nagios embedded perl interpreter
761(ePN). However, this is not thoroughly tested.
762
763=head1 REPORTING BUGS
764
765Report bugs to <t.h.amundsen@usit.uio.no>
766
767=head1 LICENSE AND COPYRIGHT
768
769This program is free software: you can redistribute it and/or modify
770it under the terms of the GNU General Public License as published by
771the Free Software Foundation, either version 3 of the License, or (at
772your option) any later version.
773
774This program is distributed in the hope that it will be useful, but
775WITHOUT ANY WARRANTY; without even the implied warranty of
776MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
777General Public License for more details.
778
779You should have received a copy of the GNU General Public License
780along with this program. If not, see L<http://www.gnu.org/licenses/>.
781
782=head1 SEE ALSO
783
a7da681c 784L<check_openmanage.conf(5)>
669797e1 785L<http://folk.uio.no/trondham/software/check_openmanage.html>
786
787=cut