]> git.uio.no Git - check_openmanage.git/blame - check_openmanage.pod
* version 3.7.0-beta4
[check_openmanage.git] / check_openmanage.pod
CommitLineData
669797e1 1# Man page created with:
2#
b53ed7ea 3# pod2man -s 8 -r "`./check_openmanage -V | head -n 1`" -c 'Nagios plugin' check_openmanage.pod check_openmanage.8
669797e1 4#
5# $Id$
6
7=head1 NAME
8
9check_openmanage - Nagios plugin for checking the hardware status on
10 Dell servers running OpenManage
11
12=head1 SYNOPSIS
13
14check_openmanage [I<OPTION>]...
b0f29cfc 15
6a3615ec 16check_openmanage -H I<hostname> [I<OPTION>]...
669797e1 17
18=head1 DESCRIPTION
19
20check_openmanage is a plugin for Nagios which checks the hardware
21health of Dell servers running OpenManage Server Administrator
22(OMSA). The plugin checks the health of the storage subsystem, power
23supplies, memory modules, temperature probes etc., and gives an alert
24if any of the components are faulty or operate outside normal
25parameters.
26
27check_openmanage is designed to be used by either locally (using NRPE
28or similar) or remotely (using SNMP). In either mode, the output is
29(nearly) the same. Note that checking the alert log is not supported
30in SNMP mode.
31
32=head1 GENERAL OPTIONS
33
34=over 4
35
310299f4 36=item -f, --configfile I<FILE>
37
38Specify a configuration file. For reference on the config file syntax
39and options, consult the L<check_openmanage.conf(5)> manual page.
40
669797e1 41=item -t, --timeout I<SECONDS>
42
43The number of seconds after which the plugin will abort. Default
44timeout is 30 seconds if the option is not present.
45
aca136f2 46=item -p, --perfdata [I<multline> or I<minimal>]
669797e1 47
48Collect performance data. Performance data collected include
c1c1118a 49temperatures (in Celsius) and fan speeds (in rpm). On systems that
aca136f2 50support it, power consumption is also collected (in Watts). This
51option takes one of two arguments, both of which are optional.
52
53If the argument C<minimal> is specified, the plugin will use shorter
54names for the performance data labels, e.g. C<t0> instead of
55C<temp_0_system_board_ambient>. This can be used as a workaround in
56cases where the plugin output needs shortening, for example if the
571024 character limit of NRPE is reached.
669797e1 58
59If given the argument C<multiline>, the plugin will output the
60performance data on multiple lines, for Nagios 3.x and above.
61
62=item -w, --warning I<STRING> or I<FILE>
63
64Override the machine-default temperature warning thresholds. Syntax is
65C<id1=max[/min],id2=max[/min],...>. The following example sets warning
66limits to max 50C for probe 0, and max 45C and min 10C for probe 1:
67
68check_openmanage -w 0=50,1=45/10
69
70The minimum limit can be omitted, if desired. Most often, you are only
71interested in setting the maximum thresholds.
72
73This parameter can be either a string with the limits, or a file
74containing the limits string. The option can be specified multiple
75times.
76
b0f29cfc 77NOTE: This option should only be used to narrow the field of OK
78temperatures wrt. the OMSA defaults. To expand the field of OK
79temperatures, increase the OMSA thresholds. See the plugin web page
80for more information.
81
669797e1 82=item -c, --critical I<STRING> or I<FILE>
83
84Override the machine-default temperature critical thresholds. Syntax
85and behaviour is the same as for warning thresholds described above.
86
87=item -o, --ok-info I<NUMBER>
88
89This option lets you define how much output you want the plugin to
90give when everything is OK, i.e. the verbosity level. The default
91value is 0 (one line of output). The output levels are cumulative.
92
93=over 4
94
95=item B<0>
96
97- Only one line (default)
98
99=item B<1>
100
101- BIOS and firmware info on a separate line
102
103=item B<2>
104
105- Storage controller and enclosure info on separate lines
106
107=item B<3>
108
109- OMSA version on separate line
110
111=back
112
113The reason that OMSA version is separated from the rest is that
114finding it requires running a really slow omreport command, when the
115plugin is run locally via NRPE.
116
88f61eb1 117=item -B, --show-blacklist
118
119If used together with blacklisting, this option will make the plugin
120output all blacklistings that are being used. The output will have the
121correct blacklisting syntax, and will make it easy to maintain control
122over which blacklistings that are used for each server, as any
123blacklistings can be viewed from Nagios.
124
125When blacklisting is not used, this option has no effect.
126
71d7d930 127=item --omreport I<OMREPORT PATH>
128
129Specify full path to omreport, if it is not installed in any of the
130regular places. Usually this option is only needed on Windows, if
131omreport is not installed on the C: drive.
132
669797e1 133=item -i, --info
134
135Prefix any alerts with the service tag.
136
137=item -e, --extinfo
138
139Display a short summary of system information (model and service tag)
140in case of an alert.
141
d27881e0 142=item -I, --htmlinfo [I<CODE>]
669797e1 143
144Using this option will make the servicetag and model name into
145clickable HTML links in the output. The model name link will point to
146the official Dell documentation for that model, while the servicetag
147link will point to a website containing support info for that
148particular server.
149
150This option takes an optional argument, which should be your country
151code or C<me> for the middle east. If the country code is omitted the
152servicetag link will still work, but it will not be speficic for your
153country or area. Example for Germany:
154
155 check_openmanage --htmlinfo de
156
157If this option is used together with either the I<--extinfo> or
158I<--info> options, it is particularly useful. Only the most common
159country codes is supported at this time.
160
161=item --postmsg I<STRING> or I<FILE>
162
163User specified post message. Useful for displaying arbitrary or
164various system information at the end of alerts. The argument is
165either a string with the message, or a file containing that
166string. You can control the format with the following interpreted
167sequences:
168
169=over 4
170
171=item B<%m>
172
173System model
174
175=item B<%s>
176
177Service tag
178
179=item B<%b>
180
181BIOS version
182
183=item B<%d>
184
185BIOS release date
186
187=item B<%o>
188
189Operating system name
190
191=item B<%r>
192
193Operating system release
194
195=item B<%p>
196
197Number of physical drives
198
199=item B<%l>
200
201Number of logical drives
202
203=item B<%n>
204
205Line break. Will be a regular line break if run from a TTY, else an
206HTML line break.
207
208=item B<%%>
209
210A literal C<%>
211
212=back
213
214=item -s, --state
215
216Prefix each alert with its corresponding service state (i.e. warning,
217critical etc.). This is useful in case of several alerts from the same
218monitored system.
219
d27881e0 220=item -S, --short-state
669797e1 221
222Same as the B<--state> option above, except that the state is
223abbreviated to a single letter (W=warning, C=critical etc.).
224
fb90e271 225=item --linebreak I<STRING>
669797e1 226
227check_openmanage will sometimes report more than one line, e.g. if
228there are several alerts. If the script has a TTY, it will use regular
229linebreaks. If not (which is the case with NRPE) it will use HTML
230linebreaks. Sometimes it can be useful to control what the plugin uses
231as a line separator, and this option provides that control.
232
233The argument is the exact string to be used as the line
234separator. There are two exceptions, i.e. two keywords that translates
235to the following:
236
237=over 4
238
239=item B<REG>
240
241Regular linebreaks, i.e. "\n".
242
243=item B<HTML>
244
245HTML linebreaks, i.e. "<br/>".
246
247=back
248
249This is a rather special option that is normally not needed. The
250default behaviour should be sufficient for most users.
251
252=item -d, --debug
253
254Debug output. Will report status on everything, even if status is
255ok. Blacklisted or unchecked components are ignored (i.e. no output).
256
257NOTE: This option is intended for diagnostics and debugging purposes
258only. Do not use this option from within Nagios, i.e. in the Nagios
259config.
260
261=item -h, --help
262
263Display help text.
264
265=item -V, --version
266
267Display version info.
268
269=back
270
271=head1 SNMP OPTIONS
272
273=over 4
274
275=item -H, --hostname I<HOSTNAME>
276
277The transport address of the destination SNMP device. Using this
278option triggers SNMP mode.
279
280=item -P, --protocol I<PROTOCOL>
281
282SNMP protocol version. This option is optional and expects a digit
283(i.e. C<1>, C<2> or C<3>) to define the SNMP version. The default is
284C<2>, i.e. SNMP version 2c.
285
286=item -C, --community I<COMMUNITY>
287
288This option expects a string that is to be used as the SNMP community
289name when using SNMP version 1 or 2c. By default the community name
290is set to C<public> if the option is not present.
291
292=item --port I<PORT>
293
294SNMP port of the remote (monitored) system. Defaults to the well-known
295SNMP port 161.
296
8e4a6325 297=item -6, --ipv6
298
cf2df3b9 299This option will cause the plugin to use IPv6. The default is IPv4 if
300the option is not present.
8e4a6325 301
302=item --tcp
303
cf2df3b9 304This option will cause the plugin to use TCP as transport
305protocol. The default is UDP if the option is not present.
8e4a6325 306
669797e1 307=item -U, --username I<SECURITYNAME>
308
309[SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires
310that a securityName be specified. This option is required when using
311SNMP version 3, and expects a string 1 to 32 octets in lenght.
312
313=item --authpassword I<PASSWORD>, --authkey I<KEY>
314
315[SNMPv3] By default a securityLevel of C<noAuthNoPriv> is assumed. If
316the --authpassword option is specified, the securityLevel becomes
317C<authNoPriv>. The --authpassword option expects a string which is at
318least 1 octet in length as argument.
319
320Optionally, instead of the --authpassword option, the --authkey option
321can be used so that a plain text password does not have to be
322specified in a script. The --authkey option expects a hexadecimal
323string produced by localizing the password with the
324authoritativeEngineID for the specific destination device. The
325C<snmpkey> utility included with the Net::SNMP distribution can be
326used to create the hexadecimal string (see L<snmpkey>).
327
328=item --authprotocol I<ALGORITHM>
329
330[SNMPv3] Two different hash algorithms are defined by SNMPv3 which can
331be used by the Security Model for authentication. These algorithms are
332HMAC-MD5-96 C<MD5> (RFC 1321) and HMAC-SHA-96 C<SHA-1> (NIST FIPS PUB
333180-1). The default algorithm used by the plugin is HMAC-MD5-96. This
334behavior can be changed by using this option. The option expects
335either the string C<md5> or C<sha> to be passed as argument to modify
336the hash algorithm.
337
338=item --privpassword I<PASSWORD>, --privkey I<KEY>
339
340[SNMPv3] By specifying the options --privkey or --privpassword, the
341securityLevel associated with the object becomes
342C<authPriv>. According to SNMPv3, privacy requires the use of
343authentication. Therefore, if either of these two options are present
344and the --authkey or --authpassword arguments are missing, the
345creation of the object fails. The --privkey and --privpassword
346options expect the same input as the --authkey and --authpassword
347options respectively.
348
349=item --privprotocol I<ALGORITHM>
350
351[SNMPv3] The User-based Security Model described in RFC 3414 defines a
352single encryption protocol to be used for privacy. This protocol,
353CBC-DES C<DES> (NIST FIPS PUB 46-1), is used by default or if the
354string C<des> is passed to the --privprotocol option. The Net::SNMP
355module also supports RFC 3826 which describes the use of
356CFB128-AES-128 C<AES> (NIST FIPS PUB 197) in the USM. The AES
357encryption protocol can be selected by passing C<aes> or C<aes128> to
358the --privprotocol option.
359
360One of the following arguments are required: des, aes, aes128, 3des,
3613desde
362
606e084f 363=item --use-get_table
364
365This option exists as a workaround when using check_openmanage with
366SNMPv3 on Windows with net-snmp. Using this option will make
367check_openmanage use the Net::SNMP function get_table() instead of
368get_entries() while fetching values via SNMP. The latter is faster and
369is the default.
370
669797e1 371=back
372
373=head1 BLACKLISTING
374
375=over 4
376
377=item -b, --blacklist I<STRING> or I<FILE>
378
379Blacklist missing and/or failed components, if you do not plan to fix
380them. The parameter is either the blacklist string, or a file (that
381may or may not exist) containing the string. The blacklist string
382contains component names with component IDs separated by slash
383(/). Blacklisted components are left unchecked.
384
385TIP: Use the option C<-d> (or C<--debug>) to get the blacklist ID for
386devices. The ID is listed in a separate column in the debug output.
387
0b6ba9c9 388NOTE: If blacklisting is in effect, the global health of the system is
389not checked.
669797e1 390
391=over 9
392
393=item B<Syntax:>
394
395component1=id1[,id2,...]/component2=id1[,id2,...]/...
396
02bf599a 397The ID part can also be C<all>, in which all components of that type
0b6ba9c9 398is blacklisted.
399
669797e1 400=item B<Example:>
401
02bf599a 402check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
669797e1 403
404=back
405
0b6ba9c9 406In the example we blacklist powersupply 0, fans 3 and 5, physical disk
4071:0:0:1, and warnings about out-of-date drivers for all
408controllers. Legal component names include:
669797e1 409
410=over 8
411
412=item B<ctrl>
413
0b6ba9c9 414Storage controller. Note that if a controller is blacklisted, all
415components on that controller (such as physical and logical drives)
416are blacklisted as well.
669797e1 417
418=item B<ctrl_fw>
419
420Suppress the special warning message about old controller
421firmware. Use this if you can not or will not upgrade the firmware.
422
423=item B<ctrl_driver>
424
425Suppress the special warning message about old controller driver.
426Particularly useful on systems where you can not upgrade the driver.
427
8dd8083c 428=item B<ctrl_stdr>
429
430Suppress the special warning message about old Storport driver on
431Windows.
432
d27881e0 433=item B<ctrl_pdisk>
434
435This blacklisting keyword exists as a possible workaround for physical
436drives with bad firmware which makes Openmanage choke. It takes the
437controller number as argument. Use this option to blacklist all
438physical drives on a specific controller. This blacklisting keyword is
439only available in local mode, i.e. not with SNMP.
440
669797e1 441=item B<pdisk>
442
443Physical disk.
444
b17cf22e 445=item B<pdisk_cert>
446
447Suppress warning message about non-certified physical disk.
448
669797e1 449=item B<vdisk>
450
451Logical drive (virtual disk)
452
453=item B<bat>
454
455Controller cache battery
456
7b02bc55 457=item B<bat_charge>
458
459Ignore warnings related to the controller cache battery charging
7031b02a 460cycle, which happens approximately every 40 days on Dell servers. Note
461that using this blacklist keyword makes check_openmanage ignore
462non-critical cache battery errors.
7b02bc55 463
669797e1 464=item B<conn>
465
466Connector (channel)
467
468=item B<encl>
469
470Enclosure
471
472=item B<encl_fan>
473
474Enclosure fan
475
476=item B<encl_ps>
477
478Enclosure power supply
479
480=item B<encl_temp>
481
482Enclosure temperature probe
483
484=item B<encl_emm>
485
486Enclosure management module (EMM)
487
488=item B<dimm>
489
490Memory module
491
492=item B<fan>
493
494Fan
495
496=item B<ps>
497
498Powersupply
499
500=item B<temp>
501
502Temperature sensor
503
504=item B<cpu>
505
506Processor (CPU)
507
508=item B<volt>
509
510Voltage probe
511
512=item B<bp>
513
514System battery
515
600bd61b 516=item B<amp>
669797e1 517
518Amperage probe (power consumption monitoring)
519
520=item B<intr>
521
522Intrusion sensor
523
92083947 524=item B<sd>
525
526SD card
527
669797e1 528=back
529
530=back
531
532=head1 CHECK CONTROL
533
534=over 4
535
50cf4d78 536=item --no-storage
537
538Turn off storage checking. This is an alias for C<--check storage=0>.
539
669797e1 540=item --only I<KEYWORD>
541
542This option can be specifed once and expects a keyword. The different
543keywords and the behaviour of check_openmanage is described below.
544
545=over 4
546
547=item B<critical>
548
549Print only critical alerts. With this option any warning alerts are
550suppressed.
551
552=item B<warning>
553
554Print only warning alerts. With this option any critical alerts are
555suppressed.
556
557=item B<chassis>
558
559Check all chassis components and nothing else.
560
561=item B<storage>
562
563Only check storage
564
565=item B<memory>
566
567Only check memory modules
568
569=item B<fans>
570
571Only check fans
572
573=item B<power>
574
575Only check power supplies
576
577=item B<temp>
578
579Only check temperatures
580
581=item B<cpu>
582
583Only check processors
584
585=item B<voltage>
586
587Only check voltage probes
588
589=item B<batteries>
590
591Only check batteries
592
593=item B<amperage>
594
595Only check power usage
596
597=item B<intrusion>
598
599Only check chassis intrusion
600
92083947 601=item B<sdcard>
602
603Only check SD cards
604
669797e1 605=item B<esmhealth>
606
607Only check ESM log overall health, i.e. fill grade
608
609=item B<esmlog>
610
611Only check the event log (ESM) content
612
613=item B<alertlog>
614
615Only check the alert log content
616
617=back
618
619=item --check I<STRING> or I<FILE>
620
621This parameter allows you to adjust which components that should be
622checked at all. This is a rougher approach than blacklisting, which
623require that you specify component id or index. The parameter should
624be either a string containing the adjustments, or a file containing
625the string. No errors are raised if the file does not exist.
626
627Note: This option is ignored with alternate basenames.
628
629=over 9
630
631=item B<Example:>
632
633check_openmanage --check storage=0,intrusion=1
634
635=back
636
637Legal values are described below, along with the default value.
638
639=over 4
640
641=item B<storage>
642
643Check storage subsystem (controllers, disks etc.). Default: ON
644
645=item B<memory>
646
647Check memory (dimms). Default: ON
648
649=item B<fans>
650
651Check chassis fans. Default: ON
652
653=item B<power>
654
655Check power supplies. Default: ON
656
657=item B<temp>
658
659Check temperature sensors. Default: ON
660
661=item B<cpu>
662
663Check CPUs. Default: ON
664
665=item B<voltage>
666
667Check voltage sensors. Default: ON
668
669=item B<batteries>
670
671Check system batteries. Default: ON
672
673=item B<amperage>
674
675Check amperage probes. Default: ON
676
677=item B<intrusion>
678
679Check chassis intrusion. Default: ON
680
92083947 681=item B<sdcard>
682
683Check SD cards. Default: ON
684
669797e1 685=item B<esmhealth>
686
687Check the ESM log health, i.e. fill grade. Default: ON
688
689=item B<esmlog>
690
691Check the ESM log content. Default: OFF
692
693=item B<alertlog>
694
695Check the alert log content. Default: OFF
696
697=back
698
699=back
700
701=head1 DIAGNOSTICS
702
703The option C<--debug> (or C<-d>) can be specified to display all
704monitored components.
705
706=head1 DEPENDENCIES
707
708If SNMP is requested, the perl module Net::SNMP is
709required. Otherwise, only a regular perl distribution is required to
710run the script. On the target (monitored) system, Dell Openmanage
711Server Administrator (OMSA) must be installed and running.
712
713=head1 EXIT STATUS
714
715If no errors are discovered, a value of 0 (OK) is returned. An exit
716value of 1 (WARNING) signifies one or more non-critical errors, while
7172 (CRITICAL) signifies one or more critical errors.
718
719The exit value 3 (UNKNOWN) is reserved for errors within the script,
720or errors getting values from Dell OMSA.
721
722=head1 AUTHOR
723
724Written by Trond H. Amundsen <t.h.amundsen@usit.uio.no>
725
726=head1 BUGS AND LIMITATIONS
727
728Storage info is not collected or checked on very old PowerEdge models
729and/or old OMSA versions, due to limitations in OMSA. The overall
730support on those models/versions by this plugin is not well tested.
731
732=head1 INCOMPATIBILITIES
733
734The plugin should work with the Nagios embedded perl interpreter
735(ePN). However, this is not thoroughly tested.
736
737=head1 REPORTING BUGS
738
739Report bugs to <t.h.amundsen@usit.uio.no>
740
741=head1 LICENSE AND COPYRIGHT
742
743This program is free software: you can redistribute it and/or modify
744it under the terms of the GNU General Public License as published by
745the Free Software Foundation, either version 3 of the License, or (at
746your option) any later version.
747
748This program is distributed in the hope that it will be useful, but
749WITHOUT ANY WARRANTY; without even the implied warranty of
750MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
751General Public License for more details.
752
753You should have received a copy of the GNU General Public License
754along with this program. If not, see L<http://www.gnu.org/licenses/>.
755
756=head1 SEE ALSO
757
a7da681c 758L<check_openmanage.conf(5)>
669797e1 759L<http://folk.uio.no/trondham/software/check_openmanage.html>
760
761=cut