]> git.uio.no Git - check_openmanage.git/blame - check_openmanage.pod
sdcard
[check_openmanage.git] / check_openmanage.pod
CommitLineData
669797e1 1# Man page created with:
2#
b53ed7ea 3# pod2man -s 8 -r "`./check_openmanage -V | head -n 1`" -c 'Nagios plugin' check_openmanage.pod check_openmanage.8
669797e1 4#
5# $Id$
6
7=head1 NAME
8
9check_openmanage - Nagios plugin for checking the hardware status on
10 Dell servers running OpenManage
11
12=head1 SYNOPSIS
13
14check_openmanage [I<OPTION>]...
b0f29cfc 15
6a3615ec 16check_openmanage -H I<hostname> [I<OPTION>]...
669797e1 17
18=head1 DESCRIPTION
19
20check_openmanage is a plugin for Nagios which checks the hardware
21health of Dell servers running OpenManage Server Administrator
22(OMSA). The plugin checks the health of the storage subsystem, power
23supplies, memory modules, temperature probes etc., and gives an alert
24if any of the components are faulty or operate outside normal
25parameters.
26
27check_openmanage is designed to be used by either locally (using NRPE
28or similar) or remotely (using SNMP). In either mode, the output is
29(nearly) the same. Note that checking the alert log is not supported
30in SNMP mode.
31
32=head1 GENERAL OPTIONS
33
34=over 4
35
36=item -t, --timeout I<SECONDS>
37
38The number of seconds after which the plugin will abort. Default
39timeout is 30 seconds if the option is not present.
40
aca136f2 41=item -p, --perfdata [I<multline> or I<minimal>]
669797e1 42
43Collect performance data. Performance data collected include
c1c1118a 44temperatures (in Celsius) and fan speeds (in rpm). On systems that
aca136f2 45support it, power consumption is also collected (in Watts). This
46option takes one of two arguments, both of which are optional.
47
48If the argument C<minimal> is specified, the plugin will use shorter
49names for the performance data labels, e.g. C<t0> instead of
50C<temp_0_system_board_ambient>. This can be used as a workaround in
51cases where the plugin output needs shortening, for example if the
521024 character limit of NRPE is reached.
669797e1 53
54If given the argument C<multiline>, the plugin will output the
55performance data on multiple lines, for Nagios 3.x and above.
56
57=item -w, --warning I<STRING> or I<FILE>
58
59Override the machine-default temperature warning thresholds. Syntax is
60C<id1=max[/min],id2=max[/min],...>. The following example sets warning
61limits to max 50C for probe 0, and max 45C and min 10C for probe 1:
62
63check_openmanage -w 0=50,1=45/10
64
65The minimum limit can be omitted, if desired. Most often, you are only
66interested in setting the maximum thresholds.
67
68This parameter can be either a string with the limits, or a file
69containing the limits string. The option can be specified multiple
70times.
71
b0f29cfc 72NOTE: This option should only be used to narrow the field of OK
73temperatures wrt. the OMSA defaults. To expand the field of OK
74temperatures, increase the OMSA thresholds. See the plugin web page
75for more information.
76
669797e1 77=item -c, --critical I<STRING> or I<FILE>
78
79Override the machine-default temperature critical thresholds. Syntax
80and behaviour is the same as for warning thresholds described above.
81
82=item -o, --ok-info I<NUMBER>
83
84This option lets you define how much output you want the plugin to
85give when everything is OK, i.e. the verbosity level. The default
86value is 0 (one line of output). The output levels are cumulative.
87
88=over 4
89
90=item B<0>
91
92- Only one line (default)
93
94=item B<1>
95
96- BIOS and firmware info on a separate line
97
98=item B<2>
99
100- Storage controller and enclosure info on separate lines
101
102=item B<3>
103
104- OMSA version on separate line
105
106=back
107
108The reason that OMSA version is separated from the rest is that
109finding it requires running a really slow omreport command, when the
110plugin is run locally via NRPE.
111
71d7d930 112=item --omreport I<OMREPORT PATH>
113
114Specify full path to omreport, if it is not installed in any of the
115regular places. Usually this option is only needed on Windows, if
116omreport is not installed on the C: drive.
117
669797e1 118=item -i, --info
119
120Prefix any alerts with the service tag.
121
122=item -e, --extinfo
123
124Display a short summary of system information (model and service tag)
125in case of an alert.
126
d27881e0 127=item -I, --htmlinfo [I<CODE>]
669797e1 128
129Using this option will make the servicetag and model name into
130clickable HTML links in the output. The model name link will point to
131the official Dell documentation for that model, while the servicetag
132link will point to a website containing support info for that
133particular server.
134
135This option takes an optional argument, which should be your country
136code or C<me> for the middle east. If the country code is omitted the
137servicetag link will still work, but it will not be speficic for your
138country or area. Example for Germany:
139
140 check_openmanage --htmlinfo de
141
142If this option is used together with either the I<--extinfo> or
143I<--info> options, it is particularly useful. Only the most common
144country codes is supported at this time.
145
146=item --postmsg I<STRING> or I<FILE>
147
148User specified post message. Useful for displaying arbitrary or
149various system information at the end of alerts. The argument is
150either a string with the message, or a file containing that
151string. You can control the format with the following interpreted
152sequences:
153
154=over 4
155
156=item B<%m>
157
158System model
159
160=item B<%s>
161
162Service tag
163
164=item B<%b>
165
166BIOS version
167
168=item B<%d>
169
170BIOS release date
171
172=item B<%o>
173
174Operating system name
175
176=item B<%r>
177
178Operating system release
179
180=item B<%p>
181
182Number of physical drives
183
184=item B<%l>
185
186Number of logical drives
187
188=item B<%n>
189
190Line break. Will be a regular line break if run from a TTY, else an
191HTML line break.
192
193=item B<%%>
194
195A literal C<%>
196
197=back
198
199=item -s, --state
200
201Prefix each alert with its corresponding service state (i.e. warning,
202critical etc.). This is useful in case of several alerts from the same
203monitored system.
204
d27881e0 205=item -S, --short-state
669797e1 206
207Same as the B<--state> option above, except that the state is
208abbreviated to a single letter (W=warning, C=critical etc.).
209
fb90e271 210=item --linebreak I<STRING>
669797e1 211
212check_openmanage will sometimes report more than one line, e.g. if
213there are several alerts. If the script has a TTY, it will use regular
214linebreaks. If not (which is the case with NRPE) it will use HTML
215linebreaks. Sometimes it can be useful to control what the plugin uses
216as a line separator, and this option provides that control.
217
218The argument is the exact string to be used as the line
219separator. There are two exceptions, i.e. two keywords that translates
220to the following:
221
222=over 4
223
224=item B<REG>
225
226Regular linebreaks, i.e. "\n".
227
228=item B<HTML>
229
230HTML linebreaks, i.e. "<br/>".
231
232=back
233
234This is a rather special option that is normally not needed. The
235default behaviour should be sufficient for most users.
236
237=item -d, --debug
238
239Debug output. Will report status on everything, even if status is
240ok. Blacklisted or unchecked components are ignored (i.e. no output).
241
242NOTE: This option is intended for diagnostics and debugging purposes
243only. Do not use this option from within Nagios, i.e. in the Nagios
244config.
245
246=item -h, --help
247
248Display help text.
249
250=item -V, --version
251
252Display version info.
253
254=back
255
256=head1 SNMP OPTIONS
257
258=over 4
259
260=item -H, --hostname I<HOSTNAME>
261
262The transport address of the destination SNMP device. Using this
263option triggers SNMP mode.
264
265=item -P, --protocol I<PROTOCOL>
266
267SNMP protocol version. This option is optional and expects a digit
268(i.e. C<1>, C<2> or C<3>) to define the SNMP version. The default is
269C<2>, i.e. SNMP version 2c.
270
271=item -C, --community I<COMMUNITY>
272
273This option expects a string that is to be used as the SNMP community
274name when using SNMP version 1 or 2c. By default the community name
275is set to C<public> if the option is not present.
276
277=item --port I<PORT>
278
279SNMP port of the remote (monitored) system. Defaults to the well-known
280SNMP port 161.
281
282=item -U, --username I<SECURITYNAME>
283
284[SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires
285that a securityName be specified. This option is required when using
286SNMP version 3, and expects a string 1 to 32 octets in lenght.
287
288=item --authpassword I<PASSWORD>, --authkey I<KEY>
289
290[SNMPv3] By default a securityLevel of C<noAuthNoPriv> is assumed. If
291the --authpassword option is specified, the securityLevel becomes
292C<authNoPriv>. The --authpassword option expects a string which is at
293least 1 octet in length as argument.
294
295Optionally, instead of the --authpassword option, the --authkey option
296can be used so that a plain text password does not have to be
297specified in a script. The --authkey option expects a hexadecimal
298string produced by localizing the password with the
299authoritativeEngineID for the specific destination device. The
300C<snmpkey> utility included with the Net::SNMP distribution can be
301used to create the hexadecimal string (see L<snmpkey>).
302
303=item --authprotocol I<ALGORITHM>
304
305[SNMPv3] Two different hash algorithms are defined by SNMPv3 which can
306be used by the Security Model for authentication. These algorithms are
307HMAC-MD5-96 C<MD5> (RFC 1321) and HMAC-SHA-96 C<SHA-1> (NIST FIPS PUB
308180-1). The default algorithm used by the plugin is HMAC-MD5-96. This
309behavior can be changed by using this option. The option expects
310either the string C<md5> or C<sha> to be passed as argument to modify
311the hash algorithm.
312
313=item --privpassword I<PASSWORD>, --privkey I<KEY>
314
315[SNMPv3] By specifying the options --privkey or --privpassword, the
316securityLevel associated with the object becomes
317C<authPriv>. According to SNMPv3, privacy requires the use of
318authentication. Therefore, if either of these two options are present
319and the --authkey or --authpassword arguments are missing, the
320creation of the object fails. The --privkey and --privpassword
321options expect the same input as the --authkey and --authpassword
322options respectively.
323
324=item --privprotocol I<ALGORITHM>
325
326[SNMPv3] The User-based Security Model described in RFC 3414 defines a
327single encryption protocol to be used for privacy. This protocol,
328CBC-DES C<DES> (NIST FIPS PUB 46-1), is used by default or if the
329string C<des> is passed to the --privprotocol option. The Net::SNMP
330module also supports RFC 3826 which describes the use of
331CFB128-AES-128 C<AES> (NIST FIPS PUB 197) in the USM. The AES
332encryption protocol can be selected by passing C<aes> or C<aes128> to
333the --privprotocol option.
334
335One of the following arguments are required: des, aes, aes128, 3des,
3363desde
337
606e084f 338=item --use-get_table
339
340This option exists as a workaround when using check_openmanage with
341SNMPv3 on Windows with net-snmp. Using this option will make
342check_openmanage use the Net::SNMP function get_table() instead of
343get_entries() while fetching values via SNMP. The latter is faster and
344is the default.
345
669797e1 346=back
347
348=head1 BLACKLISTING
349
350=over 4
351
352=item -b, --blacklist I<STRING> or I<FILE>
353
354Blacklist missing and/or failed components, if you do not plan to fix
355them. The parameter is either the blacklist string, or a file (that
356may or may not exist) containing the string. The blacklist string
357contains component names with component IDs separated by slash
358(/). Blacklisted components are left unchecked.
359
360TIP: Use the option C<-d> (or C<--debug>) to get the blacklist ID for
361devices. The ID is listed in a separate column in the debug output.
362
0b6ba9c9 363NOTE: If blacklisting is in effect, the global health of the system is
364not checked.
669797e1 365
366=over 9
367
368=item B<Syntax:>
369
370component1=id1[,id2,...]/component2=id1[,id2,...]/...
371
02bf599a 372The ID part can also be C<all>, in which all components of that type
0b6ba9c9 373is blacklisted.
374
669797e1 375=item B<Example:>
376
02bf599a 377check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=all
669797e1 378
379=back
380
0b6ba9c9 381In the example we blacklist powersupply 0, fans 3 and 5, physical disk
3821:0:0:1, and warnings about out-of-date drivers for all
383controllers. Legal component names include:
669797e1 384
385=over 8
386
387=item B<ctrl>
388
0b6ba9c9 389Storage controller. Note that if a controller is blacklisted, all
390components on that controller (such as physical and logical drives)
391are blacklisted as well.
669797e1 392
393=item B<ctrl_fw>
394
395Suppress the special warning message about old controller
396firmware. Use this if you can not or will not upgrade the firmware.
397
398=item B<ctrl_driver>
399
400Suppress the special warning message about old controller driver.
401Particularly useful on systems where you can not upgrade the driver.
402
8dd8083c 403=item B<ctrl_stdr>
404
405Suppress the special warning message about old Storport driver on
406Windows.
407
d27881e0 408=item B<ctrl_pdisk>
409
410This blacklisting keyword exists as a possible workaround for physical
411drives with bad firmware which makes Openmanage choke. It takes the
412controller number as argument. Use this option to blacklist all
413physical drives on a specific controller. This blacklisting keyword is
414only available in local mode, i.e. not with SNMP.
415
669797e1 416=item B<pdisk>
417
418Physical disk.
419
420=item B<vdisk>
421
422Logical drive (virtual disk)
423
424=item B<bat>
425
426Controller cache battery
427
7b02bc55 428=item B<bat_charge>
429
430Ignore warnings related to the controller cache battery charging
7031b02a 431cycle, which happens approximately every 40 days on Dell servers. Note
432that using this blacklist keyword makes check_openmanage ignore
433non-critical cache battery errors.
7b02bc55 434
669797e1 435=item B<conn>
436
437Connector (channel)
438
439=item B<encl>
440
441Enclosure
442
443=item B<encl_fan>
444
445Enclosure fan
446
447=item B<encl_ps>
448
449Enclosure power supply
450
451=item B<encl_temp>
452
453Enclosure temperature probe
454
455=item B<encl_emm>
456
457Enclosure management module (EMM)
458
459=item B<dimm>
460
461Memory module
462
463=item B<fan>
464
465Fan
466
467=item B<ps>
468
469Powersupply
470
471=item B<temp>
472
473Temperature sensor
474
475=item B<cpu>
476
477Processor (CPU)
478
479=item B<volt>
480
481Voltage probe
482
483=item B<bp>
484
485System battery
486
600bd61b 487=item B<amp>
669797e1 488
489Amperage probe (power consumption monitoring)
490
491=item B<intr>
492
493Intrusion sensor
494
92083947 495=item B<sd>
496
497SD card
498
669797e1 499=back
500
501=back
502
503=head1 CHECK CONTROL
504
505=over 4
506
50cf4d78 507=item --no-storage
508
509Turn off storage checking. This is an alias for C<--check storage=0>.
510
669797e1 511=item --only I<KEYWORD>
512
513This option can be specifed once and expects a keyword. The different
514keywords and the behaviour of check_openmanage is described below.
515
516=over 4
517
518=item B<critical>
519
520Print only critical alerts. With this option any warning alerts are
521suppressed.
522
523=item B<warning>
524
525Print only warning alerts. With this option any critical alerts are
526suppressed.
527
528=item B<chassis>
529
530Check all chassis components and nothing else.
531
532=item B<storage>
533
534Only check storage
535
536=item B<memory>
537
538Only check memory modules
539
540=item B<fans>
541
542Only check fans
543
544=item B<power>
545
546Only check power supplies
547
548=item B<temp>
549
550Only check temperatures
551
552=item B<cpu>
553
554Only check processors
555
556=item B<voltage>
557
558Only check voltage probes
559
560=item B<batteries>
561
562Only check batteries
563
564=item B<amperage>
565
566Only check power usage
567
568=item B<intrusion>
569
570Only check chassis intrusion
571
92083947 572=item B<sdcard>
573
574Only check SD cards
575
669797e1 576=item B<esmhealth>
577
578Only check ESM log overall health, i.e. fill grade
579
580=item B<esmlog>
581
582Only check the event log (ESM) content
583
584=item B<alertlog>
585
586Only check the alert log content
587
588=back
589
590=item --check I<STRING> or I<FILE>
591
592This parameter allows you to adjust which components that should be
593checked at all. This is a rougher approach than blacklisting, which
594require that you specify component id or index. The parameter should
595be either a string containing the adjustments, or a file containing
596the string. No errors are raised if the file does not exist.
597
598Note: This option is ignored with alternate basenames.
599
600=over 9
601
602=item B<Example:>
603
604check_openmanage --check storage=0,intrusion=1
605
606=back
607
608Legal values are described below, along with the default value.
609
610=over 4
611
612=item B<storage>
613
614Check storage subsystem (controllers, disks etc.). Default: ON
615
616=item B<memory>
617
618Check memory (dimms). Default: ON
619
620=item B<fans>
621
622Check chassis fans. Default: ON
623
624=item B<power>
625
626Check power supplies. Default: ON
627
628=item B<temp>
629
630Check temperature sensors. Default: ON
631
632=item B<cpu>
633
634Check CPUs. Default: ON
635
636=item B<voltage>
637
638Check voltage sensors. Default: ON
639
640=item B<batteries>
641
642Check system batteries. Default: ON
643
644=item B<amperage>
645
646Check amperage probes. Default: ON
647
648=item B<intrusion>
649
650Check chassis intrusion. Default: ON
651
92083947 652=item B<sdcard>
653
654Check SD cards. Default: ON
655
669797e1 656=item B<esmhealth>
657
658Check the ESM log health, i.e. fill grade. Default: ON
659
660=item B<esmlog>
661
662Check the ESM log content. Default: OFF
663
664=item B<alertlog>
665
666Check the alert log content. Default: OFF
667
668=back
669
670=back
671
672=head1 DIAGNOSTICS
673
674The option C<--debug> (or C<-d>) can be specified to display all
675monitored components.
676
677=head1 DEPENDENCIES
678
679If SNMP is requested, the perl module Net::SNMP is
680required. Otherwise, only a regular perl distribution is required to
681run the script. On the target (monitored) system, Dell Openmanage
682Server Administrator (OMSA) must be installed and running.
683
684=head1 EXIT STATUS
685
686If no errors are discovered, a value of 0 (OK) is returned. An exit
687value of 1 (WARNING) signifies one or more non-critical errors, while
6882 (CRITICAL) signifies one or more critical errors.
689
690The exit value 3 (UNKNOWN) is reserved for errors within the script,
691or errors getting values from Dell OMSA.
692
693=head1 AUTHOR
694
695Written by Trond H. Amundsen <t.h.amundsen@usit.uio.no>
696
697=head1 BUGS AND LIMITATIONS
698
699Storage info is not collected or checked on very old PowerEdge models
700and/or old OMSA versions, due to limitations in OMSA. The overall
701support on those models/versions by this plugin is not well tested.
702
703=head1 INCOMPATIBILITIES
704
705The plugin should work with the Nagios embedded perl interpreter
706(ePN). However, this is not thoroughly tested.
707
708=head1 REPORTING BUGS
709
710Report bugs to <t.h.amundsen@usit.uio.no>
711
712=head1 LICENSE AND COPYRIGHT
713
714This program is free software: you can redistribute it and/or modify
715it under the terms of the GNU General Public License as published by
716the Free Software Foundation, either version 3 of the License, or (at
717your option) any later version.
718
719This program is distributed in the hope that it will be useful, but
720WITHOUT ANY WARRANTY; without even the implied warranty of
721MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
722General Public License for more details.
723
724You should have received a copy of the GNU General Public License
725along with this program. If not, see L<http://www.gnu.org/licenses/>.
726
727=head1 SEE ALSO
728
729L<http://folk.uio.no/trondham/software/check_openmanage.html>
730
731=cut