]> git.uio.no Git - check_openmanage.git/blame - check_openmanage.pod
* version 3.5.0 (yay!)
[check_openmanage.git] / check_openmanage.pod
CommitLineData
669797e1 1# Man page created with:
2#
b53ed7ea 3# pod2man -s 8 -r "`./check_openmanage -V | head -n 1`" -c 'Nagios plugin' check_openmanage.pod check_openmanage.8
669797e1 4#
5# $Id$
6
7=head1 NAME
8
9check_openmanage - Nagios plugin for checking the hardware status on
10 Dell servers running OpenManage
11
12=head1 SYNOPSIS
13
14check_openmanage [I<OPTION>]...
15
16=head1 DESCRIPTION
17
18check_openmanage is a plugin for Nagios which checks the hardware
19health of Dell servers running OpenManage Server Administrator
20(OMSA). The plugin checks the health of the storage subsystem, power
21supplies, memory modules, temperature probes etc., and gives an alert
22if any of the components are faulty or operate outside normal
23parameters.
24
25check_openmanage is designed to be used by either locally (using NRPE
26or similar) or remotely (using SNMP). In either mode, the output is
27(nearly) the same. Note that checking the alert log is not supported
28in SNMP mode.
29
30=head1 GENERAL OPTIONS
31
32=over 4
33
34=item -t, --timeout I<SECONDS>
35
36The number of seconds after which the plugin will abort. Default
37timeout is 30 seconds if the option is not present.
38
39=item -p, --perfdata [I<multline>]
40
41Collect performance data. Performance data collected include
42temperatures (in Celcius) and fan speeds (in rpm). On systems that
43support it, power consumption is also collected (in Watts).
44
45If given the argument C<multiline>, the plugin will output the
46performance data on multiple lines, for Nagios 3.x and above.
47
48=item -w, --warning I<STRING> or I<FILE>
49
50Override the machine-default temperature warning thresholds. Syntax is
51C<id1=max[/min],id2=max[/min],...>. The following example sets warning
52limits to max 50C for probe 0, and max 45C and min 10C for probe 1:
53
54check_openmanage -w 0=50,1=45/10
55
56The minimum limit can be omitted, if desired. Most often, you are only
57interested in setting the maximum thresholds.
58
59This parameter can be either a string with the limits, or a file
60containing the limits string. The option can be specified multiple
61times.
62
63=item -c, --critical I<STRING> or I<FILE>
64
65Override the machine-default temperature critical thresholds. Syntax
66and behaviour is the same as for warning thresholds described above.
67
68=item -o, --ok-info I<NUMBER>
69
70This option lets you define how much output you want the plugin to
71give when everything is OK, i.e. the verbosity level. The default
72value is 0 (one line of output). The output levels are cumulative.
73
74=over 4
75
76=item B<0>
77
78- Only one line (default)
79
80=item B<1>
81
82- BIOS and firmware info on a separate line
83
84=item B<2>
85
86- Storage controller and enclosure info on separate lines
87
88=item B<3>
89
90- OMSA version on separate line
91
92=back
93
94The reason that OMSA version is separated from the rest is that
95finding it requires running a really slow omreport command, when the
96plugin is run locally via NRPE.
97
71d7d930 98=item --omreport I<OMREPORT PATH>
99
100Specify full path to omreport, if it is not installed in any of the
101regular places. Usually this option is only needed on Windows, if
102omreport is not installed on the C: drive.
103
669797e1 104=item -i, --info
105
106Prefix any alerts with the service tag.
107
108=item -e, --extinfo
109
110Display a short summary of system information (model and service tag)
111in case of an alert.
112
113=item --htmlinfo [I<CODE>]
114
115Using this option will make the servicetag and model name into
116clickable HTML links in the output. The model name link will point to
117the official Dell documentation for that model, while the servicetag
118link will point to a website containing support info for that
119particular server.
120
121This option takes an optional argument, which should be your country
122code or C<me> for the middle east. If the country code is omitted the
123servicetag link will still work, but it will not be speficic for your
124country or area. Example for Germany:
125
126 check_openmanage --htmlinfo de
127
128If this option is used together with either the I<--extinfo> or
129I<--info> options, it is particularly useful. Only the most common
130country codes is supported at this time.
131
132=item --postmsg I<STRING> or I<FILE>
133
134User specified post message. Useful for displaying arbitrary or
135various system information at the end of alerts. The argument is
136either a string with the message, or a file containing that
137string. You can control the format with the following interpreted
138sequences:
139
140=over 4
141
142=item B<%m>
143
144System model
145
146=item B<%s>
147
148Service tag
149
150=item B<%b>
151
152BIOS version
153
154=item B<%d>
155
156BIOS release date
157
158=item B<%o>
159
160Operating system name
161
162=item B<%r>
163
164Operating system release
165
166=item B<%p>
167
168Number of physical drives
169
170=item B<%l>
171
172Number of logical drives
173
174=item B<%n>
175
176Line break. Will be a regular line break if run from a TTY, else an
177HTML line break.
178
179=item B<%%>
180
181A literal C<%>
182
183=back
184
185=item -s, --state
186
187Prefix each alert with its corresponding service state (i.e. warning,
188critical etc.). This is useful in case of several alerts from the same
189monitored system.
190
191=item --short-state
192
193Same as the B<--state> option above, except that the state is
194abbreviated to a single letter (W=warning, C=critical etc.).
195
fb90e271 196=item --linebreak I<STRING>
669797e1 197
198check_openmanage will sometimes report more than one line, e.g. if
199there are several alerts. If the script has a TTY, it will use regular
200linebreaks. If not (which is the case with NRPE) it will use HTML
201linebreaks. Sometimes it can be useful to control what the plugin uses
202as a line separator, and this option provides that control.
203
204The argument is the exact string to be used as the line
205separator. There are two exceptions, i.e. two keywords that translates
206to the following:
207
208=over 4
209
210=item B<REG>
211
212Regular linebreaks, i.e. "\n".
213
214=item B<HTML>
215
216HTML linebreaks, i.e. "<br/>".
217
218=back
219
220This is a rather special option that is normally not needed. The
221default behaviour should be sufficient for most users.
222
223=item -d, --debug
224
225Debug output. Will report status on everything, even if status is
226ok. Blacklisted or unchecked components are ignored (i.e. no output).
227
228NOTE: This option is intended for diagnostics and debugging purposes
229only. Do not use this option from within Nagios, i.e. in the Nagios
230config.
231
232=item -h, --help
233
234Display help text.
235
236=item -V, --version
237
238Display version info.
239
240=back
241
242=head1 SNMP OPTIONS
243
244=over 4
245
246=item -H, --hostname I<HOSTNAME>
247
248The transport address of the destination SNMP device. Using this
249option triggers SNMP mode.
250
251=item -P, --protocol I<PROTOCOL>
252
253SNMP protocol version. This option is optional and expects a digit
254(i.e. C<1>, C<2> or C<3>) to define the SNMP version. The default is
255C<2>, i.e. SNMP version 2c.
256
257=item -C, --community I<COMMUNITY>
258
259This option expects a string that is to be used as the SNMP community
260name when using SNMP version 1 or 2c. By default the community name
261is set to C<public> if the option is not present.
262
263=item --port I<PORT>
264
265SNMP port of the remote (monitored) system. Defaults to the well-known
266SNMP port 161.
267
268=item -U, --username I<SECURITYNAME>
269
270[SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires
271that a securityName be specified. This option is required when using
272SNMP version 3, and expects a string 1 to 32 octets in lenght.
273
274=item --authpassword I<PASSWORD>, --authkey I<KEY>
275
276[SNMPv3] By default a securityLevel of C<noAuthNoPriv> is assumed. If
277the --authpassword option is specified, the securityLevel becomes
278C<authNoPriv>. The --authpassword option expects a string which is at
279least 1 octet in length as argument.
280
281Optionally, instead of the --authpassword option, the --authkey option
282can be used so that a plain text password does not have to be
283specified in a script. The --authkey option expects a hexadecimal
284string produced by localizing the password with the
285authoritativeEngineID for the specific destination device. The
286C<snmpkey> utility included with the Net::SNMP distribution can be
287used to create the hexadecimal string (see L<snmpkey>).
288
289=item --authprotocol I<ALGORITHM>
290
291[SNMPv3] Two different hash algorithms are defined by SNMPv3 which can
292be used by the Security Model for authentication. These algorithms are
293HMAC-MD5-96 C<MD5> (RFC 1321) and HMAC-SHA-96 C<SHA-1> (NIST FIPS PUB
294180-1). The default algorithm used by the plugin is HMAC-MD5-96. This
295behavior can be changed by using this option. The option expects
296either the string C<md5> or C<sha> to be passed as argument to modify
297the hash algorithm.
298
299=item --privpassword I<PASSWORD>, --privkey I<KEY>
300
301[SNMPv3] By specifying the options --privkey or --privpassword, the
302securityLevel associated with the object becomes
303C<authPriv>. According to SNMPv3, privacy requires the use of
304authentication. Therefore, if either of these two options are present
305and the --authkey or --authpassword arguments are missing, the
306creation of the object fails. The --privkey and --privpassword
307options expect the same input as the --authkey and --authpassword
308options respectively.
309
310=item --privprotocol I<ALGORITHM>
311
312[SNMPv3] The User-based Security Model described in RFC 3414 defines a
313single encryption protocol to be used for privacy. This protocol,
314CBC-DES C<DES> (NIST FIPS PUB 46-1), is used by default or if the
315string C<des> is passed to the --privprotocol option. The Net::SNMP
316module also supports RFC 3826 which describes the use of
317CFB128-AES-128 C<AES> (NIST FIPS PUB 197) in the USM. The AES
318encryption protocol can be selected by passing C<aes> or C<aes128> to
319the --privprotocol option.
320
321One of the following arguments are required: des, aes, aes128, 3des,
3223desde
323
324=back
325
326=head1 BLACKLISTING
327
328=over 4
329
330=item -b, --blacklist I<STRING> or I<FILE>
331
332Blacklist missing and/or failed components, if you do not plan to fix
333them. The parameter is either the blacklist string, or a file (that
334may or may not exist) containing the string. The blacklist string
335contains component names with component IDs separated by slash
336(/). Blacklisted components are left unchecked.
337
338TIP: Use the option C<-d> (or C<--debug>) to get the blacklist ID for
339devices. The ID is listed in a separate column in the debug output.
340
0b6ba9c9 341NOTE: If blacklisting is in effect, the global health of the system is
342not checked.
669797e1 343
344=over 9
345
346=item B<Syntax:>
347
348component1=id1[,id2,...]/component2=id1[,id2,...]/...
349
0b6ba9c9 350The ID part can also be C<ALL>, in which all components of that type
351is blacklisted.
352
669797e1 353=item B<Example:>
354
0b6ba9c9 355check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=ALL
669797e1 356
357=back
358
0b6ba9c9 359In the example we blacklist powersupply 0, fans 3 and 5, physical disk
3601:0:0:1, and warnings about out-of-date drivers for all
361controllers. Legal component names include:
669797e1 362
363=over 8
364
365=item B<ctrl>
366
0b6ba9c9 367Storage controller. Note that if a controller is blacklisted, all
368components on that controller (such as physical and logical drives)
369are blacklisted as well.
669797e1 370
371=item B<ctrl_fw>
372
373Suppress the special warning message about old controller
374firmware. Use this if you can not or will not upgrade the firmware.
375
376=item B<ctrl_driver>
377
378Suppress the special warning message about old controller driver.
379Particularly useful on systems where you can not upgrade the driver.
380
381=item B<pdisk>
382
383Physical disk.
384
385=item B<vdisk>
386
387Logical drive (virtual disk)
388
389=item B<bat>
390
391Controller cache battery
392
7b02bc55 393=item B<bat_charge>
394
395Ignore warnings related to the controller cache battery charging
7031b02a 396cycle, which happens approximately every 40 days on Dell servers. Note
397that using this blacklist keyword makes check_openmanage ignore
398non-critical cache battery errors.
7b02bc55 399
669797e1 400=item B<conn>
401
402Connector (channel)
403
404=item B<encl>
405
406Enclosure
407
408=item B<encl_fan>
409
410Enclosure fan
411
412=item B<encl_ps>
413
414Enclosure power supply
415
416=item B<encl_temp>
417
418Enclosure temperature probe
419
420=item B<encl_emm>
421
422Enclosure management module (EMM)
423
424=item B<dimm>
425
426Memory module
427
428=item B<fan>
429
430Fan
431
432=item B<ps>
433
434Powersupply
435
436=item B<temp>
437
438Temperature sensor
439
440=item B<cpu>
441
442Processor (CPU)
443
444=item B<volt>
445
446Voltage probe
447
448=item B<bp>
449
450System battery
451
452=item B<pm>
453
454Amperage probe (power consumption monitoring)
455
456=item B<intr>
457
458Intrusion sensor
459
460=back
461
462=back
463
464=head1 CHECK CONTROL
465
466=over 4
467
468=item --only I<KEYWORD>
469
470This option can be specifed once and expects a keyword. The different
471keywords and the behaviour of check_openmanage is described below.
472
473=over 4
474
475=item B<critical>
476
477Print only critical alerts. With this option any warning alerts are
478suppressed.
479
480=item B<warning>
481
482Print only warning alerts. With this option any critical alerts are
483suppressed.
484
485=item B<chassis>
486
487Check all chassis components and nothing else.
488
489=item B<storage>
490
491Only check storage
492
493=item B<memory>
494
495Only check memory modules
496
497=item B<fans>
498
499Only check fans
500
501=item B<power>
502
503Only check power supplies
504
505=item B<temp>
506
507Only check temperatures
508
509=item B<cpu>
510
511Only check processors
512
513=item B<voltage>
514
515Only check voltage probes
516
517=item B<batteries>
518
519Only check batteries
520
521=item B<amperage>
522
523Only check power usage
524
525=item B<intrusion>
526
527Only check chassis intrusion
528
529=item B<esmhealth>
530
531Only check ESM log overall health, i.e. fill grade
532
533=item B<esmlog>
534
535Only check the event log (ESM) content
536
537=item B<alertlog>
538
539Only check the alert log content
540
541=back
542
543=item --check I<STRING> or I<FILE>
544
545This parameter allows you to adjust which components that should be
546checked at all. This is a rougher approach than blacklisting, which
547require that you specify component id or index. The parameter should
548be either a string containing the adjustments, or a file containing
549the string. No errors are raised if the file does not exist.
550
551Note: This option is ignored with alternate basenames.
552
553=over 9
554
555=item B<Example:>
556
557check_openmanage --check storage=0,intrusion=1
558
559=back
560
561Legal values are described below, along with the default value.
562
563=over 4
564
565=item B<storage>
566
567Check storage subsystem (controllers, disks etc.). Default: ON
568
569=item B<memory>
570
571Check memory (dimms). Default: ON
572
573=item B<fans>
574
575Check chassis fans. Default: ON
576
577=item B<power>
578
579Check power supplies. Default: ON
580
581=item B<temp>
582
583Check temperature sensors. Default: ON
584
585=item B<cpu>
586
587Check CPUs. Default: ON
588
589=item B<voltage>
590
591Check voltage sensors. Default: ON
592
593=item B<batteries>
594
595Check system batteries. Default: ON
596
597=item B<amperage>
598
599Check amperage probes. Default: ON
600
601=item B<intrusion>
602
603Check chassis intrusion. Default: ON
604
605=item B<esmhealth>
606
607Check the ESM log health, i.e. fill grade. Default: ON
608
609=item B<esmlog>
610
611Check the ESM log content. Default: OFF
612
613=item B<alertlog>
614
615Check the alert log content. Default: OFF
616
617=back
618
619=back
620
621=head1 DIAGNOSTICS
622
623The option C<--debug> (or C<-d>) can be specified to display all
624monitored components.
625
626=head1 DEPENDENCIES
627
628If SNMP is requested, the perl module Net::SNMP is
629required. Otherwise, only a regular perl distribution is required to
630run the script. On the target (monitored) system, Dell Openmanage
631Server Administrator (OMSA) must be installed and running.
632
633=head1 EXIT STATUS
634
635If no errors are discovered, a value of 0 (OK) is returned. An exit
636value of 1 (WARNING) signifies one or more non-critical errors, while
6372 (CRITICAL) signifies one or more critical errors.
638
639The exit value 3 (UNKNOWN) is reserved for errors within the script,
640or errors getting values from Dell OMSA.
641
642=head1 AUTHOR
643
644Written by Trond H. Amundsen <t.h.amundsen@usit.uio.no>
645
646=head1 BUGS AND LIMITATIONS
647
648Storage info is not collected or checked on very old PowerEdge models
649and/or old OMSA versions, due to limitations in OMSA. The overall
650support on those models/versions by this plugin is not well tested.
651
652=head1 INCOMPATIBILITIES
653
654The plugin should work with the Nagios embedded perl interpreter
655(ePN). However, this is not thoroughly tested.
656
657=head1 REPORTING BUGS
658
659Report bugs to <t.h.amundsen@usit.uio.no>
660
661=head1 LICENSE AND COPYRIGHT
662
663This program is free software: you can redistribute it and/or modify
664it under the terms of the GNU General Public License as published by
665the Free Software Foundation, either version 3 of the License, or (at
666your option) any later version.
667
668This program is distributed in the hope that it will be useful, but
669WITHOUT ANY WARRANTY; without even the implied warranty of
670MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
671General Public License for more details.
672
673You should have received a copy of the GNU General Public License
674along with this program. If not, see L<http://www.gnu.org/licenses/>.
675
676=head1 SEE ALSO
677
678L<http://folk.uio.no/trondham/software/check_openmanage.html>
679
680=cut