]>
Commit | Line | Data |
---|---|---|
669797e1 | 1 | # Man page created with: |
2 | # | |
b53ed7ea | 3 | # pod2man -s 8 -r "`./check_openmanage -V | head -n 1`" -c 'Nagios plugin' check_openmanage.pod check_openmanage.8 |
669797e1 | 4 | # |
5 | # $Id$ | |
6 | ||
7 | =head1 NAME | |
8 | ||
9 | check_openmanage - Nagios plugin for checking the hardware status on | |
10 | Dell servers running OpenManage | |
11 | ||
12 | =head1 SYNOPSIS | |
13 | ||
14 | check_openmanage [I<OPTION>]... | |
15 | ||
16 | =head1 DESCRIPTION | |
17 | ||
18 | check_openmanage is a plugin for Nagios which checks the hardware | |
19 | health of Dell servers running OpenManage Server Administrator | |
20 | (OMSA). The plugin checks the health of the storage subsystem, power | |
21 | supplies, memory modules, temperature probes etc., and gives an alert | |
22 | if any of the components are faulty or operate outside normal | |
23 | parameters. | |
24 | ||
25 | check_openmanage is designed to be used by either locally (using NRPE | |
26 | or similar) or remotely (using SNMP). In either mode, the output is | |
27 | (nearly) the same. Note that checking the alert log is not supported | |
28 | in SNMP mode. | |
29 | ||
30 | =head1 GENERAL OPTIONS | |
31 | ||
32 | =over 4 | |
33 | ||
34 | =item -t, --timeout I<SECONDS> | |
35 | ||
36 | The number of seconds after which the plugin will abort. Default | |
37 | timeout is 30 seconds if the option is not present. | |
38 | ||
39 | =item -p, --perfdata [I<multline>] | |
40 | ||
41 | Collect performance data. Performance data collected include | |
42 | temperatures (in Celcius) and fan speeds (in rpm). On systems that | |
43 | support it, power consumption is also collected (in Watts). | |
44 | ||
45 | If given the argument C<multiline>, the plugin will output the | |
46 | performance data on multiple lines, for Nagios 3.x and above. | |
47 | ||
48 | =item -w, --warning I<STRING> or I<FILE> | |
49 | ||
50 | Override the machine-default temperature warning thresholds. Syntax is | |
51 | C<id1=max[/min],id2=max[/min],...>. The following example sets warning | |
52 | limits to max 50C for probe 0, and max 45C and min 10C for probe 1: | |
53 | ||
54 | check_openmanage -w 0=50,1=45/10 | |
55 | ||
56 | The minimum limit can be omitted, if desired. Most often, you are only | |
57 | interested in setting the maximum thresholds. | |
58 | ||
59 | This parameter can be either a string with the limits, or a file | |
60 | containing the limits string. The option can be specified multiple | |
61 | times. | |
62 | ||
63 | =item -c, --critical I<STRING> or I<FILE> | |
64 | ||
65 | Override the machine-default temperature critical thresholds. Syntax | |
66 | and behaviour is the same as for warning thresholds described above. | |
67 | ||
68 | =item -o, --ok-info I<NUMBER> | |
69 | ||
70 | This option lets you define how much output you want the plugin to | |
71 | give when everything is OK, i.e. the verbosity level. The default | |
72 | value is 0 (one line of output). The output levels are cumulative. | |
73 | ||
74 | =over 4 | |
75 | ||
76 | =item B<0> | |
77 | ||
78 | - Only one line (default) | |
79 | ||
80 | =item B<1> | |
81 | ||
82 | - BIOS and firmware info on a separate line | |
83 | ||
84 | =item B<2> | |
85 | ||
86 | - Storage controller and enclosure info on separate lines | |
87 | ||
88 | =item B<3> | |
89 | ||
90 | - OMSA version on separate line | |
91 | ||
92 | =back | |
93 | ||
94 | The reason that OMSA version is separated from the rest is that | |
95 | finding it requires running a really slow omreport command, when the | |
96 | plugin is run locally via NRPE. | |
97 | ||
71d7d930 | 98 | =item --omreport I<OMREPORT PATH> |
99 | ||
100 | Specify full path to omreport, if it is not installed in any of the | |
101 | regular places. Usually this option is only needed on Windows, if | |
102 | omreport is not installed on the C: drive. | |
103 | ||
669797e1 | 104 | =item -i, --info |
105 | ||
106 | Prefix any alerts with the service tag. | |
107 | ||
108 | =item -e, --extinfo | |
109 | ||
110 | Display a short summary of system information (model and service tag) | |
111 | in case of an alert. | |
112 | ||
113 | =item --htmlinfo [I<CODE>] | |
114 | ||
115 | Using this option will make the servicetag and model name into | |
116 | clickable HTML links in the output. The model name link will point to | |
117 | the official Dell documentation for that model, while the servicetag | |
118 | link will point to a website containing support info for that | |
119 | particular server. | |
120 | ||
121 | This option takes an optional argument, which should be your country | |
122 | code or C<me> for the middle east. If the country code is omitted the | |
123 | servicetag link will still work, but it will not be speficic for your | |
124 | country or area. Example for Germany: | |
125 | ||
126 | check_openmanage --htmlinfo de | |
127 | ||
128 | If this option is used together with either the I<--extinfo> or | |
129 | I<--info> options, it is particularly useful. Only the most common | |
130 | country codes is supported at this time. | |
131 | ||
132 | =item --postmsg I<STRING> or I<FILE> | |
133 | ||
134 | User specified post message. Useful for displaying arbitrary or | |
135 | various system information at the end of alerts. The argument is | |
136 | either a string with the message, or a file containing that | |
137 | string. You can control the format with the following interpreted | |
138 | sequences: | |
139 | ||
140 | =over 4 | |
141 | ||
142 | =item B<%m> | |
143 | ||
144 | System model | |
145 | ||
146 | =item B<%s> | |
147 | ||
148 | Service tag | |
149 | ||
150 | =item B<%b> | |
151 | ||
152 | BIOS version | |
153 | ||
154 | =item B<%d> | |
155 | ||
156 | BIOS release date | |
157 | ||
158 | =item B<%o> | |
159 | ||
160 | Operating system name | |
161 | ||
162 | =item B<%r> | |
163 | ||
164 | Operating system release | |
165 | ||
166 | =item B<%p> | |
167 | ||
168 | Number of physical drives | |
169 | ||
170 | =item B<%l> | |
171 | ||
172 | Number of logical drives | |
173 | ||
174 | =item B<%n> | |
175 | ||
176 | Line break. Will be a regular line break if run from a TTY, else an | |
177 | HTML line break. | |
178 | ||
179 | =item B<%%> | |
180 | ||
181 | A literal C<%> | |
182 | ||
183 | =back | |
184 | ||
185 | =item -s, --state | |
186 | ||
187 | Prefix each alert with its corresponding service state (i.e. warning, | |
188 | critical etc.). This is useful in case of several alerts from the same | |
189 | monitored system. | |
190 | ||
191 | =item --short-state | |
192 | ||
193 | Same as the B<--state> option above, except that the state is | |
194 | abbreviated to a single letter (W=warning, C=critical etc.). | |
195 | ||
fb90e271 | 196 | =item --linebreak I<STRING> |
669797e1 | 197 | |
198 | check_openmanage will sometimes report more than one line, e.g. if | |
199 | there are several alerts. If the script has a TTY, it will use regular | |
200 | linebreaks. If not (which is the case with NRPE) it will use HTML | |
201 | linebreaks. Sometimes it can be useful to control what the plugin uses | |
202 | as a line separator, and this option provides that control. | |
203 | ||
204 | The argument is the exact string to be used as the line | |
205 | separator. There are two exceptions, i.e. two keywords that translates | |
206 | to the following: | |
207 | ||
208 | =over 4 | |
209 | ||
210 | =item B<REG> | |
211 | ||
212 | Regular linebreaks, i.e. "\n". | |
213 | ||
214 | =item B<HTML> | |
215 | ||
216 | HTML linebreaks, i.e. "<br/>". | |
217 | ||
218 | =back | |
219 | ||
220 | This is a rather special option that is normally not needed. The | |
221 | default behaviour should be sufficient for most users. | |
222 | ||
223 | =item -d, --debug | |
224 | ||
225 | Debug output. Will report status on everything, even if status is | |
226 | ok. Blacklisted or unchecked components are ignored (i.e. no output). | |
227 | ||
228 | NOTE: This option is intended for diagnostics and debugging purposes | |
229 | only. Do not use this option from within Nagios, i.e. in the Nagios | |
230 | config. | |
231 | ||
232 | =item -h, --help | |
233 | ||
234 | Display help text. | |
235 | ||
236 | =item -V, --version | |
237 | ||
238 | Display version info. | |
239 | ||
240 | =back | |
241 | ||
242 | =head1 SNMP OPTIONS | |
243 | ||
244 | =over 4 | |
245 | ||
246 | =item -H, --hostname I<HOSTNAME> | |
247 | ||
248 | The transport address of the destination SNMP device. Using this | |
249 | option triggers SNMP mode. | |
250 | ||
251 | =item -P, --protocol I<PROTOCOL> | |
252 | ||
253 | SNMP protocol version. This option is optional and expects a digit | |
254 | (i.e. C<1>, C<2> or C<3>) to define the SNMP version. The default is | |
255 | C<2>, i.e. SNMP version 2c. | |
256 | ||
257 | =item -C, --community I<COMMUNITY> | |
258 | ||
259 | This option expects a string that is to be used as the SNMP community | |
260 | name when using SNMP version 1 or 2c. By default the community name | |
261 | is set to C<public> if the option is not present. | |
262 | ||
263 | =item --port I<PORT> | |
264 | ||
265 | SNMP port of the remote (monitored) system. Defaults to the well-known | |
266 | SNMP port 161. | |
267 | ||
268 | =item -U, --username I<SECURITYNAME> | |
269 | ||
270 | [SNMPv3] The User-based Security Model (USM) used by SNMPv3 requires | |
271 | that a securityName be specified. This option is required when using | |
272 | SNMP version 3, and expects a string 1 to 32 octets in lenght. | |
273 | ||
274 | =item --authpassword I<PASSWORD>, --authkey I<KEY> | |
275 | ||
276 | [SNMPv3] By default a securityLevel of C<noAuthNoPriv> is assumed. If | |
277 | the --authpassword option is specified, the securityLevel becomes | |
278 | C<authNoPriv>. The --authpassword option expects a string which is at | |
279 | least 1 octet in length as argument. | |
280 | ||
281 | Optionally, instead of the --authpassword option, the --authkey option | |
282 | can be used so that a plain text password does not have to be | |
283 | specified in a script. The --authkey option expects a hexadecimal | |
284 | string produced by localizing the password with the | |
285 | authoritativeEngineID for the specific destination device. The | |
286 | C<snmpkey> utility included with the Net::SNMP distribution can be | |
287 | used to create the hexadecimal string (see L<snmpkey>). | |
288 | ||
289 | =item --authprotocol I<ALGORITHM> | |
290 | ||
291 | [SNMPv3] Two different hash algorithms are defined by SNMPv3 which can | |
292 | be used by the Security Model for authentication. These algorithms are | |
293 | HMAC-MD5-96 C<MD5> (RFC 1321) and HMAC-SHA-96 C<SHA-1> (NIST FIPS PUB | |
294 | 180-1). The default algorithm used by the plugin is HMAC-MD5-96. This | |
295 | behavior can be changed by using this option. The option expects | |
296 | either the string C<md5> or C<sha> to be passed as argument to modify | |
297 | the hash algorithm. | |
298 | ||
299 | =item --privpassword I<PASSWORD>, --privkey I<KEY> | |
300 | ||
301 | [SNMPv3] By specifying the options --privkey or --privpassword, the | |
302 | securityLevel associated with the object becomes | |
303 | C<authPriv>. According to SNMPv3, privacy requires the use of | |
304 | authentication. Therefore, if either of these two options are present | |
305 | and the --authkey or --authpassword arguments are missing, the | |
306 | creation of the object fails. The --privkey and --privpassword | |
307 | options expect the same input as the --authkey and --authpassword | |
308 | options respectively. | |
309 | ||
310 | =item --privprotocol I<ALGORITHM> | |
311 | ||
312 | [SNMPv3] The User-based Security Model described in RFC 3414 defines a | |
313 | single encryption protocol to be used for privacy. This protocol, | |
314 | CBC-DES C<DES> (NIST FIPS PUB 46-1), is used by default or if the | |
315 | string C<des> is passed to the --privprotocol option. The Net::SNMP | |
316 | module also supports RFC 3826 which describes the use of | |
317 | CFB128-AES-128 C<AES> (NIST FIPS PUB 197) in the USM. The AES | |
318 | encryption protocol can be selected by passing C<aes> or C<aes128> to | |
319 | the --privprotocol option. | |
320 | ||
321 | One of the following arguments are required: des, aes, aes128, 3des, | |
322 | 3desde | |
323 | ||
324 | =back | |
325 | ||
326 | =head1 BLACKLISTING | |
327 | ||
328 | =over 4 | |
329 | ||
330 | =item -b, --blacklist I<STRING> or I<FILE> | |
331 | ||
332 | Blacklist missing and/or failed components, if you do not plan to fix | |
333 | them. The parameter is either the blacklist string, or a file (that | |
334 | may or may not exist) containing the string. The blacklist string | |
335 | contains component names with component IDs separated by slash | |
336 | (/). Blacklisted components are left unchecked. | |
337 | ||
338 | TIP: Use the option C<-d> (or C<--debug>) to get the blacklist ID for | |
339 | devices. The ID is listed in a separate column in the debug output. | |
340 | ||
0b6ba9c9 | 341 | NOTE: If blacklisting is in effect, the global health of the system is |
342 | not checked. | |
669797e1 | 343 | |
344 | =over 9 | |
345 | ||
346 | =item B<Syntax:> | |
347 | ||
348 | component1=id1[,id2,...]/component2=id1[,id2,...]/... | |
349 | ||
0b6ba9c9 | 350 | The ID part can also be C<ALL>, in which all components of that type |
351 | is blacklisted. | |
352 | ||
669797e1 | 353 | =item B<Example:> |
354 | ||
0b6ba9c9 | 355 | check_openmanage -b ps=0/fan=3,5/pdisk=1:0:0:1/ctrl_driver=ALL |
669797e1 | 356 | |
357 | =back | |
358 | ||
0b6ba9c9 | 359 | In the example we blacklist powersupply 0, fans 3 and 5, physical disk |
360 | 1:0:0:1, and warnings about out-of-date drivers for all | |
361 | controllers. Legal component names include: | |
669797e1 | 362 | |
363 | =over 8 | |
364 | ||
365 | =item B<ctrl> | |
366 | ||
0b6ba9c9 | 367 | Storage controller. Note that if a controller is blacklisted, all |
368 | components on that controller (such as physical and logical drives) | |
369 | are blacklisted as well. | |
669797e1 | 370 | |
371 | =item B<ctrl_fw> | |
372 | ||
373 | Suppress the special warning message about old controller | |
374 | firmware. Use this if you can not or will not upgrade the firmware. | |
375 | ||
376 | =item B<ctrl_driver> | |
377 | ||
378 | Suppress the special warning message about old controller driver. | |
379 | Particularly useful on systems where you can not upgrade the driver. | |
380 | ||
381 | =item B<pdisk> | |
382 | ||
383 | Physical disk. | |
384 | ||
385 | =item B<vdisk> | |
386 | ||
387 | Logical drive (virtual disk) | |
388 | ||
389 | =item B<bat> | |
390 | ||
391 | Controller cache battery | |
392 | ||
7b02bc55 | 393 | =item B<bat_charge> |
394 | ||
395 | Ignore warnings related to the controller cache battery charging | |
7031b02a | 396 | cycle, which happens approximately every 40 days on Dell servers. Note |
397 | that using this blacklist keyword makes check_openmanage ignore | |
398 | non-critical cache battery errors. | |
7b02bc55 | 399 | |
669797e1 | 400 | =item B<conn> |
401 | ||
402 | Connector (channel) | |
403 | ||
404 | =item B<encl> | |
405 | ||
406 | Enclosure | |
407 | ||
408 | =item B<encl_fan> | |
409 | ||
410 | Enclosure fan | |
411 | ||
412 | =item B<encl_ps> | |
413 | ||
414 | Enclosure power supply | |
415 | ||
416 | =item B<encl_temp> | |
417 | ||
418 | Enclosure temperature probe | |
419 | ||
420 | =item B<encl_emm> | |
421 | ||
422 | Enclosure management module (EMM) | |
423 | ||
424 | =item B<dimm> | |
425 | ||
426 | Memory module | |
427 | ||
428 | =item B<fan> | |
429 | ||
430 | Fan | |
431 | ||
432 | =item B<ps> | |
433 | ||
434 | Powersupply | |
435 | ||
436 | =item B<temp> | |
437 | ||
438 | Temperature sensor | |
439 | ||
440 | =item B<cpu> | |
441 | ||
442 | Processor (CPU) | |
443 | ||
444 | =item B<volt> | |
445 | ||
446 | Voltage probe | |
447 | ||
448 | =item B<bp> | |
449 | ||
450 | System battery | |
451 | ||
452 | =item B<pm> | |
453 | ||
454 | Amperage probe (power consumption monitoring) | |
455 | ||
456 | =item B<intr> | |
457 | ||
458 | Intrusion sensor | |
459 | ||
460 | =back | |
461 | ||
462 | =back | |
463 | ||
464 | =head1 CHECK CONTROL | |
465 | ||
466 | =over 4 | |
467 | ||
468 | =item --only I<KEYWORD> | |
469 | ||
470 | This option can be specifed once and expects a keyword. The different | |
471 | keywords and the behaviour of check_openmanage is described below. | |
472 | ||
473 | =over 4 | |
474 | ||
475 | =item B<critical> | |
476 | ||
477 | Print only critical alerts. With this option any warning alerts are | |
478 | suppressed. | |
479 | ||
480 | =item B<warning> | |
481 | ||
482 | Print only warning alerts. With this option any critical alerts are | |
483 | suppressed. | |
484 | ||
485 | =item B<chassis> | |
486 | ||
487 | Check all chassis components and nothing else. | |
488 | ||
489 | =item B<storage> | |
490 | ||
491 | Only check storage | |
492 | ||
493 | =item B<memory> | |
494 | ||
495 | Only check memory modules | |
496 | ||
497 | =item B<fans> | |
498 | ||
499 | Only check fans | |
500 | ||
501 | =item B<power> | |
502 | ||
503 | Only check power supplies | |
504 | ||
505 | =item B<temp> | |
506 | ||
507 | Only check temperatures | |
508 | ||
509 | =item B<cpu> | |
510 | ||
511 | Only check processors | |
512 | ||
513 | =item B<voltage> | |
514 | ||
515 | Only check voltage probes | |
516 | ||
517 | =item B<batteries> | |
518 | ||
519 | Only check batteries | |
520 | ||
521 | =item B<amperage> | |
522 | ||
523 | Only check power usage | |
524 | ||
525 | =item B<intrusion> | |
526 | ||
527 | Only check chassis intrusion | |
528 | ||
529 | =item B<esmhealth> | |
530 | ||
531 | Only check ESM log overall health, i.e. fill grade | |
532 | ||
533 | =item B<esmlog> | |
534 | ||
535 | Only check the event log (ESM) content | |
536 | ||
537 | =item B<alertlog> | |
538 | ||
539 | Only check the alert log content | |
540 | ||
541 | =back | |
542 | ||
543 | =item --check I<STRING> or I<FILE> | |
544 | ||
545 | This parameter allows you to adjust which components that should be | |
546 | checked at all. This is a rougher approach than blacklisting, which | |
547 | require that you specify component id or index. The parameter should | |
548 | be either a string containing the adjustments, or a file containing | |
549 | the string. No errors are raised if the file does not exist. | |
550 | ||
551 | Note: This option is ignored with alternate basenames. | |
552 | ||
553 | =over 9 | |
554 | ||
555 | =item B<Example:> | |
556 | ||
557 | check_openmanage --check storage=0,intrusion=1 | |
558 | ||
559 | =back | |
560 | ||
561 | Legal values are described below, along with the default value. | |
562 | ||
563 | =over 4 | |
564 | ||
565 | =item B<storage> | |
566 | ||
567 | Check storage subsystem (controllers, disks etc.). Default: ON | |
568 | ||
569 | =item B<memory> | |
570 | ||
571 | Check memory (dimms). Default: ON | |
572 | ||
573 | =item B<fans> | |
574 | ||
575 | Check chassis fans. Default: ON | |
576 | ||
577 | =item B<power> | |
578 | ||
579 | Check power supplies. Default: ON | |
580 | ||
581 | =item B<temp> | |
582 | ||
583 | Check temperature sensors. Default: ON | |
584 | ||
585 | =item B<cpu> | |
586 | ||
587 | Check CPUs. Default: ON | |
588 | ||
589 | =item B<voltage> | |
590 | ||
591 | Check voltage sensors. Default: ON | |
592 | ||
593 | =item B<batteries> | |
594 | ||
595 | Check system batteries. Default: ON | |
596 | ||
597 | =item B<amperage> | |
598 | ||
599 | Check amperage probes. Default: ON | |
600 | ||
601 | =item B<intrusion> | |
602 | ||
603 | Check chassis intrusion. Default: ON | |
604 | ||
605 | =item B<esmhealth> | |
606 | ||
607 | Check the ESM log health, i.e. fill grade. Default: ON | |
608 | ||
609 | =item B<esmlog> | |
610 | ||
611 | Check the ESM log content. Default: OFF | |
612 | ||
613 | =item B<alertlog> | |
614 | ||
615 | Check the alert log content. Default: OFF | |
616 | ||
617 | =back | |
618 | ||
619 | =back | |
620 | ||
621 | =head1 DIAGNOSTICS | |
622 | ||
623 | The option C<--debug> (or C<-d>) can be specified to display all | |
624 | monitored components. | |
625 | ||
626 | =head1 DEPENDENCIES | |
627 | ||
628 | If SNMP is requested, the perl module Net::SNMP is | |
629 | required. Otherwise, only a regular perl distribution is required to | |
630 | run the script. On the target (monitored) system, Dell Openmanage | |
631 | Server Administrator (OMSA) must be installed and running. | |
632 | ||
633 | =head1 EXIT STATUS | |
634 | ||
635 | If no errors are discovered, a value of 0 (OK) is returned. An exit | |
636 | value of 1 (WARNING) signifies one or more non-critical errors, while | |
637 | 2 (CRITICAL) signifies one or more critical errors. | |
638 | ||
639 | The exit value 3 (UNKNOWN) is reserved for errors within the script, | |
640 | or errors getting values from Dell OMSA. | |
641 | ||
642 | =head1 AUTHOR | |
643 | ||
644 | Written by Trond H. Amundsen <t.h.amundsen@usit.uio.no> | |
645 | ||
646 | =head1 BUGS AND LIMITATIONS | |
647 | ||
648 | Storage info is not collected or checked on very old PowerEdge models | |
649 | and/or old OMSA versions, due to limitations in OMSA. The overall | |
650 | support on those models/versions by this plugin is not well tested. | |
651 | ||
652 | =head1 INCOMPATIBILITIES | |
653 | ||
654 | The plugin should work with the Nagios embedded perl interpreter | |
655 | (ePN). However, this is not thoroughly tested. | |
656 | ||
657 | =head1 REPORTING BUGS | |
658 | ||
659 | Report bugs to <t.h.amundsen@usit.uio.no> | |
660 | ||
661 | =head1 LICENSE AND COPYRIGHT | |
662 | ||
663 | This program is free software: you can redistribute it and/or modify | |
664 | it under the terms of the GNU General Public License as published by | |
665 | the Free Software Foundation, either version 3 of the License, or (at | |
666 | your option) any later version. | |
667 | ||
668 | This program is distributed in the hope that it will be useful, but | |
669 | WITHOUT ANY WARRANTY; without even the implied warranty of | |
670 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
671 | General Public License for more details. | |
672 | ||
673 | You should have received a copy of the GNU General Public License | |
674 | along with this program. If not, see L<http://www.gnu.org/licenses/>. | |
675 | ||
676 | =head1 SEE ALSO | |
677 | ||
678 | L<http://folk.uio.no/trondham/software/check_openmanage.html> | |
679 | ||
680 | =cut |