Nagios に API でアクセスすることが出来る nagira を触ってみる

はじめに

幾つかのサーバーを Nagios で監視しているが、監視設定は手動となっているので、それを自動化できたら嬉しいなと情報収集していたら nagira なる Nagios を API で操作出来るラッパーがあるらしいので試してみる。ちなみに、サーバー構築後は以下のようなワークフローイメージを抱いている*1。

f:id:inokara:20140210005251p:plain

抱いているイメージを実現する為の一つのツールとして利用出来れば嬉しいなあということで期待を込めて触ってみる。また、NagiosQL を使っている場合にはこちらのようなツールがあったり、NCPA というクライアントツールでも API を提供しているようだが、こちらは次回試してみたい。

参考

準備

nagira って？

Sinatra で実装された軽量な Nagios の RESTful な API 環境
Nagios の hosts や services 等のキャッシュファイルにアクセスして JSON 形式で値を返してくれる
Nagios の設定ファイルを読んで同じく JSON 形式で値を返してくれる

基本的には status.dat や objects.cache 等を読み取って JSON で返すというのがメインのお仕事のようだ。

環境

Amazon EC2 t1.micro
Ubuntu 13.10 amd64
既に Ruby 2.0 を apt-get にてインストール済み
尚、こちらに cookbook をアップしておいたので細かい流れはそちらで...*2

Nagios のインストール

とっても簡単で apt-get で一発。

sudo apt-get install nagios3

ただし、nagios ではなくて nagios3 なのでそこだけ注意。

nagira のインストール

nagira は gem でインストールする。

sudo gem install nagira --no-ri --no-rdoc -V

すんなりインストールが終了しないのは手慣れた*3ものなので適宜対応する。今回だと以下の gem のインストールで若干躓いた...

今回は gem install nagira で他の gem と一緒にインストールしようとするとコケるのに単体の gem でインストールするとちゃんとインストール出来てしまうケースだった。なんでやろ...

ちょっとだけ修正

nagira をインストールした後、nagira を起動するものの以下のように出力されて起動しない。

Starting Sinatra Nagira services:    /etc/init.d/nagira: line 68: /var/log/nagios/nagira.log: No such file or directory
[FAIL]

どうやら /var/log/nagios/ が存在していないようなので以下のように修正する。

--- /tmp/nagira 2014-02-09 03:46:51.891958000 +0000
+++ nagira      2014-02-09 15:07:20.675958000 +0000
@@ -43,7 +43,7 @@
 RACK_ENV=${RACK_ENV:-production}
 NAGIRA_USER=${NAGIRA_USER:-nagios}
 RVM_STRING=${RVM_STRING:-"true"} # i.e. do nothing special
-NAGIRA_LOG=${NAGIRA_LOG:-/var/log/nagios/nagira.log}
+NAGIRA_LOG=${NAGIRA_LOG:-/var/log/nagios3/nagira.log}
 # There's no default for NAGIOS_CFG_FILE. Default is actually to have
 # it unset, Nagira would be able to find nagios config if it is in one
 # of the standard locations (/etc or /usr/local). If your Nagios
@@ -57,8 +57,8 @@
 get_pid () {
 # ps -U ${
 # ps -C ruby -o pid,comm,cmd
-     #echo $(ps axo pid,command | awk '$2 ~ /ruby/ && $3 ~ /nagira/ {print $1}')
-     echo $(ps -C ruby -o pid,cmd | awk '$2 ~ /^\/usr.*bin\/nagira *$/ {print $1}')
+     echo $(ps axo pid,command | awk '$2 ~ /ruby/ && $3 ~ /nagira/ {print $1}')
+     #echo $(ps -C ruby -o pid,cmd | awk '$2 ~ /^\/usr.*bin\/nagira *$/ {print $1}')
 }

また、pid ファイルの取得が取得出来ないので pid ファイルを取得出来るようにも修正している。

動作確認

動作確認にあたりというか最近では必須になりつつ jq はインストールしておいた方がイイと思う。

_api

どんな機能があるかは以下のコマンドで確認することが出来る。

curl -s http://localhost:4567/_api | jq .

以下のように出力される。

{
  "PUT": [
    "/_status/:host_name/_services",
    "/_status/:host_name/_services/:service_description",
    "/_status/:host_name/_services/:service_description/_return_code/:return_code/_plugin_output/:plugin_output",
    "/_status",
    "/_status/:host_name",
    "/_host_status/:host_name"
  ],
  "HEAD": [
    "/_config",
    "/_objects",
    "/_objects/:type",
    "/_objects/:type/:name",
    "/_status/:hostname/_services/:service_name",
    "/_status/(?<hostname>w([w-.]+)?w)/_(?<service>(services|hostcomments|servicecomments))",
    "/_status(/_hosts)?",
    "/_status/(?<hostname>w([w-.]+)?w)",
    "/_api",
    "/_runtime",
    "/"
  ],
  "GET": [
    "/_config",
    "/_objects",
    "/_objects/:type",
    "/_objects/:type/:name",
    "/_status/:hostname/_services/:service_name",
    "/_status/(?<hostname>w([w-.]+)?w)/_(?<service>(services|hostcomments|servicecomments))",
    "/_status(/_hosts)?",
    "/_status/(?<hostname>w([w-.]+)?w)",
    "/_api",
    "/_runtime",
    "/"
  ]
}

以下に基本的な機能を試す。

_status/${host}

以下を実行すると監視対象 ${host} のステータスを確認することが出来る。

curl -s http://localhost:4567/_status/_hosts | jq .

以下のような出力が得られる。

{
  "scheduled_downtime_depth": "0",
  "percent_state_change": "0.00",
  "is_flapping": "0",
  "last_update": "1391960823",
  "obsess_over_host": "1",
  "process_performance_data": "1",
  "failure_prediction_enabled": "1",
  "flap_detection_enabled": "1",
  "event_handler_enabled": "1",
  "passive_checks_enabled": "1",
  "active_checks_enabled": "1",
  "acknowledgement_type": "0",
  "problem_has_been_acknowledged": "0",
  "notifications_enabled": "1",
  "current_notification_id": "0",
  "current_notification_number": "0",
  "no_more_notifications": "0",
  "current_problem_id": "0",
  "current_event_id": "0",
  "last_event_id": "0",
  "last_hard_state": "0",
  "current_state": "0",
  "check_type": "0",
  "check_latency": "0.139",
  "check_execution_time": "0.010",
  "host_name": "localhost",
  "modified_attributes": "0",
  "check_command": "check-host-alive",
  "notification_period": "24x7",
  "check_interval": "5.000000",
  "retry_interval": "1.000000",
  "has_been_checked": "1",
  "should_be_scheduled": "1",
  "last_problem_id": "0",
  "plugin_output": "PING OK - Packet loss = 0%, RTA = 0.05 ms",
  "performance_data": "rta=0.054000ms;5000.000000;5000.000000;0.000000 pl=0%;100;100;0",
  "last_check": "1391960733",
  "next_check": "1391961043",
  "check_options": "0",
  "current_attempt": "1",
  "max_attempts": "10",
  "state_type": "1",
  "last_state_change": "1391912113",
  "last_hard_state_change": "1391912113",
  "last_time_up": "1391960743",
  "last_time_down": "0",
  "last_time_unreachable": "0",
  "last_notification": "0",
  "next_notification": "0"
}

引き続き、試して追記していく。

さいごに

gem の依存関係で若干、怪しい部分があるものの手軽に Nagios を操作出来るのは素晴らしい！
API で操作出来ることでサーバー構築後の監視設定を自動化出来そうな気がして嬉しい
幾つかの監視ツールを触ってきた中で一番慣れているのは Nagios だったりするが最近は Zabbix なのかなー

*1:青写真

*2:現在、テスト中

*3:そうでもないが...

ようへいの日々精進XP

よかろうもん