ようへいの日々精進XP

よかろうもん

fluent-plugin-datacounter を使って日本語を含んだログメッセージをカウントしたいと思った時に僕が選んだ苦肉の策

きっかけ

日本語を含んでいるログメッセージを fluent-plugin-datacounter でカウント出来ないかなって思ったので試行錯誤してみた。


参考

github.com

recipe.kc-cloud.jp


で、どうやったか

最初は...

以下のように設定して...

<source>
  type forward
  port 24224
  bind 0.0.0.0
</source>

<match fish.log>
   type copy
   <store>
     type datacounter
     unit hour
     aggregate all
     count_key fish
     pattern1 pattern01 鮪
     pattern2 pattern02 秋刀魚
     pattern3 pattrrn03 鮭
     tag datacount.log
   </store>
   <store>
     type stdout
   </store>
</match>

<match datacount.log>
  type stdout
</match>

以下のようにやってみたけど...

echo '{"fish":"鮪"}' | fluent-cat fish.log
echo '{"fish":"秋刀魚"}' | fluent-cat fish.log
echo '{"fish":"鮭"}' | fluent-cat fish.log

以下のようなエラーが出たので断念。

2015-06-22 15:06:23 +0000 [warn]: emit transaction failed  error_class=Encoding::CompatibilityError error=#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluent-plugin-datacounter-0.4.4/lib/fluent/plugin/out_datacounter.rb:262:in `match'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluent-plugin-datacounter-0.4.4/lib/fluent/plugin/out_datacounter.rb:262:in `block (2 levels) in emit'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluent-plugin-datacounter-0.4.4/lib/fluent/plugin/out_datacounter.rb:261:in `each'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluent-plugin-datacounter-0.4.4/lib/fluent/plugin/out_datacounter.rb:261:in `block in emit'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/event.rb:129:in `call'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/event.rb:129:in `block in each'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/event.rb:128:in `each'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/event.rb:128:in `each'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluent-plugin-datacounter-0.4.4/lib/fluent/plugin/out_datacounter.rb:255:in `emit'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/match.rb:36:in `emit'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/engine.rb:160:in `emit_stream'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:160:in `on_message'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:235:in `call'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:235:in `block in on_read_msgpack'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:234:in `feed_each'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:234:in `on_read_msgpack'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:220:in `call'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:220:in `on_read'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/cool.io-1.2.4/lib/cool.io/io.rb:128:in `on_readable'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/cool.io-1.2.4/lib/cool.io/io.rb:191:in `on_readable'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/cool.io-1.2.4/lib/cool.io/loop.rb:88:in `run_once'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/cool.io-1.2.4/lib/cool.io/loop.rb:88:in `run'
  2015-06-22 15:06:23 +0000 [warn]: /var/lib/gems/2.1.0/gems/fluentd-0.10.58/lib/fluent/plugin/in_forward.rb:93:in `run'

当然、プラグインを弄ったりする能力なんて無いので困った、困った、こまどり姉妹

ということで、僕が選んだのは...

  • exec_filter を利用してスクリプトでログメッセージを解析して、ログメッセージに基づいたコードを付与
  • コードが付与されたログメッセージの付与されたコードをカウントする

という苦肉の策。

exec_filter で使うスクリプト

# encoding: utf-8

require 'json'
require 'msgpack'

codes = {
  "" => "fish01",
  "秋刀魚" => "fish02",
  "" => "fish03",
}

while str = STDIN.gets
  str.chomp!
  s = JSON.parse(str)
  codes.each do |k,v|
    if s["fish"].include?("#{k}")
      s.store("fish_code", "#{v}")
      print MessagePack.pack(s)
      STDOUT.flush
    end
  end
end

上記のように exec_filter で利用するスクリプトで以下のように魚の種類に基づいたコードを付与する。

fish fish_code
fish01
秋刀魚 fish02
fish03

fluentd の設定

事前に fluent-plugin-datacounter はインストールしておく。

<source>
  type forward
  port 24224
  bind 0.0.0.0
</source>

<match fish.log>
  type       exec_filter
  command    /usr/bin/ruby /path/to/log-filter.rb
  in_format  json
  out_format msgpack
  tag        fish.filterd.log
  flush_interval 1s
</match>

<match fish.filterd.log>
   type copy
   <store>
     type datacounter
     unit minute
     aggregate all
     count_key fish_code
     pattern1 maguro fish01
     pattern2 sanam fish02
     pattern3 shake fish03
     tag datacount.log
   </store>
   <store>
      type stdout
   </store>
</match>

<match datacount.log>
  type stdout
</match>

試す

改めて...

echo '{"fish":"鮪"}' | fluent-cat fish.log
echo '{"fish":"秋刀魚"}' | fluent-cat fish.log
echo '{"fish":"鮭"}' | fluent-cat fish.log
echo '{"fish":"鮪"}' | fluent-cat fish.log
echo '{"fish":"秋刀魚"}' | fluent-cat fish.log
echo '{"fish":"鮭"}' | fluent-cat fish.log

ログを確認すると...exec_filter を介して fish_code というキーと fish に該当するコードが付与されている。うっしっし。

2015-06-23 14:13:40 +0000 fish.filterd.log: {"fish":"鮪","fish_code":"fish01"}
2015-06-23 14:13:40 +0000 fish.filterd.log: {"fish":"秋刀魚","fish_code":"fish02"}
2015-06-23 14:13:40 +0000 fish.filterd.log: {"fish":"鮭","fish_code":"fish03"}
2015-06-23 14:13:42 +0000 fish.filterd.log: {"fish":"鮪","fish_code":"fish01"}
2015-06-23 14:13:42 +0000 fish.filterd.log: {"fish":"秋刀魚","fish_code":"fish02"}
2015-06-23 14:13:44 +0000 fish.filterd.log: {"fish":"鮭","fish_code":"fish03"}

更にログを確認。

2015-06-23 14:27:23 +0000 datacount.log: {"unmatched_count":0,"unmatched_rate":0.0,"unmatched_percentage":0.0,"maguro_count":2,"maguro_rate":0.03,"maguro_percentage":33.333333333333336,"sanma_count":2,"sanma_rate":0.03,"sanma_percentage":33.333333333333336,"shake_count":2,"shake_rate":0.03,"shake_percentage":33.333333333333336}

おお、ちゃんとカウントされてるし、Datadog に飛ばせば以下のように。

f:id:inokara:20150624002616p:plain


ということで

苦肉の策でした。 も少しシンプルな方法が無いかを考えたい。