Fluentd: mengapa menyetel buffer keluaran itu penting



Saat ini, mustahil membayangkan proyek berbasis Kubernetes tanpa tumpukan ELK, yang dengannya log aplikasi dan komponen sistem dari kluster disimpan. Dalam praktik kami, kami menggunakan tumpukan EFK dengan Fluentd, bukan Logstash.



Fluentd β€” , Cloud Native Computing Foundation, - Kubernetes.



Fluentd Logstash , , Fluentd , .



, EFK , , Kibana . , .





Fluentd DaemonSet ( Kubernetes) stdout /var/log/containers. JSON- ElasticSearch, standalone , . Kibana.



Fluentd , ElasticSearch . , Nginx. :



127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777 "-" "Opera/12.0" -


, ElasticSearch , :



{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "HgGl_nIBR8C-2_33RlQV",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}

{
  "_index": "test-custom-prod-example-2020.01.02",
  "_type": "_doc",
  "_id": "IgGm_nIBR8C-2_33e2ST",
  "_version": 1,
  "_score": 0,
  "_source": {
    "service": "test-custom-prod-example",
    "container_name": "nginx",
    "namespace": "test-prod",
    "@timestamp": "2020-01-14T05:29:47.599052886 00:00",
    "log": "127.0.0.1 192.168.0.1 - [28/Feb/2013:12:00:00  0900] \"GET / HTTP/1.1\" 200 777 \"-\" \"Opera/12.0\" -",
    "tag": "custom-log"
  }
}


, .



Fluentd :



2020-01-16 01:46:46 +0000 [warn]: [test-prod] failed to flush the buffer. retry_time=4 next_retry_seconds=2020-01-16 01:46:53 +0000 chunk="59c37fc3fb320608692c352802b973ce" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"elasticsearch\", :port=>9200, :scheme=>\"http\", :user=>\"elastic\", :password=>\"obfuscated\"}): read timeout reached"


ElasticSearch request_timeout , - . Fluentd ElasticSearch :



2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fc3fb320608692c352802b973ce" 
2020-01-16 01:47:05 +0000 [warn]: [test-prod] retry succeeded. chunk_id="59c37fad241ab300518b936e27200747" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fc11f7ab707ca5de72a88321cc2" 
2020-01-16 01:47:05 +0000 [warn]: [test-dev] retry succeeded. chunk_id="59c37fb5adb70c06e649d8c108318c9b" 
2020-01-16 01:47:15 +0000 [warn]: [kube-system] retry succeeded. chunk_id="59c37f63a9046e6dff7e9987729be66f"


, ElasticSearch _id . .



Kibana :







. β€” fluent-plugin-elasticsearch . , ElasticSearch . , -, .



Fluentd, . - ElasticSearch , , . , , , , , Fluentd .



, , , , : , , . , , , , , Fluentd .



:



 <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.test.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 8M
        queue_limit_length 8
        overflow_action block
      </buffer>


:

chunk_limit_size β€” , .



  • flush_interval β€” , .
  • queue_limit_length β€” .
  • request_timeout β€” , Fluentd ElasticSearch.


, queue_limit_length chunk_limit_size, Β« , Β». :



2020-01-21 10:22:57 +0000 [warn]: [test-prod] failed to write data into buffer by buffer overflow action=:block


, , , , .



: , , .



chunk_limit_size 32 , ElasticSeacrh , . , , queue_limit_length.



-, request_timeout. , 20 , Fluentd :



2020-01-21 09:55:33 +0000 [warn]: [test-dev] buffer flush took longer time than slow_flush_log_threshold: elapsed_time=20.85753920301795 slow_flush_log_threshold=20.0 plugin_id="postgresql-dev" 


, , slow_flush_log_threshold. request_timeout.



:



  1. request_timeout , ( ). -.
  2. slow_flush_log_threshold. elapsed_time .
  3. request_timeout , elapsed_time, . request_timeout elapsed_time + 50%.
  4. , slow_flush_log_threshold. elapsed_time + 25%.


, , . , , .



, , , :



node-1 node-2 node-3 node-4
/ / / /
failed to flush the buffer 1749/2 694/2 47/0 1121/2
retry succeeded 410/2 205/1 24/0 241/2


, , , . - Fluentd , slow_flush_log_threshold. request_timeout, , .





Fluentd EFK , . , , ElasticSearch , .



:






All Articles