dockerubuntudocker-composetelegrafinfluxdb-2

Telegraf is sending incorrect data to influxdb2?


I'm new to telegraf / influxdb2 :)

I've made a quick docker setup to monitor a cloud Ubuntu VM. To do so, I've created a "default like" docker-compose file with telegraf, influxdb2 & grafana. It works just fine on my ubuntu laptop. When I run it on the ubuntu cloud VM, I have very strange numbers for my metrics.

For example

What is very strange to me is that the docker inputs is working fine :/

This is driving me crazy :) did someone has already encounter the same kind of thing ? And again, works just fine on my laptop :)

docker-compose.yml

version: '3.1'
services:
  grafana:
    image: grafana/grafana
    container_name: grafana
    restart: unless-stopped
    depends_on:
      - telegraf
    volumes:
      - ./grafana/provisioning/:/etc/grafana/provisioning/
      - ./grafana/dashboards/:/var/lib/grafana/dashboards/
      - ./grafana/grafana.ini:/etc/grafana/grafana.ini
    ports:
      - 3000:3000
  influxdb:
    image: influxdb:2.5.1
    container_name: influxdb
    restart: unless-stopped
    ports:
      - 8086:8086
    environment:
      - DOCKER_INFLUXDB_INIT_USERNAME=xxxx
      - DOCKER_INFLUXDB_INIT_PASSWORD=yyyyy
      - DOCKER_INFLUXDB_INIT_ORG=myorg
      - DOCKER_INFLUXDB_INIT_BUCKET=mybucket
      - DOCKER_INFLUXDB_INIT_RETENTION=3w
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=my-token
    volumes:
      - data-influx:/var/lib/influxdb2 
  telegraf:
    image: telegraf:1.24.3-alpine
    container_name: telegraf
    restart: unless-stopped
    depends_on:
      - influxdb
    volumes:
      - ./telegraf/etc/telegraf.conf:/etc/telegraf/telegraf.conf:ro
      - /var/run/docker.sock:/var/run/docker.sock
      - /sys:/rootfs/sys:ro
      - /proc:/rootfs/proc:ro
      - /etc:/rootfs/etc:ro
    user: telegraf:999
      
volumes:
  data-influx:

telegraf.conf

[global_tags]

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  flush_buffer_when_full = true
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  quiet = false
  hostname = "LoubVM"

[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086"]
  token = "my-token"
  organization = "myorg"
  bucket = "mybucket"
[[inputs.statsd]]
  protocol = "udp"
  max_tcp_connections = 250
  tcp_keep_alive = false
  service_address = ":8125"
  delete_gauges = true
  delete_counters = true
  delete_sets = true
  delete_timings = true
  percentiles = [90]
  metric_separator = "_"
  parse_data_dog_tags = false
  allowed_pending_messages = 10000
  percentile_limit = 1000

[[inputs.cpu]]
  percpu = true
  totalcpu = true

[[inputs.disk]]
  mount_points = ["/"]

[[inputs.diskio]]

[[inputs.kernel]]

[[inputs.mem]]

[[inputs.processes]]

[[inputs.swap]]

[[inputs.system]]

[[inputs.net]]

[[inputs.netstat]]

[[inputs.interrupts]]

[[inputs.linux_sysctl_fs]]

[[inputs.docker]]
  endpoint = "unix:///var/run/docker.sock"
  gather_services = false
  source_tag = false
  container_name_include = []
  container_name_exclude = []
  timeout = "5s"
  total = false
  docker_label_include = []
  docker_label_exclude = []


Solution

  • According to my cloud provider, it's because my ubuntu VM is a VPS (Virtual Private Server). Telegraf catch some hypervisor's metric which ends in wrong data in influx/grafana.

    My workaround is about creating some new metric with custom scripts, scheduled by cron, that send metric to statsd (component of telegraf, that you can use to send data to telegraf). My monitoring script is like

    function SendToStatsd {
    measurement=$1
    value=$2
    echo "${measurement}:${value}|c" | nc -w10 -u 127.0.0.1 8125
    }
    
    ## Uptime # Threshold in grafana 2419200 (4 weeks)
    myUptime=$(awk '{print $1}' /proc/uptime)
    SendToStatsd myUptime $myUptime
    
    ## Swap
    mySwap=$(free |grep Swap)
    mySwapTotal=$(echo $mySwap| awk '{print $2}')
    SendToStatsd mySwapTotal $mySwapTotal
    
    # etc etc ...