docker grafana docker-swarm grafana-loki portainer

Grafana cannot retrieve logs (and labels) from Loki

I have Docker Swarm environment with several nodes, one of them called gateway. On all nodes installed Docker Plugin grafana/loki-docker-driver:2.9.1 instead of using Promtail. I used Portainer monitoring template to deploy stack which includes Grafana 9.5.2. This stack will be stack A for simplicity. Also I have stack B with Loki 2.9.1 and mingrammer/flog as log producer to check if everything is fine. So, stack A and stack B use same net docker network. Constraints are the same, so both stacks are one the same node gateway:

- node.role == manager
- node.labels.monitoring == true

This is stack B

version: '3.8'

x-logging: &logging
  logging:
    driver: loki
    options:
      loki-url: "http://host.docker.internal:3100/loki/api/v1/push"

services:
  loki:
    image: grafana/loki:2.9.1
    <<: *logging
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.role == manager
          - node.labels.monitoring == true
    ports:
      - target: 3100
        published: 3100
        protocol: tcp
        mode: ingress
    volumes:
      - loki-data:/loki
    configs:
      - source: loki_config
        target: /etc/loki/local-config.yaml
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - net
  
  log-generator:
      image: mingrammer/flog
      deploy:
        replicas: 1
        restart_policy:
          condition: on-failure
        placement:
          constraints:
            - node.role == manager
            - node.labels.monitoring == true
      command:
        - --loop
        - --format=json
        - --number=10 # number of log lines to generate per second
        - --delay=100ms # delay between log lines
        - --output=/var/log/generated-logs.txt
        - --overwrite
        - --type=log
      volumes:
        - loki-data:/var/log/
      networks:
        - net

volumes:
  loki-data:

networks:
  net:
    name: monitoring_net
    external: true

configs:
  loki_config:
    external: true

and this is config for Loki

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

frontend:
    address: 0.0.0.0

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 48h

storage_config:
  boltdb:
    directory: /loki/index
  filesystem:
    directory: /loki/chunks

This is stack A:

version: "3.8"

services:
  grafana:
    image: portainer/template-swarm-monitoring:grafana-9.5.2
    ports:
      - target: 3000
        published: 3000
        protocol: tcp
        mode: ingress
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.role == manager
          - node.labels.monitoring == true
    volumes:
      - type: volume
        source: grafana-data
        target: /var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=${GRAFANA_USER}
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_USERS_ALLOW_SIGN_UP=false
    networks:
      - net    
...

volumes:
  grafana-data:
  prometheus-data:

networks:
  net:
    driver: overlay

configs:
  prometheus_conf:
    external:
      name: prometheus_config

In the Grafana UI I tried to add new connection: and before has been constantly saying:

Unable to fetch labels from Loki (Failed to call resource), please check the server logs for more details

Then I realize, that address is wrong and set it to http://loki:3100 and now it says:

Data source connected, but no labels received. Verify that Loki and Promtail is configured properly.

Logs from Grafana on this request:

logger=context userId=1 orgId=1 uname=admin t=2023-09-27T09:46:06.21413999Z level=error msg="Datasource has already been updated by someone else. Please reload and try again" error="trying to update old version of datasource" remote_addr=10.0.0.2 traceID=
logger=context userId=1 orgId=1 uname=admin t=2023-09-27T09:46:06.21431287Z level=info msg="Request Completed" method=PUT path=/api/datasources/uid/a4c05563-ea62-4f76-b2a7-b6f964314b88 status=409 remote_addr=10.0.0.2 time_ms=1 duration=1.141585ms size=107 referer=http://<dns>:3000/connections/your-connections/datasources/edit/a4c05563-ea62-4f76-b2a7-b6f964314b88 handler=/api/datasources/uid/:uid
logger=context userId=1 orgId=1 uname=admin t=2023-09-27T09:53:33.342223169Z level=info msg="Request Completed" method=GET path=/api/live/ws status=-1 remote_addr=10.0.0.2 time_ms=0 duration=746.46µs size=0 referer= handler=/api/live/ws

Loki in that moment doesn't show any logs which could refer to sent request, only "save checkpoint wal".

I absolutely don't know what's the problem here. I stuck here for around 4 days in a row, Could you provide and solution or ideas to overcome it?

Solution

So, there are could be several problems:

1. mingrammer/flog generates log to file instead of stdout

It means that docker-loki-driver couldn't get any info to pass to Loki. So, in this case just change line

- --output=/var/log/generated-logs.txt

- --output=/dev/stdout

2. Docker plugins able to work only(!) in bridge and host network modes

Hence, it simply couldn't find any host with name loki (or within any docker network or its alias).

3. host.docker.internal depends on OS

Also, http://host.docker.internal:3100/loki/api/v1/push line is wrong in case you would deploy it on Linux, because it's only Windows and Mac specific feature, in case of Linux family you should add it manually, e.g. as mentioned here: https://stackoverflow.com/a/67158212/7502538.

Thus, as workaround you can manage it using static ip address (added manually to host file):

loki-url: http://172.0.0.15:3100/loki/api/v1/push

Or simply deploy any proxy-server like nginx and set up redirection. Finally (and it works for my project) set something like:

loki-url: http://loki-example-dns.com/loki/api/v1/push

Meaning docker-loki-driver should be able to lookup Loki service in the host network as you usually do (e.g. with ping command).

4. Wrong access rights

For this configuration it's quite ok to let it be public accessible. But if something goes wrong be sure you didn't enable any type of authentication/authorization, either proceeds with right credentials. I faced it during this project and also they Basic Auth was handled by nginx.

Setting up Grafana

In Grafana Data Sources if you wish to add Loki and they stay in the same docker network (in swarm mode with respect to question), that and in only that case you could use alias:

http://loki:3100/

Additions

Several links that helped me out to figure this out:

Lastly this command used to be really helpful to analyze logs: sudo journalctl -u docker.service -f