I have Docker Swarm environment with several nodes, one of them called gateway
. On all nodes installed Docker Plugin grafana/loki-docker-driver:2.9.1
instead of using Promtail. I used Portainer monitoring template to deploy stack which includes Grafana 9.5.2
. This stack will be stack A
for simplicity. Also I have stack B
with Loki 2.9.1
and mingrammer/flog
as log producer to check if everything is fine. So, stack A and stack B use same net
docker network. Constraints are the same, so both stacks are one the same node gateway
:
- node.role == manager
- node.labels.monitoring == true
version: '3.8'
x-logging: &logging
logging:
driver: loki
options:
loki-url: "http://host.docker.internal:3100/loki/api/v1/push"
services:
loki:
image: grafana/loki:2.9.1
<<: *logging
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
- node.labels.monitoring == true
ports:
- target: 3100
published: 3100
protocol: tcp
mode: ingress
volumes:
- loki-data:/loki
configs:
- source: loki_config
target: /etc/loki/local-config.yaml
command: -config.file=/etc/loki/local-config.yaml
networks:
- net
log-generator:
image: mingrammer/flog
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
- node.labels.monitoring == true
command:
- --loop
- --format=json
- --number=10 # number of log lines to generate per second
- --delay=100ms # delay between log lines
- --output=/var/log/generated-logs.txt
- --overwrite
- --type=log
volumes:
- loki-data:/var/log/
networks:
- net
volumes:
loki-data:
networks:
net:
name: monitoring_net
external: true
configs:
loki_config:
external: true
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
frontend:
address: 0.0.0.0
schema_config:
configs:
- from: 2020-10-24
store: boltdb
object_store: filesystem
schema: v11
index:
prefix: index_
period: 48h
storage_config:
boltdb:
directory: /loki/index
filesystem:
directory: /loki/chunks
version: "3.8"
services:
grafana:
image: portainer/template-swarm-monitoring:grafana-9.5.2
ports:
- target: 3000
published: 3000
protocol: tcp
mode: ingress
deploy:
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
- node.labels.monitoring == true
volumes:
- type: volume
source: grafana-data
target: /var/lib/grafana
environment:
- GF_SECURITY_ADMIN_USER=${GRAFANA_USER}
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_USERS_ALLOW_SIGN_UP=false
networks:
- net
...
volumes:
grafana-data:
prometheus-data:
networks:
net:
driver: overlay
configs:
prometheus_conf:
external:
name: prometheus_config
In the Grafana UI I tried to add new connection: and before has been constantly saying:
Unable to fetch labels from Loki (Failed to call resource), please check the server logs for more details
Then I realize, that address is wrong and set it to http://loki:3100
and now it says:
Data source connected, but no labels received. Verify that Loki and Promtail is configured properly.
Logs from Grafana on this request:
logger=context userId=1 orgId=1 uname=admin t=2023-09-27T09:46:06.21413999Z level=error msg="Datasource has already been updated by someone else. Please reload and try again" error="trying to update old version of datasource" remote_addr=10.0.0.2 traceID=
logger=context userId=1 orgId=1 uname=admin t=2023-09-27T09:46:06.21431287Z level=info msg="Request Completed" method=PUT path=/api/datasources/uid/a4c05563-ea62-4f76-b2a7-b6f964314b88 status=409 remote_addr=10.0.0.2 time_ms=1 duration=1.141585ms size=107 referer=http://<dns>:3000/connections/your-connections/datasources/edit/a4c05563-ea62-4f76-b2a7-b6f964314b88 handler=/api/datasources/uid/:uid
logger=context userId=1 orgId=1 uname=admin t=2023-09-27T09:53:33.342223169Z level=info msg="Request Completed" method=GET path=/api/live/ws status=-1 remote_addr=10.0.0.2 time_ms=0 duration=746.46µs size=0 referer= handler=/api/live/ws
Loki in that moment doesn't show any logs which could refer to sent request, only "save checkpoint wal".
I absolutely don't know what's the problem here. I stuck here for around 4 days in a row, Could you provide and solution or ideas to overcome it?
So, there are could be several problems:
It means that docker-loki-driver couldn't get any info to pass to Loki. So, in this case just change line
- --output=/var/log/generated-logs.txt
to
- --output=/dev/stdout
Hence, it simply couldn't find any host with name loki
(or within any docker network or its alias).
Also, http://host.docker.internal:3100/loki/api/v1/push
line is wrong in case you would deploy it on Linux, because it's only Windows and Mac specific feature, in case of Linux family you should add it manually, e.g. as mentioned here: https://stackoverflow.com/a/67158212/7502538.
Thus, as workaround you can manage it using static ip address (added manually to host file):
loki-url: http://172.0.0.15:3100/loki/api/v1/push
Or simply deploy any proxy-server like nginx and set up redirection. Finally (and it works for my project) set something like:
loki-url: http://loki-example-dns.com/loki/api/v1/push
Meaning docker-loki-driver should be able to lookup Loki service in the host network as you usually do (e.g. with ping
command).
For this configuration it's quite ok to let it be public accessible. But if something goes wrong be sure you didn't enable any type of authentication/authorization, either proceeds with right credentials. I faced it during this project and also they Basic Auth
was handled by nginx.
In Grafana Data Sources
if you wish to add Loki and they stay in the same docker network (in swarm mode with respect to question), that and in only that case you could use alias:
http://loki:3100/
Several links that helped me out to figure this out:
Lastly this command used to be really helpful to analyze logs:
sudo journalctl -u docker.service -f