amazon-web-servicesamazon-s3amazon-ec2prometheusthanos

Thanos for Prometheus long-term storaging doesn't store anything inside s3 bucket


I'm creating a monitoring infrastructure and I'm figuring out Prometheus huge storage. I want to implement Thanos to store all data into an S3 bucket. (Everything is dockerized)

Here's my docker-compose:

version: '3.9'

services:

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: on-failure:5
    network_mode: host
    volumes:
      - ./prometheus/:/etc/prometheus/
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.retention.time=1d"
    user: "0"

  thanos-sidecar:
    image: thanosio/thanos:main-2024-02-12-f28680c
    container_name: thanos-sidecar
    restart: unless-stopped
    depends_on: 
      - prometheus
    extra_hosts:
      - "host.docker.internal:host-gateway"
    command:
      - "sidecar"
      - "--http-address=0.0.0.0:19090"
      - "--grpc-address=0.0.0.0:19190"
      - "--prometheus.url=http://host.docker.internal:9090"

  thanos-store:
    image: thanosio/thanos:main-2024-02-12-f28680c
    container_name: thanos-store
    restart: unless-stopped
    depends_on: 
      - prometheus
    expose:
      - "10901"
    command:
      - "store"
      - "--grpc-address=0.0.0.0:10901"
      - "--data-dir=/thanos-data"
      - "--objstore.config-file=/etc/thanos/s3.yaml"
      - "--sync-block-duration=0.5m"
    volumes:
      - ./thanos:/etc/thanos
      - ./thanos/thanos-data:/thanos-data

  thanos-query:
    image: thanosio/thanos:main-2024-02-12-f28680c
    container_name: thanos-query
    restart: unless-stopped
    ports: 
      - "10902:10902"
    expose: 
      - "10902"
    depends_on: 
      - prometheus
    command:
      - "query"
      - "--http-address=0.0.0.0:10902"
      - "--grpc-address=0.0.0.0:10901"
      - "--store=thanos-store:10901"

And here's my s3.yaml file:

type: S3
config:
  bucket: prometheus-storaging
  endpoint: s3.eu-west-1.amazonaws.com 
  access_key: <user-access-key>
  secret_key: <user-secret-key>
  insecure: false
  region: eu-west-1

Ofc my user is enabled to get, put, list and delete objects into my bucket called prometheus-storaging.

I've inspected all three Thanos containers but found nothing broken. I've also changed the --sync-block-duration from 4h to 0.5m to increase the syncing process and I'm actually getting the right logs:

ts=2024-02-12T14:58:47.108623799Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=26.465216ms duration_ms=26 cached=0 returned=0 partial=0
ts=2024-02-12T14:59:17.110948605Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=28.802605ms duration_ms=28 cached=0 returned=0 partial=0
ts=2024-02-12T14:59:47.109934224Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=27.379893ms duration_ms=27 cached=0 returned=0 partial=0
ts=2024-02-12T15:00:17.113818181Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=31.517016ms duration_ms=31 cached=0 returned=0 partial=0
ts=2024-02-12T15:00:47.11444593Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=31.438836ms duration_ms=31 cached=0 returned=0 partial=0
ts=2024-02-12T15:01:17.113290234Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=31.118912ms duration_ms=31 cached=0 returned=0 partial=0
ts=2024-02-12T15:01:47.111825861Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=29.215559ms duration_ms=29 cached=0 returned=0 partial=0
ts=2024-02-12T15:02:17.112276432Z caller=fetcher.go:557 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=29.431009ms duration_ms=29 cached=0 returned=0 partial=0

Any suggestion?


Solution

  • Prometheus server and Thanos sidecar should share the same data volume. In this way, the sidecar can upload the data blocks into the bucket.

    Something like:

    version: '3.9'
    services:
      prometheus:
        image: prom/prometheus:latest
        container_name: prometheus
        user: root
        volumes:
          - ./prometheus:/etc/config/
          - ./data/prometheus:/data
        command:
          - '--config.file=/etc/config/prometheus.yml'
          - '--storage.tsdb.path=/data'
          - '--storage.tsdb.retention.time=2h'
          - '--web.enable-lifecycle'
          - '--web.enable-admin-api'
          - '--storage.tsdb.min-block-duration=5m'
          - '--storage.tsdb.max-block-duration=5m'
          [... other options ...]
        restart: unless-stopped
        expose:
          - 9001
        ports:
          - "9001:9001"
    
      thanos_sidecar:
        image: thanosio/thanos:main-2024-02-12-f28680c
        container_name: thanos_sidecar
        volumes:
          - ./prometheus:/etc/config/
          - ./data/prometheus:/data
        command:
          - "--log.level=debug"
          - "--tsdb.path=/data"
          - "--prometheus.url=http://prometheus:9001"
          [... other options ...]
        expose:
          - 10902
          - 10901
    

    Full example: https://github.com/thanos-community/thanos-docker-compose/tree/master