postgresqldockergreenplum

Greenplum Database service running on docker consumes high disk space


I have a greenplum database instance running on docker. There is very little data in the tables+indexes (approx. 550 MB). I checked the size of all tables using the query below:

SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_total_relation_size(c.oid) - pg_relation_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r'
  ) a
) a
order by total_bytes desc

The docker image is 4.7 GB. So approximate usage for this greenplum docker image should be (4.7 + 0.5 ) = 5.2 GB. But, the docker container consumes 13GB disk space.

The disk usage is as below:

[gpadmin@mdw ~]$ df -h
Filesystem                           Size  Used Avail Use% Mounted on
overlay                               17G   13G  4.7G  73% /
tmpfs                                2.0G     0  2.0G   0% /dev
tmpfs                                2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/mapper/centos_greenplum01-root   17G   13G  4.7G  73% /etc/hosts
shm                                   64M     0   64M   0% /dev/shm
tmpfs                                2.0G     0  2.0G   0% /proc/acpi
tmpfs                                2.0G     0  2.0G   0% /proc/scsi
tmpfs                                2.0G     0  2.0G   0% /sys/firmware

The host machine and docker are both CentOS.

As part of testing my application, I stop/start the docker container multiple times through the day.


Solution

  • Debug steps to identify if the root cause was docker or greenplum.

    Login to docker:

    cd /
    df -schk *
    

    Iteratively check the largest directories:

    The cause for the issue is huge log files in /data/primary/gpseg1/pg_log.

    I removed all logs older than 2 days.