dockerterraformamazon-rdsapache-superset

Set postgres RDS as metadata backup for Superset


I'm trying to store the metadata of my superset dashboard running on a EC2 in a RDS.

My rds:

resource "aws_db_instance" "default" {
  allocated_storage    = 10
  identifier           = var.db_identifier
  db_name              = var.db_name
  engine               = var.engine
  engine_version       = var.engine_version
  instance_class       = "db.t4g.micro"  #free tier
  username             = var.username
  password             = var.password
  parameter_group_name = var.parameter_group_name
  skip_final_snapshot  = true
}

resource "aws_security_group" "sg_rds" {
    name        = "sg_rds"
    description = "Allow RDS inbound traffic"
    ingress {
        from_port   = 5432
        to_port     = 5432
        protocol    = "tcp"
        cidr_blocks = ["0.0.0.0/0"]
}
}

module "rds" {
    source = "../../modules/rds"
    
    db_identifier = "dbsuperset"
    db_name = "superset_backup"
    engine = "postgres"
    engine_version = "17.2"
    username = "superset_user"
    password = "*********"
    parameter_group_name = "default.postgres17"
}

The SQLALCHEMY_DATABASE_URI I set in my superset_config.py according to the superset doc :

...
SQLALCHEMY_DATABASE_URI='postgresql://superset_user:*******@dbsuperset.cqx****tct.eu-west-1.rds.amazonaws.com:5432/dbsuperset'
...

My Dockerfile:

FROM python:3.10-slim

WORKDIR /app

RUN apt-get update && apt-get install -y \
    build-essential \         
    libpq-dev \               
    libssl-dev \             
    libffi-dev \             
    && rm -rf /var/lib/apt/lists/*  

RUN pip install --upgrade pip
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY superset_config.py /app/superset_config.py

ENV PYTHONPATH=/app

ENV FLASK_APP=superset.app:create_app

RUN mkdir -p /app && chmod -R 755 /app

EXPOSE 8088

RUN superset fab create-admin \
    --username admin \
    --password admin \
    --firstname Admin \
    --lastname User \
    --email admin@example.com && \
    superset db upgrade && \
    superset init

#RUN superset load_examples

CMD ["gunicorn", "-b", "0.0.0.0:8088", "superset.app:create_app()"]

When I launch the image building on an EC2 in AWS the config of the URI doesn't seem to work, what should i change ?

 > [9/9] RUN superset fab create-admin     --username admin     --password admin     --firstname Admin     --lastname User     --email admin@example.com &&     superset db upgrade &&     superset init:
4.745 Loaded your LOCAL configuration at [/app/superset_config.py]
5.020 2025-04-30 06:12:57,112:INFO:superset.initialization:Setting database isolation level to READ COMMITTED
139.3 2025-04-30 06:15:11,365:ERROR:flask_appbuilder.security.sqla.manager:DB Creation and initialization failed: (psycopg2.OperationalError) connection to server at "dbsuperset.c******otct.eu-west-1.rds.amazonaws.com" (172.31.41.7), port 5432 failed: Connection timed out
139.3   Is the server running on that host and accepting TCP/IP connections?
139.3 
139.3 (Background on this error at: https://sqlalche.me/e/14/e3q8)
------
Dockerfile:47
--------------------
  46 |     # 3. Initialize Superset with `superset init`
  47 | >>> RUN superset fab create-admin \
  48 | >>>     --username admin \
  49 | >>>     --password admin \
  50 | >>>     --firstname Admin \
  51 | >>>     --lastname User \
  52 | >>>     --email admin@example.com && \
  53 | >>>     superset db upgrade && \
  54 | >>>     superset init
  55 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c superset fab create-admin     --username admin     --password admin     --firstname Admin     --lastname User     --email admin@example.com &&     superset db upgrade &&     superset init" did not complete successfully: exit code: 1
REPOSITORY   TAG       IMAGE ID   CREATED   SIZE
Cloud-init v. 22.2.2 finished at Wed, 30 Apr 2025 06:15:12 +0000. Datasource DataSourceEc2.  Up 306.13 seconds

The logs and events in my RDS indicate that a connection attempt was made :

2025-04-30 08:02:03 UTC::@:[899]:LOG: checkpoint starting: time
2025-04-30 08:02:04 UTC::@:[899]:LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.203 s, sync=0.002 s, total=0.208 s; sync files=3, longest=0.002 s, average=0.001 s; distance=65536 kB, estimate=65536 kB; lsn=3/740003A0, redo lsn=3/74000380
2025-04-30 08:07:03 UTC::@:[899]:LOG: checkpoint starting: time
2025-04-30 08:07:03 UTC::@:[899]:LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.203 s, sync=0.012 s, total=0.225 s; sync files=3, longest=0.008 s, average=0.004 s; distance=65536 kB, estimate=65536 kB; lsn=3/780003B0, redo lsn=3/78000390
2025-04-30 08:12:03 UTC::@:[899]:LOG: checkpoint starting: time
2025-04-30 08:12:03 UTC::@:[899]:LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.203 s, sync=0.002 s, total=0.209 s; sync files=3, longest=0.002 s, average=0.001 s; distance=65536 kB, estimate=65536 kB; lsn=3/7C0003C0, redo lsn=3/7C0003A0
2025-04-30 08:17:03 UTC::@:[899]:LOG: checkpoint starting: time
2025-04-30 08:17:03 UTC::@:[899]:LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.203 s, sync=0.002 s, total=0.209 s; sync files=3, longest=0.002 s, average=0.001 s; distance=65536 kB, estimate=65536 kB; lsn=3/800003C8, redo lsn=3/800003A8
----------------------- END OF LOG ----------------------

The IAM policy and security group of the EC2 instance creating my docker image:

resource "aws_security_group" "allow_ssh" {
  name        = "allow_ssh_ec2_builder"
  description = "Allow SSH inbound traffic"
  #ssh
  ingress { 
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # RDS access (PostgreSQL)
  ingress {
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"] 
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"  
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_iam_role" "ec2_role" {
  name = "${var.instance_name}-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

...

# Attach RDS full access policy
resource "aws_iam_role_policy_attachment" "rds_full_access" {
  role       = aws_iam_role.ec2_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonRDSFullAccess"
}

# Create an Instance Profile
resource "aws_iam_instance_profile" "ec2_instance_profile" {
  name = "${var.instance_name}-instance-profile"
  role = aws_iam_role.ec2_role.name
}

Solution

  • I wasn't using the correct info in the DATABASE_URI, instead of i was using the <db_identifier>. My db URI then should looks like this :

    SQLALCHEMY_DATABASE_URI = 'postgresql://superset:superset_password@dbsuperset.c*****tct.eu-west-1.rds.amazonaws.com:5432/superset_backup'

    Now data are saved on my rds:

    [ec2-user@ip-172-31-40-62 ~]$ psql -h lbissdbsuperset.cqx3cbuvotct.eu-west-1.rds.amazonaws.com -U superset -d lbiss_superset_backup -p 5432
    Password for user superset: 
    psql (15.12, server 17.2)
    WARNING: psql major version 15, server major version 17.
             Some psql features might not work.
    SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
    Type "help" for help.
    
    lbiss_superset_backup=> \l
    ERROR:  column d.daticulocale does not exist
    LINE 6:        d.daticulocale as "ICU Locale",
                   ^
    HINT:  Perhaps you meant to reference the column "d.datlocale".
    lbiss_superset_backup=> \d
                             List of relations
     Schema |                Name                |   Type   |  Owner   
    --------+------------------------------------+----------+----------
     public | ab_group                           | table    | superset
     public | ab_group_id_seq                    | sequence | superset
     public | ab_group_role                      | table    | superset
    ....
    

    I can select the users :

    lbiss_superset_backup=> select * from ab_user;
     id | first_name | last_name | username |                                            
                                      password                                           
                                       | active |       email       | last_login | login_
    count | fail_login_count |         created_on         |         changed_on         | 
    created_by_fk | changed_by_fk 
    ----+------------+-----------+----------+--------------------------------------------
    -------------------------------------------------------------------------------------
    -----------------------------------+--------+-------------------+------------+-------
    ------+------------------+----------------------------+----------------------------+-
    --------------+---------------
      1 | Admin      | User      | admin    | scrypt:32768:8:1$LOtt53rv8Jt0XGuv$108888bfa
    4fa45d8e0e465a0e25f9d9**********ec4f906e6fa916b31908144e1e13ad2a850
    40c109c1db47eb4e3029cd77bf76bee115 | t      | admin@example.com |            |       
          |                  | 2025-04-30 09:38:47.558746 | 2025-04-30 09:38:47.558753 | 
                  |              
    (1 row)
    

    Only Admin is created, as intended