databasepostgresqlpsqldatabase-restore

How do I partly restore data from the pg_data_dump?


I have an issue. I've got a backup functionality in my application Here is how I do dumps:

schema_command = (
            f'PGPASSWORD={os.environ["POSTGRES_PASSWORD"]} '
            f'pg_dump -U {os.environ["POSTGRES_USER"]} '
            f'--schema-only -h {os.environ["DB_HOST"]} {os.environ["DB_NAME"]} > {schema_backup_file}'
        )
        subprocess.run(schema_command, shell=True, check=True)

        data_command = (
            f'PGPASSWORD={os.environ["POSTGRES_PASSWORD"]} '
            f'pg_dump -U {os.environ["POSTGRES_USER"]} '
            f'--data-only -h {os.environ["DB_HOST"]} {os.environ["DB_NAME"]} > {data_backup_file}'
        )
        subprocess.run(data_command, shell=True, check=True)

I also have a restore functionality:

schema_restore_command = (
            f'PGPASSWORD={os.environ["POSTGRES_PASSWORD"]} '
            f'psql -U {os.environ["POSTGRES_USER"]} '
            f'-h {os.environ["DB_HOST"]} -d {os.environ["DB_NAME"]} < {schema_backup_file}'
        )
        print(schema_restore_command)
        subprocess.run(schema_restore_command, shell=True, check=True)

        data_restore_command = (
            f'PGPASSWORD={os.environ["POSTGRES_PASSWORD"]} '
            f'psql -U {os.environ["POSTGRES_USER"]} '
            f'-h {os.environ["DB_HOST"]} -d {os.environ["DB_NAME"]} < {data_backup_file}'
        )
        print(data_restore_command)
        subprocess.run(data_restore_command, shell=True, check=True)

The problem is I need to partly restore data from dump. For example - I have accidentally deleted some database rows, but my table is still full of content. It doesn't work. It work only when the table is fully empty.

I have tried --inserts flag when creating a data dump I have tried to clean my database before restoring, it works, but thats not how it supposed to work according to my task description


Solution

  • If you're trying to restore unique values and that violates a constraint, you can combine --inserts with --on-conflict-do-nothing:

    data_command = (
      f'PGPASSWORD={os.environ["POSTGRES_PASSWORD"]} '
      f'pg_dump -U {os.environ["POSTGRES_USER"]} --inserts --on-conflict-do-nothing '
      f'--data-only -h {os.environ["DB_HOST"]} {os.environ["DB_NAME"]} > {data_backup_file}'
    )
    

    That way the values you still had in the table will stay, their incoming duplicates will get ignored and only the missing ones will be inserted.

    You can narrow down which objects you want to dump by specifying --schema and --table:

    data_command = (
      f'PGPASSWORD={os.environ["POSTGRES_PASSWORD"]} '
      f'pg_dump -U {os.environ["POSTGRES_USER"]} --inserts --on-conflict-do-nothing '
      f'--data-only --schema {target_schema} --table {target_schema}.{target_table} '
      f'-h {os.environ["DB_HOST"]} {os.environ["DB_NAME"]} > {data_backup_file}'
    )
    

    It's possible to dump everything and narrow things down while restoring, by reading the backup with pg_restore and piping that to psql, or letting it handle the restore directly.

    data_restore_command = (
      f'PGPASSWORD={os.environ["POSTGRES_PASSWORD"]} '
      f'pg_restore -U {os.environ["POSTGRES_USER"]} '
      f'--data-only --schema {target_schema} --table {target_schema}.{target_table} '
      f'-h {os.environ["DB_HOST"]} -d {os.environ["DB_NAME"]} {data_backup_file}'
    )
    

    This also simplifies the process of dropping, recreating and repopulating objects, because you can pick and choose what to restore, rather than nuke and rebuild the whole thing.

    On top of that, pg_restore can read optionally compressed, custom-format dumps (created with pg_dump -Fc) that are way lighter and faster to process. It also enables multithreaded restore if you use the --jobs setting.