pythonpython-3.xchecksumsha256data-integrity

How to verify integrity of files using digest in python (SHA256SUMS)


I have a set of files and a SHA256SUMS digest file that contains a sha256() hash for each of the files. What's the best way to verify the integrity of my files with python?

For example, here's how I would download the Debian 10 net installer SHA256SUMS digest file and download/verify its the MANIFEST file in BASH

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 02:11:20--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K  71.7KB/s    in 1.0s    

2020-08-25 02:11:22 (71.7 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 02:11:27--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 02:11:28 (128 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ sha256sum --check --ignore-missing SHA256SUMS 
./MANIFEST: OK
user@host:~$ 

What is the best way to do this same operation (download and verify the integrity of the Debian 10 MANIFEST file using the SHA256SUMS file) in python?


Solution

  • The following python script implements a function named integrity_is_ok() that takes the path to a SHA256SUMS file and a list of files to be verified, and it returns False if any of the files couldn't be verified and True otherwise.

    #!/usr/bin/env python3
    from hashlib import sha256
    import os
    
    # Takes the path (as a string) to a SHA256SUMS file and a list of paths to
    # local files. Returns true only if all files' checksums are present in the
    # SHA256SUMS file and their checksums match
    def integrity_is_ok( sha256sums_filepath, local_filepaths ):
    
        # first we parse the SHA256SUMS file and convert it into a dictionary
        sha256sums = dict()
        with open( sha256sums_filepath ) as fd:
            for line in fd:
                # sha256 hashes are exactly 64 characters long
                checksum = line[0:64]
    
                # there is one space followed by one metadata character between the
                # checksum and the filename in the `sha256sum` command output
                filename = os.path.split( line[66:] )[1].strip()
                sha256sums[filename] = checksum
    
        # now loop through each file that we were asked to check and confirm its
        # checksum matches what was listed in the SHA256SUMS file
        for local_file in local_filepaths:
    
            local_filename = os.path.split( local_file )[1]
    
            sha256sum = sha256()
            with open( local_file, 'rb' ) as fd:
                data_chunk = fd.read(1024)
                while data_chunk:
                    sha256sum.update(data_chunk)
                    data_chunk = fd.read(1024)
    
            checksum = sha256sum.hexdigest()
            if checksum != sha256sums[local_filename]:
                return False
    
        return True
    
    if __name__ == '__main__':
    
        script_dir = os.path.split( os.path.realpath(__file__) )[0]
        sha256sums_filepath = script_dir + '/SHA256SUMS'
        local_filepaths = [ script_dir + '/MANIFEST' ]
    
        if integrity_is_ok( sha256sums_filepath, local_filepaths ):
            print( "INFO: Checksum OK" )
        else:
            print( "ERROR: Checksum Invalid" )
    

    Here is an example execution:

    user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
    --2020-08-25 22:40:16--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
    Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
    Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 75295 (74K)
    Saving to: ‘SHA256SUMS’
    
    SHA256SUMS          100%[===================>]  73.53K   201KB/s    in 0.4s    
    
    2020-08-25 22:40:17 (201 KB/s) - ‘SHA256SUMS’ saved [75295/75295]
    
    user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
    --2020-08-25 22:40:32--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
    Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
    Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 1709 (1.7K)
    Saving to: ‘MANIFEST’
    
    MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      
    
    2020-08-25 22:40:32 (13.0 MB/s) - ‘MANIFEST’ saved [1709/1709]
    
    user@host:~$ ./sha256sums_python.py 
    INFO: Checksum OK
    user@host:~$ 
    

    Parts of the above code were adapted from the following answer on Ask Ubuntu: