pythonsynchronizationsftpdata-synchronizationpysftp

How to sync only the changed files from the remote directory using pysftp?


I am using pysftp library's get_r function (https://pysftp.readthedocs.io/en/release_0.2.9/pysftp.html#pysftp.Connection.get_r) to get a local copy of a directory structure from sftp server.

Is that the correct approach for a situation when the contents of the remote directory have changed and I would like to get only the files that changed since the last time the script was run?

The script should be able to sync the remote directory recursively and mirror the state of the remote directory - f.e. with a parameter controlling if the local outdated files (those that are no longer present on the remote server) should be removed, and any changes to the existing files and new files should be fetched.

My current approach is here.

Example usage:

from sftp_sync import sync_dir

sync_dir('/remote/path/', '/local/path/')

Solution

  • Use the pysftp.Connection.listdir_attr to get file listing with attributes (including the file timestamp).

    Then, iterate the list and compare against local files.

    import os
    import pysftp
    import stat
    
    remote_path = "/remote/path"
    local_path = "/local/path"
    
    with pysftp.Connection('example.com', username='user', password='pass') as sftp:
        sftp.cwd(remote_path)
        for f in sftp.listdir_attr():
            if not stat.S_ISDIR(f.st_mode):
                print("Checking %s..." % f.filename)
                local_file_path = os.path.join(local_path, f.filename)
                if ((not os.path.isfile(local_file_path)) or
                    (f.st_mtime > os.path.getmtime(local_file_path))):
                    print("Downloading %s..." % f.filename)
                    sftp.get(f.filename, local_file_path)
    

    Though these days, you should not use pysftp, as it is dead. Use Paramiko directly instead. See pysftp vs. Paramiko. The above code will work with Paramiko too with its SFTPClient.listdir_attr.