pythonrecursionftpftputil

python script to recursively search FTP for specific filename and newer than 24 hours


Our storage area ran into trouble with SMB connections and now we have been forced to use FTP to access files on a regular basis. So rather than using Bash, I am trying to use python but I am running into a few problems. The script needs to recursively search through the FTP directory and find all files "*1700_m30.mp4" newer than 24 hours. Then copy all these files locally.

this is what I got so far - but I can't seem to get the script to download the files or get the stats from the files that tell me whether they are newer than 24 hours.

#!/usr/bin/env python
# encoding: utf-8

import sys
import os
import ftplib
import ftputil
import fnmatch
import time

dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for 
print 'Looking for this pattern :', pattern # print pattern


print "logging into GSP" # print 
host = ftputil.FTPHost('xxx.xxx','xxx','xxxxx') # ftp host info
recursive = host.walk("/GSPstor/xxxxx/xxx/xxx/xxx/xxxx",topdown=True,onerror=None) # recursive search 
for root,dirs,files in recursive:
    for name in files:
        print 'Files   :', files # print all files it finds
        video_list = fnmatch.filter(files, pattern)
        print 'Files to be moved :', video_list # print list of files to be moved 
        if host.path.isfile(video_list): # check whether the file is valid 
            host.download(video_list, video_list, 'b') # download file list 



host.close  

Here is the modified script based on the excellent recommendations from ottomeister (thank you!!) - the last issue now is that it downloads but it keeps downloading the files and overwriting the existing files:

import sys
import os
import ftplib
import ftputil
import fnmatch
import time
from time import mktime
import datetime
import os.path, time 
from ftplib import FTP


dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for 
print 'Looking for this pattern :', pattern # print pattern
utc_datetime_less24H = datetime.datetime.utcnow()-datetime.timedelta(seconds=86400) #UTC time minus 24 hours in seconds
print 'UTC time less than 24 Hours is: ', utc_datetime_less24H.strftime("%Y-%m-%d %H:%M:%S") # print UTC time minus 24 hours in seconds
print "logging into GSP FTP" # print 


with ftputil.FTPHost('xxxxxxxx','xxxxxx','xxxxxx') as host: # ftp host info
    recursive = host.walk("/GSPstor/xxxx/com/xxxx/xxxx/xxxxxx",topdown=True,onerror=None) # recursive search 
    for root,dirs,files in recursive:
        for name in files:
            print 'Files   :', files # print all files it finds
            video_list = fnmatch.filter(files, pattern) # collect all files that match pattern into variable:video_list
            statinfo = host.stat(root, video_list) # get the stats from files in variable:video_list
            file_mtime = datetime.datetime.utcfromtimestamp(statinfo.st_mtime) 
            print 'Files with pattern: %s and epoch mtime is: %s ' % (video_list, statinfo.st_mtime)
            print 'Last Modified: %s' % datetime.datetime.utcfromtimestamp(statinfo.st_mtime) 
            if file_mtime >= utc_datetime_less24H: 
                for fname in video_list:
                    fpath = host.path.join(root, fname)
                    if host.path.isfile(fpath):
                        host.download_if_newer(fpath, os.path.join(dir_dest, fname), 'b') 

host.close()

Solution

  • This line:

        video_list = fnmatch.filter(files, pattern)
    

    gets you a list of filenames that match your glob pattern. But this line:

        if host.path.isfile(video_list): # check whether the file is valid 
    

    is bogus, because host.path.isfile() does not want a list of filenames as its argument. It wants a single pathname. So you need to iterate over video_list constructing one pathname at a time, passing each of those pathnames to host.path.isfile(), and then possibly downloading that particular file. Something like this:

        import os.path
    
        for fname in video_list:
            fpath = host.path.join(root, fname)
            if host.path.isfile(fpath):
                host.download(fpath, os.path.join(dir_dest, fname), 'b')
    

    Note that I'm using host.path.join() to manage remote pathnames and os.path.join() to manage local pathnames. Also note that this puts all of the downloaded files into a single directory. If you want to put them into a directory hierarchy that mirrors the remote layout (you'll have to do something like that if the filenames in different remote directories can clash) then you'll need to construct a different destination path, and you'll probably have to create the local destination directory hierarchy too.

    To get timestamp information use host.lstat() or host.stat() depending on how you want to handle symlinks.

    And yes, that should be host.close(). Without it the connection will be closed after the host variable goes out of scope and is garbage-collected, but it's better to close it explicitly. Even better, use a with clause to ensure that the connection gets closed even if an exception causes this code to be abandoned before it reaches the host.close() call, like this:

        with ftputil.FTPHost('xxx.xxx','xxx','xxxxx') as host: # ftp host info
            recursive = host.walk(...)
            ...