Our storage area ran into trouble with SMB connections and now we have been forced to use FTP to access files on a regular basis. So rather than using Bash, I am trying to use python but I am running into a few problems. The script needs to recursively search through the FTP directory and find all files "*1700_m30.mp4" newer than 24 hours. Then copy all these files locally.
this is what I got so far - but I can't seem to get the script to download the files or get the stats from the files that tell me whether they are newer than 24 hours.
#!/usr/bin/env python
# encoding: utf-8
import sys
import os
import ftplib
import ftputil
import fnmatch
import time
dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for
print 'Looking for this pattern :', pattern # print pattern
print "logging into GSP" # print
host = ftputil.FTPHost('xxx.xxx','xxx','xxxxx') # ftp host info
recursive = host.walk("/GSPstor/xxxxx/xxx/xxx/xxx/xxxx",topdown=True,onerror=None) # recursive search
for root,dirs,files in recursive:
for name in files:
print 'Files :', files # print all files it finds
video_list = fnmatch.filter(files, pattern)
print 'Files to be moved :', video_list # print list of files to be moved
if host.path.isfile(video_list): # check whether the file is valid
host.download(video_list, video_list, 'b') # download file list
host.close
Here is the modified script based on the excellent recommendations from ottomeister (thank you!!) - the last issue now is that it downloads but it keeps downloading the files and overwriting the existing files:
import sys
import os
import ftplib
import ftputil
import fnmatch
import time
from time import mktime
import datetime
import os.path, time
from ftplib import FTP
dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for
print 'Looking for this pattern :', pattern # print pattern
utc_datetime_less24H = datetime.datetime.utcnow()-datetime.timedelta(seconds=86400) #UTC time minus 24 hours in seconds
print 'UTC time less than 24 Hours is: ', utc_datetime_less24H.strftime("%Y-%m-%d %H:%M:%S") # print UTC time minus 24 hours in seconds
print "logging into GSP FTP" # print
with ftputil.FTPHost('xxxxxxxx','xxxxxx','xxxxxx') as host: # ftp host info
recursive = host.walk("/GSPstor/xxxx/com/xxxx/xxxx/xxxxxx",topdown=True,onerror=None) # recursive search
for root,dirs,files in recursive:
for name in files:
print 'Files :', files # print all files it finds
video_list = fnmatch.filter(files, pattern) # collect all files that match pattern into variable:video_list
statinfo = host.stat(root, video_list) # get the stats from files in variable:video_list
file_mtime = datetime.datetime.utcfromtimestamp(statinfo.st_mtime)
print 'Files with pattern: %s and epoch mtime is: %s ' % (video_list, statinfo.st_mtime)
print 'Last Modified: %s' % datetime.datetime.utcfromtimestamp(statinfo.st_mtime)
if file_mtime >= utc_datetime_less24H:
for fname in video_list:
fpath = host.path.join(root, fname)
if host.path.isfile(fpath):
host.download_if_newer(fpath, os.path.join(dir_dest, fname), 'b')
host.close()
This line:
video_list = fnmatch.filter(files, pattern)
gets you a list of filenames that match your glob pattern. But this line:
if host.path.isfile(video_list): # check whether the file is valid
is bogus, because host.path.isfile()
does not want a list of filenames as its argument. It wants a single pathname. So you need to iterate over video_list
constructing one pathname at a time, passing each of those pathnames to host.path.isfile()
, and then possibly downloading that particular file. Something like this:
import os.path
for fname in video_list:
fpath = host.path.join(root, fname)
if host.path.isfile(fpath):
host.download(fpath, os.path.join(dir_dest, fname), 'b')
Note that I'm using host.path.join()
to manage remote pathnames and os.path.join()
to manage local pathnames. Also note that this puts all of the downloaded files into a single directory. If you want to put them into a directory hierarchy that mirrors the remote layout (you'll have to do something like that if the filenames in different remote directories can clash) then you'll need to construct a different destination path, and you'll probably have to create the local destination directory hierarchy too.
To get timestamp information use host.lstat()
or host.stat()
depending on how you want to handle symlinks.
And yes, that should be host.close()
. Without it the connection will be closed after the host
variable goes out of scope and is garbage-collected, but it's better to close it explicitly. Even better, use a with
clause to ensure that the connection gets closed even if an exception causes this code to be abandoned before it reaches the host.close()
call, like this:
with ftputil.FTPHost('xxx.xxx','xxx','xxxxx') as host: # ftp host info
recursive = host.walk(...)
...