pythonoperating-systemshutilcopytree

How can I copy different file types in Python while keeping their directory structure


I'm new to using Spyder 2.3.0 and Python 3.4.1

I have a directory structure with sub directories.

Unlike other examples on the web I want to select multiple file types and copy the directory structure across. I have tried below and it works but it takes only one file type at a time and “copytree’s” across (it’s going to be very slow).

Is there a way or different way to streamline this to make it faster?

What I was thinking I wanted to do was:

Make a comprehensive list of file types and locations (walking through directory structure)

For example ending with

fileExt = [".txt", ".doc", ".docx", ".xls",".xlsx", ".ppt", ".pptx", ".m", ".xmcd", ".pdf " ]

Then with that list simply “shutil.copytree

Any advice greatly appreciated.

  srcDir  = 'c:/a/src/dir/'
  dirName = 'c:/a/dest/dir/'


import os
import shutil

##################################################################################

dstDir = os.path.abspath(dirName)

def ignore_list(path, files):

    filesToIgnore = []

    for fileName in files:

        fullFileName = os.path.join(os.path.normpath(path), fileName)

        if not os.path.isdir(fullFileName) and not fileName.endswith('.txt') :

            filesToIgnore.append(fileName)

    return filesToIgnore

# start of script

shutil.copytree(srcDir, dstDir, ignore=ignore_list)
####################################################################################################################################################################

dstDir = os.path.abspath(dirName)

def ignore_list(path, files):

    filesToIgnore = []

    for fileName in files:

        fullFileName = os.path.join(os.path.normpath(path), fileName)

        if not os.path.isdir(fullFileName) and not fileName.endswith('.docx') :


            filesToIgnore.append(fileName)

    return filesToIgnore

# start of script

shutil.copytree(srcDir, dstDir, ignore=ignore_list)

####################################################

Copy and paste changing “endswith('.docx') :”


Solution

  • we can perform your "ignore_list" method a little bit more

    valid_formats = ["txt", "doc", "docx", "xls","xlsx", "ppt", "pptx", "m", "xmcd", "pdf "]
    
    import timeit
    import os.path as op
    
    def ignore_list(path, files):
        dag_path = [op.join(op.normpath(path), f) for f in files]
        return [ff for ff in dag_path if not op.isdir(ff) and ff.split(".")[-1] not in valid_formats]
    
    start = timeit.default_timer()
    ignore = ignore_list(path, files)
    print ("Time: {0}".format(str(timeit.default_timer()-start)))