Under a folder, I have many .gz files and within these gz files some are .txt, some are .csv, some are .xml, or some other extensions.
E.g. gz (the original/compressed file in()) files in the folder will be
C:\Xiang\filename1.txt.gz (filename1.txt)
C:\Xiang\filename2.txt.gz (filename2.txt)
C:\Xiang\some_prefix_filename3.txt.gz (filename3.txt)
...
C:\Xiang\xmlfile1.xml_some_postfix.gz (xmlfile1.xml)
C:\Xiang\yyyymmddxmlfile2.xml.gz (xmlfile2.xml)
...
C:\Xiang\someotherName.csv.gz (someotherName.csv)
C:\Xiang\possiblePrefixsomeotherfile1.someotherExtension.gz (someotherfile1.someotherExtension)
C:\Xiang\someotherfile2.someotherExtensionPossiblePostfix.gz (someotherfile2.someotherExtension)
...
How could I simply up-zip all the .gz files in Python on Windows 10 under the folder C:\Xiang
and save into folder C:\UnZipGz
, honor the original filenames, with the result as follows:
C:\UnZipGz\filename1.txt
C:\UnZipGz\filename2.txt
C:\UnZipGz\filename3.txt
...
C:\UnZipGz\xmlfile1.xml.
C:\UnZipGz\xmlfile2.xml.
...
C:\UnZipGz\someotherName.csv.
C:\UnZipGz\someotherfile1.someotherExtension
C:\UnZipGz\someotherfile2.someotherExtension
...
Generally, the gz files naming convention are consistent with the filenames of the files inside, but it is not always the case. Somehow, renaming to the some .gz files happened in the past. Now the gz file names does not necessarily match with the filenames of the file in gz files.
How could I extract all the gz files and keep the original file filenames and extensions. I.e, regardless how the gz files are named, when extracting gz files, only save the un-zip files in the original format as
filename.fileExtension
into the C:\UnZipGz
folder.
import gzip
import os
INPUT_DIRECTORY = 'C:\Xiang'
OUTPUT_DIRECTORY = 'C:\UnZipGz'
GZIP_EXTENSION = '.gz'
def make_output_path(output_directory, zipped_name):
""" Generate a path to write the unzipped file to.
:param str output_directory: Directory to place the file in
:param str zipped_name: Name of the zipped file
:return str:
"""
name_without_gzip_extension = zipped_name[:-len(GZIP_EXTENSION)]
return os.path.join(output_directory, name_without_gzip_extension)
for file in os.scandir(INPUT_DIRECTORY):
if not file.name.lower().endswith(GZIP_EXTENSION):
continue
output_path = make_output_path(OUTPUT_DIRECTORY, file.name)
print('Decompressing', file.path, 'to', output_path)
with gzip.open(file.path, 'rb') as file:
with open(output_path, 'wb') as output_file:
output_file.write(file.read())
Explanation:
To retrieve the original file name, you can use gzinfo
:
https://github.com/PierreSelim/gzinfo
>>> import gzinfo
>>> info = gzinfo.read_gz_info('bar.txt.gz')
>>> info.fname
'foo.txt'
References to extract original file name: