I have a zip-file that I would like to unzip on Ubuntu with the correct filenames (they contain æ,ø,å).
What I have tried:
Everything works as expected and filenames are correct.
unzip file.zip
The characters æ,ø and å are missing from the filenames, where 'æ' has been replaces with 'C'.
I attempt to detect the encoding of the zip-file, but it doesn't seem to tell me anything.
file file.zip
I attempt to unpack the file using various encodings that are often used for æ,ø,å-containing texts.
unzip -O UTF-8 file.zip
unzip -O ISO-8859-1 file.zip
unzip -O windows-1257 file.zip
None work...
It is suggested that 7zip may fix the problem, but no..
7z x file.zip
It is suggested that I change the ubuntu language settings and then try again.
saveLang=$LANG
export LANG=da_DK
7z x file.zip
export LANG=$saveLang
This also does not work.
The unzip works correctly if I use Python3 for the purpose, but there must be an easier way?
import zipfile
with zipfile.ZipFile('file.zip', "r") as z:
z.extractall("/home/xxxx/")
I am considering finding a list of "ALL" encodings, and then just extracting the filenames and going through them manually. Something along the line of this...
while read p; do
echo "$p"
unzip -j -O $p file.zip
done <encodings.txt
Windows and Python3 seems to have some MAGIC under the hood that I cannot replicate. Do you guys have any suggestions to what this "MAGIC" is?
The key piece of information you provided was that unrar
on windows was able to create the filenames correctly. So unless unrar
is doing some encoding detection under the hood, that meant that there is a good chance that the encoding used in the zip files matches the default codepage used on your Windows setup.
Using chcp
on Windows you see that your codepage is
Active code page: 850
It's then a simple matter of telling unzip
that the encoding used in the zip file is CP850
unzip -O CP850 file.zip