I am trying to load a zip file and save it in the virtual file system for further processing with pyscript. In this example, I aim to open it and list its content.
As far as I got:
See the self standing html code below, adapted from tutorials (with thanks to the author, btw)
It is able to load Pyscript, lets the user select a file and loads it (although not in the right format it seems). It creates a dummy zip file and saves it to the virtual file, and list the content. All this works upfront and also if I point the process_file function to that dummy zip file, it indeed opens and lists it.
The part that is NOT working is when I select via the button/file selector any valid zip file in the local file system, when loading the data into data
it is text (utf-8) and I get this error:
File "/lib/python3.10/zipfile.py", line 1353, in _RealGetContents
raise BadZipFile("Bad magic number for central directory")
zipfile.BadZipFile: Bad magic number for central directory
I have tried saving to a file and loading it, instead of using BytesIO , also tried variations of using ArrayBuffer or Stream from here I have also tried creating a FileReader and using readAsBinaryString() or readAsText() and various transformations, with same result: either it fails to recognise the "magic number" or I get "not a zip file". When feeding some streams or arrayBuffer I get variations of:
TypeError: a bytes-like object is required, not 'pyodide.JsProxy'
At this point I suspect there is something embarrassingly obvious that yet I am unable to see, so, any fresh pair of eyes and advice on how best/simply load a file is much appreciated :) Many thanks in advance.
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
<title>Example</title>
</head>
<body>
<p>Example</p>
<br />
<label for="myfile">Select a file:</label>
<input type="file" id="myfile" name="myfile">
<br />
<br />
<div id="print_output"></div>
<br />
<p>File Content:</p>
<div style="border:2px inset #AAA;cursor:text;height:120px;overflow:auto;width:600px; resize:both">
<div id="content">
</div>
</div>
<py-script output="print_output">
import asyncio
import zipfile
from js import document, FileReader
from pyodide import create_proxy
import io
async def process_file(event):
fileList = event.target.files.to_py()
for f in fileList:
data= await f.text()
mf=io.BytesIO(bytes(data,'utf-8'))
with zipfile.ZipFile(mf,"r") as zf:
nl=zf.namelist()
nlf=" _ ".join(nl)
document.getElementById("content").innerHTML=nlf
def main():
# Create a Python proxy for the callback function
# process_file() is your function to process events from FileReader
file_event = create_proxy(process_file)
# Set the listener to the callback
e = document.getElementById("myfile")
e.addEventListener("change", file_event, False)
mf = io.BytesIO()
with zipfile.ZipFile(mf, mode="w",compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr('file1.txt', b"hi")
zf.writestr('file2.txt', str.encode("hi"))
zf.writestr('file3.txt', str.encode("hi",'utf-8'))
with open("a.txt.zip", "wb") as f: # use `wb` mode
f.write(mf.getvalue())
with zipfile.ZipFile("a.txt.zip", "r") as zf:
nl=zf.namelist()
nlf=" ".join(nl)
document.getElementById("content").innerHTML = nlf
main()
</py-script>
</body>
</html>
You were very close with your code. The problem was in converting the file data to the correct data type. The requirement is to convert the arrayBuffer
to Uint8Array
and then to a bytearray
.
Import the required function:
from js import Uint8Array
Read the file data
into an arrayBuffer
and copy it to a new Uint8Array
data = Uint8Array.new(await f.arrayBuffer())
Convert the Uint8Array
to a bytearray
that BytesIO expects
mf = io.BytesIO(bytearray(data))