In one of my requirements is to decode a byte-array based on a cp1047
code page. A sample of my code is:
ebcdic_str = input_bytes.decode('cp1047')
The above code using python works correctly but while executing the same operation as part of the PySpark code (by creating an udf wrapping the above code) i am getting the following error:
ebcdic_str = input_bytes.decode('cp1047')
LookupError: unknown encoding: cp1047
I have successfully done the same operation in PySpark using the code page cp037
, but faced other issues using that code page. Per a suggestion from IBM the code page has been changed to cp1047
. Unfortunately the code itself is failing.
Can anybody suggest a reason for the failure?
The issue was happening because we are not using a PySpark package ebcdic
in our code. Once that package is imported the issue was resolved.
A side note, since ebcdic
package is not a widely used package it might not be pre-distributed on all your worker/edge nodes. So you might want to validate the package is available; otherwise you might receive an 'ebcdic module not found' error.