I have a list of warc records. Every single item in list is created like this:
header = warc.WARCHeader({
"WARC-Type": "response",
"WARC-Target-URI": "www.somelink.com",
}, defaults=True)
data = "Some string"
record = warc.WARCRecord(header, data.encode('utf-8','replace'))
Now, I am using *.warc.gz to store my records like this:
output_file = warc.open("my_file.warc.gz", 'wb')
And write records like this:
output_file.write_record(record) # type of record is WARCRecord
But how can I compress with lzma as *.warc.xz? I have tried replacing gz with xz when callig warc.open, but warc in python3 do not support this format. I have found this trial, but I was not able to save WARCRecord with this:
output_file = lzma.open("my_file.warc.xz", 'ab', preset=9)
header = warc.WARCHeader({
"WARC-Type": "response",
"WARC-Target-URI": "www.somelink.com",
}, defaults=True)
data = "Some string"
record = warc.WARCRecord(header, data.encode('utf-8','replace'))
output_file.write(record)
The error message is:
TypeError: a bytes-like object is required, not 'WARCRecord'
Thanks for any help.
The WARCRecord
class has a write_to
method, to write records to a file object.
You could use that to write records to a file created with lzma.open()
.