Assuming I have:
How can I get the HTML content of that page?
Thanks for your time and attention.
Using warcio it would be simply:
warcio extract --payload <file.warc.gz> <offset>
Alternatively, fetch the WARC record using the HTTP range request and then extract the payload at offset 0:
curl -s -r331727487-$((331727487+6613-1)) \
https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2020-40/segments/1600400203096.42/warc/CC-MAIN-20200922031902-20200922061902-00310.warc.gz \
>warc_temp.warc.gz
warcio extract --payload warc_temp.warc.gz 0
The range starts at offset and ends at offset+length-1. See also getting WARC file