pythonweb-scrapingtartarfile

Reading Tarfile from URL


Trying to read tarfile from a URL

Mostly this scraping data from a website. Even tried using gzip to open the file but it produces similar the same error. Please suggest a solution for this.

import tarfile
from io import BytesIO
import urllib.request as urllib2

rt = urllib2.urlopen("https://opentender.eu/data/files/CY_ocds_data.json.tar.gz").read()
csvzip = tarfile.open(BytesIO(rt),mode='r:gz')

This is producing type error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-2ed9e3f5bdd6> in <module>()
      4 import urllib.request as urllib2
      5 rt = urllib2.urlopen("https://opentender.eu/data/files/CY_ocds_data.json.tar.gz").read()
----> 6 csvzip = tarfile.open(BytesIO(rt),mode='r:gz')
      7 # csvzip.printdir()

2 frames
/usr/lib/python3.7/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
    166             mode += 'b'
    167         if fileobj is None:
--> 168             fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
    169         if filename is None:
    170             filename = getattr(fileobj, 'name', '')

TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO

Solution

  • You have to call tarfile.open with the fileobj keyword argument:

    csvzip = tarfile.open(fileobj=BytesIO(rt),mode='r:gz')