import shapefile
data = shapefile.Reader("data_file.shp")
shapes = data.shapes()
My problem is that getting the shapes from the Shapefile reader gives me an exception MemoryError
when using Pyshp.
The .shp
file is quite large, at 1.2 gB. But I am using ony 3% of my machine's 32gB, so I don't understand it.
Is there any other approach that I can take? Can process the file in chunks in Python? Or use some tool to spilt the file into chinks, then process each of them individually?
Although I haven't been able to test it, Pyshp should be able to read it regardless of the file size or memory limits. Creating the Reader
instance doesn't load the entire file, only the header information.
It seems the problem here is that you used the shapes()
method, which reads all shape information into memory at once. This usually isn't a problem, but it is with files this big. As a general rule you should instead use the iterShapes()
method which reads each shape one by one.
import shapefile
data = shapefile.Reader("data_file.shp")
for shape in data.iterShapes():
# do something...