Hi I'm trying to run a pipeline to process a very large file (about 4milion records). Everytime it reaches to around 270, 000 it fails and then stops processing anymore records and returns this error.
'/FileLocation/FiLeNAME..DAT' at position '93167616': com.streamsets.pipeline.lib.dirspooler.BadSpoolFileException: com.streamsets.pipeline.api.ext.io.OverrunException: Reader exceeded the read limit '131072'.
If anyone else has experienced similar issue, please help. Thank you
I have checked the lines where it stops the pipeline but there seems to be nothing obvious there. Tried another file and still not working.
'/FileLocation/FiLeNAME..DAT' at position '93167616': com.streamsets.pipeline.lib.dirspooler.BadSpoolFileException: com.streamsets.pipeline.api.ext.io.OverrunException: Reader exceeded the read limit '131072'.
Looks like you're hitting the maximum record size. This limit is in place to guard against badly formatted data causing 'out of memory' errors.
Check your data format configuration and increase Max Record Length, Maximum Object Length, Max Line Length etc depending on the data format you are using.
See the Directory Origin documentation for more detail. Note in particular that you may have to edit sdc.properties
if the records you are parsing are bigger than the system-wide limit of 1048576
bytes.