I´m using Apache NiFi to ingest and preprocess some CSV files, but when runing during a long time, it always fails. The error is always the same:
FlowFile Repository failed to update
Searching at logs, I see this error always:
2018-07-11 22:42:49,913 ERROR [Timer-Driven Process Thread-10] o.a.n.p.attributes.UpdateAttribute UpdateAttribute[id=c7f45dc9-ee12-31b0-8dee-6f1746b3c544] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update: org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:405)
at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:336)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:28)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: **Cannot update journal file ./flowfile_repository/journals/8772495.journal because this journal has already been closed**
at org.apache.nifi.wali.LengthDelimitedJournal.checkState(LengthDelimitedJournal.java:223)
at org.apache.nifi.wali.LengthDelimitedJournal.update(LengthDelimitedJournal.java:178)
at org.apache.nifi.wali.SequentialAccessWriteAheadLog.update(SequentialAccessWriteAheadLog.java:121)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:300)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:257)
What makes me believe that the root cause is that Nifi Cannot update journal file ./flowfile_repository/journals/8772495.journal because this journal has already been closed**, as seen on logs file.
How can I solve this issue?
Thanks!
If NiFi is having issues writing to the journal file there are a few things to check.
Are you reading in fields from your CSV that are large (greater than 64kb) and attempting to assign them to attributes? You may want to consider processing that particular field in your CSV as a separate flowfile and matching it with attributes later. See this mailing list discussion for more information.
Have you checked NiFi's configuration against the best practices listed in the administration guide? I also recommend understanding each of the Flowfile repository settings. It will allow you to ask more targeted questions.
It is likely worth updating your JVM settings to allow for larger file processing. Check out this post on Hortonworks detailing best practices for high performance systems.
In order to solve the problem you may need to tweak a couple of things. Does the flow handle the CSV in an efficient manner? Does NiFi have enough memory to do what it needs to with the data? Would it be more appropriate to handle the CSV files as records? If that concept is unfamiliar check out this post that introduces record processing in NiFi. I hope some of these resources help you get a little closer to a solution. If you have a follow up question let me know.