I am using Apache NIFI 1.28
version, i am trying to create a minimalistic data flow where i am generate the data and want to ingest in HDFS
in `HDP (Hortonworks Data Platform) 2.5.0 , i am getting the following error,
2024-10-31 12:19:20,860 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd] Failed to write to HDFS
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configurable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1023)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:529)
at java.base/java.lang.Class.forName(Class.java:508)
at org.apache.nifi.processors.hadoop.ExtendedConfiguration.getClassByNameOrNull(ExtendedConfiguration.java:70)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2617)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:182)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getCompressionCodec(AbstractHadoopProcessor.java:605)
at org.apache.nifi.processors.hadoop.PutHDFS$1.run(PutHDFS.java:341)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:400)
at java.base/javax.security.auth.Subject.doAs(Subject.java:453)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1930)
at org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:328)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1361)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:247)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configurable
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
... 37 common frames omitted
2024-10-31 12:19:21,002 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@2cb95c6c checkpointed with 2 Records and 0 Swap Files in 49 milliseconds (Stop-the-world time = 21 milliseconds, Clear Edit Logs time = 14 millis), max Transaction ID 105
2024-10-31 12:19:24,899 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:19:24,908 INFO [pool-7-thread-1] o.a.n.wali.SequentialAccessWriteAheadLog Checkpointed Write-Ahead Log with 4 Records and 0 Swap Files in 9 milliseconds (Stop-the-world time = 4 milliseconds), max Transaction ID 292
2024-10-31 12:19:24,909 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 9 milliseconds
2024-10-31 12:19:29,755 INFO [NiFi Web Server-41] o.a.n.c.s.StandardProcessScheduler Stopping PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd]
2024-10-31 12:19:29,755 INFO [NiFi Web Server-41] o.a.n.controller.StandardProcessorNode Stopping processor: PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd]
2024-10-31 12:19:29,757 INFO [Timer-Driven Process Thread-10] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd] to run
2024-10-31 12:19:29,757 INFO [NiFi Web Server-41] o.a.n.c.s.StandardProcessScheduler Stopping GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc]
2024-10-31 12:19:29,757 INFO [NiFi Web Server-41] o.a.n.controller.StandardProcessorNode Stopping processor: GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc]
2024-10-31 12:19:29,757 INFO [Timer-Driven Process Thread-3] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc] to run
2024-10-31 12:19:29,758 INFO [Timer-Driven Process Thread-3] o.a.n.controller.StandardProcessorNode GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc] has completely stopped. Completing any associated Futures.
2024-10-31 12:19:29,762 WARN [org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner] org.apache.hadoop.fs.FileSystem Cleaner thread interrupted, will stop
java.lang.InterruptedException: null
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1722)
at java.base/java.lang.ref.ReferenceQueue.await(ReferenceQueue.java:67)
at java.base/java.lang.ref.ReferenceQueue.remove0(ReferenceQueue.java:158)
at java.base/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:234)
at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:4157)
at java.base/java.lang.Thread.run(Thread.java:1570)
2024-10-31 12:19:29,763 INFO [Timer-Driven Process Thread-10] o.a.n.controller.StandardProcessorNode PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd] has completely stopped. Completing any associated Futures.
2024-10-31 12:19:30,327 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@7e46c2ea // Another save pending = false
2024-10-31 12:19:44,914 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:19:44,914 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:20:04,921 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:20:04,921 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:20:20,880 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
2024-10-31 12:20:20,881 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Archive cleanup completed for container default; will now allow writing to this container. Bytes used = 184.4 GB, bytes free = 1.02 GB, capacity = 185.42 GB
2024-10-31 12:20:24,934 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:20:24,934 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:20:44,938 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:20:44,938 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:21:04,949 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:21:04,950 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:21:20,921 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
2024-10-31 12:21:20,922 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Archive cleanup completed for container default; will now allow writing to
here below is the puthdfs
processor configurations
here below is the processor group whole dataflow along with error:
also i am adding a minimalistic core-site.xml
configurations as well which i have imported from HDP 2.5
and placed in /conf
directory and configured its path along with hdfs-site.xml
in the puthdfs processor
.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.136.131:8020</value>
<final>true</final>
</property>
<property>
<name>fs.trash.interval</name>
<value>360</value>
</property>
<property>
<name>ha.failover-controller.active-standby-elector.zk.op.retries</name>
<value>120</value>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>
<property>
<name>hadoop.proxyuser.falcon.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.falcon.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hbase.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hbase.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hcat.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hcat.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<property>
<name>hadoop.security.key.provider.path</name>
<value></value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>50</value>
</property>
<property>
<name>ipc.client.connection.maxidletime</name>
<value>30000</value>
</property>
<property>
<name>ipc.client.idlethreshold</name>
<value>8000</value>
</property>
<property>
<name>ipc.server.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobtracker.webinterface.trusted</name>
<value>false</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/etc/hadoop/conf/topology_script.py</value>
</property>
<property>
<name>hadoop.proxyuser.nifi.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.nifi.hosts</name>
<value>*</value>
</property>
</configuration>
I shall be gratefull if someone could help me out where i am doing wrong.
I have resolved all these isssues by switching from the technology stack from HDP2.5
to standalone hadoop installation on my windows 11 machine. Becuase HDP2.5
had no more support available thats why unable to resolve the internal ips for datanodes
while replicating. The following hdfs-site.xml
and core-site.xml
i have included in apache nifi installation directory that is
C:\nifi-1.28.0-bin\nifi-1.28.0\conf
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9820</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///C:/hadoopsetup/hadoop-3.2.4/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///C:/hadoopsetup/hadoop-3.2.4/data/dfs/datanode</value>
</property>
</configuration>
for hadoop installation i have reffered this tutorial:
https://apsaggu.wordpress.com/2023/06/29/installation-of-apache-hadoop-on-windows-11/