tcpapache-nififirewallgatewayflowfile

Connect one NiFi cluster to another over a unidirectional network / data diode / restricted network


I need to connect two Apache NiFi instances / cluster in a restricted network environment. Basically the whole exchange must work over a unidirectional network / data diode, that only supports TCP packages from client to server and ACK packages from the server. Both FlowFile attributes and content should be transferred as is.

Apache NiFi provides the site-to-site feature, to connect an instance / cluster to another, exactly what I'm looking for. It provides multiple protocol implementations, such as raw sockets or HTTP(S). However, as it includes data exchange from server to client as well, it sadly does NOT work over a unidirectional network / data diode.

Besides site-to-site, NiFi provides processors to transfer data using TCP, namely the the PutTCP and ListenTCP processor pair. However, PutTCP only transfers the FlowFile content and NOT the FlowFile attributes. Moreover, ListenTCP does NOT result in the transfer of the content of each FlowFile into exactly one FlowFile with the same content, but splits the content using a defined delimiter instead.

Are there easy means to transfer FlowFiles, that is both attributes and content, from one NiFi instance / cluster to another, over a unidirectional network / data diode?


Solution

  • As of NiFi 1.21.0 there seems to exist no built-in solution for the problem outlined above, aside from providing a custom implementation using either one of the processors allowing to execute arbitrary code, such as the ExecuteScript processor, or providing a custom processor.

    A processor pair inspired by the ListenTCP / PutTCP that transfers both FlowFile attributes and content as is, can be a solution to the problem outlined above. There is an existing but stalled community project, that aims to solve exactly this problem. I ended up forking from that project creating "nifi-flow-over-tcp", which has evolved since then. Feel free to use / contribute to the project, when facing a similar problem.

    It provides a PutFlowToTCP and ListenFlowFromTCP processor pair, which use a simple TCP based protocol, to transfer both FlowFile attributes and content over plain TCP sockets.

    It basically encodes the FlowFile into a format that provides the attributes byte length, the contents byte length, followed by the attributes (as utf-8 JSON) and the content bytes. On the receiving side, the data is decoded and a new FlowFile containing all attributes and content is created. Note however, that due to implementation restrictions of the NiFi processor API the uuid of the FlowFile is NOT retained.