pdfms-wordactivemq-classicopennlp

Can I use message broker to stream PDF or MS Word document content as XML?


I am trying to send content of word document and PDF to Apache OpenNLP. I am wondering if I can use ActiveMQ to read the MS word so that I can trigger a process to Apache Kafka to process the stream.

Any suggestion to stream the PDF or word other than ActiveMQ is welcome.


Solution

  • If you use ActiveMQ "Classic" (i.e. any 5.x version) you'll have problems moving large messages as there's no real support for that use-case. However, ActiveMQ Artemis (i.e. ActiveMQ's next-gen broker) has support for arbitrarily large messages which would facilitate your use-case. The nice thing about having large message support in the broker is that you don't have to involve some other kind of storage mechanism in your solution. That makes development and maintenance of your application and environment a bit simpler.