Erlang Interoperability guide discusses different interoperability mechanisms. Here are my conclusions:
Ports and Erl_Interface programs: OS scheduled, limit scalability.
Port Drivers: dangerous because a crash in the port driver brings the emulator down too.
C Nodes: Node server needs to scale as well as Erlang app to avoid scalability sacrifices.
NIFs: Loic sums them up well.
Some advocate the use of OpenCL basically delegating resource hungry computations to GPU while letting the Erlang emulator to own the CPU. This sounds fantastic but then you have a requirement on your servers having a suitable GPU.
Using JInterface and communicating with a Java process that spawns a thread for every request might be an option.
So has anyone come across a solution that has been tested in practise and turned out to work well?
Actually all solutions take place. As I've been working tightly with some of them I could say the following:
Ports are safe but port communication is slow. If port crashes, VM continues working. If you do not communicate with your port extensively or you do not trust the port - this is your choice
NIFs are extremely fast. If your data flow is great you should use them. Of course they are unsafe so you have to program NIF library carefully and you'd better learn some C (the point that most of NIF creators skip). Actually scheduling problems are easily overcome with the specific pattern. You should start the new C thread that does actual job just after receiving data from Erlang and detach processing from Erlang thread. So you quit NIF function very quickly returning back in Erlang and waiting for a message from C code.
Java Nodes or C nodes are for tasks that can be moved to the node completely. That are some long and heavy jobs.
Bearing in mind above considerations you decide the way that fits your task best.