Multi-threaded publish with zmq

I have a question as to the best way to publish from many threads to a single subscriber in ZMQ is. I have a web server, written in C++, that for each connection in, will need to send a ZMQ message to a subscriber. These servers can have hundreds to thousands of concurrent connections, so I am curious what the most efficient way to write to this subscriber is.

I can think of three different ways:

Share connection

I do not believe this way will work, but basically having a single connection pointer, and passing that to all threads. I am assuming(could not find much information when I was searching on the ZMQ site), this will cause a race condition, as I don't think zmq_msg_send() is thread safe. I could put a mutex around the send to solve this problem, but I am worried about speed, as my server needs to be as fast as possible.
Shared queue

This is similar to the shared connection, but in a different way, instead of putting a mutex around the zmq_msg_send(), instead I have a mutex around a shared vector of messages to send, and have a write thread to process all of these messages. I believe this will be faster than the previous method since writing to a vector is probably much quicker than doing a zmq_msg_send(), however, if possible I would like to avoid waiting at all.
Connection per thread

The only way I can think of to avoid mutex waiting is to open a zmq connection per thread(which means per connection I get in, as a process one user connection per thread). This maybe doable, although I do not know how zmq_connect works. Does that block until connection is established? Ideally my flow would go something like this:
```
user_connection()
{
    createZMQConnection();
    doWork();
    sendData();        
}
```
However if creating a connection blocks, it may be better to to use the shared queue.

Has anyone made a similar application, or does anyone know what the recommended patter would be?

Edit:

So as for my infrastructure, I have a many pub to single sub. The pubs are connecting and subs are binding.

I have multiple load balanced boxes running multiple threads per box. Overall I am seeing about 30,000 messages / sec. Per box, I will probably see about about 2,000-4,000 messages/sec. My usual processing time per request is about 40ms, so I will often have about 100-200 concurrent instances open. I wasn't sure if opening 200 concurrent socket objects would be a problem, or if this would be the ZMQ way of doing stuff. From the sounds of things, I should be passing a zmq::context to each thread and creating the sockets there.

This is kind of how I imagine the code will be flowing, let me know if this looks right:

void receiveConnection()
{
    zmq::context_t context(1);
    doWorkClass c(context);
    c.run();
}

doWorkClass(context)
{
    socket(context, ZMQ_PUB);
}

void doWorkClass::run()
{
    sendString = doWork();
    s_send(socket,sendString);
}

So I use one context for all sockets, and create one socket per thread, do my work, and send my message.

Solution

ZeroMQ way of thinking is a bit different

AS seen from your ideas, you try to optimise resources and try to avoid race-conditions on accessing them.

ZeroMQ does look "similar" however, works a bit differently. Forget about pointers, forget about blocking and similar issues.

ZeroMQ is rather an abstract layer to rely on

This means, you may create your many threads as PUB sides and have one SUB ( or several, with a load balanced design ), that consumes all the incoming event-stream.

With the set of information given, there seems to be needed a setup, where:

a central process ( with a known ip:port address if distributed ) creates a SUB-archetype, .bind() to that "central point" and sets it's own subscription ( to avoid any filtering ) to ""
all ad-hoc created threads instantiate it's own PUB archetype and .connect() to a "central node"
may want to add any additional signalling / state-control / own-protocol-handshaking messaging archetypes, being operated in parallel, so as to meet your system-wide requirements.

Zero-sharing principle

ZeroMQ discourages from sharing access-point(s) to the ZeroMQ-socket(s), that is not a thread-safe approach and shall be avoided in principle.

Zero-blocking principle

ZeroMQ also discourages all means for "low-level" and system signalling/blocking.

If needed, there may be created another layer for thread-to-thread soft-SIG signalling & state-control.

All your message-flow can be designed as non-blocking ( and can have a devillish performance relief, indeed ). As you indicated, your servers need speed.

What is your peak-rate and sustained-rate of MSG/msec & MB/msec to be a bit quantitative?

You may find reasonable performance tests for ZeroMQ-layer, both cited here, on Stack Overflow, and on ZeroMQ Web, to have a base for comparison.