Is boost::lockfree::queue (in multithreaded program) lockable?

I am working on a program where, 2+ (gstreamer) boost:: threads and same number of boost:: threads of a dummy application are simultaneously using a queue. Now this queue is used for synchronization between tasks of gstreamer thread with its corresponding dummy application thread.

The queue is an EVENT queue: where the EVENT is a structure

typedef struct EVENT{
    EVENT_TYPE Ev_Type;  // EVENT_TYPE is enum of Events
    EVENT_DATA Ev_Data;  // EVENT_DATA is union of data to be stored for that event
}Event_;

Upon googling, I came across these 2 options for queue: lockfree::queue & lockfree::spsc_queue, suggesting that lockfree::queues are used for multithreaded applications.

CONFUSION: Why the name lockFREE? is it suggesting cannot be (mutex) locked?

Also see this example, it says "boost::lockfree::queue is not lockfree"

Mind=blown...

Well, I then tried following the example (above link) and implement this queue

class Foo {
protected:
    boost::lockfree::queue<EVENT> mqSttEventQueue;
public:
    unsigned int SetEventIntoQueue(EVENT *psEvent);
};

And its definition as:

unsigned int Foo::SetEventIntoQueue(EVENT *psEvent) {
    if(mqSttEventQueue.push(*psEvent)){
         //notify that event is in queue;
    }
}

This is compiled successfully. But I am completely running in the dark here.

QUESTIONS:

Why the example declares the queue as

boost::lockfree::queue<int> queue(128);

what is that 128 there for? Is it saying the queue size is 128 (bytes/items)? Is queue<int> declaring the type of data in the queue?

Why didnt it work for my program

boost::lockfree::queue<EVENT> mqSttEventQueue(128);

If I declare it like this, it gets compilation error as

error: expected identifier before numeric constant

boost::lockfree::queue<EVENT> mqSttEventQueue(128);
                                              ^~~

PS:- I am really not sure what title to put here...Edit it if you can.

Solution

Why the name lockFREE? is it suggesting cannot be (mutex) locked?

Of course anything can be locked; you put the mutex outside the data structure and have every thread that touches the data structure use it.

boost::lockfree::queue provides unsynchronized_pop and unsynchronized_push for use in cases where you've ensured that only one thread can be accessing the queue.

But the main purpose of lockfree::queue and of lockless algorithms / data structures is that they don't need to be locked: multiple threads can safely write and/or read at the same time.

"lock free" has 2 meanings in programming, leading to potentially confusing but true statements like "this lockless algorithm is not lock-free".

Casual usage: synonym for lockless - implemented without mutexes, using atomic loads, stores, and RMW operations like CAS or std::atomic::atomic_fetch_add. See for example An Introduction to Lock-Free Programming (Jeff Preshing). And maybe What every systems programmer should know about concurrency.

std::shared_ptr uses lockless atomic operations to manage its control block. C++11 std::atomic<> provides lockless primitives for custom algorithms. See stdatomic. Normally in C++11, unsynchronized access to the same variable by multiple threads is undefined behaviour. (Unless they're all read-only.) But std::atomic gives you well-defined behaviour, with your choice of sequential-consistency, acquire/release, or relaxed memory ordering.
Technical computer-science meaning: a thread sleeping forever or being killed won't block the rest of the threads. i.e. guaranteed forward progress for the program as a whole (at least one thread). (Wait-free is when threads never have to retry). See https://en.wikipedia.org/wiki/Non-blocking_algorithm. A CAS retry loop is a classic example of lock-free but not wait-free. Wait-free is stuff like RCU (read-copy-update) read threads, or depending on definitions, a atomic_fetch_add on hardware that implements it as a primitive (e.g. x86 xadd), not in terms of an LL/SC or CAS retry loop.

Most lockless multi-reader / multi-writer queues are not lock-free in the technical sense. Usually they use a circular buffer, and a writer will "claim" an entry somehow (fixing its order in the queue). but it can't be read until the writer finishes writing to the entry itself.

See Lock-free Progress Guarantees for an example with analysis of its possible blocking behaviour. A writer atomically increments a write-index, then writes data into the array entry. If the writer sleeps between doing those things, other writers can fill up the later entries while readers are stuck waiting for that claimed but not written entry. (I haven't looked at boost::lockfree::queue<T>, but presumably it's similar¹.)

In practice performance is excellent with very low contention between writers and readers. But in theory a writer could block at just the wrong moment and stall the whole queue.

Footnote 1: The other plausible option for a queue is a a linked list. In that case you can fully construct a new node and then attempt to CAS it into the list. So if you succeed at adding it, then other threads can read it right away because you have its pointers set correctly.

But the reclaim problem (safely freeing memory that other threads might be reading to see if another reader has already claimed them) is extremely thorny outside of garbage-collected languages / environments. (e.g. Java)

boost::lockfree::queue<int> queue(128); Why 128?

That's the queue (max) size, in entries. Of int in this case, because you used queue<int>, duh. As mentioned above, most lockless queues use a fixed size circular buffer. It can't realloc and copy like std::vector when it needs to grow, because other threads can be reading it simultaneously.

As documented in the manual (the first google hit for boost::lockfree::queue), the explicit queue(size_type) constructor takes a size.

You could also bake the capacity into the type, by using it as a template parameter. (So the capacity becomes a compile-time constant everywhere that uses the queue, not just in places that can do constant-propagation from the constructor call.)

The class apparently doesn't enforce / require a power-of-2 size, so a template size parameter could maybe optimize significantly better by letting % capacity operations compile into an AND with a mask instead of a division.