"pthread_join" doesn't return on a just cancelled thread (with "pthread_cancel")

I have a pool of threads (QueueWorkers class) in my program that are released using this logic:

int QueueWorkers::stop()
{
  for (unsigned int ix = 0; ix < threadIds.size(); ++ix)
  {
    pthread_cancel(threadIds[ix]);
    pthread_join(threadIds[ix], NULL);
  }

  return 0;
}

where threadIds is a class variable of type std::vector<pthread_t>.

This logic works most of the times but I have checked testing that it fails with some probability. In particular, sometimes after the execution of pthread_cancel the pthread_join statement in the next line never returns and my program hangs.

As far as I understand until now, using pthread_join on a cancelled thread should always return. Are there any circumstances that could be avoiding this or any way of debugging what can be going on here? Is my approach to release threads upon termination the right one?

Additional information: Threads have a cancellation handler (registered using pthread_cleanup_push) which frees dynamic memory used by the thread to avoid leaks. Under normal circumstances, the handler is called upon pthread_cancel and works fine, but the time pthread_join fails returning I have checked that the cancellation handler is not invoked.

Thanks in advance!

EDIT: as suggested in question comments, I have modified my code to check the returned value of pthread_cancel. It's always 0, no matter if after that pthread_join works as expected or not.

EDIT2: as requested in some comment to this question, let me provide more detail of how it works.

The pool of threads is initialized by the start() method:

int QueueWorkers::start()
{
  // numberOfThreads and pQueue are class variables
  for (int i = 0; i < numberOfThreads; ++i)
  {
    pthread_t  tid;
    pthread_create(&tid, NULL, workerFunc, pQueue);  
    threadIds.push_back(tid);
  }

  return 0;
}

The start function workerFunc() is as follows (simplified):

static void* workerFunc(void* pQueue)
{
  // Initialize some dynamic objects (Foo for simplification)
  Foo* foo = initFoo();

  // Set pthread_cancel handler
  pthread_cleanup_push(workerFinishes, foo);

  // Loop forever
  for (;;)
  {
    // Wait for new item to process on pQueue
    ... paramsV = ((Queue*) pQueue)->pop();

    // Then process it
    ...
  }

  // Next statemement never executes but compilation breaks without it. See this note in pthread.h:
  // "pthread_cleanup_push and pthread_cleanup_pop are macros and must always be used in
  // matching pairs at the same nesting level of braces".
  pthread_cleanup_pop(0);
}

Note the pthread_cleanup_push() statement before starting the ethernal loop. This is done to implement the cleanup logic upon cancellation for the Foo object:

static void workerFinishes(void* curl)
{
  freeFoo((Foo*) curl);
}

I hope not having over-simplified the code. In any case, you can see the original version here.

Solution

Are sure the thread is in a cancelation or your thread cancelation_type is asynchronous?

From man of pthread_cancel:

A thread's cancellation type, determined by pthread_setcanceltype(3), may be either asynchronous or deferred (the default for new threads). Asynchronous cancelability means that the thread can be canceled at any time (usually immediately, but the system does not guarantee this). Deferred cancelability means that cancellation will be delayed until the thread next calls a function that is a cancellation point. A list of functions that are or may be cancellation points is provided in pthreads(7).

I don't think canceling threads is the best ways to make sure that a thread will finish. Perhaps you can send the thread a message that it should stop and make sure the thread does receive the message and will handle it.