I have an infrequent but fairly consistent crash in my Qt 5.2.0 application which I am having a heck of a time diagnosing, but believe to be related to QSharedData
. The application is highly multi-threaded, which is presumably part of the problem.
The class in question is here:
class RouteData : public QSharedData
{
public:
RouteData() :
m_destAddress(0),
m_valid(false),
m_movingAverage(ROUTE_INITIAL_QUALITY)
{ }
RouteData(const RouteData &other) :
QSharedData(other),
m_destAddress(other.m_destAddress),
m_addresses(other.m_addresses),
m_valid(other.m_valid),
m_movingAverage(other.m_movingAverage),
m_lastSuccess(other.m_lastSuccess),
m_lastFailure(other.m_lastFailure)
{ }
~RouteData() { }
quint16 m_destAddress;
QList<quint16> m_addresses;
bool m_valid;
double m_movingAverage;
QDateTime m_lastSuccess;
QDateTime m_lastFailure;
};
class Route
{
public:
Route()
{
d = new RouteData;
}
Route(quint16 destAddress)
{
d = new RouteData;
d->m_destAddress = destAddress;
}
Route(const Route &other) : d(other.d) {}
QString toString() const;
bool isValid() const { return d->m_valid; }
quint16 destAddress() const { return d->m_destAddress; }
QList<quint16> addressList() const { return d->m_addresses; }
quint32 length() const { return d->m_addresses.length(); }
double quality() const { return d->m_movingAverage; }
quint64 msecsSinceLastFailure() const;
void setDestination(quint16 dest) { d->m_destAddress = dest; }
void setAddressList(const QList<quint16> &addressList);
void markResult(bool success);
bool operator<(const Route& other) const;
bool operator==(const Route& other) const;
bool operator!=(const Route& other) const;
private:
QSharedDataPointer<RouteData> d;
};
Q_DECLARE_TYPEINFO(Route, Q_MOVABLE_TYPE);
The crash occurs here, inserting a Route into a QMap
:
SendResult EmberGateway::ezspSendUnicast(quint16 indexOrDestination, ..., const Route& route)
{
...
if (route.isValid()) {
m_lastRouteMap.insert(indexOrDestination, route);
where m_lastRouteMap is a QMap<quint16, Route>
.
And the stack trace looks like this:
(gdb) where
#0 0x00007fa297ced9a8 in QTimeZone::~QTimeZone() () from /usr/local/Trolltech/Qt-5.2.0/lib/libQt5Core.so.5
#1 0x00007fa297c96de5 in QDateTime::~QDateTime() () from /usr/local/Trolltech/Qt-5.2.0/lib/libQt5Core.so.5
#2 0x00000000004644fb in RouteData::~RouteData (this=0x7fa28c00b150, __in_chrg=<optimized out>) at ../libILS/libILS/Route.h:33
#3 0x0000000000600805 in QSharedDataPointer<RouteData>::operator= (this=0x7fa28c0e6420, o=...)
at /usr/local/Trolltech/Qt-5.2.0/include/QtCore/qshareddata.h:98
#4 0x00000000005ff55b in Route::operator= (this=0x7fa28c0e6420) at ./libILS/Route.h:43
#5 0x0000000000652f8e in QMap<unsigned short, Route>::insert (this=0x7fa28c0880e8, akey=@0x7fa17c6feb44: 25504, avalue=...)
at /usr/local/Trolltech/Qt-5.2.0/include/QtCore/qmap.h:682
#6 0x0000000000641b4b in ils::EmberGateway::ezspSendUnicast (this=0x7fa28c088090, indexOrDestination=25504, apsFrame=...,
data=..., route=...) at libILS/Gateways/EmberGateway.cpp:909
#7 0x00000000006371d5 in ils::EmberGateway::sendUnicast (this=0x7fa28c088090, destAddress=25504, array=..., route=...)
at libILS/Gateways/EmberGateway.cpp:474
#8 0x00000000005fadc4 in NetworkController::sendMessageViaGateway (this=0x7fa28c03e9b0, message=...)
at libILS/Controllers/NetworkController.cpp:1668
#9 0x00000000005ed8f4 in NetworkController::dispatchMessage (this=0x7fa28c03e9b0, pendingMessagePair=...)
at libILS/Controllers/NetworkController.cpp:913
#10 0x0000000000604e2f in QtConcurrent::VoidStoredMemberFunctionPointerCall1<void, NetworkController, QPair<QSharedPointer<Message>, QTimer*>, QPair<QSharedPointer<Message>, QTimer*> >::runFunctor (this=0x7fa23804ac60)
at /usr/local/Trolltech/Qt-5.2.0/include/QtConcurrent/qtconcurrentstoredfunctioncall.h:410
#11 0x00000000005ff41e in QtConcurrent::RunFunctionTask<void>::run (this=0x7fa23804ac60)
at /usr/local/Trolltech/Qt-5.2.0/include/QtConcurrent/qtconcurrentrunbase.h:132
#12 0x00007fa297c55e52 in ?? () from /usr/local/Trolltech/Qt-5.2.0/lib/libQt5Core.so.5
#13 0x00007fa297c591c2 in ?? () from /usr/local/Trolltech/Qt-5.2.0/lib/libQt5Core.so.5
#14 0x00007fa297746f3a in start_thread () from /lib64/libpthread.so.0
#15 0x00007fa296c698fd in clone () from /lib64/libc.so.6
So at #5 we're doing the QMap::insert, and at #4 creating a copy (via Route "=" operator). At #3 the Qt code in question looks like this:
inline QSharedDataPointer<T> & operator=(const QSharedDataPointer<T> &o) {
if (o.d != d) {
if (o.d)
o.d->ref.ref();
T *old = d;
d = o.d;
if (old && !old->ref.deref())
delete old;
}
return *this;
}
We're hitting the "delete old" and seg faulting in the dtor for one of the QDateTime
's (actually it's private QTimeZone
member).
My ezspSendUnicast()
method can run fine for hundreds of thousands of iterations before crashing. I don't think I'm leaking memory (according to valgrind). I believe that the Route object I'm passing to ezspSendUnicast() is properly mutex protected, but it's possible I've missed something (but I thought the QSharedData
was thread-safe for copy-on-write, in any case).
Any insight on how to tackle this problem would be greatly appreciated!
The instances of QSharedData
, when accessed via a separate instance of QSharedDataPointer
, can be, in fact, accessed from multiple threads at once, and it's safe. The shared data pointer will atomically take a copy of the referenced data, as needed.
So, as long as you're working on your own instance of the Route
object, everything is fine. What you can't do is use a reference to an element held by a container. You can use a const-reference to a temporary object constructed from the one held by the container - that is, to a new instance.
Per the documentation:
Proper locking should be used when sharing an instance of an implicitly shared class between threads.
It is never correct to access a reference to a shared instance of an implicitly shared class without holding the lock.
You must always have a new instance. You can copy construct a temporary instance and then pass it via a const reference, for example as an argument in a function call. Such const-references-to-temporary-instances delay the destruction of the referenced temporary object until they are not needed anymore.
This applies to all implicitly shared classes in Qt 4 and Qt 5 -- whether coming from Qt proper (all containers!), or of your own design that leverages QSharedDataPointer
.
So, this would be correct - you keep your own instance.
Route r(...);
QMutexLocker l(&routeMapMutex);
routeMap.insert(key, r);
l.unlock();
r.setDestination(...);
QMutexLocker l(&routeMapMutex);
routeMap.insert(key, r);
l.unlock();
As is this - again, you have your own instance.
QMutexLocker l(&routeMapMutex);
Route r = routeMap[key];
l.unlock();
if (r.isValid()) ...
But this is certainly incorrect, even though the reference is const.
QMutexLocker l(&routeMapMutex);
const Route & r = routeMap[key];
l.unlock();
if (r.isValid()) ...
But this is correct, since it is a const reference to a temporary instance, whose lifetime is prolonged as needed. So, you reference a new instance that nobody but you have access to.
QMutexLocker l(&routeMapMutex);
const Route & r = Route(routeMap[key]);
l.unlock();
if (r.isValid()) ...
This is also correct, since the call is guarded by the mutex:
void fun(const Route &);
QMutexLocker l(&routeMapMutex);
fun(routeMap[key]);
l.unlock();
My hunch is that somewhere in your code you access the reference to the value in the map (as opposed to a const-ref to a temporary) without holding the mutex.
It's also conceivable that you are having some other memory or threading bug that manifests itself in an unrelated place.