For some testing purposes I have written a piece of code for measuring execution times of several fast operations in my real-time video processing code. And things are working fine. I am getting very realistic results, but i noticed one interesting peculiarity.
I am using a POSIX function clock_gettime
with CLOCK_MONOTONIC
attribute. So i am getting timespecs with nanosecond precision (1/1000000000sec) and it is said that getting a timespec value in that manner takes only several processor ticks.
Here are two functions that i am using for saving timespecs. I also added definitions of datastructures that are being used:
QVector<long> timeMemory;
QVector<std::string> procMemory;
timespec moment;
void VisionTime::markBegin(const std::string& action) {
if(measure){
clock_gettime(CLOCK_MONOTONIC, &moment);
procMemory.append(action + ";b");
timeMemory.append(moment.tv_nsec);
}
}
void VisionTime::markEnd(const std::string& action) {
if(measure){
clock_gettime(CLOCK_MONOTONIC, &moment);
procMemory.append(action + ";e");
timeMemory.append(moment.tv_nsec);
}
}
I am collecting the results into a couple of QVectors that are used later.
I noticed that when these two functions are executed for the first time(right after each other, having nothing between them), the difference between two saved timespecs is ~34000ns. Next time the difference is about 2 times smaller. And so on. If i execute them for hundreds of times then the average difference is ~2000ns. So an average recurrent execution of these functions takes about 17000x less time than the first one. As i am taking hundreds of measurements in a row, it does not really matter to me that some first executions last a little bit longer. But anyway it just interests me, why is it that way?
I have various experience in Java, but i am quite new to c++. I do not know much how things work here.
I am using O3 flag for optimization level.
My QMake conf:
QMAKE_CXXFLAGS += -O3 -march=native
So, can anyone tell, which part of this little code gets faster at runtime, how and why? I doubt appending to QVector. Does optimization affect this somehow?
It's my first question here on stackoverflow, hope it's not too long :) Many thanks for all your responses!
There are quite a few potential first-time costs in your measurement code, here's a couple and how you can test for them.
Memory allocation: Those QVectors won't have any memory allocated on the heap until the first time you use them.
Also, the vector will most likely start out by allocating a small amount of memory, then allocate exponentially more as you add more data (a standard compromise for containers like this). Therefore, you will have many memory allocations towards the beginning of your runtime, then the frequency will decrease over time.
You can verify that this is happening by looking at the return value of QVector::capacity()
, and tune the behavior by QVector::reserve(int)
- e.g. if you do timeMemory.reserve(10000);
, procMemory.reserve(10000);
, you can reserve enough space for the first ten thousand measurements before your measurements begin.
Lazy symbol binding: the dynamic linker by default won't resolve symbols from Qt (or other shared libraries) until they are needed. So, if these measuring functions are the first place in your code where some QVector
or std::string
functions are called, the dynamic linker will need to do some one-time work to resolve those functions, which takes time.
If this is indeed the case, you can disable the lazy loading by setting the environment variable LD_BIND_NOW=1
on Linux or DYLD_BIND_AT_LAUNCH=1
on Mac.