I am currently working on a pipeline which loads and transforms multiple images at once. As this is happening to many images at the same time (1440) the memory footprint is quite heavy. I therefore tried to implement a memory management system based on setrlimit, however it doesn't seem to affect the spawned threads (std::thread) as they will happily ignore the limit - I know this because of calls to getrlimit() in the threaded functions - and eventually cause my program to be killed. Here is the code I use for setting the limit:
void setMemoryLimit(std::uint64_t bytes)
{
struct rlimit limit;
getrlimit(RLIMIT_AS, &limit);
if(bytes <= limit.rlim_max)
{
limit.rlim_cur = bytes;
std::cout << "New memory limit: " << limit.rlim_cur << " bytes" << std::endl;
}
else
{
limit.rlim_cur = limit.rlim_max;
std::cout << "WARNING: Memory limit couldn't be set to " << bytes << " bytes" << std::endl;
std::cout << "New memory limit: " << limit.rlim_cur << " bytes" << std::endl;
}
if(setrlimit(RLIMIT_AS, &limit) != 0)
std::perror("WARNING: memory limit couldn't be set:");
// included for debugging purposes
struct rlimit tmp;
getrlimit(RLIMIT_AS, &tmp);
std::cout << "Tmp limit: " << tmp.rlim_cur << " bytes" << std::endl; // prints the correct limit
}
I'm using Linux. The man page states that setrlimit affects the whole process so I'm kind of clueless why the threads don't seem to be affected.
Edit: By the way, the function above is called at the very beginning of main().
The problem was quite hard to find as it consisted of two entirely independent components:
My executable was compiled with -fomit-frame-pointer. This will result in a reset of the limit. See the following example:
/* rlimit.cpp */
#include <iostream>
#include <thread>
#include <vector>
#include <sys/resource.h>
class A
{
public:
void foo()
{
struct rlimit limit;
getrlimit(RLIMIT_AS, &limit);
std::cout << "Limit: " << limit.rlim_cur << std::endl;
}
};
int main()
{
struct rlimit limit;
limit.rlim_cur = 500 * 1024 * 1024;
setrlimit(RLIMIT_AS, &limit);
std::cout << "Limit: " << limit.rlim_cur << std::endl;
std::vector<std::thread> t;
for(int i = 0; i < 5; i++)
{
A a;
t.push_back(std::thread(&A::foo, &a));
}
for(auto thread : t)
thread.join();
return 0;
}
Outputs:
> g++ -std=c++11 -pthread -fomit-frame-pointer rlimit.cpp -o limit
> ./limit
Limit: 524288000
Limit: 18446744073709551615
Limit: 18446744073709551615
Limit: 18446744073709551615
Limit: 18446744073709551615
Limit: 18446744073709551615
> g++ -std=c++11 -pthread rlimit.cpp -o limit
> ./limit
Limit: 524288000
Limit: 524288000
Limit: 524288000
Limit: 524288000
Limit: 524288000
Limit: 524288000
For the image processing part I work with OpenCL. Apparently NVIDIA's implementation calls setrlimit and pushes the limit to rlim_max.