c++windowsmemorymalloc

How much memory does a process on Windows really use?


Platform characteristics:

Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz
8GB RAM
Windows 10 Visual Studio
MSVC compiler

I wrote the following code in C++ and then I run this executable with Microsoft .NET API (class System.Diagnostics.Process) and print statistics about the process through some time intervals.

#include <iostream>
#include <chrono>
#include <thread>

int main()
{
   long long n = 1'000'000'000;
   std::cout << "part 1 started" << std::endl;
   char* arr = static_cast<char*>(malloc(n));
   std::this_thread::sleep_for(std::chrono::milliseconds(500));
   arr[n - 1] = rand();
   std::this_thread::sleep_for(std::chrono::milliseconds(500));
   std::cout << "part2 started" << std::endl;
   for (long long i = n / 5.0 * 4.0; i < n; i++)
      arr[i] = rand();
   std::this_thread::sleep_for(std::chrono::milliseconds(500));
   std::cout << "part3 started" << std::endl;
   for (long long i = n/5; i < n; i++)
      arr[i] = rand();
   std::this_thread::sleep_for(std::chrono::milliseconds(500));
   long long tmpll = 0;
   for (int i = 0; i < 100000000; i++)
   {
      tmpll = 0;
      if (rand() % 2)
         tmpll = 2'400'000'000;
      if (rand() % 2)
         tmpll += 2'400'000'000;
      if (rand() % 2)
         tmpll = 2'400'000'000;
      tmpll += rand();
      tmpll += rand();
      if (tmpll < 0) tmpll = abs(tmpll);
      tmpll %= n;
      arr[tmpll] = rand();
   }
   std::cout << arr[rand() % n] << std::endl;
   std::this_thread::sleep_for(std::chrono::milliseconds(2000));
   std::cout << "ended" << std::endl;
   free(arr);
   std::cout << "freed" << std::endl;
}

I printed PrivateMemorySize64, WorkingSet64, and PeakWorkingSet64 fields, and got the amount I asked to allocate in PrivateMemorySize64, but always a smaller value (usually no more than 80% of allocated space) in PeakWorkingSet64.

I read that Windows automatically loads rarely used pages on disk, so it is ok. But just now I've disabled swap file usage (now malloc fails for 2GB even while it worked correctly with swap file for 8GB on the system with 3GB free RAM, and the code worked correctly as well using 3GB PeakWorkingSet) and still got this result:

PrivateMemorySize64 = 1002602496
WorkingSet64 = 803287040
PeakWorkingSet64 = 803291136

How is that possible? I cannot really imagine an optimization that allows not to really store 1GB memory as I write into random places of the array for a long time. I thought even when I was not accessing the whole array and just printed a random element, it had to store all 1GB.

Please tell me what Windows 10 does really do here.

I reloaded the computer after disabling page file usage. Then I turned swap file usage back and got the same result:

PrivateMemorySize64 = 1002516480
WorkingSet64 = 803246080
PeakWorkingSet64 = 803299328

Solution

  • If you want to understand the numbers, read Mark Russinovich's book Windows Internals. Part 1 describes processes and memory management.

    You need to distinguish between virtual memory, which is what your process has access to, and physical memory, which is the RAM of your PC.

    Some of the virtual memory must also be in RAM, because the CPU needs to work with it. This is called working set. The rest of the virtual memory is either swapped to disk, or may even be non-existent yet (reserved only).

    Memory can be allocated (committed or reserved) in a certain granularity (typically 64k) and swapped to disk in sizes of a page (typically 4k).

    For your C++ code, it matters

    So, from the code only, it's not possible to tell how much memory it will use.

    As for the 2 GB allocation of a single block, this may fail due to memory fragmentation. E.g. a 32 bit application may access 4 GB of virtual memory at may. Now consider a DLL being loaded at exactly the 2 GB boundary. This means that there's less than 2 GB below the DLL and less than 2 GB above the DLL. Thus, you cannot allocate a contiguous block of 2 GB.

    Some notes on the code:

    In order to fill a complete array, this code may be used:

    #include <memory>
    #include <cstdlib>
    #include <Windows.h>
    #include <Psapi.h>
    #include <iostream>
    int main()
    {
        constexpr auto SIZE = 3'000'000'000ull;
        auto arr = std::unique_ptr<char[]>(new char[SIZE]);
    
        // Find the page size
        SYSTEM_INFO sysInfo;
        GetSystemInfo(&sysInfo);
        auto pageSize = sysInfo.dwPageSize;
    
        // Fill the array
        // Accessing one byte per page is enough
        for (auto i = 0ull; i < SIZE; i+=pageSize)
            arr[i] = static_cast<char>(rand());
    
        // Get Memory statistics
        PROCESS_MEMORY_COUNTERS pmc;
        GetProcessMemoryInfo(GetCurrentProcess(), &pmc, sizeof(pmc));
        std::cout << "Working Set Size: " << pmc.WorkingSetSize << '\n';
        std::cout << "Peak Working Set Size: " << pmc.PeakWorkingSetSize << '\n';
        std::cout << "Page File Usage: " << pmc.PagefileUsage << '\n';
        std::cout << "Peak Page File Usage: " << pmc.PeakPagefileUsage << '\n';
    }
    

    Output on my machine, compiled as C++20, 64 bit, release build, no debugger, page file disabled and rebooted:

    Working Set Size: 3003518976
    Peak Working Set Size: 3003523072
    Page File Usage: 3008671744
    Peak Page File Usage: 3008692224
    

    Looks good to me. Everything is in physical RAM, so working set size is 3 GB. Interestingly, it also gives a page file usage of 3 GB, although that is set to 0 MB across all drives:

    0 MB page file size