Platform characteristics:
Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz
8GB RAM
Windows 10 Visual Studio
MSVC compiler
I wrote the following code in C++ and then I run this executable with Microsoft .NET API (class System.Diagnostics.Process
) and print statistics about the process through some time intervals.
#include <iostream>
#include <chrono>
#include <thread>
int main()
{
long long n = 1'000'000'000;
std::cout << "part 1 started" << std::endl;
char* arr = static_cast<char*>(malloc(n));
std::this_thread::sleep_for(std::chrono::milliseconds(500));
arr[n - 1] = rand();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::cout << "part2 started" << std::endl;
for (long long i = n / 5.0 * 4.0; i < n; i++)
arr[i] = rand();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
std::cout << "part3 started" << std::endl;
for (long long i = n/5; i < n; i++)
arr[i] = rand();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
long long tmpll = 0;
for (int i = 0; i < 100000000; i++)
{
tmpll = 0;
if (rand() % 2)
tmpll = 2'400'000'000;
if (rand() % 2)
tmpll += 2'400'000'000;
if (rand() % 2)
tmpll = 2'400'000'000;
tmpll += rand();
tmpll += rand();
if (tmpll < 0) tmpll = abs(tmpll);
tmpll %= n;
arr[tmpll] = rand();
}
std::cout << arr[rand() % n] << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
std::cout << "ended" << std::endl;
free(arr);
std::cout << "freed" << std::endl;
}
I printed PrivateMemorySize64
, WorkingSet64
, and PeakWorkingSet64
fields, and got the amount I asked to allocate in PrivateMemorySize64
, but always a smaller value (usually no more than 80% of allocated space) in PeakWorkingSet64
.
I read that Windows automatically loads rarely used pages on disk, so it is ok. But just now I've disabled swap file usage (now malloc
fails for 2GB even while it worked correctly with swap file for 8GB on the system with 3GB free RAM, and the code worked correctly as well using 3GB PeakWorkingSet
) and still got this result:
PrivateMemorySize64 = 1002602496
WorkingSet64 = 803287040
PeakWorkingSet64 = 803291136
How is that possible? I cannot really imagine an optimization that allows not to really store 1GB memory as I write into random places of the array for a long time. I thought even when I was not accessing the whole array and just printed a random element, it had to store all 1GB.
Please tell me what Windows 10 does really do here.
I reloaded the computer after disabling page file usage. Then I turned swap file usage back and got the same result:
PrivateMemorySize64 = 1002516480
WorkingSet64 = 803246080
PeakWorkingSet64 = 803299328
If you want to understand the numbers, read Mark Russinovich's book Windows Internals. Part 1 describes processes and memory management.
You need to distinguish between virtual memory, which is what your process has access to, and physical memory, which is the RAM of your PC.
Some of the virtual memory must also be in RAM, because the CPU needs to work with it. This is called working set. The rest of the virtual memory is either swapped to disk, or may even be non-existent yet (reserved only).
Memory can be allocated (committed or reserved) in a certain granularity (typically 64k) and swapped to disk in sizes of a page (typically 4k).
For your C++ code, it matters
So, from the code only, it's not possible to tell how much memory it will use.
As for the 2 GB allocation of a single block, this may fail due to memory fragmentation. E.g. a 32 bit application may access 4 GB of virtual memory at may. Now consider a DLL being loaded at exactly the 2 GB boundary. This means that there's less than 2 GB below the DLL and less than 2 GB above the DLL. Thus, you cannot allocate a contiguous block of 2 GB.
Some notes on the code:
your code uses malloc()
but does not #include <cstdlib>
, so technically the code is not correct. Same for abs()
and rand()
.
why use malloc()
and free()
in a C++ program, when we have new[]
and delete[]
?
if (tmpll < 0) tmpll = abs(tmpll);
uses the C version of abs()
, which takes an int
parameter, but you use it with a long long
variable. The behavior is implementation-defined.
abs(tmpll)
returns an int
, so you write only to 2 GB of memory when this is invoked. That's ok for the given 1 GB of memory, but may not be ok when you test with more memory.
tmpll += ...;
may overflow and signed integer overflow is undefined behavior.
In order to fill a complete array, this code may be used:
#include <memory>
#include <cstdlib>
#include <Windows.h>
#include <Psapi.h>
#include <iostream>
int main()
{
constexpr auto SIZE = 3'000'000'000ull;
auto arr = std::unique_ptr<char[]>(new char[SIZE]);
// Find the page size
SYSTEM_INFO sysInfo;
GetSystemInfo(&sysInfo);
auto pageSize = sysInfo.dwPageSize;
// Fill the array
// Accessing one byte per page is enough
for (auto i = 0ull; i < SIZE; i+=pageSize)
arr[i] = static_cast<char>(rand());
// Get Memory statistics
PROCESS_MEMORY_COUNTERS pmc;
GetProcessMemoryInfo(GetCurrentProcess(), &pmc, sizeof(pmc));
std::cout << "Working Set Size: " << pmc.WorkingSetSize << '\n';
std::cout << "Peak Working Set Size: " << pmc.PeakWorkingSetSize << '\n';
std::cout << "Page File Usage: " << pmc.PagefileUsage << '\n';
std::cout << "Peak Page File Usage: " << pmc.PeakPagefileUsage << '\n';
}
Output on my machine, compiled as C++20, 64 bit, release build, no debugger, page file disabled and rebooted:
Working Set Size: 3003518976
Peak Working Set Size: 3003523072
Page File Usage: 3008671744
Peak Page File Usage: 3008692224
Looks good to me. Everything is in physical RAM, so working set size is 3 GB. Interestingly, it also gives a page file usage of 3 GB, although that is set to 0 MB across all drives: