matlabmatlab-figure

Optimization and acceleration of large-scale calculations in the process of wave simulation


There is a task: to simulate waves on the ocean surface 3D case. I faced the problem that when simulating waves in a large area of the ocean (1 km x 1 km), the program takes a very long time to execute. And in fact it is not executed at all. Because during those half an hour of waiting for the result, it was never calculated.

Example of my code:

clear,clc
%% Data 
x = 0:1:100; % coordinates х [m]
y = 0:1:100; % coordinates y [m]
g = 9.81; % gravitational constant [m/s^2]
speed = 5; % wind velocity [m/s]
w0 = (g / speed); % norm.frequency [Hz]
dw = 0.1; % frequency step [rad/s]
w = 0.8:dw:11.1; % frequency [rad/s] 
dtt = pi / 18; % angular step [rad]
theta = 0:dtt:pi; % direction angles, angles between the wavevector & coordintae axis [rad]

%% P-M spectrum, Frequency-Angular spectrum & Amplitude
Psi = 8.1e-3 .* ((w/w0).^(-5)) .* exp((-0.74) ./ ((w/w0).^(4))); % P-M spectrum [none]
Phi = ((speed)^(5)/g^(3)) * Psi; % self-similar spectrum [s*m^2]
Sw = Phi / 2; % frequency spectrum [s*m^2]

St = cos(theta).^(4); % angular spectrum [none]
Norm = trapz(dtt, St); % norm.coefficient [none]
Swt = Sw .* St'; % frequency-angular spectrum [s*m^2]

eta0 = sqrt((Swt * dw * dtt) ./ Norm); % amplitude [m]

figure(1);
subplot(2,1,1)
plot(w, Psi);
title('$$\Psi$$($$\omega$$) - P-M spectrum', 'Interpreter', 'LaTex');
xlabel('\omega [rad/s]');
ylabel('\Psi [none]');
grid on;
subplot(2,1,2)
plot(w, Swt); 
title('$$S(\omega , \theta)$$($$\omega$$) - frequency-angular spectrum', 'Interpreter', 'LaTex');
xlabel('\omega [rad/s]');
ylabel('S(\omega,\theta) [s*m^2]');
grid on;

%% Setting the initial phase parameter
phase = 2*pi*rand(length(theta),length(w)); %% random initial phase ranging from 0 to 2pi [rad]

%% Surface Waving [Linear, 3D (eta & x,y)] at different harmonics & random phase (at one moment in time), different directions of the wavevector (multiple angles)
t = 0; % time moment [s]
Kabs = (w.^2) / g; % wavevector module [rad/m]
Kx = Kabs .* cos(theta)'; % projection of the wavevector onto the x-axis [rad/m]
Ky = Kabs .* sin(theta)'; % projection of the wavevector onto the y-axis [rad/m]

eta = zeros(length(x),length(y),length(theta),length(w)); % reserving space for calculation results
tic
for i = 1:length(x)
    for j = 1:length(y)
        eta(i,j,:,:) = eta0 .* cos(w * t - Kx .* i - Ky .* j + phase);
    end
end
toc
% sum(sum(eta,4),3) - double sum of eta over all harmonics (frequencies) and wavevector directions (angles theta),
% where '4' и '3' summation indicator for variable frequency and angle
etaW = sum(eta,4);
etaWA = sum(etaW,3);

figure(2)
surf(x,y,etaWA);
title('\eta(x,y) - surface waving');
xlabel('x [m]');
ylabel('y [m]');
zlabel('\eta [m]');
cbar = colorbar;
cbar.Label.String = '\eta [m]';
grid on
shading flat

One of the code optimization methods that I was able to use is to create an "empty" 4D array (4D array of zeros) eta = zeros(length(x),length(y),length(theta),length(w)); into which, after executing the loop, the calculation results will be filled:

eta = zeros(length(x),length(y),length(theta),length(w)); % reserving space for calculation results
tic
for i = 1:length(x)
    for j = 1:length(y)
        eta(i,j,:,:) = eta0 .* cos(w * t - Kx .* i - Ky .* j + phase);
    end
end
toc

Then I summarize the results by frequency and angle variables:

etaW = sum(eta,4);
etaWA = sum(etaW,3);

Thereby preparing the place for the results in advance. It kind of helped. For example, the code execution time for an area x = 0:1:100; y = 0:1:100; (100 m x 100 m) using this method was 0.7 s (without it, 3.9 s). In the case of an area x = 0:1:500; y = 0:1:500; (500 m x 500 m), the execution time was about 19 seconds (without it...I have no idea, I didn't wait for the code to be executed, but it turns out it was very long). However, with an area x = 0:1:1000; y = 0:1:1000; (1000 m x 1000 m), I have not been getting the desired result for a long time (and it feels like I won't get it at all).

Is there any more ways in my case to achieve the desired result and optimize my code so that it can cope with such a scale of data (at the same time, without changing the step in the array)?

Note: I have 16 GB of RAM on my computer. My second computer, where I initially performed calculations, hung up completely during the execution of the program, so I had to restart it "manually". So I suppose there is even less RAM in it.


Solution

  • You want a 2D output array, don’t compute the full 4D array and then project it. Instead, compute the projection in your loop.

    The intermediate 4D array in the case of a 1000x1000 output is more than 15 GB, which means your 16 GB system cannot hold it in memory in its entirety (since MATLAB and the OS also need some space), and will swap out portions to disk.

    If the loop were to index the array as eta(:,:,i,j) then it wouldn’t be so bad, as you’re writing to a contiguous part of memory. But since you index eta(i,j,:,:), every loop iteration is writing to bytes all across the 15 GB memory block. Thus, every loop iteration has your system read from disk and write back to disk most the array. This obviously slows down things a lot, and is called “thrashing”, because in the days of spinning hard disks, this made a lot of noise.

    In general, when writing loops, you should always consider the order that array elements have in memory, and loop such that you access array elements in memory order as much as possible. This optimizes the usage of the system’s cache. In MATLAB, arrays are stored with the elements along the 1st dimension contiguous in memory. So you don’t want the first dimension index being the outer loop, and you don’t want to access elements along the last dimension(s) as a single block.

    To project inside the loop, instead of

    eta = zeros(length(x),length(y),length(theta),length(w));
    for i = 1:length(x)
        for j = 1:length(y)
            eta(i,j,:,:) = eta0 .* cos(w * t - Kx .* i - Ky .* j + phase);
        end
    end
    etaW = sum(eta,4);
    etaWA = sum(etaW,3);
    

    write

    etaWA = zeros(length(x),length(y));
    for j = 1:length(y)
        for i = 1:length(x)
            etaWA(i,j) = sum(eta0 .* cos(w * t - Kx .* i - Ky .* j + phase), 'all');
        end
    end
    

    Note how I not only added the sum over all dimensions inside the loop instead of at the end. I also swapped the loop order, so that the inner loop goes over the first dimension. This is the most efficient way to loop in MATLAB.

    By the way, “reserving space for calculation results” is crucial. This is called pre-allocation in MATLAB speak, and avoids the huge amount of work done when enlarging an array repeatedly.