excelmatlabmean

Error in calculating the average values of the rows of a group of excel-files


There is a group of excel files (10 files), in each of which there is a table: 2 columns (the name of the parameter, its value) and 12 rows (the number of parameters and, accordingly, their values). It is necessary to calculate the average between 10 variations of a single parameter. There is a program for this below. However, I noticed that he performs the calculation with an error of one ten-thousandth of a number, he underestimates it. For example, here's what the result gives me:

Matlab Averaging using Excel
0.0288 0,02891568
0.0008 0,000836121
0.0012 0,00117159
0.0723 0,07248086
0.0858 0,085798107
0.0867 0,086977032
0.1030 0,102957729
0.9524 0,959947104
-0.0050 -0,005002288
0.0008 0,000837283
-0.0000 -1,28168E-05
0.0000 2,17761E-06
clc, clear;
 
% Folder with Excel files
folder = 'The path to the folder with 10 excel files'; % the path to the folder
files = dir(fullfile(folder, '*.xlsx')); % we get a list of all excel files
 
% Initialize the vector to store the average values
average_values = [];
 
% Loop through all files
for i = 1:length(files)
    % The full path to the file
    file_path = fullfile(folder, files(i).name);
 
    % Reading data from an Excel file
    opts = detectImportOptions(file_path, 'VariableNamingRule', 'preserve'); 
    data = readtable(file_path, opts);
 
    % Check if there is at least one numeric column
    if any(varfun(@isnumeric, data, 'OutputFormat', 'uniform'))
        % We extract only numeric data
        numeric_data = data{:, varfun(@isnumeric, data, 'OutputFormat', 'uniform')}; 
 
        % We calculate the average values for the rows, ignoring NaN
        avg_values_per_row = mean(numeric_data, 2, 'omitnan');
 
        % Saving the average values in an array
        average_values = avg_values_per_row;
    else
        warning('There are no numeric columns in the "%s" file.', files(i).name);
    end
end
 
% Displaying average values
disp('Average values by line:');
disp(average_values);
 
% Displaying the number of rows
disp('Number of lines:');
disp(length(average_values));

As you can see, for parameters number 1, 4, 6, 8, the underestimation goes from 0.0001 to 0.0007 units (in the case of the 8th parameter, the underestimation is already in one thousandth of the number). What is most interesting is that if you manually enter 10 values of each parameter into the mean function in Matlab and calculate, it will give the correct result without this underestimation (using the example of the 8th parameter):

>> mean([0.952380002    0.980380905 0.97087088  0.95237715  0.952379051 0.970871364 0.980383351 0.925921306 0.961536062 0.952370972
])

ans =

    0.9599

Where can there be a problem in the code and how to fix it so that it gives an accurate result? (the attachment contains a folder with files from the example: https://drive.google.com/file/d/1COPpBq2mU8Ww62DRvbOKTtr_aC-LsCIL/view?usp=sharing)


Solution

  • Your loop just returns values from the last workbook. Open 'Param10.xlsx' and compare values with averages you displayed in the question. The error is here:

    for i = 1:length(files)
            ...
            average_values = avg_values_per_row;
            ...
    end
    

    On every loop run average_values gets current values. As I wrote before, you need to sum something like this (don't know the exact syntax):

            average_values = sum(average_values,numeric_data)
    

    and after the loop divide each element of average_values by 'length(files)'.