There is a group of excel files (10 files), in each of which there is a table: 2 columns (the name of the parameter, its value) and 12 rows (the number of parameters and, accordingly, their values). It is necessary to calculate the average between 10 variations of a single parameter. There is a program for this below. However, I noticed that he performs the calculation with an error of one ten-thousandth of a number, he underestimates it. For example, here's what the result gives me:
Matlab | Averaging using Excel |
---|---|
0.0288 | 0,02891568 |
0.0008 | 0,000836121 |
0.0012 | 0,00117159 |
0.0723 | 0,07248086 |
0.0858 | 0,085798107 |
0.0867 | 0,086977032 |
0.1030 | 0,102957729 |
0.9524 | 0,959947104 |
-0.0050 | -0,005002288 |
0.0008 | 0,000837283 |
-0.0000 | -1,28168E-05 |
0.0000 | 2,17761E-06 |
clc, clear;
% Folder with Excel files
folder = 'The path to the folder with 10 excel files'; % the path to the folder
files = dir(fullfile(folder, '*.xlsx')); % we get a list of all excel files
% Initialize the vector to store the average values
average_values = [];
% Loop through all files
for i = 1:length(files)
% The full path to the file
file_path = fullfile(folder, files(i).name);
% Reading data from an Excel file
opts = detectImportOptions(file_path, 'VariableNamingRule', 'preserve');
data = readtable(file_path, opts);
% Check if there is at least one numeric column
if any(varfun(@isnumeric, data, 'OutputFormat', 'uniform'))
% We extract only numeric data
numeric_data = data{:, varfun(@isnumeric, data, 'OutputFormat', 'uniform')};
% We calculate the average values for the rows, ignoring NaN
avg_values_per_row = mean(numeric_data, 2, 'omitnan');
% Saving the average values in an array
average_values = avg_values_per_row;
else
warning('There are no numeric columns in the "%s" file.', files(i).name);
end
end
% Displaying average values
disp('Average values by line:');
disp(average_values);
% Displaying the number of rows
disp('Number of lines:');
disp(length(average_values));
As you can see, for parameters number 1, 4, 6, 8, the underestimation goes from 0.0001 to 0.0007 units (in the case of the 8th parameter, the underestimation is already in one thousandth of the number). What is most interesting is that if you manually enter 10 values of each parameter into the mean
function in Matlab and calculate, it will give the correct result without this underestimation (using the example of the 8th parameter):
>> mean([0.952380002 0.980380905 0.97087088 0.95237715 0.952379051 0.970871364 0.980383351 0.925921306 0.961536062 0.952370972
])
ans =
0.9599
Where can there be a problem in the code and how to fix it so that it gives an accurate result? (the attachment contains a folder with files from the example: https://drive.google.com/file/d/1COPpBq2mU8Ww62DRvbOKTtr_aC-LsCIL/view?usp=sharing)
Your loop just returns values from the last workbook. Open 'Param10.xlsx' and compare values with averages you displayed in the question. The error is here:
for i = 1:length(files)
...
average_values = avg_values_per_row;
...
end
On every loop run average_values gets current values. As I wrote before, you need to sum something like this (don't know the exact syntax):
average_values = sum(average_values,numeric_data)
and after the loop divide each element of average_values by 'length(files)'.