matlabecdf

How to plot and estimate empirical CDF and cdf in matlab


the question has already been raised several times, but mine differs a little from those previously voiced. There is a table (x value and relative frequencies).

x 150 250 350 450 550 650 750
y 1 2 8 30 18 16 5

I don’t really understand the meaning of the function [f,x] = ecdf(y) built into matlab, since I estimate and plot an empirical distribution function,

enter image description here

however, it is clearly not correct, if you build a histogram based on the selected data (x and y), then the resulting ECDF does not describe the correctly chosen distribution.

enter image description here

Therefore, such a question arose: how to construct correctly ECDF function from the table (empirical distribution function for x and having an array of relative frequencies)for the distribution function and from it directly estimate and plot cumulative distribution function?

My code for plot hist and ECDF:

%% data
y = [1; 2; 8; 30; 18; 16; 5];
x = [150; 250; 350; 450; 550; 650; 750];
%% hist and polygon
figure(1)
bar(x,y,'LineWidth',1,...
    'FaceColor',[0.0745098039215686 0.623529411764706 1],...
    'EdgeColor',[0.149019607843137 0.149019607843137 0.149019607843137],...
    'BarWidth',1,...
    'BarLayout','stacked');
hold on
plot(x,y,'-o','Color','red','LineWidth',1)
hold off

%% ecdf
[ff,x] = ecdf(y);
x_e = [0;x];
figure(2)
stairs(x_e,ff,'Marker','o','LineWidth',1,'Color',[0.0745098039215686 0.623529411764706 1]);
set(gca,'GridAlpha',0.25,'GridLineStyle','--','MinorGridLineStyle','--',...
    'XGrid','on','XMinorGrid','on','YGrid','on');
xlim([0 780]);

Solution

  • You should not use the ecdf function, because it takes the data values as input. Your inputs, on the other hand, seem to be the population values and their absolute frequencies. So you only need to

    When plotting, I suggest you include some initial and final population values with respective normalized frequencies 0 and 1 for a clearer graph.

    x = [150; 250; 350; 450; 550; 650; 750];
    y = [1; 2; 8; 30; 18; 16; 5]; % example data
    cdf = cumsum(y./sum(y)); % normalize, then compute cumulative sum
    stairs([100; x; 900], [0; cdf; 1], 'linewidth', .8), grid on % note two extra values
    

    enter image description here