matlab2dpolygoncurve-fitting

Fitting a polygon in MATLAB


Problem definition

I'm attempting to reconstruct a non-convex and non-self-intersecting polygonal line (which I call the target line) from a noisy dataset (see the figure below). Therefore, my objective is to learn the positions of the vertices of the target line from the dataset. I refer to the estimated polygonal line as the polygonal line.

The target line is assumed to be normalized, i.e., contained in the intervals [-1,1] (horizontal direction) and [0,1] (vertical direction). Moreover, the target line starts from the point [1 0]' and ends at a point crossing the horizontal axis of the reference system.

The main challenges of this problem are the following:

I'm able to obtain a rough estimate with an algorithm that is quite fast (even though it's not optimized), so I'm seeking suggestions to enhance it regarding some critical issues. I use the following conventions:

My algorithm takes as input D, N, and returns hatV.


My algorithm (overview)

function hatV = staticShaper(D, N)
    % step 0: unfrozen initialization 
    hatV   = 1.5 * [cos(linspace(0, pi, N)); sin(linspace(0, pi, N))];
    freeze = zeros(1, N);

    for iter = 1 : 3
        % step 1: project the polygonal line over the dataset
        idx = projectLine(hatV, D);
       
        % step 2: update the polygonal line
        hatV = updateLine(hatV, D, idx, freeze);

        % step 3: simplify the polygonal line
        [hatV, Ndeficit] = simplifyLine(hatV);

        % step 4: interpolate and freeze the polygonal line
        [hatV, freeze] = interpolateLine(hatV, Ndeficit);
    end
end

Questions

enter image description here This picture depicts the four different steps of my algorithm over three iterations, considering N = 50. The dots represent the unfrozen vertices, while the circled dots indicate the frozen vertices. The black lines connect the unfrozen vertices with their nearest observations in the dataset.

enter image description here

My main concern is about how to improve the accuracy of the result. There are two main issues with my solution.

So my first question is: how do I solve these two issues? In addition to this question, I have the following one: how do I prove (or disprove) that my algorithm generates a polygonal line that does not intersect with itself?

My algorithm (more details)

function idx =  projectLine(hatV, D)
    idx = NaN(1, size(hatV, 2));
    for i = 1 : size(hatV, 2)
        [~, idx(i)] = min(vecnorm(hatV(:, i) - D));
    end
end

For each vertex hatV(:, i) I search for the nearest observation in D.


function hatV = updateLine(hatV, D, idx, freeze)
    for i = 1 : size(hatV, 2)
        if not(freeze(i))
            hatV(:, i) = D(:, idx(i));
        end
    end

    hatV(:, 1)   = [1 0]';
    hatV(2, end) = 0;
end

I place each unfrozen vertex hatV(:, i) over its nearest observation D(:, idx(i)). After that, I ensure that the first vertex hatV(:, 1) is at [1 0]', and I confirm that the final vertex hatV(:, end) has no vertical component. This is because I know that the starting point of the target line is [1 0]' and that the target line ends on the horizontal axis of the reference system.


function [hatV, Ndeficit] = simplifyLine(hatV)
    oldHatV    = hatV; % backup 
    k          = 1;
    hatV       = [];
    hatV(:, 1) = oldHatV(:, 1);
    for i = 1 : size(oldHatV, 2)
        if norm(oldHatV(:, i) - hatV(:, k)) < 0.01
            hatV(:, k) = 0.5 * (hatV(:, k) + oldHatV(:, i));
        else
            k          = k + 1;
            hatV(:, k) = oldHatV(:, i);
        end
    end
    Ndeficit = size(oldHatV, 2) - size(hatV, 2);
end

I have to say that this apparently simple task takes me some time, and I'm not even sure if this is a good implementation. However, here I aim to eliminate identical or similar vertices. To achieve this, I examine the distances between vertices. If the distance is small enough, I perform a merging operation by taking the mean value. In the end, I count how many vertices are lost during these operations.


function [hatV, freeze] = interpolateLine(hatV, Ndeficit)
    % step 1: normalized edge lengths computation
    ell = NaN(1, size(hatV, 2) - 1);
    for i = 1 : size(hatV, 2) - 1
        ell(i) = norm(hatV(:, i + 1) - hatV(:, i));
    end
    ell = ell / sum(ell);
    
    % step 2: new vertices generation
    newVertContainer = cell(1, size(hatV, 2) - 1);
    for i = 1 : size(hatV, 2) - 1
        nrNewVertices       = round(Ndeficit * ell(i)); 
        alpha               = 0;
        newVertContainer{i} = [];
        for j = 1 : nrNewVertices
            newVertex           = (1 - alpha) * hatV(:, i) + alpha * hatV(:, i + 1);
            newVertContainer{i} = cat(2, newVertContainer{i}, newVertex);
            alpha               = alpha + 1 / nrNewVertices;
        end
    end
    
    % step 3: output definition
    oldHatV = hatV;
    hatV    = oldHatV(:, 1);
    freeze  = 1;
    for i = 1 : size(oldHatV, 2) - 1
        hatV   = cat(2,   hatV,   newVertContainer{i}, oldHatV(:, i + 1) );
        freeze = cat(2, freeze, zeros(1, size(newVertContainer{i}, 2)), 1);
    end
end

The idea is to insert new vertices 'uniformly' along the current polygonal line. To achieve this, I add a number of new vertices to each edge of the polygonal line that is proportional to the edge length.

The first step is to compute the length of each edge of the estimated polygonal line and normalize the results. If edge i has a normalized length ell(i), then the number of new vertices generated between the two old vertices defining the edge is round(Ndeficit * ell(i)).

Clearly, this strategy does not guarantee that the polygonal line will have exactly N vertices in the end, but this issue is not a big problem at the moment (I will fix it later).

In step 2, I generate the new vertices, which are temporarily stored in newVertContainer. Finally, in step 3, I define the updated polygonal line where the old vertices are frozen, and the new vertices are unfrozen.


Solution

  • k = boundary(D(1,:).',D(2,:).',1);
    bound = [D(1,k).',D(2,k).'];
    

    (The third input argument in this call of MATLAB's boundary function is an optional argument that is used to "tighten" the boundary around the dataset if it is close to 1)

    The code runs in around 20ms on my machine and is pretty good at giving you both the upper and lower boundaries of the dataset:

    scatter(D(1,:).',D(2,:).');hold on
    plot(bound(:,1),bound(:,2),'--k','linewidth',2)
    

    enter image description here


    EDIT: Here are a few more steps towards a working solution:

    % Start from the point closest to [1,0]
    [~,idx]=min(sqrt((bound(:,1)-1).^2+bound(:,2).^2));
    if idx>1
    
        bound = [bound(idx:end,:);bound(1:(idx-1),:)];
    
    end
    
    % Get the index of the next intersection with the horizontal axis
    % (exclude a few points before and after the initial point)
    npts_exc = 6;
    
    [~,idx2]=min(bound((npts_exc):(end-npts_exc),2));
    
    bound1 = bound(1:(npts_exc+idx2-1),:);
    bound2 = bound((npts_exc+idx2-1):end,:);
    
    % Follow curve 1. for each point on curve 1 get the closest one on
    % curve 2 and then construct the mid point
    bound_avg = zeros(size(bound1));
    
    for ii=1:size(bound1,1)
        
        [~,idx_min]=min(sqrt((bound2(:,1)-bound1(ii,1)).^2+(bound2(:,2)-bound1(ii,2)).^2));
    
        bound_avg(ii,:) = 0.5*(bound1(ii,:)+bound2(idx_min,:));
    
    end
    

    This still runs in around 30ms total. It gives the following results:

    scatter(D(1,:).',D(2,:).');hold on
    plot(bound1(:,1),bound1(:,2),'--k','linewidth',1)
    plot(bound2(:,1),bound2(:,2),'--r','linewidth',1)
    plot(bound_avg(:,1),bound_avg(:,2),'-m','linewidth',2)
    

    enter image description here

    The last thing to do is to resample the curve to get the right length of N points.

    EDIT2 : Final solution

    % Resampling so that the N points on the curve are uniformally positionned
    % on the curve length
    N = 50;
    bound_res = zeros(N,2);
    % Keep endpoints
    bound_res(1,:) = bound_avg(1,:);
    bound_res(end,:) = bound_avg(end,:);
    
    % Cumulative curve length
    c_len = cumsum(sqrt((bound_avg(1:(end-1),1)-bound_avg(2:end,1)).^2+(bound_avg(1:(end-1),2)-bound_avg(2:end,2)).^2));
    % Curve length will have to be unique for interp1 to work
    [C,ia,~] = unique(c_len);
    
    % remove non unique points from bound_avg as well
    bound_avg_un = bound_avg(ia+1,:);
    
    % Target cumulative curve length for uniform distribution
    step_size = c_len(end)/(N-1);
    target_lengths = step_size*(1:(N-1));
    
    % Interpolate on the target lengths
    bound_res(2:(end-1),1) = interp1(C,bound_avg_un(:,1),target_lengths(1:(end-1)));
    bound_res(2:(end-1),2) = interp1(C,bound_avg_un(:,2),target_lengths(1:(end-1)));
    

    enter image description here