algorithmsortingoptimizationmathematical-optimizationpareto-optimality

Pareto boundary (set): Order of an algorithm


I have to carry out a challenge that involves the elaboration of an algorithm to compute the Pareto (set) boundary. The statement is basically:

Given a set S of n points in the square [0,1] x [0,1], make an algorithm to determine the subset P contained in S, formed by the non-dominated points of S.

It is also said that it is easy to elaborate an algorithm of the order n*n point comparisons that accomplish this purpose. Well I came up with an algorithm by researching here and there. The challenge is still to implement an algorithm of the order n*log(n). How do I get the order of these algorithms?

Thanks in advance!

#data
set.seed(103)
x = runif(200)
y = runif(200)

#algorithm
pareto = 1:length(x)
for(i in 1:length(x)){
    cond1 = y[i]!=min(y[which(x==x[i])])
    cond2 = x[i]!=min(x[which(y==y[i])])
    for(k in 1:length(x)){
        if((x[i]>x[k]  &  y[i]>y[k]) | (x[i]==x[k] & cond1) | (y[i]==y[k] & cond2)){
            pareto[i] = NA
            break
        }
    }
}
xPareto = x[pareto[!is.na(pareto)]]
yPareto = y[pareto[!is.na(pareto)]]

#graphic:
plot(x, y)
points(xPareto, yPareto, col=2, pch=16)
dat = data.frame(x=xPareto, y=yPareto)
dat = dat[order(dat$x),]
for(i in 1:(nrow(dat)-1)){
    segments(dat$x[i], dat$y[i], dat$x[i+1], dat$y[i+1], col=2, lty=2)
}

enter image description here


Solution

  • The intuition behind the efficient greedy solution to this problem lies in the fact that a point i is dominated by point j iff x[i] > x[j] and y[i] > y[j], which implies that j must come before i when the points are ordered by either coordinate. Hence, if we traverse the points in increasing order of their x-coordinates, then the point j (if any) that dominates point i must have been traversed before point i is traversed. In other words, it is impossible for the dominating point j to come after the dominated point i in this ordering.

    Thus, with this traversal order the domination problem (i.e. checking if a point is dominated by some other point) boils down to checking if we have already seen a point with a lower y-coordinate as the traversal order already enforces the x-coordinate condition. This can easily be done by checking each point's y-coordinate to the lowest (minimum) y-coordinate we have seen so far -- if the minimum y-coordinate is less than the current point i's y-coordinate then the point j with the minimum y-coordinate dominates i as x[j] < x[i] because j was seen before i.

    Sorting by the x-coordinate takes O(n log n) time and checking each point (while maintaining the partial minimum y-coordinate) takes O(n) time, making the entire algorithm take O(n log n) time.

    o = order(x)
    minY = Inf
    
    pareto = 1:n
    for(j in 1:n){
        i = o[j]
        if (y[i] <= minY) {
            # Part of pareto boundary
            minY = y[i]
        } else {
            # Dominated by some point in pareto boundary
            pareto[i] = NA
        }
    }