I have to carry out a challenge that involves the elaboration of an algorithm to compute the Pareto (set) boundary. The statement is basically:
Given a set S of n points in the square [0,1] x [0,1], make an algorithm to determine the subset P contained in S, formed by the non-dominated points of S.
It is also said that it is easy to elaborate an algorithm of the order n*n point comparisons that accomplish this purpose. Well I came up with an algorithm by researching here and there. The challenge is still to implement an algorithm of the order n*log(n). How do I get the order of these algorithms?
Thanks in advance!
#data
set.seed(103)
x = runif(200)
y = runif(200)
#algorithm
pareto = 1:length(x)
for(i in 1:length(x)){
cond1 = y[i]!=min(y[which(x==x[i])])
cond2 = x[i]!=min(x[which(y==y[i])])
for(k in 1:length(x)){
if((x[i]>x[k] & y[i]>y[k]) | (x[i]==x[k] & cond1) | (y[i]==y[k] & cond2)){
pareto[i] = NA
break
}
}
}
xPareto = x[pareto[!is.na(pareto)]]
yPareto = y[pareto[!is.na(pareto)]]
#graphic:
plot(x, y)
points(xPareto, yPareto, col=2, pch=16)
dat = data.frame(x=xPareto, y=yPareto)
dat = dat[order(dat$x),]
for(i in 1:(nrow(dat)-1)){
segments(dat$x[i], dat$y[i], dat$x[i+1], dat$y[i+1], col=2, lty=2)
}
The intuition behind the efficient greedy solution to this problem lies in the fact that a point i
is dominated by point j
iff x[i] > x[j]
and y[i] > y[j]
, which implies that j
must come before i
when the points are ordered by either coordinate. Hence, if we traverse the points in increasing order of their x-coordinates, then the point j
(if any) that dominates point i
must have been traversed before point i
is traversed. In other words, it is impossible for the dominating point j
to come after the dominated point i
in this ordering.
Thus, with this traversal order the domination problem (i.e. checking if a point is dominated by some other point) boils down to checking if we have already seen a point with a lower y-coordinate as the traversal order already enforces the x-coordinate condition. This can easily be done by checking each point's y-coordinate to the lowest (minimum) y-coordinate we have seen so far -- if the minimum y-coordinate is less than the current point i
's y-coordinate then the point j
with the minimum y-coordinate dominates i
as x[j] < x[i]
because j
was seen before i
.
Sorting by the x-coordinate takes O(n log n)
time and checking each point (while maintaining the partial minimum y-coordinate) takes O(n)
time, making the entire algorithm take O(n log n)
time.
o = order(x)
minY = Inf
pareto = 1:n
for(j in 1:n){
i = o[j]
if (y[i] <= minY) {
# Part of pareto boundary
minY = y[i]
} else {
# Dominated by some point in pareto boundary
pareto[i] = NA
}
}