pythonscatter

Find the best couples of values from a scatter plot


I have a list of thousands of couples of float values (reward, risk).

I want to extract the top couples, i.e. best reward with lowest risk.

Note to financial experts: it is a bit similar to an efficient frontier, but there is neither mean nor standard deviation. A sample of my data points with a representation of the cloud:

import numpy as np
import matplotlib.pyplot as plt
# first value is reward, second is risk
cloud = np.array([[1,2],[4,3],[5.5,2.3],[4,2],[3,3],[.9,1.9],[4,3],[4,3.2],[3,2.2],[2,2.6]])
plt.scatter(cloud[:,1], cloud[:, 0])
plt.xlabel("risk")
plt.ylabel("reward")

I expect an array with [.9, .9], [4, 2] and [5.5, 2.3]

I can do it with a loop, but it is not smart and may be not efficient...


Solution

  • I wrote a first attempt, not sure it is the best one.

    If it can help anybody or be improved when dealing with large cloud of points:

    import numpy as np
    import matplotlib.pyplot as plt
    # first value is reward, second is risk
    cloud = np.array([[1,2],[4,3],[5.5,2.3],[4,2],[3,3],[.9,1.9],[.9,1.9], [4,3],[4,3.2],[3,2.2],[5.5,2.3],[2,2.6]])
    
    def extract_border(cloud):
        """ Extract all couples of points where first value is the highest and second value is the lowest """
        # if some couples are similar, only one is recorded
        # function takes the cloud of points and returns the border array
        # initial cloud is left unchanged as we use a local version of it in the function
        if cloud.shape[0] == 0:  # cloud is empty
            border = []
        else:
            border = np.zeros((cloud.shape))
            for i in range(cloud.shape[0]):  # all points may be best couples
                if cloud.shape[0] > 0:  # some points are still remaining
                    idx_max = np.argmax(cloud[:, 0])
                    border[i, :] = cloud[idx_max, :]  # record the current best couple
                    cloud = np.squeeze(cloud[np.where(cloud[:, 1] < cloud[idx_max, 1]), :], axis=0)  # remove all bad couples
                else:  # no more points remaing in the cloud
                    break
            border = border[:i, :]  # reduce the border size to only valid couples
        return border
    border = extract_border(cloud)
    print(f"final border: \n reward   risk \n {border}")