c++algorithmcomputational-geometryeuclidean-distance

Efficient Algorithm for Minimum Euclidean Distance Between Points in Non-Overlapping Regions in a 2D Array


Problem Backbround

Problem

I'm looking for an algorithm to determine the two co-ordinates A and B that have the minimum Euclidean distance, where A is a member of RegionA and B is a member of RegionB.

The brute force method has a time complexity of O(mn).

Is there a more time-efficient algorithm (preferably better than O(mn)) for solving this problem?

The name + link, code, or description of one instance of such an algorithm would be greatly appreciated and will be accepted as an answer.

C++ Code to Create Two Random Regions A and B

The region generation code is immaterial to the problem being asked. and I am only interested in how close the regions are to each other. In the project I am working on I used cellular automata to create the regions. I can include the code for this if necessary. The Region construction procedure exists only to create relevant examples.

#include <cstdlib>
#include <cstring>
#include <vector>
#include <cstdio>
#include <map>
#include <ctime>
#include <cmath>

constexpr int Width = 150;
constexpr int Height = 37;
bool Map[Height][Width];

struct coord {
    int X, Y;
    coord(int X, int Y): X(X), Y(Y) {}
    bool operator==(const coord& Other) {
        return Other.X == X && Other.Y == Y;
    }
    bool IsAdjacentTo(const coord& Other){
        int ManhattanDistance = abs(X - Other.X) + abs(Y - Other.Y);
        return ManhattanDistance == 1;
    }
    bool IsOnMapEdge(){
        return X == 0 || Y == 0 || X == Width - 1 || Y == Height - 1;
    }
};

using region = std::vector<coord>;

void ClearMap(){
    std::memset(Map, 0, sizeof Map);
}
bool& GetMapBool(coord Coord) {
    return Map[Coord.Y][Coord.X];
}

int RandInt(int LowerBound, int UpperBound){
    return std::rand() % (UpperBound - LowerBound + 1) + LowerBound;
}

region CreateRegion(){
    std::puts("Creating Region");
    ClearMap();
    region Region;
    int RegionSize = RandInt(50, Width * Height / 2);
    Region.reserve(RegionSize);
    while (true){
        Region.emplace_back(
            RandInt(1, Width - 1), 
            RandInt(1, Height - 1)
        );
        if (!Region[0].IsOnMapEdge())
            break;
        else 
            Region.pop_back();
    }
    GetMapBool(Region[0]) = 1;
    while (Region.size() != RegionSize){
        coord Member = Region[RandInt(0, Region.size() - 1)];
        coord NeighbourToMakeAMember {
            RandInt(-1, 1) + Member.X,
            RandInt(-1, 1) + Member.Y
        };
        if (!Member.IsOnMapEdge() && !GetMapBool(NeighbourToMakeAMember) && Member.IsAdjacentTo(NeighbourToMakeAMember)){
            GetMapBool(NeighbourToMakeAMember) = 1;
            Region.push_back(NeighbourToMakeAMember);
        };
    }
    std::puts("Created Region");
    return Region;      
}

std::pair<region, region> CreateTwoRegions(){
    bool RegionsOverlap;
    std::pair<region, region> Regions;
    do {
        Regions.first = CreateRegion();
        Regions.second = CreateRegion();
        ClearMap();
        for (coord Member : Regions.first){
            GetMapBool(Member) = 1;
        }
        RegionsOverlap = 0;
        for (coord Member : Regions.second){
            if (GetMapBool(Member)){
                ClearMap();
                std::puts("Regions Overlap/Touch");
                RegionsOverlap = 1;
                break;
            } else {
                GetMapBool(Member) = 1;
            }
        }
    } while (RegionsOverlap);
    return Regions;
}

void DisplayMap(){
    for (int Y = 0; Y < Height; ++Y){
        for (int X = 0; X < Width; ++X)
            std::printf("%c", (Map[Y][X] ? '1' : '-'));
        std::puts("");
    }
}

int main(){
    int Seed = time(NULL);
    std::srand(Seed);
    ClearMap();
    auto[RegionA, RegionB] = CreateTwoRegions();
    DisplayMap();
}

Problem Illustration

What are the co-ordinates of the points A and B that form the minimum Euclidean distance between the two regions?

What are the co-ordinates of the points A and B that form the minimum Euclidean distance between the two regions?


Solution

  • Notes on finding the nearest points between two regions

    As others have mentioned, you can improve on finding the closest points between the two regions of n and m points by first filtering each region's points to only those which border unassigned cells and then doing an all-pairs search between the boundaries. For a roughly rectangular region this would take O(m + n + 16 √m √n) time: two linear filters and then the all-pairs search. I've implemented an example of this search below.

    But this will not save you any time if your regions are shaped like this:

    ########
    #      #
    # #### #
    # #  # #
    # # ## #
    # #    #
    # ######
    

    If you are using larger datasets and can tolerate preprocessing you can get further acceleration using a 2D k-d tree.

    Construction takes O(n log n) time after which you can search the tree in O(log n) time for m queries. So the total time to find the closest points is then O(m log n + n log n) or O(n log m + m log m).

    You can do some benchmarking to determine if it is better to build the k-d tree for the larger or the smaller of the regions. Remember that this answer may change depending on how many times you're able to reuse the same k-d tree for queries against different regions.

    For small datasets you'll probably find that the all-pairs comparison is faster because it has much greater cache locality.

    Note that implementing a k-d tree is tricky, so you'll probably want to use an existing library or definitely use test cases if you're rolling your own.

    Another algorithm

    If we happen to know that the regions are within a distance d of each other we can get an even faster algorithm.

    For one of the regions we can build a hash table where a reference to each member cell is stored at a hash location (c.y // d) * height + (c.x // d). To find the nearest neighbor to a query point we then do an all-pairs check against the points referenced in the associated "meta-cell" and as well as the meta-cell's neighbors.

    Notes on region building

    On the off-chance you're trying to grow the regions quickly, but their boundaries can overlap, this should do the trick.

    Note that your question can be improved by stating why you are trying to calculate a thing. If you want to know the closest-cell so you have a primitive to build regions more quickly, then reconsidering how region building works might be a better use of your time, and it's what I address here.

    If region generation is immaterial and you're only interested in how close they are to each other, then focus on defining, conceptually, what a region is and be very clear that your region construction procedure exists only to create relevant examples.

    The fundamental trick we'll use to accelerate region construction is to make a space-time tradeoff. Rather than storing the map as a set of boolean values, we'll store it as a set of 8-bit integers. We reserve a few of these integers for special purposes:

    Thus, the algorithm for building a region becomes simple:

    A few other parts of your code needed cleaning. Types are typically defined using Camel Case and variables use snake case. I prefer snake_case for functions as well. See a style guide for more opinions.

    Do not use global variables they are bad and, in the fullness of time, they will bring you nothing but sorrow.

    You're using C++, avoid using headers such as stdio, stdlib, and cstring. They provide the C way of doing things, which often come with disadvantageous such as diminished type safety.

    Any time your if statements have a lot of conditions, ask yourself whether they can be simplified by inverting one or more of the conditions (that is, switching is_good() to !is_good()) and doing an early exit. This can make your code much easier to read and reasonable about.

    Nice use of std::vector::reserve(). I have seen companies wasting >$100k/yr doing unnecessary vector reallocations.

    Ranged loops such as

    for (coord Member : Regions.second){
    

    should be written as

    for (const auto& member : Regions.second){
    

    auto is almost always better as a data type because it ensures you do the right thing and it makes it obvious to reviewers of your code that there are no tricks in play. const protects from unintended writes. auto& means "take a reference, don't make a copy". I've seen companies spending >$10M/year because they forgot about using & to use references in for-loops.

    Source Code

    #include <array>
    #include <iostream>
    #include <optional>
    #include <random>
    #include <string>
    #include <tuple>
    #include <vector>
    
    constexpr int REGION_SEED_TRIES = 100;
    constexpr int NEW_MEMBER_TRIES = 200;
    
    constexpr std::array<char, 10> REGION_SYMBOLS =
        {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
    
    struct coord {
      int x;
      int y;
      coord(int x, int y) : x(x), y(y) {}
      bool adjacent(const coord& o) const {
        return std::abs(x - o.x) + std::abs(y - o.y) == 1;
      }
    };
    
    // Methods written in inadvisable one-line format for ease of display on
    // StackOverflow
    struct Map {
      using CellType = uint8_t;
      static constexpr int UNASSIGNED = 0;
      static constexpr int BUFFER = 1;
      int free_label = 2;
      std::vector<CellType> data;
      int m_width;
      int m_height;
    
      Map(int width, int height) {
        data.resize(width * height, UNASSIGNED);
        m_width = width;
        m_height = height;
      }
    
      CellType& operator[](const coord& c) {
        return data.at(c.y * m_width + c.x);
      }
      CellType operator[](const coord& c) const {
        return data.at(c.y * m_width + c.x);
      }
    
      void clear() {
        std::fill(data.begin(), data.end(), false);
      }
      int get_free_label() {
        return free_label++;
      }
      int width() const {
        return m_width;
      }
      int height() const {
        return m_height;
      }
    
      bool on_edge(const coord& c) const {
        return c.x == 0 || c.y == 0 || c.x == m_width - 1 || c.y == m_height - 1;
      }
    
      bool in_body(const coord& c) const {
        return c.x >= 0 && c.y >= 0 && c.x < m_width && c.y < m_height;
      }
    
      bool neighbor_is_buffer(const coord& c) const {
        for (int ny = c.y - 1; ny <= c.y + 1; ++ny) {
          for (int nx = c.x - 1; nx <= c.x + 1; ++nx) {
            const coord n{nx, ny};
            if (!in_body(n)) {
              continue;
            }
            if (data[n.y * m_width + n.x] == BUFFER) {
              return true;
            }
          }
        }
        return false;
      }
    };
    
    using RegionBoundary = std::vector<coord>;
    
    int rand_int(const int lb, const int ub, std::mt19937& prng) {
      return std::uniform_int_distribution<>(lb, ub)(prng);
    }
    
    void create_region(
        Map& map,
        std::mt19937& prng,
        const int distance_to_neighboring_regions) {
      const auto region_label = map.get_free_label();
    
      const auto region_size = rand_int(50, map.width() * map.height() / 2, prng);
      std::vector<coord> region_members;
      region_members.reserve(region_size);
    
      // Seed the region
      for (int i = 0; i < REGION_SEED_TRIES; i++) {
        const int x = rand_int(1, map.width() - 2, prng);
        const int y = rand_int(1, map.height() - 2, prng);
        if (map[{x, y}] == Map::UNASSIGNED) {
          region_members.emplace_back(x, y);
          break;
        }
      }
      if (region_members.empty()) {
        throw std::runtime_error(
            "Unable to find a seed location in a reasonable amount of time");
      }
    
      // Grow the region
      for (int tries = 0;
           tries < NEW_MEMBER_TRIES && region_members.size() < region_size;
           tries++) {
        // Choose a random member of the region to grow from
        const auto& member =
            region_members.at(rand_int(0, region_members.size() - 1, prng));
        // Choose a random neighbour of the member
        const coord neighbour(
            member.x + rand_int(-1, 1, prng), member.y + rand_int(-1, 1, prng));
        // Early exits so that the logic of the function is easier to follow
        if (!neighbour.adjacent(member)) {
          continue;
        }
        if (map.on_edge(neighbour)) {
          continue;
        }
        if (map[neighbour] != Map::UNASSIGNED) {
          continue;
        }
        // Neighbor is a valid candidate for expansion, so add it to the region
        map[neighbour] = region_label;
        region_members.push_back(neighbour);
        tries = 0; // Reset the number of tries
      }
    
      // Mark cells adjacent to the region so regions don't grow into each other
    
      // We grow outwards from the region converting UNASSIGNED cells to BUFFER
      // cells and we repeat this several times, if necessary, to ensure an adequate
      // buffer between regions
      for (int buffer = 0; buffer < distance_to_neighboring_regions; buffer++) {
        for (const auto& member : region_members) {
          // We will include all 8 cells around the focal cell
          for (int nx = member.x - 1; nx <= member.x + 1; ++nx) {
            for (int ny = member.y - 1; ny <= member.y + 1; ++ny) {
              const coord n{nx, ny};
              if (!map.in_body(n)) {
                continue;
              }
              if (map[n] == Map::UNASSIGNED) {
                map[n] = Map::BUFFER;
                // Add buffer cell to the region so that we can continue to grow the
                // buffer between regions if we need to
                region_members.push_back(n);
              }
            }
          }
        }
      }
    }
    
    RegionBoundary get_region_boundary(const Map& map, const int region_label) {
      // Iterate over all cells in the map considering only those that are
      // of type `region_label` and are adjacent to a cell of type `Map::BUFFER`
      RegionBoundary boundary;
      for (int y = 0; y < map.height(); ++y) {
        for (int x = 0; x < map.width(); ++x) {
          const auto type = map[{x, y}];
          if (type == region_label && map.neighbor_is_buffer({x, y})) {
            boundary.emplace_back(x, y);
          }
        }
      }
      return boundary;
    }
    
    void display_map(const Map& map, const std::vector<coord>& highlight) {
      for (int y = 0; y < map.height(); ++y) {
        for (int x = 0; x < map.width(); ++x) {
          const coord c = {x, y};
          const auto type = map[c];
          const char symbol = [&]() {
            if (std::any_of(highlight.begin(), highlight.end(), [&](const auto& x) {
                  return x.x == c.x && x.y == c.y;
                })) {
              return '!';
            }
            if (type == Map::UNASSIGNED || type == Map::BUFFER) {
              return ' ';
            } else {
              return REGION_SYMBOLS.at((type - 2) % REGION_SYMBOLS.size());
            }
          }();
          std::cout << symbol;
        }
        std::cout << "\n";
      }
    }
    
    std::optional<std::tuple<coord, coord, double>> closest_points_on_boundary(
        const RegionBoundary& boundary1,
        const RegionBoundary& boundary2) {
      std::optional<std::tuple<coord, coord, double>> closest_points;
      for (const auto& p1 : boundary1) {
        for (const auto& p2 : boundary2) {
          const auto distance = std::pow(p1.x - p2.x, 2) + std::pow(p1.y - p2.y, 2);
          if (!closest_points || distance < std::get<2>(*closest_points)) {
            closest_points = {p1, p2, distance};
          }
        }
      }
      return closest_points;
    }
    
    int main(int argc, char** argv) {
      std::random_device rd;
      std::mt19937 prng(rd());
    
      if (argc != 3) {
        std::cerr << "Usage: " << argv[0] << " width height\n";
        return 1;
      }
    
      const int width = std::stoi(argv[1]);
      const int height = std::stoi(argv[2]);
    
      Map map(width, height);
    
      create_region(map, prng, 5);
      create_region(map, prng, 5);
    
      const auto boundary2 = get_region_boundary(map, 2);
      const auto boundary3 = get_region_boundary(map, 3);
      const auto closest = closest_points_on_boundary(boundary2, boundary3);
      if (closest) {
        display_map(map, {std::get<0>(*closest), std::get<1>(*closest)});
      }
    
      return 0;
    }
    
    

    Producing output like this

         00             
      000000000         
      00 00000000       
      00000000000       
     00000000000        
     0000000000000      
     000000000000       
     0000000000         
     000000 000         
     000000000          
      000 0!00          
                        
                    111 
                    111 
                   1111 
                   1111 
           !11111111111 
           111111111111