c++algorithm computational-geometry euclidean-distance

Efficient algorithm for minimum Euclidean distance between points in non-overlapping regions in a 2D array

Problem background

I have a 2D array Map[Height][Width] that stores a bool to represent each cell.
There exists two non-overlapping regions RegionA and RegionB.
Each region is a list of unique adjacent integer co-ordinates.
The number of co-ordinates in RegionA and RegionB are m and n respectively
A co-ordinate in RegionA will never be horizontally or vertically adjacent to a coordinate in RegionB (i.e. the regions do not touch)
The bounding box of RegionA could overlap with the bounding box of RegionB
RegionA and RegionB may have holes
RegionA may or may not surround RegionB (and vise versa)
if a co-ordinate (X, Y) is part of a region then Map[Y][X] == 1, otherwise its zero.

Problem

I'm looking for an algorithm to determine the two co-ordinates A and B that have the minimum Euclidean distance, where A is a member of RegionA and B is a member of RegionB.

The brute force method has a time complexity of O(mn).

Is there a more time-efficient algorithm (preferably better than O(mn)) for solving this problem?

The name + link, code, or description of one instance of such an algorithm would be greatly appreciated and will be accepted as an answer.

C++ code to create two random regions A and B

The region generation code is immaterial to the problem being asked. and I am only interested in how close the regions are to each other. In the project I am working on I used cellular automata to create the regions. I can include the code for this if necessary. The Region construction procedure exists only to create relevant examples.

#include <cstdlib>
#include <cstring>
#include <vector>
#include <cstdio>
#include <map>
#include <ctime>
#include <cmath>

constexpr int Width = 150;
constexpr int Height = 37;
bool Map[Height][Width];

struct coord {
    int X, Y;
    coord(int X, int Y): X(X), Y(Y) {}
    bool operator==(const coord& Other) {
        return Other.X == X && Other.Y == Y;
    }
    bool IsAdjacentTo(const coord& Other){
        int ManhattanDistance = abs(X - Other.X) + abs(Y - Other.Y);
        return ManhattanDistance == 1;
    }
    bool IsOnMapEdge(){
        return X == 0 || Y == 0 || X == Width - 1 || Y == Height - 1;
    }
};

using region = std::vector<coord>;

void ClearMap(){
    std::memset(Map, 0, sizeof Map);
}
bool& GetMapBool(coord Coord) {
    return Map[Coord.Y][Coord.X];
}

int RandInt(int LowerBound, int UpperBound){
    return std::rand() % (UpperBound - LowerBound + 1) + LowerBound;
}

region CreateRegion(){
    std::puts("Creating Region");
    ClearMap();
    region Region;
    int RegionSize = RandInt(50, Width * Height / 2);
    Region.reserve(RegionSize);
    while (true){
        Region.emplace_back(
            RandInt(1, Width - 1), 
            RandInt(1, Height - 1)
        );
        if (!Region[0].IsOnMapEdge())
            break;
        else 
            Region.pop_back();
    }
    GetMapBool(Region[0]) = 1;
    while (Region.size() != RegionSize){
        coord Member = Region[RandInt(0, Region.size() - 1)];
        coord NeighbourToMakeAMember {
            RandInt(-1, 1) + Member.X,
            RandInt(-1, 1) + Member.Y
        };
        if (!Member.IsOnMapEdge() && !GetMapBool(NeighbourToMakeAMember) && Member.IsAdjacentTo(NeighbourToMakeAMember)){
            GetMapBool(NeighbourToMakeAMember) = 1;
            Region.push_back(NeighbourToMakeAMember);
        };
    }
    std::puts("Created Region");
    return Region;      
}

std::pair<region, region> CreateTwoRegions(){
    bool RegionsOverlap;
    std::pair<region, region> Regions;
    do {
        Regions.first = CreateRegion();
        Regions.second = CreateRegion();
        ClearMap();
        for (coord Member : Regions.first){
            GetMapBool(Member) = 1;
        }
        RegionsOverlap = 0;
        for (coord Member : Regions.second){
            if (GetMapBool(Member)){
                ClearMap();
                std::puts("Regions Overlap/Touch");
                RegionsOverlap = 1;
                break;
            } else {
                GetMapBool(Member) = 1;
            }
        }
    } while (RegionsOverlap);
    return Regions;
}

void DisplayMap(){
    for (int Y = 0; Y < Height; ++Y){
        for (int X = 0; X < Width; ++X)
            std::printf("%c", (Map[Y][X] ? '1' : '-'));
        std::puts("");
    }
}

int main(){
    int Seed = time(NULL);
    std::srand(Seed);
    ClearMap();
    auto[RegionA, RegionB] = CreateTwoRegions();
    DisplayMap();
}

Problem illustration

What are the co-ordinates of the points A and B that form the minimum Euclidean distance between the two regions?

Solution

Notes on finding the nearest points between two regions

As others have mentioned, you can improve on finding the closest points between the two regions of n and m points by first filtering each region's points to only those which border unassigned cells and then doing an all-pairs search between the boundaries. For a roughly rectangular region this would take O(m + n + 16 √m √n) time: two linear filters and then the all-pairs search. I've implemented an example of this search below.

But this will not save you any time if your regions are shaped like this:

########
#      #
# #### #
# #  # #
# # ## #
# #    #
# ######

If you are using larger datasets and can tolerate preprocessing you can get further acceleration using a 2D k-d tree.

Construction takes O(n log n) time after which you can search the tree in O(log n) time for m queries. So the total time to find the closest points is then O(m log n + n log n) or O(n log m + m log m).

You can do some benchmarking to determine if it is better to build the k-d tree for the larger or the smaller of the regions. Remember that this answer may change depending on how many times you're able to reuse the same k-d tree for queries against different regions.

For small datasets you'll probably find that the all-pairs comparison is faster because it has much greater cache locality.

Note that implementing a k-d tree is tricky, so you'll probably want to use an existing library or definitely use test cases if you're rolling your own.

Another algorithm

If we happen to know that the regions are within a distance d of each other we can get an even faster algorithm.

For one of the regions we can build a hash table where a reference to each member cell is stored at a hash location (c.y // d) * height + (c.x // d). To find the nearest neighbor to a query point we then do an all-pairs check against the points referenced in the associated "meta-cell" and as well as the meta-cell's neighbors.

Notes on region building

On the off-chance you're trying to grow the regions quickly, but their boundaries can overlap, this should do the trick.

Note that your question can be improved by stating why you are trying to calculate a thing. If you want to know the closest-cell so you have a primitive to build regions more quickly, then reconsidering how region building works might be a better use of your time, and it's what I address here.

If region generation is immaterial and you're only interested in how close they are to each other, then focus on defining, conceptually, what a region is and be very clear that your region construction procedure exists only to create relevant examples.

The fundamental trick we'll use to accelerate region construction is to make a space-time tradeoff. Rather than storing the map as a set of boolean values, we'll store it as a set of 8-bit integers. We reserve a few of these integers for special purposes:

0 - An unassigned space that a region may be built on
1 - A buffer cell that isn't part of any region, but close to one. This is not available for building on.
2, 3, 4, ... - valid region labels.

Thus, the algorithm for building a region becomes simple:

Find a valid seed cell
Grow outward from the seed cell via 4-connected cells until the desired number of cells is reached
- A new Region Cell can only be placed in an Unassigned Space
Iterate over Region Cells marking their 8-connected neighbors as Buffer cells.
- Note that, if we add the new Buffer cells to the list of Region Cells and repeat the last step, we can grow arbitrarily large buffers between regions.

A few other parts of your code needed cleaning. Types are typically defined using Camel Case and variables use snake case. I prefer snake_case for functions as well. See a style guide for more opinions.

Do not use global variables they are bad and, in the fullness of time, they will bring you nothing but sorrow.

You're using C++, avoid using headers such as stdio, stdlib, and cstring. They provide the C way of doing things, which often come with disadvantageous such as diminished type safety.

Any time your if statements have a lot of conditions, ask yourself whether they can be simplified by inverting one or more of the conditions (that is, switching is_good() to !is_good()) and doing an early exit. This can make your code much easier to read and reasonable about.

Nice use of std::vector::reserve(). I have seen companies wasting >$100k/yr doing unnecessary vector reallocations.

Ranged loops such as

for (coord Member : Regions.second){

should be written as

for (const auto& member : Regions.second){

auto is almost always better as a data type because it ensures you do the right thing and it makes it obvious to reviewers of your code that there are no tricks in play. const protects from unintended writes. auto& means "take a reference, don't make a copy". I've seen companies spending >$10M/year because they forgot about using & to use references in for-loops.

Source Code

#include <array>
#include <iostream>
#include <optional>
#include <random>
#include <string>
#include <tuple>
#include <vector>

constexpr int REGION_SEED_TRIES = 100;
constexpr int NEW_MEMBER_TRIES = 200;

constexpr std::array<char, 10> REGION_SYMBOLS =
    {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};

struct coord {
  int x;
  int y;
  coord(int x, int y) : x(x), y(y) {}
  bool adjacent(const coord& o) const {
    return std::abs(x - o.x) + std::abs(y - o.y) == 1;
  }
};

// Methods written in inadvisable one-line format for ease of display on
// StackOverflow
struct Map {
  using CellType = uint8_t;
  static constexpr int UNASSIGNED = 0;
  static constexpr int BUFFER = 1;
  int free_label = 2;
  std::vector<CellType> data;
  int m_width;
  int m_height;

  Map(int width, int height) {
    data.resize(width * height, UNASSIGNED);
    m_width = width;
    m_height = height;
  }

  CellType& operator[](const coord& c) {
    return data.at(c.y * m_width + c.x);
  }
  CellType operator[](const coord& c) const {
    return data.at(c.y * m_width + c.x);
  }

  void clear() {
    std::fill(data.begin(), data.end(), false);
  }
  int get_free_label() {
    return free_label++;
  }
  int width() const {
    return m_width;
  }
  int height() const {
    return m_height;
  }

  bool on_edge(const coord& c) const {
    return c.x == 0 || c.y == 0 || c.x == m_width - 1 || c.y == m_height - 1;
  }

  bool in_body(const coord& c) const {
    return c.x >= 0 && c.y >= 0 && c.x < m_width && c.y < m_height;
  }

  bool neighbor_is_buffer(const coord& c) const {
    for (int ny = c.y - 1; ny <= c.y + 1; ++ny) {
      for (int nx = c.x - 1; nx <= c.x + 1; ++nx) {
        const coord n{nx, ny};
        if (!in_body(n)) {
          continue;
        }
        if (data[n.y * m_width + n.x] == BUFFER) {
          return true;
        }
      }
    }
    return false;
  }
};

using RegionBoundary = std::vector<coord>;

int rand_int(const int lb, const int ub, std::mt19937& prng) {
  return std::uniform_int_distribution<>(lb, ub)(prng);
}

void create_region(
    Map& map,
    std::mt19937& prng,
    const int distance_to_neighboring_regions) {
  const auto region_label = map.get_free_label();

  const auto region_size = rand_int(50, map.width() * map.height() / 2, prng);
  std::vector<coord> region_members;
  region_members.reserve(region_size);

  // Seed the region
  for (int i = 0; i < REGION_SEED_TRIES; i++) {
    const int x = rand_int(1, map.width() - 2, prng);
    const int y = rand_int(1, map.height() - 2, prng);
    if (map[{x, y}] == Map::UNASSIGNED) {
      region_members.emplace_back(x, y);
      break;
    }
  }
  if (region_members.empty()) {
    throw std::runtime_error(
        "Unable to find a seed location in a reasonable amount of time");
  }

  // Grow the region
  for (int tries = 0;
       tries < NEW_MEMBER_TRIES && region_members.size() < region_size;
       tries++) {
    // Choose a random member of the region to grow from
    const auto& member =
        region_members.at(rand_int(0, region_members.size() - 1, prng));
    // Choose a random neighbour of the member
    const coord neighbour(
        member.x + rand_int(-1, 1, prng), member.y + rand_int(-1, 1, prng));
    // Early exits so that the logic of the function is easier to follow
    if (!neighbour.adjacent(member)) {
      continue;
    }
    if (map.on_edge(neighbour)) {
      continue;
    }
    if (map[neighbour] != Map::UNASSIGNED) {
      continue;
    }
    // Neighbor is a valid candidate for expansion, so add it to the region
    map[neighbour] = region_label;
    region_members.push_back(neighbour);
    tries = 0; // Reset the number of tries
  }

  // Mark cells adjacent to the region so regions don't grow into each other

  // We grow outwards from the region converting UNASSIGNED cells to BUFFER
  // cells and we repeat this several times, if necessary, to ensure an adequate
  // buffer between regions
  for (int buffer = 0; buffer < distance_to_neighboring_regions; buffer++) {
    for (const auto& member : region_members) {
      // We will include all 8 cells around the focal cell
      for (int nx = member.x - 1; nx <= member.x + 1; ++nx) {
        for (int ny = member.y - 1; ny <= member.y + 1; ++ny) {
          const coord n{nx, ny};
          if (!map.in_body(n)) {
            continue;
          }
          if (map[n] == Map::UNASSIGNED) {
            map[n] = Map::BUFFER;
            // Add buffer cell to the region so that we can continue to grow the
            // buffer between regions if we need to
            region_members.push_back(n);
          }
        }
      }
    }
  }
}

RegionBoundary get_region_boundary(const Map& map, const int region_label) {
  // Iterate over all cells in the map considering only those that are
  // of type `region_label` and are adjacent to a cell of type `Map::BUFFER`
  RegionBoundary boundary;
  for (int y = 0; y < map.height(); ++y) {
    for (int x = 0; x < map.width(); ++x) {
      const auto type = map[{x, y}];
      if (type == region_label && map.neighbor_is_buffer({x, y})) {
        boundary.emplace_back(x, y);
      }
    }
  }
  return boundary;
}

void display_map(const Map& map, const std::vector<coord>& highlight) {
  for (int y = 0; y < map.height(); ++y) {
    for (int x = 0; x < map.width(); ++x) {
      const coord c = {x, y};
      const auto type = map[c];
      const char symbol = [&]() {
        if (std::any_of(highlight.begin(), highlight.end(), [&](const auto& x) {
              return x.x == c.x && x.y == c.y;
            })) {
          return '!';
        }
        if (type == Map::UNASSIGNED || type == Map::BUFFER) {
          return ' ';
        } else {
          return REGION_SYMBOLS.at((type - 2) % REGION_SYMBOLS.size());
        }
      }();
      std::cout << symbol;
    }
    std::cout << "\n";
  }
}

std::optional<std::tuple<coord, coord, double>> closest_points_on_boundary(
    const RegionBoundary& boundary1,
    const RegionBoundary& boundary2) {
  std::optional<std::tuple<coord, coord, double>> closest_points;
  for (const auto& p1 : boundary1) {
    for (const auto& p2 : boundary2) {
      const auto distance = std::pow(p1.x - p2.x, 2) + std::pow(p1.y - p2.y, 2);
      if (!closest_points || distance < std::get<2>(*closest_points)) {
        closest_points = {p1, p2, distance};
      }
    }
  }
  return closest_points;
}

int main(int argc, char** argv) {
  std::random_device rd;
  std::mt19937 prng(rd());

  if (argc != 3) {
    std::cerr << "Usage: " << argv[0] << " width height\n";
    return 1;
  }

  const int width = std::stoi(argv[1]);
  const int height = std::stoi(argv[2]);

  Map map(width, height);

  create_region(map, prng, 5);
  create_region(map, prng, 5);

  const auto boundary2 = get_region_boundary(map, 2);
  const auto boundary3 = get_region_boundary(map, 3);
  const auto closest = closest_points_on_boundary(boundary2, boundary3);
  if (closest) {
    display_map(map, {std::get<0>(*closest), std::get<1>(*closest)});
  }

  return 0;
}

Producing output like this

     00             
  000000000         
  00 00000000       
  00000000000       
 00000000000        
 0000000000000      
 000000000000       
 0000000000         
 000000 000         
 000000000          
  000 0!00          
                    
                111 
                111 
               1111 
               1111 
       !11111111111 
       111111111111