I have a 2D array Map[Height][Width]
that stores a bool
to represent each cell.
There exists two non-overlapping regions RegionA
and RegionB
.
Each Region is a list of unique adjacent integer co-ordinates.
The number of co-ordinates in RegionA
and RegionB
are m
and n
respectively
A co-ordinate in RegionA
will never be horizontally or vertically adjacent to a coordinate in RegionB
(i.e. the regions do not touch)
The bounding box of RegionA
could overlap with the bounding box of RegionB
RegionA
and RegionB
may have holes
RegionA
may or may not surround RegionB
(and vise versa)
if a co-ordinate (X, Y) is part of a region then Map[Y][X] == 1
, otherwise its zero.
I'm looking for an algorithm to determine the two co-ordinates A
and B
that have the minimum Euclidean distance, where A
is a member of RegionA
and B
is a member of RegionB
.
The brute force method has a time complexity of O(mn)
.
Is there a more time-efficient algorithm (preferably better than O(mn)
) for solving this problem?
The name + link, code, or description of one instance of such an algorithm would be greatly appreciated and will be accepted as an answer.
The region generation code is immaterial to the problem being asked. and I am only interested in how close the regions are to each other. In the project I am working on I used cellular automata to create the regions. I can include the code for this if necessary. The Region construction procedure exists only to create relevant examples.
#include <cstdlib>
#include <cstring>
#include <vector>
#include <cstdio>
#include <map>
#include <ctime>
#include <cmath>
constexpr int Width = 150;
constexpr int Height = 37;
bool Map[Height][Width];
struct coord {
int X, Y;
coord(int X, int Y): X(X), Y(Y) {}
bool operator==(const coord& Other) {
return Other.X == X && Other.Y == Y;
}
bool IsAdjacentTo(const coord& Other){
int ManhattanDistance = abs(X - Other.X) + abs(Y - Other.Y);
return ManhattanDistance == 1;
}
bool IsOnMapEdge(){
return X == 0 || Y == 0 || X == Width - 1 || Y == Height - 1;
}
};
using region = std::vector<coord>;
void ClearMap(){
std::memset(Map, 0, sizeof Map);
}
bool& GetMapBool(coord Coord) {
return Map[Coord.Y][Coord.X];
}
int RandInt(int LowerBound, int UpperBound){
return std::rand() % (UpperBound - LowerBound + 1) + LowerBound;
}
region CreateRegion(){
std::puts("Creating Region");
ClearMap();
region Region;
int RegionSize = RandInt(50, Width * Height / 2);
Region.reserve(RegionSize);
while (true){
Region.emplace_back(
RandInt(1, Width - 1),
RandInt(1, Height - 1)
);
if (!Region[0].IsOnMapEdge())
break;
else
Region.pop_back();
}
GetMapBool(Region[0]) = 1;
while (Region.size() != RegionSize){
coord Member = Region[RandInt(0, Region.size() - 1)];
coord NeighbourToMakeAMember {
RandInt(-1, 1) + Member.X,
RandInt(-1, 1) + Member.Y
};
if (!Member.IsOnMapEdge() && !GetMapBool(NeighbourToMakeAMember) && Member.IsAdjacentTo(NeighbourToMakeAMember)){
GetMapBool(NeighbourToMakeAMember) = 1;
Region.push_back(NeighbourToMakeAMember);
};
}
std::puts("Created Region");
return Region;
}
std::pair<region, region> CreateTwoRegions(){
bool RegionsOverlap;
std::pair<region, region> Regions;
do {
Regions.first = CreateRegion();
Regions.second = CreateRegion();
ClearMap();
for (coord Member : Regions.first){
GetMapBool(Member) = 1;
}
RegionsOverlap = 0;
for (coord Member : Regions.second){
if (GetMapBool(Member)){
ClearMap();
std::puts("Regions Overlap/Touch");
RegionsOverlap = 1;
break;
} else {
GetMapBool(Member) = 1;
}
}
} while (RegionsOverlap);
return Regions;
}
void DisplayMap(){
for (int Y = 0; Y < Height; ++Y){
for (int X = 0; X < Width; ++X)
std::printf("%c", (Map[Y][X] ? '1' : '-'));
std::puts("");
}
}
int main(){
int Seed = time(NULL);
std::srand(Seed);
ClearMap();
auto[RegionA, RegionB] = CreateTwoRegions();
DisplayMap();
}
What are the co-ordinates of the points A and B that form the minimum Euclidean distance between the two regions?
As others have mentioned, you can improve on finding the closest points between the two regions of n and m points by first filtering each region's points to only those which border unassigned cells and then doing an all-pairs search between the boundaries. For a roughly rectangular region this would take O(m + n + 16 √m √n) time: two linear filters and then the all-pairs search. I've implemented an example of this search below.
But this will not save you any time if your regions are shaped like this:
########
# #
# #### #
# # # #
# # ## #
# # #
# ######
If you are using larger datasets and can tolerate preprocessing you can get further acceleration using a 2D k-d tree.
Construction takes O(n log n) time after which you can search the tree in O(log n) time for m queries. So the total time to find the closest points is then O(m log n + n log n) or O(n log m + m log m).
You can do some benchmarking to determine if it is better to build the k-d tree for the larger or the smaller of the regions. Remember that this answer may change depending on how many times you're able to reuse the same k-d tree for queries against different regions.
For small datasets you'll probably find that the all-pairs comparison is faster because it has much greater cache locality.
Note that implementing a k-d tree is tricky, so you'll probably want to use an existing library or definitely use test cases if you're rolling your own.
If we happen to know that the regions are within a distance d of each other we can get an even faster algorithm.
For one of the regions we can build a hash table where a reference to each member cell is stored at a hash location (c.y // d) * height + (c.x // d)
. To find the nearest neighbor to a query point we then do an all-pairs check against the points referenced in the associated "meta-cell" and as well as the meta-cell's neighbors.
On the off-chance you're trying to grow the regions quickly, but their boundaries can overlap, this should do the trick.
Note that your question can be improved by stating why you are trying to calculate a thing. If you want to know the closest-cell so you have a primitive to build regions more quickly, then reconsidering how region building works might be a better use of your time, and it's what I address here.
If region generation is immaterial and you're only interested in how close they are to each other, then focus on defining, conceptually, what a region is and be very clear that your region construction procedure exists only to create relevant examples.
The fundamental trick we'll use to accelerate region construction is to make a space-time tradeoff. Rather than storing the map as a set of boolean values, we'll store it as a set of 8-bit integers. We reserve a few of these integers for special purposes:
Thus, the algorithm for building a region becomes simple:
A few other parts of your code needed cleaning. Types are typically defined using Camel Case and variables use snake case. I prefer snake_case for functions as well. See a style guide for more opinions.
Do not use global variables they are bad and, in the fullness of time, they will bring you nothing but sorrow.
You're using C++, avoid using headers such as stdio
, stdlib
, and cstring
. They provide the C way of doing things, which often come with disadvantageous such as diminished type safety.
Any time your if
statements have a lot of conditions, ask yourself whether they can be simplified by inverting one or more of the conditions (that is, switching is_good()
to !is_good()
) and doing an early exit. This can make your code much easier to read and reasonable about.
Nice use of std::vector::reserve()
. I have seen companies wasting >$100k/yr doing unnecessary vector reallocations.
Ranged loops such as
for (coord Member : Regions.second){
should be written as
for (const auto& member : Regions.second){
auto
is almost always better as a data type because it ensures you do the right thing and it makes it obvious to reviewers of your code that there are no tricks in play. const
protects from unintended writes. auto&
means "take a reference, don't make a copy". I've seen companies spending >$10M/year because they forgot about using &
to use references in for-loops.
#include <array>
#include <iostream>
#include <optional>
#include <random>
#include <string>
#include <tuple>
#include <vector>
constexpr int REGION_SEED_TRIES = 100;
constexpr int NEW_MEMBER_TRIES = 200;
constexpr std::array<char, 10> REGION_SYMBOLS =
{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
struct coord {
int x;
int y;
coord(int x, int y) : x(x), y(y) {}
bool adjacent(const coord& o) const {
return std::abs(x - o.x) + std::abs(y - o.y) == 1;
}
};
// Methods written in inadvisable one-line format for ease of display on
// StackOverflow
struct Map {
using CellType = uint8_t;
static constexpr int UNASSIGNED = 0;
static constexpr int BUFFER = 1;
int free_label = 2;
std::vector<CellType> data;
int m_width;
int m_height;
Map(int width, int height) {
data.resize(width * height, UNASSIGNED);
m_width = width;
m_height = height;
}
CellType& operator[](const coord& c) {
return data.at(c.y * m_width + c.x);
}
CellType operator[](const coord& c) const {
return data.at(c.y * m_width + c.x);
}
void clear() {
std::fill(data.begin(), data.end(), false);
}
int get_free_label() {
return free_label++;
}
int width() const {
return m_width;
}
int height() const {
return m_height;
}
bool on_edge(const coord& c) const {
return c.x == 0 || c.y == 0 || c.x == m_width - 1 || c.y == m_height - 1;
}
bool in_body(const coord& c) const {
return c.x >= 0 && c.y >= 0 && c.x < m_width && c.y < m_height;
}
bool neighbor_is_buffer(const coord& c) const {
for (int ny = c.y - 1; ny <= c.y + 1; ++ny) {
for (int nx = c.x - 1; nx <= c.x + 1; ++nx) {
const coord n{nx, ny};
if (!in_body(n)) {
continue;
}
if (data[n.y * m_width + n.x] == BUFFER) {
return true;
}
}
}
return false;
}
};
using RegionBoundary = std::vector<coord>;
int rand_int(const int lb, const int ub, std::mt19937& prng) {
return std::uniform_int_distribution<>(lb, ub)(prng);
}
void create_region(
Map& map,
std::mt19937& prng,
const int distance_to_neighboring_regions) {
const auto region_label = map.get_free_label();
const auto region_size = rand_int(50, map.width() * map.height() / 2, prng);
std::vector<coord> region_members;
region_members.reserve(region_size);
// Seed the region
for (int i = 0; i < REGION_SEED_TRIES; i++) {
const int x = rand_int(1, map.width() - 2, prng);
const int y = rand_int(1, map.height() - 2, prng);
if (map[{x, y}] == Map::UNASSIGNED) {
region_members.emplace_back(x, y);
break;
}
}
if (region_members.empty()) {
throw std::runtime_error(
"Unable to find a seed location in a reasonable amount of time");
}
// Grow the region
for (int tries = 0;
tries < NEW_MEMBER_TRIES && region_members.size() < region_size;
tries++) {
// Choose a random member of the region to grow from
const auto& member =
region_members.at(rand_int(0, region_members.size() - 1, prng));
// Choose a random neighbour of the member
const coord neighbour(
member.x + rand_int(-1, 1, prng), member.y + rand_int(-1, 1, prng));
// Early exits so that the logic of the function is easier to follow
if (!neighbour.adjacent(member)) {
continue;
}
if (map.on_edge(neighbour)) {
continue;
}
if (map[neighbour] != Map::UNASSIGNED) {
continue;
}
// Neighbor is a valid candidate for expansion, so add it to the region
map[neighbour] = region_label;
region_members.push_back(neighbour);
tries = 0; // Reset the number of tries
}
// Mark cells adjacent to the region so regions don't grow into each other
// We grow outwards from the region converting UNASSIGNED cells to BUFFER
// cells and we repeat this several times, if necessary, to ensure an adequate
// buffer between regions
for (int buffer = 0; buffer < distance_to_neighboring_regions; buffer++) {
for (const auto& member : region_members) {
// We will include all 8 cells around the focal cell
for (int nx = member.x - 1; nx <= member.x + 1; ++nx) {
for (int ny = member.y - 1; ny <= member.y + 1; ++ny) {
const coord n{nx, ny};
if (!map.in_body(n)) {
continue;
}
if (map[n] == Map::UNASSIGNED) {
map[n] = Map::BUFFER;
// Add buffer cell to the region so that we can continue to grow the
// buffer between regions if we need to
region_members.push_back(n);
}
}
}
}
}
}
RegionBoundary get_region_boundary(const Map& map, const int region_label) {
// Iterate over all cells in the map considering only those that are
// of type `region_label` and are adjacent to a cell of type `Map::BUFFER`
RegionBoundary boundary;
for (int y = 0; y < map.height(); ++y) {
for (int x = 0; x < map.width(); ++x) {
const auto type = map[{x, y}];
if (type == region_label && map.neighbor_is_buffer({x, y})) {
boundary.emplace_back(x, y);
}
}
}
return boundary;
}
void display_map(const Map& map, const std::vector<coord>& highlight) {
for (int y = 0; y < map.height(); ++y) {
for (int x = 0; x < map.width(); ++x) {
const coord c = {x, y};
const auto type = map[c];
const char symbol = [&]() {
if (std::any_of(highlight.begin(), highlight.end(), [&](const auto& x) {
return x.x == c.x && x.y == c.y;
})) {
return '!';
}
if (type == Map::UNASSIGNED || type == Map::BUFFER) {
return ' ';
} else {
return REGION_SYMBOLS.at((type - 2) % REGION_SYMBOLS.size());
}
}();
std::cout << symbol;
}
std::cout << "\n";
}
}
std::optional<std::tuple<coord, coord, double>> closest_points_on_boundary(
const RegionBoundary& boundary1,
const RegionBoundary& boundary2) {
std::optional<std::tuple<coord, coord, double>> closest_points;
for (const auto& p1 : boundary1) {
for (const auto& p2 : boundary2) {
const auto distance = std::pow(p1.x - p2.x, 2) + std::pow(p1.y - p2.y, 2);
if (!closest_points || distance < std::get<2>(*closest_points)) {
closest_points = {p1, p2, distance};
}
}
}
return closest_points;
}
int main(int argc, char** argv) {
std::random_device rd;
std::mt19937 prng(rd());
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " width height\n";
return 1;
}
const int width = std::stoi(argv[1]);
const int height = std::stoi(argv[2]);
Map map(width, height);
create_region(map, prng, 5);
create_region(map, prng, 5);
const auto boundary2 = get_region_boundary(map, 2);
const auto boundary3 = get_region_boundary(map, 3);
const auto closest = closest_points_on_boundary(boundary2, boundary3);
if (closest) {
display_map(map, {std::get<0>(*closest), std::get<1>(*closest)});
}
return 0;
}
Producing output like this
00
000000000
00 00000000
00000000000
00000000000
0000000000000
000000000000
0000000000
000000 000
000000000
000 0!00
111
111
1111
1111
!11111111111
111111111111