javahashmappriority-queueshortest-patha-star

PriorityQueue Comparator weird behavior when sort order is fetched from hashmap


While working in LeetCode 675, I came across a weird behavior with PriorityQueue Comparator that I can't explain.

For brevity, I'll state the gist of the problem statement; interested parties can read it on LeetCode in full detail.

Given a mxn matrix called forest, some cells are designated as trees. The ask is to find the minimum number of steps to visit (cut off) all tree cells, in ascending order of tree heights.

To solve this, I find all the trees, and sort them in ascending order of height. Then I calculate the minimum distance from each tree to its next by running an A* search using the heuristic function: distance from source + Manhattan distance from the goal

The minimum number of steps to cut off all the trees is the sum of all the A* distances. If at any point the A* search fails to return a path, I abort.

class Solution {
  private final int[][] dx = new int[][] {{-1, 0}, {1, 0}, {0, -1}, {0, 1}};

  public int cutOffTree(List<List<Integer>> forest) {
    int[] curr = new int[] {0, 0};
    int dist = 0;

    for (int[] next : getTreesSortedByHeight(forest)) {
      int d = minDist(forest, curr, next);
      if (d < 0) {
        return -1;
      }
      curr = next;
      dist += d;
    }

    return dist;
  }

  private List<int[]> getTreesSortedByHeight(List<List<Integer>> forest) {
    List<int[]> trees = new ArrayList<>();
    for (int row = 0; row < forest.size(); row++) {
      for (int col = 0; col < forest.get(0).size(); col++) {
        if (forest.get(row).get(col) > 1) {
          trees.add(new int[] {row, col});
        }
      }
    }
    trees.sort(Comparator.comparingInt(a -> forest.get(a[0]).get(a[1])));
    return trees;
  }

  int minDist(List<List<Integer>> forest, int[] start, int[] goal) {
    int m = forest.size();
    int n = forest.get(0).size();
    Map<Integer, Integer> costs = new HashMap<>();
    costs.put(start[0] * n + start[1], manhattanDist(start[0], start[1], goal));
    // GOTCHA: Fetching the distance from the cost map using the coordinates doesn't work!
    Queue<int[]> heap = new PriorityQueue<>(Comparator.comparingInt(a -> a[0]));
    heap.offer(new int[] {0, 0, start[0], start[1]}); // [cost, distance, row, column]

    while (!heap.isEmpty()) {
      int[] curr = heap.poll();
      int dist = curr[1];
      int row = curr[2];
      int col = curr[3];

      if (row == goal[0] && col == goal[1]) {
        return dist;
      }
      for (int[] d : dx) {
        int r = row + d[0];
        int c = col + d[1];
        if (r >= 0 && r < m && c >= 0 && c < n && forest.get(r).get(c) > 0) {
          int cost = dist + 1 + manhattanDist(r, c, goal);
          if (cost < costs.getOrDefault(r * n + c, Integer.MAX_VALUE)) {
            costs.put(r * n + c, cost);
            heap.offer(new int[] {cost, dist + 1, r, c});
          }
        }
      }
    }
    return -1;
  }

  private int manhattanDist(int row, int col, int[] goal) {
    return Math.abs(goal[0] - row) + Math.abs(goal[1] - col);
  }
}

Note that each heap entry contains the heuristic cost. Logically, this is unnecessary, because the entry also contains the cell coordinates (row and column), using which we can fetch the distance from the costs map as follows:

Queue<int[]> heap = new PriorityQueue<>(Comparator.comparingInt(a -> costs.get(a[1] * n + a[2]);

But this doesn't work, and fails one of the test cases. The test case is huge, so, I see no point in pasting it here, as it is unlikely anyone would have the time to debug it.

What I'd like to know is why this weirdness.

Edit: Added complete code as requested in a comment.


Solution

  • PriorityQueue cannot handle it if the ordering of its elements changes while they are still in the queue, and its behavior is undefined in that scenario. Very weird things are likely to happen, since the heap property of the elements is silently broken with no way for PriorityQueue to know that that has happened. It seems pretty clear that that can happen, since multiple paths to the same node can result in the same element being in the queue in multiple places.

    Your original code is, in fact, the best way to do it.