postgisdbscan

DB-SCAN giving different results in different databases


So my dataset is the same, but databases are separate. Is it expected that DB-SCAN will produce different results for both databases?


Solution

  • Yes, this is to be expected as DBScan is not fully deterministic. This and this thread are interesting to read for further context.

    From wikipedia:

    "DBSCAN is not entirely deterministic: border points that are reachable from more than one cluster can be part of either cluster, depending on the order the data are processed. For most data sets and domains, this situation does not arise often and has little impact on the clustering result: both on core points and noise points, DBSCAN is deterministic. DBSCAN* is a variation that treats border points as noise, and this way achieves a fully deterministic result as well as a more consistent statistical interpretation of density-connected components."