I have a series of 2D Histograms that I created using the kde2d function of MASS in the following way:
# Loading libraries
library(MASS)
library(RcolorBrewer)
# Loading data
data <- as.matrix(read.table('data.dat'))
# Create the 2dhist object
hist_2d <- kde2d(data[,1],data[,2],n = 60, lims=c(-180,180,-180,180))
# Define the color palette
rf <- colorRampPalette(rev(brewer.pal(11,'Spectral')))
r <- rf(60)
# Defining the axis
at_x = seq(-180,180,by=30)
at_y = seq(-180,180,by=30)
# Plot the 2DHistogram
image(hist_2d,col=r,cex.main=3,main='Q68L',axes=F)
axis(1,lwd.ticks=2,at=at_x,labels=T,cex.axis=2)
axis(2,lwd.ticks=2,at=at_y,labels=T,cex.axis=2)
The histogram generated looks like this. How I can identify all the zones with high density ( which I marked inside the white squares)? The ideal solution for this problem would be a function that throws an (x,y) range for every high density zone so that it can be applied in several datasets.
Thanks in advance and let me know if you need additional information
With the right representation of the data, this can be done with
cluster analysis. Since you do not provide data, I will illustrate
with the data used on the kde2d
help page - the geyser data.
This data is gives a pretty clean separation of "high density"
areas (like your example pictures), so I will just use a simple
k-means clustering.
library(MASS)
attach(geyser)
f2 <- kde2d(duration, waiting, n = 50, lims = c(0.5, 6, 40, 100),
h = c(width.SJ(duration), width.SJ(waiting)) )
image(f2, zlim = c(0, 0.05))
We need to find the "hot spots". In order to get an idea about what values should be considered "high", we can look at a boxplot.
boxplot(as.vector(f2$z))
Based on this, I will somewhat arbitrarily use points where the z-value is greater than 0.012. You will need to tune this for your particular problem.
Hot = which(f2$z > 0.012, arr.ind = TRUE)
HotPoints = data.frame(x=f2$x[Hot[,1]], y=f2$y[Hot[,2]])
plot(HotPoints, pch=20, xlim = c(0.5,6), ylim = c(40,100))
Now we need to cluster the points and find the x & y ranges for the clusters. First I do it simply and show that the results are reasonable.
KM3 = kmeans(scale(HotPoints), 3)
plot(HotPoints, pch=20, xlim = c(0.5,6), ylim = c(40,100))
for(i in 1:3) {
Rx = range(HotPoints[KM3$cluster == i,1])
Ry = range(HotPoints[KM3$cluster == i,2])
polygon(c(Rx, rev(Rx)), rep(Ry, each=2))
}
I am not sure how you want the results presented to you, but one way to get them all in one place is this:
XRanges = sapply(unique(KM3$cluster),
function(i) range(HotPoints[KM3$cluster == i,1]))
XRanges
[,1] [,2] [,3]
[1,] 3.979592 3.867347 1.734694
[2,] 4.877551 4.316327 2.071429
YRanges = sapply(unique(KM3$cluster),
function(i) range(HotPoints[KM3$cluster == i,2]))
YRanges
[,1] [,2] [,3]
[1,] 47.34694 70.61224 73.06122
[2,] 62.04082 87.75510 95.10204
This gives a min and max for x and y for each of the three clusters.
However, I made a few choices here and I wish to point out
that I still left some work for you. What you still need to do:
1. You need to choose a cut-off point for how high the density
needs to be to get a cluster.
2. Given the points above your cut-off, you will need to say
how many clusters you want to generate.
The rest of the machinery is there.