I have risk data which I would like to color on a map according to the risk level. I read a shapefile and a csv data file which I then merge. This works very well working with adm1 shapefile.
When I run the same script with adm2 shapefile, the results are completely weird: The polygons colored are way far from the polygon with data. I have the attached map 3 examples of the colored polygons and in black dot the location of the data.
I will appreciate if anyone can give a clue of what is going on, and if possible how to resolve this.
Python script used to produce the plot:
#!/home/zmumba/anaconda3/bin/python
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from mpl_toolkits.basemap import Basemap # optional ?
map_df = gpd.read_file("/home/zmumba/DA/Dekad_Progs/Shapefiles/Lesotho/geoBoundaries-LSO-ADM2-all/geoBoundaries-LSO-ADM2.shx")
risks_df=pd.read_csv("/home/zmumba/DA/Dekad_Progs/Output/H1dRrisks.csv")
map_df["risk"] = map_df.merge(risks_df, left_on="shapeName", right_on="District")["risk"]
colors = {1: "green", 2: "yellow", 3: "orange", 4: "red"} # or a list
labels = {1: "no risk", 2: "low risk", 3: "medium risk", 4: "high risk"}
catego = map_df["risk"].astype(str).str.cat(map_df["risk"].map(labels), sep="- ")
fig, ax = plt.subplots(figsize=(5, 5))
plt.title(f'Risk of Heavy 24hr Rain: 20-24Nov', y=1.04)
map_df.plot(
column=catego,
categorical=True,
edgecolor="k",
linewidths=0.8,
alpha=0.7,
cmap=ListedColormap([c for r,c in colors.items() if r in map_df["risk"].unique()]),
legend=True,
legend_kwds={
"title": "Risk Level",
"shadow": True,
"loc": "lower right",
"fontsize": 10,
},
ax=ax,
)
ax.set_axis_off()
plt.savefig('H1dRriskmap.png', dpi=300)
plt.show()
Just to add some clarification to my problem, The attached image shows the polygon numbers from the shapefile. Coloring 42 colors 41, coloring 43 colors 42. The coordinates of 41, 42 and 43 are
-29.3892289999999, 28.3056183629316 -> 41
-29.2870792999999, 29.2715247780569 -> 42
-29.2255776170000, 27.6546474532047 -> 43
These coordinates are the centroids of the polygons taken from the shapefile itself. So it cannot be a problem of coordinates. Could there be something wrong in the python code? The code reads from a file which is in the format:
"District","risk"
"name1",1
"name2",1
...
"name78",1
where 1 can b2 2, 3, or 4 depending on the risk level. I would be glad to try doing the same thing in R, but the problem with R is, it will say "no package sf" and so on.
As I see it, your error is in fact in your python-code... here:
map_df["risk"] = map_df.merge(risks_df, left_on="shapeName", right_on="District")["risk"]
To clarify:
Both your dataframes map_df
and risks_df
are indexed by numeric values (0,1,2,3 ...).
If you run the merge ( map_df.merge(risks_df, ...)
), it will return a new DataFrame
with a new index (which is again a numeric index 0,1,2...).
Now if you do map_df["risk"] = map_df.merge(...)["risk"]
you'll write the results of the merge based on their merge-index to the index-values in map_df
.
This works because both indexes are numeric, but it is in no way guaranteed to be correct!
Just check your results... I guess you'll find that the "risk" values are not correctly assigned.
To make sure you assign the values correctly, I'd suggest to replace the line above with something like this:
map_df = map_df.set_index("shapeName")
map_df["risks"] = risks_df.set_index("District")["risk"]