pythonpandasdata-visualizationgeospatialchoropleth

How to generate a choropleth map based on region names?


I'm working on Python with a dataset that has data about a numerical variable for each italian region, like this:

import numpy as np
import pandas as pd
regions = ['Trentino Alto Adige', "Valle d'Aosta", 'Veneto', 'Lombardia', 'Emilia-Romagna', 'Toscana', 'Friuli-Venezia Giulia', 'Liguria', 'Piemonte', 'Marche', 'Lazio', 'Umbria', 'Abruzzo', 'Sardegna', 'Puglia', 'Molise', 'Basilicata', 'Calabria', 'Sicilia', 'Campania']
df = pd.DataFrame([regions,[10+(i/2) for i in range(20)]]).transpose()
df.columns = ['region','quantity']
df.head()

enter image description here

I would like to generate a map of Italy in which the colour of the different regions depends on the numeric values of the variable quantity (df['quantity']),i.e., a choropleth map like this:

enter image description here

How can I do it?


Solution

  • You can use geopandas.

    The regions in your df compared to the geojson dont match exactly. I'm sure you can find another one, or alter the names so they match.

    import pandas as pd
    import geopandas as gpd
    
    regions = ['Trentino Alto Adige', "Valle d'Aosta", 'Veneto', 'Lombardia', 'Emilia-Romagna', 'Toscana', 'Friuli-Venezia Giulia', 'Liguria', 'Piemonte', 'Marche', 'Lazio', 'Umbria', 'Abruzzo', 'Sardegna', 'Puglia', 'Molise', 'Basilicata', 'Calabria', 'Sicilia', 'Campania']
    df = pd.DataFrame([regions,[10+(i/2) for i in range(20)]]).transpose()
    df.columns = ['region','quantity']
    
    #Download a geojson of the region geometries
    gdf = gpd.read_file(filename=r'https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_municipalities.geojson')
    gdf = gdf.dissolve(by='reg_name') #The geojson is to detailed, dissolve boundaries by reg_name attribute
    gdf = gdf.reset_index()
    
    #gdf.reg_name[~gdf.reg_name.isin(regions)] Two regions are missing in your df
    #16    Trentino-Alto Adige/Südtirol
    #18    Valle d'Aosta/Vallée d'Aoste
    
    gdf = pd.merge(left=gdf, right=df, how='left', left_on='reg_name', right_on='region')
    
    ax = gdf.plot(
        column="quantity",
        legend=True,
        figsize=(15, 10),
        cmap='OrRd',
        missing_kwds={'color': 'lightgrey'});
    
    ax.set_axis_off();
    

    enter image description here