pythonpandasdataframeerror-checking

Double check on columns from two DataFrame


I have an issue implementing a check from a column of one DataFrame to a column of a different DataFrame. I have two Data Frame, one is the child of the other. First one has index with info of surface and building id, and a column with solar radiation values, here a sample:

> Index                                         Irradiation
> #Igh                                            1221900.0
> 1589(Building1589):233687():Irradiance(W/m2)     371071.0
> 1589(Building1589):233688():Irradiance(W/m2)     373723.0
> 1589(Building1589):233689():Irradiance(W/m2)     374250.0

and the second one has basically the same info, but a lower number of rows (I filtered the rows to get only ones with a minimum value of irradiation, here is the sample:

     Irradiation building_id surface_id
1       744146.0        1589     233720
2       756753.0        1589     233721
3       764957.0        1589     233722
4       739676.0        1589     233723
5       755673.0        1589     233724

Basically, I would like to check that everything worked properly by looking at the irradiation of DataFrame2 and checking if the building id value and the surface id value of that rows are in the Index of the DataFrame 1, so that no mistake occurred. I can easily turn the index of DataFrame1 into a column, but then I don't know how to set a boolean check between the two DataFrames.


Solution

  • df1 = pd.DataFrame(
        {
            "Index": [
                "#Igh",
                "1589(Building1589):233687():Irradiance(W/m2)",
                "1589(Building1589):233688():Irradiance(W/m2)",
                "1589(Building1589):233689():Irradiance(W/m2)",
                "0000(Building0000):233689():Irradiance(W/m2)",
                "1589(Building1589):000000():Irradiance(W/m2)",
                "1589(Building0000):233724():Irradiance(W/m2)"
            ],
            "Irradiation": [1221900.0, 371071.0, 373723.0, 374250.0, 1, 2 , 3],
        }
    )
    
    # df2
    #      Irradiation building_id surface_id
    # 1       744146.0        1589     233720
    # 2       756753.0        1589     233721
    # 3       764957.0        1589     233722
    # 4       739676.0        1589     233723
    # 5       755673.0        1589     233724
    
    df2 = pd.DataFrame(
        {
            "Irradiation": [744146.0, 756753.0, 764957.0, 739676.0, 755673.0],
            "building_id": [1589, 1589, 1589, 1589, 1589],
            "surface_id": [233720, 233721, 233722, 233723, 233724]
        }
    )
    
    df1["building_id"] = df1["Index"].str[:4].astype('str')
    df1["surface_id"] = df1["Index"].str[19:25].astype('str')
    
    #check if building_id surface_id of df2 are in df1, and add to df2["check"]
    df2["check"] = df2.building_id.astype("str").isin(df1.building_id) & \
                df2.surface_id.astype("str").isin(df1.surface_id)
    print(df2)