pythonpandasmatplotlibchartstwinx

Plot pandas line chart using dual axis and loop through dataframe


I have dataframe like this,

data = {'TIMEFRAME':['9/12/2014 17:52', '10/12/2014 5:02', '10/12/2014  8:04'],
        'Volumetric Flow Meter 1':[0.82, 0.88, 0.9],
        'Pump Speed (RPM)':[2.5,2.7,3.01],
        'Data Source':['raw data','raw data','raw data'],
        'PUMP FAILURE (1 or 0)':[0,0,1]}

df = pd.DataFrame(data)
df

TIMEFRAME       Volumetric Flow Meter 1  Pump Speed (RPM)  Data Source   PUMP FAILURE (1 or 0)
9/12/2014  17:52           0.82                   2.5      raw data           0   
10/12/2014 5:02            0.88                   2.7      raw data           0
10/12/2014 8:04            0.90                   3.01     raw data           1

I am trying to loop through the dataset, plotting every numerical variable individually, against the Pump Failure to identify the trends. I have to create a list of every numerical columns in the dataframe and loop through it to plot them against the PUMP FAILURE (1 or 0) column.

For each plot, I have to ensure that I have a dual axis set up so I can see the Pump Failure (0 or 1) on the second Y-axis, and the attribute on the first Y-Axis.

The output is something like this, graph

This was my approach,

ListOfVariables=[df["Pump Speed (RPM)"],df["Volumetric Flow Meter 1"]]

for item in ListOfVariables:
    first_axis = df[item].plot #Looping through every item in the dataframe.
    second_axis = first_axis.twinx() #The Twinx function is used to ensure we share the X-Axis for both plots
    second_axis.plot(df['PUMP FAILURE (1 or 0)'], color='teal')
    plt.title(item)
    plt.show()

This doesn't produce the desire output. Any help is appreciated. Thanks.


Solution

  • Use:

    import pandas as pd
    import numpy as np
    import matplotlib.dates as mdates
    import matplotlib.pyplot as plt
    
    
    data = {'TIMEFRAME': pd.date_range('9/12/2014 17:52', '10/12/2014  18:04', 100),
            'Volumetric Flow Meter 1':np.random.randn(100),
            'Pump Speed (RPM)':np.random.randn(100),
            'Data Source':['raw data']*100,
            'PUMP FAILURE (1 or 0)':np.random.randn(100)}
    
    df = pd.DataFrame(data)
    df['TIMEFRAME'] = pd.to_datetime(df['TIMEFRAME'])
    cols = df.columns[:-1]
    
    for col in cols[1:-1]:
        fig, ax = plt.subplots(figsize=(15,3))
        ax.plot(df[cols[0]], df['PUMP FAILURE (1 or 0)'], color = 'red')
        ax2 = ax.twinx()
        ax2.plot(df[cols[0]], df[col], color='teal')
        ax.set_xticklabels(df[cols[0]].dt.floor('S'), rotation=90)
        ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=600))
        plt.title(col)
        plt.show()
    

    With interval = 600, it means each 10 hours. I tested it with 300 and the representation is not so well. If you want smaller time steps first increase the fig size.

    Output:

    enter image description here