I have dataframe like this,
data = {'TIMEFRAME':['9/12/2014 17:52', '10/12/2014 5:02', '10/12/2014 8:04'],
'Volumetric Flow Meter 1':[0.82, 0.88, 0.9],
'Pump Speed (RPM)':[2.5,2.7,3.01],
'Data Source':['raw data','raw data','raw data'],
'PUMP FAILURE (1 or 0)':[0,0,1]}
df = pd.DataFrame(data)
df
TIMEFRAME Volumetric Flow Meter 1 Pump Speed (RPM) Data Source PUMP FAILURE (1 or 0)
9/12/2014 17:52 0.82 2.5 raw data 0
10/12/2014 5:02 0.88 2.7 raw data 0
10/12/2014 8:04 0.90 3.01 raw data 1
I am trying to loop through the dataset, plotting every numerical variable individually, against the Pump Failure to identify the trends. I have to create a list of every numerical columns in the dataframe and loop through it to plot them against the PUMP FAILURE (1 or 0) column.
For each plot, I have to ensure that I have a dual axis set up so I can see the Pump Failure (0 or 1) on the second Y-axis, and the attribute on the first Y-Axis.
The output is something like this,
This was my approach,
ListOfVariables=[df["Pump Speed (RPM)"],df["Volumetric Flow Meter 1"]]
for item in ListOfVariables:
first_axis = df[item].plot #Looping through every item in the dataframe.
second_axis = first_axis.twinx() #The Twinx function is used to ensure we share the X-Axis for both plots
second_axis.plot(df['PUMP FAILURE (1 or 0)'], color='teal')
plt.title(item)
plt.show()
This doesn't produce the desire output. Any help is appreciated. Thanks.
Use:
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
data = {'TIMEFRAME': pd.date_range('9/12/2014 17:52', '10/12/2014 18:04', 100),
'Volumetric Flow Meter 1':np.random.randn(100),
'Pump Speed (RPM)':np.random.randn(100),
'Data Source':['raw data']*100,
'PUMP FAILURE (1 or 0)':np.random.randn(100)}
df = pd.DataFrame(data)
df['TIMEFRAME'] = pd.to_datetime(df['TIMEFRAME'])
cols = df.columns[:-1]
for col in cols[1:-1]:
fig, ax = plt.subplots(figsize=(15,3))
ax.plot(df[cols[0]], df['PUMP FAILURE (1 or 0)'], color = 'red')
ax2 = ax.twinx()
ax2.plot(df[cols[0]], df[col], color='teal')
ax.set_xticklabels(df[cols[0]].dt.floor('S'), rotation=90)
ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=600))
plt.title(col)
plt.show()
With interval = 600, it means each 10 hours. I tested it with 300 and the representation is not so well. If you want smaller time steps first increase the fig size.
Output: