I'm trying to plot a pandas dataframe as a bar plot and a line plot.
This MWE sums up what I'm looking at:
import pandas as pd
import matplotlib.pyplot as plt
test_df = pd.DataFrame(np.random.rand(20,2), columns = ['A', 'B'])
test_df['A'] = test_df['A'].cumsum()
test_df['B'] = test_df['B'].cumsum()
test_df.index += 1
I then plot the bars:
ax = test_df.plot(kind='bar', colors=['red', 'blue'], figsize = (13.5,6))
This plots fine as expected against the index of the df:
I then further manipulate the data in some manner to form an upper bound to be plotted as a line against the bars:
test_df['C'] = test_df.index
test_df['Upper'] = 4 * (test_df['C']/5)**0.5
test_df['Upper'].plot()
This however doesn't work as intended:
xlim
is changed the line plot doesn't begin over index 1 as I would expect from the pandas df index.How can I solve the above?
There are two things happening here in your example.
1) When you make a bar chart of 'A'
and 'B'
using the plot(kind='bar')
method of test_df
, pandas creates a plot where the values in test_df.index
are used as the x-axis tick labels for the corresponding pairs in columns 'A'
and 'B'
. I'm assuming pandas does this because bar charts are usually used with categorical variables on the x-axis. This can be illustrated by the following code:
>>> test_df.index = [1, 2, 3, 4, 100, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
>>> ax = test_df[['A', 'B']].plot(kind='bar', color=['red', 'blue'])
Which produces:
What this means is that the x-axis tick labels in your first plot are not the actual x-values at which the values are plotted. Instead 1
is actually at x = 0 becuase it is the first value of the index and 20
is at x = 19. You can deduce this by checking the x-axis limits:
>>> ax.get_xlim()
(-0.5, 19.5)
2) Calling test_df['Upper'].plot()
(without kind='bar'
) plots the values in the 'Upper'
column and uses test_df.index
as the x-coordinates. This call uses the current axes and looks like it also changes the x-axis limits to fit the most recent plotted data, whose x-values are 1 through 20. If you check the axis limits on the second plot they will be (1.0, 20.0)
.
To get around all this, I would recommend not incrementing the index and keeping it from 0-19 before plotting your data. That way you know all of the x-coordinates are the same. Then after plotting them you can set explicitly set the x-tick labels with ax.set_xticklabels(['your', 'x', 'tick', 'labels'])
.
There is probably a kwarg
you can add to df.plot(kind='bar', ...)
that will set the x-coordinates to plot bars at, but I can't seem to find it at the moment.