pythonpandasmatplotlibbar-chartlinechart

line chart and bar chart don't align against the same index


I'm trying to plot a pandas dataframe as a bar plot and a line plot.

This MWE sums up what I'm looking at:

import pandas as pd
import matplotlib.pyplot as plt

test_df = pd.DataFrame(np.random.rand(20,2), columns = ['A', 'B'])

test_df['A'] = test_df['A'].cumsum()
test_df['B'] = test_df['B'].cumsum()
test_df.index += 1

I then plot the bars:

ax = test_df.plot(kind='bar', colors=['red', 'blue'], figsize = (13.5,6))

This plots fine as expected against the index of the df: enter image description here

I then further manipulate the data in some manner to form an upper bound to be plotted as a line against the bars:

test_df['C'] = test_df.index
test_df['Upper'] = 4 * (test_df['C']/5)**0.5
test_df['Upper'].plot()

This however doesn't work as intended: enter image description here

How can I solve the above?


Solution

  • There are two things happening here in your example.

    1) When you make a bar chart of 'A' and 'B' using the plot(kind='bar') method of test_df, pandas creates a plot where the values in test_df.index are used as the x-axis tick labels for the corresponding pairs in columns 'A' and 'B'. I'm assuming pandas does this because bar charts are usually used with categorical variables on the x-axis. This can be illustrated by the following code:

    >>> test_df.index = [1, 2, 3, 4, 100, 6, 7, 8, 9, 10,
                         11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
    >>> ax = test_df[['A', 'B']].plot(kind='bar', color=['red', 'blue'])
    

    Which produces:

    Bar Chart

    What this means is that the x-axis tick labels in your first plot are not the actual x-values at which the values are plotted. Instead 1 is actually at x = 0 becuase it is the first value of the index and 20 is at x = 19. You can deduce this by checking the x-axis limits:

    >>> ax.get_xlim()
    (-0.5, 19.5)
    

    2) Calling test_df['Upper'].plot() (without kind='bar') plots the values in the 'Upper' column and uses test_df.index as the x-coordinates. This call uses the current axes and looks like it also changes the x-axis limits to fit the most recent plotted data, whose x-values are 1 through 20. If you check the axis limits on the second plot they will be (1.0, 20.0).

    To get around all this, I would recommend not incrementing the index and keeping it from 0-19 before plotting your data. That way you know all of the x-coordinates are the same. Then after plotting them you can set explicitly set the x-tick labels with ax.set_xticklabels(['your', 'x', 'tick', 'labels']).

    There is probably a kwarg you can add to df.plot(kind='bar', ...) that will set the x-coordinates to plot bars at, but I can't seem to find it at the moment.