pythonpandasmatplotlibboxplot

Pandas box plot error on one datapoint


I'm working on a box-plot using pandas.

My DataFrame looks like this

Year                 2013      2014      2015      2016      2017
dfMin            1.091603  0.973346  1.040000  0.855209  1.079500
dfLowerQuartile  1.727191  1.684009  1.275601  1.136703  2.262654
dfUpperQuartile  2.225000  2.000000  1.857570  2.120644  2.435724
dfMax            2.687323  2.350000  2.105000  2.250000  2.566467

My chart code looks like this

chartDF.boxplot(grid=False, figsize=(9,4))

Leading to a plot looking like this enter image description here

I am puzzled at the lower value in 2017 coming out as a point.

Does anyone know how to fix this issue?


Solution

  • This is expected behavior. Your minimum value for 2017 is more than 1.5 IQR below the first quartile of the four provided data points, in which case the minimum is displayed as an outlier (a point).

    From the docs for whis in boxplot (emphasis mine):

    whis : float, sequence, or string (default = 1.5)

    As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whis*IQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whis*IQR. Beyond the whiskers, data are considered outliers and are plotted as individual points. Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.

    So if you want the whiskers to extend all the way,

    df.boxplot(grid=False, figsize=(9, 4), whis='range')
    

    enter image description here