I'm working on a box-plot using pandas.
My DataFrame looks like this
Year 2013 2014 2015 2016 2017
dfMin 1.091603 0.973346 1.040000 0.855209 1.079500
dfLowerQuartile 1.727191 1.684009 1.275601 1.136703 2.262654
dfUpperQuartile 2.225000 2.000000 1.857570 2.120644 2.435724
dfMax 2.687323 2.350000 2.105000 2.250000 2.566467
My chart code looks like this
chartDF.boxplot(grid=False, figsize=(9,4))
Leading to a plot looking like this
I am puzzled at the lower value in 2017 coming out as a point.
Does anyone know how to fix this issue?
This is expected behavior. Your minimum value for 2017 is more than 1.5 IQR below the first quartile of the four provided data points, in which case the minimum is displayed as an outlier (a point).
From the docs for whis
in boxplot
(emphasis mine):
whis
: float, sequence, or string (default = 1.5)As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range
(Q3-Q1)
, the upper whisker will extend to last datum less thanQ3 + whis*IQR)
. Similarly, the lower whisker will extend to the first datum greater thanQ1 - whis*IQR
. Beyond the whiskers, data are considered outliers and are plotted as individual points. Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally,whis
can be the string'range'
to force the whiskers to the min and max of the data.
So if you want the whiskers to extend all the way,
df.boxplot(grid=False, figsize=(9, 4), whis='range')