excelgraphchartsscatter-plot

Excel XY scatter chart glitch


Sometimes Excel refuses point blank to create XY scatter plots properly. Defaulting to a silly format plotting rows as lines with legends "series 1", "series 2" & "series 3" despite identifying column headers which it sticks on the X-axis! Meant to be XY scatter chart with points joined by straight lines (that is what was selected) but not delivered.

The problem I see intermittently appears to be possibly related to this previous thread about XY scatter plots where the X axis is actually a time in time format. But I can see no reason why that applies here. The data are all clearly numeric integers.

I know that if titles or data are malformed (contain Nans, Infs or blanks) this will happen (fair enough). But sometimes it misbehaves for no apparent reason when given perfectly valid numeric data. This is a particularly small example that I found today reduced to an MRE that still fails. Usually there are many more columns and rows. Presented below as a CSV file :

log2N,N,valid,solutions
3,8,35,10
4,16,1365,119
5,32,31465,852

If you copy and paste this innocent looking dataset into Excel and select it try to plot the X-Y graph of N, valid and solutions vs Log2N then it will do the crazy useless plot instead. In fact it will only do a proper X-Y scatter plot if you select columns A & B for Log2N and N. Once the right format of XY chart is established it will happily allow the data range to be extended. Select plot data and edit "=Sheet1!$A$1:$B$4" to "=Sheet1!$A$1:$D$4" and the format remains as X-Y but this is a PITA.

enter image description here

I'm using Excel 2021 MS Office Pro. It ID's itself as: Microsoft® Excel® 2021 MSO (Version 2503 Build 16.0.18623.20178) 64-bit

I have seen this behaviour with all recent versions of Excel since 2007. This is just the first time I have found a small MRE where the behaviour shows up so clearly. I think it is a bug in Excel's assumptions about data triggered by the dynamic range but I'd be really grateful if someone could find a way around this problem so that charting XY scatter graphs worked correctly on any selection of valid numeric data treating the first column as the X-axis and the first row as legends for the graph lines.

Or if it is reproducible and a bug how to report it to MS in such a way as to get it fixed.

Whilst I do have a workaround I would prefer a permanent fix. If there is something subtle malformed about my data that I have missed it please point it out. Thanks.


Solution

  • So the answer is that it isn't a bug, more of a feature (kind of). This article explains how Excel's scatterplot determines which data (rows or columns) it chooses to plot along the horizontal and vertical axes:

    In each case, the first row or column of data is taken as the list of x-coordinates and the remaining rows or columns are each taken as different series of y-coordinates.

    If you display the graph and go into Select data | Series 1 | Edit, you will see

    X-values
    =Sheet2!$A$1:$D$1
    Y-values
    =Sheet2!$A$2:$D$2

    Demonstrating that it is indeed using the first row for x-values and the second row for y-values. The first row contains text, not numbers, so in this cases it defaults to using equal intervals along the x-axis.

    As mentioned in the comment, you can fix it by switching rows and columns.

    In answer to the question '...so that charting XY scatter graphs worked correctly on any selection of valid numeric data treating the first column as the X-axis and the first row as legends for the graph lines' , try including some blank rows below the data in the data selection before plotting it.