I have the following Dataframe built with Pandas:
SampleSize Mean StandardDeviation
0 5 0.134151 0.739142
1 25 -0.111257 1.154803
2 45 -0.049999 0.918167
3 65 -0.162783 1.179178
4 85 -0.097452 0.966980
5 105 -0.050559 1.161751
6 125 -0.038383 1.018117
7 145 0.086192 1.028177
8 165 0.045295 1.090246
9 185 -0.107837 1.101610
10 205 0.088160 0.967483
...
40 805 0.020641 1.007389
41 825 0.022781 0.991498
42 845 -0.027429 0.962288
43 865 -0.105373 1.007109
44 885 -0.054397 1.015499
45 905 -0.023729 0.989168
46 925 0.025044 0.989950
47 945 0.021345 1.035740
48 965 0.023404 0.963122
49 985 0.020648 1.000148
It is a total of 50 random normal samples' sizes, means, and stdevs. I am trying to graph a facet_grid showing both the mean and standard deviation compared side by side to the sample size.
The code I am using currently is:
df1 = pd.DataFrame({'SampleSize': range(5, SAMPLE_SIZE, 20), 'Mean': means, 'StandardDeviation': stdev})
df1_melted = pd.melt(df1, id_vars=['SampleSize'], var_name='SampleSize', value_name='Value')
ggplot(df1_melted, aes(x='SampleSize', y='Value', color='SampleSize')) + \
geom_line() + \
geom_point() + \
facet_grid('SampleSize ~ .') + \
labs(x='SampleSize', y='Mean and StandardDeviation')
This results in:
...
/usr/lib/python3.7/site-packages/pandas/core/internals/blocks.py in new_block(values, placement, ndim, klass)
1935
1936 values, _ = extract_pandas_array(values, None, ndim)
-> 1937 check_ndim(values, placement, ndim)
1938
1939 if klass is None:
/usr/lib/python3.7/site-packages/pandas/core/internals/blocks.py in check_ndim(values, placement, ndim)
1978 if len(placement) != len(values):
1979 raise ValueError(
-> 1980 f"Wrong number of items passed {len(values)}, "
1981 f"placement implies {len(placement)}"
1982 )
ValueError: Wrong number of items passed 2, placement implies 1
I am confused on where this is going wrong as it worked when I graphed each of the 2 graphs separately.
The issue is with your melt
statement. You have:
df1_melted = pd.melt(df1, id_vars=['SampleSize'], var_name='SampleSize', value_name='Value')
which produces:
Note that 'SampleSize' doesn't actually contain the sample size and that there are two of them.
Now consider:
melted = pd.melt(df1, id_vars=['SampleSize'], value_vars=['Mean','StandardDeviation'])
which produces:
Given 'SampleSize' is repeated twice in your melted dataframe, it wasn't clear to me whether you intended to have a different coloured line for the mean and standard deviation graphs, or whether you wanted to have the line change colour based on the sample size. I went with the latter.
p = (ggplot(melted, aes(x='SampleSize', y='value',color='SampleSize'))
+ theme_light(9)
+ geom_line()
+ geom_point()
+ facet_grid('variable ~ .')
+ labs(x='Sample size', y='', color='Sample\n size\n')
+ theme(
legend_title=element_text(size=8.5),
legend_title_align='center',
legend_box_spacing=0.025,
legend_key_height = 34,
legend_key_width = 9,
)
)
p