I'm trying to draw histogram using pyspark in Zeppelin notebook. Here is what I have tried so far,
%pyspark
import matplotlib.pyplot as plt
import pandas
...
x=dateDF.toPandas()["year(CAST(_c0 AS DATE))"].values.tolist()
y=dateDF.toPandas()["count(year(CAST(_c0 AS DATE)))"].values.tolist()
plt.plot(x,y)
plt.show()
This code run without no errors but this does not give the expected plot. So I googled and found this documantation,
According to this, I tried to enable angular flag as follows,
x=dateDF.toPandas()["year(CAST(_c0 AS DATE))"].values.tolist()
y=dateDF.toPandas()["count(year(CAST(_c0 AS DATE)))"].values.tolist()
plt.close()
z.configure_mpl(angular=True,close=False)
plt.plot(x,y)
plt.show()
But now I'm getting an error called No module named 'mpl_config'
and I have no idea how to enable angular without this. If you can suggest how to resolve this it will be greatly appriciated
In Zeppelin 0.10.0 I was able to plot a matplotlib plot as simply as this in a %pyspark
interpreter:
import matplotlib.pyplot as plt
x = list(range(10))
y = list(map(lambda x: x*25, x))
plt.close() # Close any existing plot when re-running this paragraph.
plt.xlabel('x', fontsize=20)
plt.ylabel('y', fontsize=20)
plt.grid()
plt.title('Inline plotting example', fontsize=20)
plt.plot(x,y)
plt.show()
Output in Zeppelin: