I am working on this notebook. https://databricks.com/notebooks/simple-aws/petastorm-spark-converter-pytorch.html
I tried running the first line
df = spark.read.parquet("/databricks-datasets/flowers/parquet") \
.select(col("content"), col("label_index")) \
.limit(1000)
However I got this error
Path does not exist: dbfs:/databricks-datasets/flowers/parquet;
I am wondering where I can find the parquet version of the flowers dataset on databricks. FYI I am working on the community edition.
This dataset was converted into Delta format, so path right now is /databricks-datasets/flowers/delta
, instead of /databricks-datasets/flowers/parquet
, and you need to read it with the corresponding code:
df = spark.read.format('delta').load('/databricks-datasets/flowers/delta')
P.S. You can always use %fs ls path
command to see what files are at given path
P.P.S. I'll ask to fix that notebook if it's possible