apache-sparkhadooppysparkapache-spark-sqlhdfs

how to check which HDFS datanode ip is returned by namenode to spark?


If I'm reading/writing a dataframe in PySpark specifying HDFS name node hostname and port:

 df.write.parquet("hdfs://namenode:8020/test/go", mode="overwrite")

Is there any way to debug which specific datanode(s) host/ports are returned to Spark by that namenode?


Solution

  • I only needed to set the Spark log level to debug.

    spark.sparkContext.setLogLevel("DEBUG")