apache-sparkrddcloudera-cdh

Spark cache RDD don't show up on Spark History WebUI - Storage


I am using Spark-1.4.1 in CDH-5.4.4.

I use rdd.cache() function but it show nothing in Storage tab on Spark History WebUI

Does anyone has the same issues? How to fix it?


Solution

  • Your RDD will only be cached once its been evaluated, the most common way to force evaluation (and therefor populate the cache) is to call count e.g:

    rdd.cache() // Nothing in storage page yet & nothing cached
    rdd.count() // RDD evaluated, cached & in storage page.