apache-sparkdataframeapache-spark-sqlmining

Is it a good practice to incur Spark show() method in production Spark job?


Using DataFrame.show() API, we can take a glance about the underlying data.

Is it good to use this method in production spark job?

Basically, I know we can comment this kind of code before launching the job, but if we just keep it, is it a good practice?
Or it will cause performance issue?


Solution

  • The show() command is an action.

    Adding unnecessary action to the code, might disturb Spark optimizer, as the optimizer can change the order of the transformation, but should trigger an action every time their is an action.
    i.e. Using unnecessary action limits the optimizer work.

    See Actions vs Transformations