visual-studio-codepyspark

Why does my Spark session terminate automatically in Visual Studio Code


I'm trying to run a Spark session in Visual Studio Code with a simple Python script like this:

from pyspark.sql import SparkSession
from pyspark.sql.types import *


spark = SparkSession.builder \
    .appName("test") \
    .getOrCreate()
    
spark.sparkContext.setLogLevel("DEBUG")

The issue is that when I run this script in VS Code, the Spark session automatically terminates right after it starts, even though I haven’t called spark.stop(). The log shows that the Spark process stops with exitCode 0, and all related resources (SparkContext, Spark UI) are shut down.

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/04 01:26:56 DEBUG PythonGatewayServer: Exiting due to broken pipe from Python driver
24/11/04 01:26:56 INFO SparkContext: Invoking stop() from shutdown hook
24/11/04 01:26:56 INFO SparkContext: SparkContext is stopping with exitCode 0.
PS E:\graduation_project> 24/11/04 01:26:56 INFO SparkUI: Stopped Spark web UI at http://DESKTOP-CCJBLEA:4040
24/11/04 01:26:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
24/11/04 01:26:56 INFO MemoryStore: MemoryStore cleared
24/11/04 01:26:56 INFO BlockManager: BlockManager stopped
24/11/04 01:26:56 INFO BlockManagerMaster: BlockManagerMaster stopped
24/11/04 01:26:56 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
24/11/04 01:26:56 INFO SparkContext: Successfully stopped SparkContext
24/11/04 01:26:56 INFO ShutdownHookManager: Shutdown hook called
24/11/04 01:26:56 INFO ShutdownHookManager: Deleting directory C:\Users\Admin\AppData\Local\Temp\spark-cbe85e4c-a72a-4c3f-96b0-57564457230a\pyspark-37a42a15-2131-4bc3-a23c-e73f551df5ef
24/11/04 01:26:56 INFO ShutdownHookManager: Deleting directory C:\Users\Admin\AppData\Local\Temp\spark-cbe85e4c-a72a-4c3f-96b0-57564457230a
24/11/04 01:26:56 INFO ShutdownHookManager: Deleting directory C:\Users\Admin\AppData\Local\Temp\spark-d13033f1-7d8e-4e90-aadd-3a6fc6a3e502
24/11/04 01:26:56 DEBUG ShutdownHookManager: Completed shutdown in 0.146 seconds; Timeouts: 0
SUCCESS: The process with PID 9836 (child process of PID 15948) has been terminated.
SUCCESS: The process with PID 15948 (child process of PID 4180) has been terminated.
SUCCESS: The process with PID 4180 (child process of PID 3768) has been terminated.

But i test on cmd, everything is ok:

C:\Users\Admin>pyspark
Python 3.11.8 (tags/v3.11.8:db85d51, Feb  6 2024, 22:03:32) [MSC v.1937 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/11/04 07:28:11 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.3
      /_/

Using Python version 3.11.8 (tags/v3.11.8:db85d51, Feb  6 2024 22:03:32)
Spark context Web UI available at http://DESKTOP-CCJBLEA.mshome.net:4041
Spark context available as 'sc' (master = local[*], app id = local-1730680091248).
SparkSession available as 'spark'.
>>>

I'm using Spark 3.5.3, JDK 17, hadoop 3.3.5

Adding tasks makes it maintain UI activities in port 4040, but in the Terminal box does not show the Spark icon:

Code:

from pyspark.sql import SparkSession
from pyspark.sql.types import *


spark = SparkSession.builder \
    .appName("test") \
    .getOrCreate()
    
input("Press Enter to exit...")
    
spark.sparkContext.setLogLevel("DEBUG")

Result:

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Press Enter to exit...

Solution

  • You can start/stop sparkContext, not sparkSession.

    Additionally, you are running a python script, inside which you are calling your sparkSession. When EOF is reached, sparkSession terminates automatically. You are trying to recreate terminal line behavior, which is not possible in this case.

    Please let me know in case of any doubts.