I am new to PySpark and am trying to run a simple code block which involves creating a Spark Session:
from pyspark.sql import *
if __name__ == "__main__":
spark = SparkSession \
.builder \
.appName("HelloSpark") \
.master("local[2]") \
.getOrCreate()
When I try to run the code block, I receive the following error:
Traceback (most recent call last):
File "/Users/aj24/Documents/spark_code/spark_test_bed/logging_in_spark/logging.py", line 1, in <module>
from pyspark.sql import *
File "/Users/aj24/Documents/spark_code/spark_test_bed/venv/lib/python3.12/site-packages/pyspark/__init__.py", line 58, in <module>
from pyspark.conf import SparkConf
File "/Users/aj24/Documents/spark_code/spark_test_bed/venv/lib/python3.12/site-packages/pyspark/conf.py", line 23, in <module>
from py4j.java_gateway import JVMView, JavaObject
File "/Users/aj24/Documents/spark_code/spark_test_bed/venv/lib/python3.12/site-packages/py4j/java_gateway.py", line 16, in <module>
import logging
File "/Users/aj24/Documents/spark_code/spark_test_bed/logging_in_spark/logging.py", line 1, in <module>
from pyspark.sql import *
File "/Users/aj24/Documents/spark_code/spark_test_bed/venv/lib/python3.12/site-packages/pyspark/sql/__init__.py", line 42, in <module>
from pyspark.sql.types import Row
File "/Users/aj24/Documents/spark_code/spark_test_bed/venv/lib/python3.12/site-packages/pyspark/sql/types.py", line 49, in <module>
from py4j.java_gateway import GatewayClient, JavaClass, JavaGateway, JavaObject, JVMView
ImportError: cannot import name 'GatewayClient' from partially initialized module 'py4j.java_gateway' (most likely due to a circular import) (/Users/aj24/Documents/spark_code/spark_test_bed/venv/lib/python3.12/site-packages/py4j/java_gateway.py)
I have a brew installation of Python3.12 and am using a virtual environment to run the code, on VSCode. My interpreter also points to that of the virtual environment.
Any ideas what might cause the circular import issue?
It could be because from a conflict between the name of your Python file logging.py
and the Python standard library logging module. When you attempt to import pyspark.sql (which internally imports logging), Python gets confused and ends up importing your logging.py file instead of the standard library logging module. This results in an incomplete or broken import, causing the circular import error.
Try a few steps and check if it works:
1.Rename the file logging.py to something else (e.g., spark_logging.py)
2.Delete the .pyc files and pycache: After renaming the file, delete any logging.pyc files and the pycache directory in your project (if they exist). You can usually find them in the same directory as logging.py.
3.Rerun and check.
Update the import statement if you use logging in other parts of your code: python
import logging # standard logging module # you can use new name