pythonvisual-studio-codepysparkmodule

I need to import a function from a file in vscode using python but "module not found error" error happening


I have a workspace with different projects as shown below.

enter image description here

I have a code in my main_scripts.py which is under main_scripts sub folder that needs to call a function inside the file config_reader.py which is inside the folder user_functions.

testing_framework is my current working directory with pyspark_training as my root project.

My main_script.py looks like this

enter image description here

and my config_reader.py file like this:

enter image description here

I tried to create a dev.env file into the main pyspark_training folder: enter image description here

and also i tried modifying the setting for pyspark_training but i am not too sure whether this is correct.

enter image description here

But still getting : ModuleNotFoundError: No module named 'user_functions'

can anyone help me to solve this? I have gone through a bunch of stack verflow topics covering the issue but to no use. I am still getting the same error.


Solution

  • This is the working version of the problem. The problem was with the cwd which was mentioned in the comments by furas. I am pasting the working version for future reference.

    Short summary:
    want to access: testing_framework > user_functions > config_reader.py

    from: testing_framework > main_scripts > main_script.py

    (Note: Folder structure is given in the question above )

    main_script.py

    import os, sys 
    import pyspark
    from pyspark.sql import SparkSession
    
    BASE = os.path.dirname(os.path.abspath(__file__)) # added the base path which points to where the script is currently
    parent_folder = os.path.join(BASE, "..") # joined with the base path
    
    sys.path.append(parent_folder) # used the parent_path to point to the cwd.
    from user_functions import config_reader
    
    spark = SparkSession.builder.appName('validation').master("local").getOrCreate()
    
    configs = config_reader.read_config(spark, config_folder, config_file)
    
    print(configs)
    

    config_reader.py

    import pyspark
    from pyspark.sql import SparkSession
    import os
    
    def read_config(spark: SparkSession, config_folder, config_file):
        full_path = os.path.join(config_folder, config_file)
        
        if config_file.endswith('.csv'):
            df_config = spark.read.format('csv') \
                    .option('header', True) \
                    .load(full_path)
        
        return df_config.collect()