pythonbashstreamlitdvc

Shell script “dvc pull” not working at Streamlit server


In my Streamlit app.py file, I used the code os.system("dvc pull") to load a .csv data file (labeled_projects.csv) from my Google service account (Google Drive), and it has been working well since I deployed it a few months ago. The code itself is loaded from my GitHub account.

But it appears that the code suddenly stopped working and I got the error message FileNotFoundError: [Errno 2] No such file or directory: '/mount/src/mlops/data/labeled_projects.csv'.

The Streamlit server provides no error message regarding the execution of os.system("dvc pull").

Attempting to replace os.system("dvc pull") by using the tempfile package to create a .sh file and executing it using the subprocess package does not help. Got the same FileNotFoundError message with no error message about dvc pull.

Also, executing the command find . -name 'labeled_projects.csv' at the streamlit server could not find any matching return, which seems to indicate that the file is not downloaded.

The code dvc pull in the Stremlit app.py file works fine if executed locally.

Thanks for your help!


Solution

  • Thanks, @pmrowla and @ruslan-kuprieiev for the feedback.

    First I updated the dvc to version 3.14.0. Then I found out that, one issue in my case is in fact, with a simple Shell script of dvc pull within the Streamlit app.py file, the correct executable of dvc that is installed into the Streamlit server from the requirements.txt file can not be successfully reached.

    In the following, I just quote the answer I got from ChatGPT 4 for your information. The code in the quote resolved my problem and the .csv data file from my Google Drive was successfully loaded.

    (start quote) "To run dvc pull (or any other command that requires a Python package installed in the virtual environment) within a Streamlit app deployed on Streamlit sharing, you'll need to take several steps.

    1. Install the Required Packages: Ensure you have dvc listed in your requirements.txt file in your GitHub repository. When you deploy the app on Streamlit sharing, it will automatically install the packages listed in requirements.txt.

    2. Use Python's sys.executable: You can use Python's sys.executable to get the path to the Python interpreter. This will help ensure you're calling the right Python environment where dvc is installed.

    3. Run the Command from Python: Use subprocess to run the dvc pull command from within your Streamlit app.

    Here's an example:

    import sys
    import subprocess
    import streamlit as st
    
    def pull_data_with_dvc():
        cmd = [sys.executable, "-m", "dvc", "pull"]
        result = subprocess.run(cmd, capture_output=True, text=True)
        if result.returncode == 0:
            st.write("Data pulled successfully!")
            st.write(result.stdout)
        else:
            st.write("Error pulling data!")
            st.write(result.stderr)
    
    # Use this function somewhere in your Streamlit app.
    pull_data_with_dvc()
    

    Remember:

    Lastly, note that Streamlit sharing has some limitations in terms of storage and resources. If you're pulling a large amount of data, you might hit those limits. Always review Streamlit's documentation and limitations for the latest details." (end quote)