pythonpython-3.xautomationvirtualenvpython-venv

How can I efficiently set up Python virtual environments for 200+ student submissions?


I am working on a grading automation tool for programming assignments. Each student submission is run in its own isolated virtual environment (venv), and dependencies are installed from a requirements.txt file located in each submission folder.

I used subprocess.run([sys.executable, "-m", "venv", "submission_[studentID]/venv"]) for every single student submission. This is safe and works as expected, but it's very slow when processing 200+ submissions. I have also leveraged multiprocessing to create virtual environment in parallel but it also taking long time to finish.

Is there a safe and fast way to setup a virtual environment per student (possibly without modifying the original base environment)?

This is how I am creating venv for now:

class VenvManager:

    ## Initialise the folder and virtual environment path
    def __init__(self, folderPath: Path):
        self._folderPath = Path(folderPath).resolve()
        self._envPath = self._folderPath / "venv"
        self.requirements_path = self._folderPath / "requirements.txt"
        

    ## Create virtual environment in the submission directory as 'venv'
    def create_venv(self):

        if not self._envPath.exists():
            
            result = subprocess.run(
                [sys.executable, "-m", "venv", str(self._envPath)],
                capture_output= True,
                text=True
            )

            if result.returncode != 0:
                return False

            return True

        else:
            ...

in the main script:

def setup_virtualenvs(submissions_root: Path):

    submissions = [s for s in submissions_root.iterdir() if s.is_dir() and s.name.startswith("Portfolio")]

     def setup(sub):

         v = VenvManager(sub)
         v.create_venv()
         v.install_requirements()
         v.save_log()

     with ThreadPoolExecutor(max_workers=12) as executor:  
         futures = {executor.submit(setup, sub): sub.name for sub in submissions}
         for future in tqdm(as_completed(futures), total=len(futures), desc="Setting up venvs", unit="student"):
             student_name = futures[future]
             try:
                 future.result()
                 print(f"Finished setup for {student_name}")
             except Exception as e:
                 print(f" Error processing {student_name}: {e}")

Solution

  • My recommendation is to use uv instead of venv. uv is about an order of magnitude faster at creating envs and installing packages.

    By way of comparison, this is using venv and regular pip:

    Command time (s)
    python -m venv with-venv 11
    source with-venv/Scripts/activate
    pip install numpy 16 (download and install)
    deactivate
    python -v venv with-venv2 11
    source with-venv2/Scripts/activate
    pip install numpy 12 (cache hit)

    Doing the equivalent with uv is much faster:

    Command time (s)
    uv venv with-uv 0.3
    source with-uv/Scripts/activate
    uv pip install numpy 8 (download and install)
    deactivate
    uv venv with-uv2 0.3
    source with-uv/Scripts/activate
    uv pip install numpy 0.6 (cache hit)

    I ran each sequence twice to show the benefit of caching. With regular pip, there is some saving, but with uv, all operations are super fast, except the actual network download.

    I chose NumPy as an arbitrary example installation, but I have a project with lots of pretty big dependencies, and uv takes 30 seconds to install it the first time, 0.6 s when it finds it all in its caches for the second venv. Pip, on the other hand, needed 69 seconds in the first venv, and still needed 27 seconds in the second venv.

    If you use uv, I believe you won't feel the need to multithread, and it might be best not to anyway, because I expect uv pip install is already using parallelism internally.

    Official uv repo: https://github.com/astral-sh/uv and docs: https://docs.astral.sh/uv/