I am working on a grading automation tool for programming assignments. Each student submission is run in its own isolated virtual environment (venv), and dependencies are installed from a requirements.txt file located in each submission folder.
I used subprocess.run([sys.executable, "-m", "venv", "submission_[studentID]/venv"])
for every single student submission. This is safe and works as expected, but it's very slow when processing 200+ submissions. I have also leveraged multiprocessing to create virtual environment in parallel but it also taking long time to finish.
Is there a safe and fast way to setup a virtual environment per student (possibly without modifying the original base environment)?
This is how I am creating venv
for now:
class VenvManager:
## Initialise the folder and virtual environment path
def __init__(self, folderPath: Path):
self._folderPath = Path(folderPath).resolve()
self._envPath = self._folderPath / "venv"
self.requirements_path = self._folderPath / "requirements.txt"
## Create virtual environment in the submission directory as 'venv'
def create_venv(self):
if not self._envPath.exists():
result = subprocess.run(
[sys.executable, "-m", "venv", str(self._envPath)],
capture_output= True,
text=True
)
if result.returncode != 0:
return False
return True
else:
...
in the main script:
def setup_virtualenvs(submissions_root: Path):
submissions = [s for s in submissions_root.iterdir() if s.is_dir() and s.name.startswith("Portfolio")]
def setup(sub):
v = VenvManager(sub)
v.create_venv()
v.install_requirements()
v.save_log()
with ThreadPoolExecutor(max_workers=12) as executor:
futures = {executor.submit(setup, sub): sub.name for sub in submissions}
for future in tqdm(as_completed(futures), total=len(futures), desc="Setting up venvs", unit="student"):
student_name = futures[future]
try:
future.result()
print(f"Finished setup for {student_name}")
except Exception as e:
print(f" Error processing {student_name}: {e}")
My recommendation is to use uv
instead of venv
. uv is about an order of magnitude faster at creating envs and installing packages.
By way of comparison, this is using venv
and regular pip
:
Command | time (s) |
---|---|
python -m venv with-venv |
11 |
source with-venv/Scripts/activate |
|
pip install numpy |
16 (download and install) |
deactivate |
|
python -v venv with-venv2 |
11 |
source with-venv2/Scripts/activate |
|
pip install numpy |
12 (cache hit) |
Doing the equivalent with uv
is much faster:
Command | time (s) |
---|---|
uv venv with-uv |
0.3 |
source with-uv/Scripts/activate |
|
uv pip install numpy |
8 (download and install) |
deactivate |
|
uv venv with-uv2 |
0.3 |
source with-uv/Scripts/activate |
|
uv pip install numpy |
0.6 (cache hit) |
I ran each sequence twice to show the benefit of caching. With regular pip, there is some saving, but with uv, all operations are super fast, except the actual network download.
I chose NumPy as an arbitrary example installation, but I have a project with lots of pretty big dependencies, and uv takes 30 seconds to install it the first time, 0.6 s when it finds it all in its caches for the second venv. Pip, on the other hand, needed 69 seconds in the first venv, and still needed 27 seconds in the second venv.
If you use uv
, I believe you won't feel the need to multithread, and it might be best not to anyway, because I expect uv pip install
is already using parallelism internally.
Official uv
repo: https://github.com/astral-sh/uv and docs: https://docs.astral.sh/uv/