I'm writing functions for a Python package to register files from a file system to an SQL database, and GitHub's CodeQL has flagged that the file paths are a potential security risk.
I have constructed a basic validator to make sure that only file paths that exist and start with a given base path when resolved are accepted for subsequent processing. However, CodeQL still views this as being insufficient.
Is this a false positive, or can my validation check be further enhanced? Note that I use resolve
so that I can get and compare the start of the file paths.
Example Code
# Python version: 3.9.19
# OS: GNU/Linux RHEL8 4.18.0-553.5.1.el8_10.x86_64
from pathlib import Path
# The storage path is defined internally by an environment variable; here is a placeholder
storage_path = Path("/path/to/where/files/are/stored")
# Define a function to validate the file path provided
def validate_file(file: Path) -> bool:
file = Path(file) if isinstance(file, str) else file # Pre-empt accidental string inputs
file = file.resolve() # Get full path for validation
# Fail if file doesn't exist
if not file.exists():
return False
# Use path to storage location as reference
basepath = list(storage_path.parents)[-2] # This can be made stricter eventually
if str(file).startswith(str(basepath)):
return True
else:
return False
# How it's used in the script
incoming_file = Path("some_file.txt") # Can be partial path, or full path on system
if validate_file(incoming_file) is True:
"""
Run code here to store the file in the database
"""
return True
else:
raise Exception("This file failed the validation check")
Firstly I'm not 100% sure that Path.resolve
is seen by CodeQL as a sanitizing function - so that might be a problem. I use os.path.normpath
.
The main problem from a CodeQL point of view is that you're not sanitizing incoming_file
. Yes you're validating it, and not using it if it is invalid, but CodeQL can't detect that. CodeQL needs to see something like:
def sanitize(filepath):
fullpath = os.path.normpath(filepath)
if not fullpath.startswith(storage_path):
raise Exception("not allowed")
return fullpath
sanitized_file = sanitize(incoming_file)
# do something with the file or handle the exception
Otherwise I think your validation looks ok, although I would check that the path starts with the full storage path.