pythoncodeqlfile-inclusion

Validating file paths to satisfy GitHub CodeQL's "Uncontrolled data used in path expression" alert


I'm writing functions for a Python package to register files from a file system to an SQL database, and GitHub's CodeQL has flagged that the file paths are a potential security risk.

I have constructed a basic validator to make sure that only file paths that exist and start with a given base path when resolved are accepted for subsequent processing. However, CodeQL still views this as being insufficient.

Is this a false positive, or can my validation check be further enhanced? Note that I use resolve so that I can get and compare the start of the file paths.

Example Code

# Python version: 3.9.19
# OS: GNU/Linux RHEL8 4.18.0-553.5.1.el8_10.x86_64
from pathlib import Path

# The storage path is defined internally by an environment variable; here is a placeholder
storage_path = Path("/path/to/where/files/are/stored")

# Define a function to validate the file path provided
def validate_file(file: Path) -> bool:
    file = Path(file) if isinstance(file, str) else file  # Pre-empt accidental string inputs
    file = file.resolve()  # Get full path for validation

    # Fail if file doesn't exist
    if not file.exists():
        return False

    # Use path to storage location as reference
    basepath = list(storage_path.parents)[-2]  # This can be made stricter eventually
    if str(file).startswith(str(basepath)):
        return True
    else:
        return False

# How it's used in the script
incoming_file = Path("some_file.txt")  # Can be partial path, or full path on system

if validate_file(incoming_file) is True:
    """
    Run code here to store the file in the database
    """
    return True
else:
    raise Exception("This file failed the validation check")

Solution

  • Firstly I'm not 100% sure that Path.resolve is seen by CodeQL as a sanitizing function - so that might be a problem. I use os.path.normpath.

    The main problem from a CodeQL point of view is that you're not sanitizing incoming_file. Yes you're validating it, and not using it if it is invalid, but CodeQL can't detect that. CodeQL needs to see something like:

    def sanitize(filepath):
        fullpath = os.path.normpath(filepath)
        if not fullpath.startswith(storage_path):
            raise Exception("not allowed")
        return fullpath
    
    sanitized_file = sanitize(incoming_file)
    # do something with the file or handle the exception
    
    

    Otherwise I think your validation looks ok, although I would check that the path starts with the full storage path.