pythonhomebrewpython-packaging

Setting up a Python project for packaging and distribution via Homebrew


I'm trying to create a Homebrew formula for a Python project.

Here's the Homebrew formula:

class Scanman < Formula
  include Language::Python::Virtualenv

  desc "Using LLMs to interact with man pages"
  url "https://github.com/nikhilkmr300/scanman/archive/refs/tags/1.0.1.tar.gz"
  sha256 "93658e02082e9045b8a49628e7eec2e9463cb72b0e0e9f5040ff5d69f0ba06c8"

  depends_on "python@3.11"

  def install
    virtualenv_install_with_resources
    bin.install "scanman"
  end

  test do
    # Simply run the program
    system "#{bin}/scanman"
  end
end

Upon running the application with the installed scanman version, it fails to locate my custom modules housed within the src directory.

ModuleNotFoundError: No module named 'src'

Any insights into why this is happening?

Here's my directory structure if that helps:

.
├── requirements.txt
├── scanman
├── scanman.rb
├── setup.py
└── src
    ├── __init__.py
    ├── cli.py
    ├── commands.py
    ├── manpage.py
    ├── rag.py
    └── state.py

The main executable is scanman. It's a Python script that lets you interact with man pages using an LLM.

It's worth noting the following:


Solution

  • I took a look at the repo and saw that scanman itself is a Python script.

    You actually have two problems:

    1. You set packages=find_packages(where="src") in setup.py, which tells Setuptools (the program that reads setup.py and builds your package) to install everything inside of src. Therefore your modules cli.py, commands.py, etc. will all end up installed as top-level modules, which you'd access as import cli, import commands. That's probably not what you want.
    2. As discussed in the comments, your shebang #!/usr/bin/env python3 is at the mercy of whatever is in PATH, which might not even be managed by Homebrew, let alone the specific venv created by Homebrew for your package.

    I elaborated on some more packaging issues and a possible simple resolution in the comments. If you're using an LLM for this work, use a different one, because it's giving you bad advice. You should not make a Pip-installable package called src. You should strive to make it so that Python can find your code without any additional env vars or other messing around. You also don't want to rewrite your code. My suggestions here are meant to achieve all of those things.

    It looks like you are already attempting to make a Pip-installable package, so rather than messing around trying to make your current setup work, I will illustrate how to do that in a way that doesn't make a mess, and follows standard longstanding conventions.

    Your file structure will look like this:

    ./
      Formula/
        scanman.rb
      .gitignore
      requirements.txt
      setup.py
      src/
        scanman/
          __init__.py
          __main__.py
          cli.py
          commands.py
          manpage.py
          rag.py
          state.py
    

    The absence of the top-level scanman script is deliberate. Keep reading.

    And your setup.py becomes:

    from setuptools import setup, find_packages
    
    setup(
        name="scanman",
        description="Using LLMs to interact with man pages",
        url="https://github.com/nikhilkmr300/scanman",
        author="Nikhil Kumar",
        author_email="nikhilkmr300@gmail.com",
        license="MIT",
        packages=find_packages(where="src"),
        install_requires=[
            "faiss-cpu",
            "langchain",
            "langchain-openai",
            "langchainhub",
            "openai",
            "termcolor"
        ],
        entry_points={
            "console_scripts": [
                "scanman = scanman.__main__:main",
            ]
        },
    )
    

    Note that this is almost identical to what you had before.

    Then:

    1. Change all of your imports with src.cli, src.manpage etc. to scanman.cli, scanman.manpage.
    2. Move the contents of your scanman script to a function inside the script src/scanman/__main__.py, and remove the shebang line. . Run pip install --editable . in your developer environment (ideally a venv!). This only needs to be run once, you do not need to re-run this when you update your code.

    Your resulting src/scanman/__main__.py script will look like this:

    import argparse
    import logging
    import os
    import readline
    import sys
    
    from langchain.memory import ConversationBufferWindowMemory
    from termcolor import colored
    
    from scanman.cli import prompt
    from scanman.commands import Command
    from scanman.manpage import Manpage
    from scanman.rag import ERROR_MSG, ask, load_retriever
    from scanman.state import State
    
    
    def main():
        logger = logging.getLogger(__name__)
        logging.basicConfig(level=logging.ERROR)
    
        ...  # all the stuff that was under the if-name-main block
    
    
    if __name__ == "__main__":
        main()
    

    Note that you don't need both logging.basicConfig(level=logging.ERROR) and logger.setLevel(logging.ERROR). Log levels are hierarchical; whatever you set on the root logger (which is what logging.basicConfig(level=...) does) will propagate to all other loggers, unless you explicitly set a different level on those other loggers.

    As a test, you should be able to run both python -m scanman and scanman using your developer environment:

    # Create a fresh venv and install into it:
    /opt/homebrew/bin/python3 -m venv ./venv-testing
    ./venv-testing/bin/pip install -r requirements.txt .
    
    # Both should now work:
    ./venv-testing/bin/scanman
    ./venv-testing/bin/python -m scanman
    

    The reason that we like the src/ layout is to hide your source code from the Python search path, so that you are 100% sure you are only running the version of your code that is installed in your venv, not whatever you happen to have in your Git repo at the time you run your code. This improves the chances that your code will work on other people's machines.

    Finally, I suggested moving the Homebrew formula to the Formula/ directory because that's standard in Homebrew, and it makes clear that scanman.rb is a Homebrew formula, rather than another project script.


    There are also some things you did correctly in your repo that I want to call out!

    First, you only set your logging config in the main script, not in the library code. That's good, because you shouldn't be trying to modify global application state from inside the library code.

    It also looks like you're using pip-compile or pip freeze to emit the requirements.txt. That's a great idea. Keep doing that! Sadly I don't know how to get Homebrew to actually install dependencies from that file, as opposed to just using your declared package deps in setup.py with full dependency resolution. You might need to ask in the Homebrew forum for that. Ideally, Homebrew would run something like this:

    $VENV_DIR/bin/pip install -r requirements.txt ./
    

    But that would require it to clone your repo before installing, rather than simply installing directly from Github via Pip.