pythonyamlpython-import

python import yaml - but which one?


Ok, so you clone a repo, there's an import

import yaml

ok, so you do pip install yaml and you get:

ERROR: No matching distribution found for yaml

Ok, so you look for a package with yaml in it, and there's like a gazillion of them... usually adding py in front does the job, but...

How on earth should I know which one was used?!

And it's not just yaml, oh no... there's:

import cv2 # python-opencv

import PIL # Pillow

and the list goes on and on...

How can I know which import uses which package? Shouldn't there be a PEP for this? Or a naming convention, e.g. import is always the same as the package name?

There's a similar topic here, if you're not frustrated enough :)


Solution

  • [When I clone a repo,] How can I know which import uses which package?

    In short: it is the cloned code's responsibility to explain this, and it is an expected courtesy that the cloned code includes an installer that will take care of it.

    If this is just some random person's bundle of .py files on GitHub with no installation instructions, look for notes in the associated documentation; failing that, make an issue on the tracker. (Or just give up. Maybe look for a better-engineered project that does the same thing.)

    However, most "serious", contemporary Python projects are meant to be installed by using some form of packaging system. These have evolved over the years, and best practices have changed many times; but generally speaking, a properly "packaged" and "distributed" project will have either a setup.py or (newer; better in many ways, but not universally adopted yet) pyproject.toml file at the top level.

    A pyproject.toml file is a config file in TOML format that simply describes a bunch of project metadata. This requires a build backend conforming to PEP 517. For a while, this required third-party tools, such as Poetry; but the standard setuptools can handle this since version 40.8.0. (As of this writing, the current release is 65.7.0.)

    A setup.py script is executable code that pip will invoke after downloading a package from PyPI (or another package index). Generally, this script will use either setuptools or distutils (the predecessor to setuptools; it has finally been officially deprecated in 3.10, and will be removed in 3.12) to install the project, by calling a function named setup and passing it a big dict with some project metadata.

    Security warning: this file is still executable code. It is arbitrary code, and it doesn't have to be following the standard conventions. Also, the package that is actually downloaded from PyPI doesn't necessarily match the project's source shown on GitHub (or another Git provisioning website), if such is even available. (This problem also affects package managers in other languages and ecosystems, notably npm for Javascript.)

    With the setup.py based approach, package dependencies are specified using a keyword argument to the setup function. The specification has changed many times; currently, projects still using a setup.py should use the install_requires keyword argument.

    With the pyproject.toml based approach, using setuptools' backend, dependencies will be an array (using JSON terminology, as TOML is a superset) stored under project.dependencies. This will vary for other backends; for example, Poetry expects this information under tool.poetry.dependencies.

    In any event, pip freeze will output a list of what's installed in the current environment. It's a somewhat common practice for developers to test the code in a virtual environment where the dependencies are installed, dump this output to a requirements.txt file, and include that as documentation.

    [When I want to use a third-party library in my own code,] How can I know which import uses which package?

    It's worth considering the question the other way around, too: given that we have installed OpenCV for Python using pip install opencv-python, and want to use it in our own code, how do we know to import cv2 specifically?

    The answer: there is no convention, and certainly no requirement for the installed package name to match the PyPI name, nor the GitHub etc. repository name. Read the documentation. Everyone who intends for their code to be used as a library, will be more than willing to show how, on at least a basic level.