python-3.xgithubscipypackagecode-structure

How to best find code implementations in existing python projects


Different people have told me that in order to improve my Python programming skills, it helps to go and look how existing projects are implemented. But I am struggeling a bit to navigate through the projects and find the parts of the code I'm interested in.

Let's say I'm using butter of the scipy.signal package, and I want to know how it is implemented, so I'm going to scipy's github repo and move to the signal folder. Now, where is the first place I should start looking for the implementation of butter?

I am also a bit confused about what a module/package/class/function is. Is scipy a module? Or a package? And then what is signal? Is there some kind of pattern like module.class.function? (Or another example: matplotlib.pyplot...)


Solution

  • It sounds like you have two questions here. First, how do you find where scipy.signal.butter is implemented? Second, what are the different hierarchical units of Python code (and how do they relate to that butter thing)?

    The first one actually has an easy solution. If you follow the link you gave for the butter function, you will see a [source] link just to the right of the function signature. Clicking on that will take you directly to the source of the function in the github repository (pinned to the commit that matches the version of the docs you were reading, which is probably what you want). Not all API documentation will have that kind of link, but when it does it makes things really easy!

    As for the second question, I'm not going to fully explain each level, but here are some broad strokes, starting with the most narrow way of organizing code and moving to the more broad ways.

    1. Functions are reusable chunks of code that you can call from other code. Functions have a local namespace when they are running.

    2. Classes are ways of organizing data together with one or more functions. Functions defined in classes are called methods (but not all functions need to be in a class). Classes have a class namespace, and each instance of a class also has its own instance namespace.

    3. Modules are groups of code, often functions or methods (but sometimes other stuff like data too). Each module has a global namespace. Generally speaking, each .py file will create a module when it is loaded. One module can access another module by using an import statement.

    4. Packages are a special kind of module that's defined by a folder foo/, rather than a foo.py file. This lets you organize whole groups of modules, rather than everything being at the same level. Packages can have further sub-packages (represented with nested folders like foo/bar/). In addition to the modules and subpackages that can be imported, a package will also have its own regular module namespace, which will be populated by running the foo/__init__.py file.

    To bring this back around to your specific question, in your case, scipy is a top-level package, and scipy.signal is a sub-package within it. The name butter is a function, but it's actually defined in the scipy/signal/_filter_design.py file. You can access it directly from scipy.signal because scipy/signal/__init__.py imports it (and all the other names defined in its module) with from ._filter_design import * (see here).

    The design of implementing something in an inner module and then importing it for use in the package's __init__.py file is a pretty common one. It helps modules that would be excessively large to be subdivided, for ease of their developers, while still having a single place to access a big chuck of the API. It is, however, very confusing to work out for yourself, so don't feel bad if you couldn't figure it out yourself. Sometimes you may need to search the repository to find the definition of something, even if you know where you're importing it from.