I have a shared Python code base, and I'm responsible for code that others depend on. I need to move modules from one subpackage to another directory/package to reorganize it. How do I do that in the safest way?
If I just move the code, I've got to worry about others who use it who may not have redirected their imports of it. If it's moved and the users of the code don't change their imports, their code will unexpectedly fail when the imports fail.
How can I ensure a seamless transition? Do I just copy the code and leave the old code in place until the imports have been changed? Are there any caveats to be aware of? What if I use import *
in conjunction with an __all__
? In what case would I have to support imports from the old location indefinitely?
I was recently asked to move code from one subpackage to another at work, and the approach I used didn't seem immediately obvious to other developers, so I'm documenting it here for others.
I don't recommend leaving a copy at the old place. If you have two copies of the same script, one is likely to change without the other. Instead, I recommend the following multi-step process.
The first step involves two parts that can be implemented simultaneously if you control both locations for the code.
First, move the file from the old place to the new place under version control. I use a simplified interface to CVS so it was a version control copy. In most other version control systems (like mercurial, subversion, and git), you should use mv
to move the file, e.g. with git:
git mv /location/old/script.py /location/new/script.py
Important:
Don't forget to move the unittests too, and also move __init__.py
's if there's code in old ones that need to be kept. Otherwise, just make sure to commit __init__.py
's there if they're not already in place or
Next, in the place of the old code, import all the names from the new location,
so in /location/old/script.py
:
from location.new.script import *
and leave a comment explaining why this is needed, and commit the change to version control. If you moved the __init__.py
, just make sure to commit a new empty __init__.py
.
There's a major caveat here. import *
is affected by __all__
. If you have an __all__
declared, you've got two approaches to providing the missing names. You can import them all explicitly:
from location.new.script import *
# names not in the new.script's __all__:
from location.new.script import foo, bar, baz
or you can delete the file and instead import the module in the __init__.py
, and add the path to sys.modules like this:
from location.new import script
import sys
sys.modules['location.old.script'] = script
This code will initialize the package and add the module to sys.modules
just in time for it to be looked up there by the importer. This is the same way that os.path
is created in the Python source. Most people would shy away from modifying sys.modules
, however. In fact, I hesitate to leave this suggestion here, and I would not if it were not in the Python standard library.
These two parts can together be pushed into production, and the move has been seamlessly implemented. If you have no control over users of your code, this may need to remain in place indefinitely for backwards compatibility.
Optional: I would then delete the old script at head (just at head, don't push it yet!) so that other developers can see the change coming and address the change in a timely fashion.
If you can do a regular expression search of all code that depends on your code, I recommend searching the code for the following regular expression:
(import|from).*location\.old.*script
If you are on Unix (or have Cygwin) you can do a regex search for it:
grep -rEe "(import|from).*location\.old.*script" .
Or most IDEs have regex search.
If you do have control over the code that uses it, or you have a view on others that use it, it's fairly straightforward to change the imports from the old to the new, e.g. from :
import location.old.script
to
import location.new.script
and from
from location.old import script
to
from location.new import script
And so on.
Important:
All of these changes need to be implemented and released to production. If any production installations remain without these done, if you delete the old location, they will fail.
This is the dangerous step. If you've missed any users/importers, their code will fail until they get their import fixes into production. You may choose to postpone this step indefinitely, but my preference is to get it done in a timely fashion if I can prove all of the changes have been pushed into production.
If you deleted it at head immediately after making the change so that others could see the change coming in development, you may have less to worry about.
Still, do not delete this until you can prove that no other users are referencing the old package location in production. If you can't prove it, don't delete it.