I have a build_requirments_file.py file, which builds a requirments.txt file for a given python program, but the thing is... It creates something like:
huggingface_hub==currently installed version\
pynput==currently installed version\
module==current version
But, how will I know in which "range" of versions will my code/programme work?
For example, how will I generate something like:
hugginface_hub>=versionx.x.x
OR
pynput<=versionx.x.x
OR\
pynput>versionx.x.x but <versionx.x.x (This means greater than versionx.x.x but less than versionx.x.x)
For that we usually need to install and test all versions of the requirements..........
I don't want to test so much, you need a whole team for testing, but what if you're just one-man-army??
Unfortunately there is no nice and clean way to actually do this. Why is that the case? Very time a package gets an updated version, all that really means is that its source code is in some way different. The more utility of the package that you use the more potential sensitivity you may have to different versions of the package. To further complicate this many package often have dependencies on other packages, that may mean that your code by itself may not be super sensitive to different versions of each package, but the functions you are knowingly or unknowingly invoking might actually be more sensitive to the versions of the packages it relies on than your own code itself! To answer your question, here are three ways to ascertain the version ranges:
1: RTFM; take a look at your code and see exactly which functions you are using from each package and look those functions up on the package's website. Most of the larger packages will keep change logs for each version and you should be able to see when the function you are using was introduced and if its signature has changed at some point. This method is relatively fast but has 2 drawbacks: your package might rely on compiled code from another language you do not know and thus cannot confidently ascertain, and not all packages have great change logs or even documentation so just because information is or is not present does not give you 100% confidence that something is working.
2: Test it yourself; the way to truly know the package versions that are acceptable for your code is if you test each one out by hand. This takes a while to do but I will share some tricks that I use to speed this up:
i) create a text file where you dump the package versions of successful tests (eg good_versions.txt
)
ii) create a new python file (eg version_range_test.py
) that follows some flow like
# version_range_test.py
# packages you need
import required_package_1
import required_package_2
...
import required_package_n
# import yours
import my_module
# run your module to see if it crashes with each version
result = my_module.my_main_func(...)
# record the expected resulting data structure into this file
expected_result = ...
# assert that the result and the expected result are identical
assert(result, expected_result)
# make a list of all of the packages that you need and their versions
package_version_list = [
f'required_package_1=={required_package_1.__version__}',
...,
f'required_package_1=={required_package_n.__version__}',
]
# write this package list to your file
with open ('good_versions.txt', ...) as f:
# add whatever formatting you want here
package_version_list = ...
# save this
f.write(package_version_list, ...)
Running the file in this way will first test that that your code (a) does not crash with the tested versions and (b) that the tested versions will produce the results you want them to. You can test a bunch of variations of inputs if you are concerned about edge cases. If either of these does not happen then your code will crash before it is able to write out the good versions. If you want to automate the reviewing of this file you can write out json data to a json file instead of writing out text to a text file.
iii) write a small bash command that you will copy and paste, this command will do 3 things in the following order... uninstall current package versions; install new package versions; run version_range_test.py
file. The pseudo-code for this will look something like
uninstall old versions:
/path/to/python.exe -m uninstall package required_package_1; /path/to/python.exe -m uninstall package required_package_n;
install new versions, the version numbers here are the only thing you need to edit when testing:
/path/to/python.exe -m install package required_package_1==#; /path/to/python.exe -m install package required_package_n==#;
run your code:
/path/to/python.exe /path/to/version_range_test.py
Putting this all together into a single line to edit:
/path/to/python.exe -m uninstall package required_package_1; /path/to/python.exe -m uninstall package required_package_n; /path/to/python.exe -m install package required_package_1==#; /path/to/python.exe -m install package required_package_n==#; /path/to/python.exe /path/to/version_range_test.py
You can either edit the version numbers by hand for every single test or write it all in a bash loop, if you do the bash loop then make sure that it will not exit if your test file crashes.
iv) After you have ran all of your tests then go review the good_versions.txt
file. If you have made it a text file you can just go look for the test that yielded the minimum and maximum versions. If you have made this a json file you can automate the searching of these much easier since it is already parseable. The automation of this can be done in a boiler plate python file.
3: Do both; you can RTFM to get a good idea of what versions you might want to start testing around, and then move onto only testing the versions you think may prove to be more sensitive in order to confirm. This reduces the testing time a lot.
Hope this helps!
Edit: Forgot to mention full scale automation of this:
You can automate everything described above in either a python or bash file, if you see this being a regular occurrence. For this purpose it would make more sense to save this the successful version tests into json format since it is already parseable, rather than writing your own text parser. Here is the pseudo code for what the python file (eg version_ranges_automation.py
) would look like:
# version_ranges_automation.py ; example of how to automate all of the above
import itertools
import json
import subprocess
python_interpreters = ['/path/to/python_a.exe', '/path/to/python_b.exe', ...]
package_names = ['required_package_1', ..., 'required_package_n',]
versions_lists = [['#1_1', '1_#2'... ], ... ['#n_1', 'n_#2'... ],]
test_filepath = "/path/to/version_range_test.py"
results_filepath = "/path/to/good_versions.json"
# concat the interpreters and package versions
all_combos = [python_interpreters] + versions_lists
# get a list of all the permutations of each interpreter and package version:
all_version_permutations = list(itertools.product(*all_combos))
# since the uninstall string is always the same it can be established here, if you are testing variations of python interpreters this will need to be placed here via the lambda function. since the interp is the first part of the list it is just pulled out here
uninstall_str_func = lambda x: "; ".join(f"{x[0]} uninstall {i} for i in package_names")
# you can initialize the install string as an anon function to save time making it in each loop, the package names and versions align so only the versions need to be pulled in, via x. This needs the interpreter names and versions
install_str_func = lambda x: "; ".join(f"{x[0]} install {i}=={j}" for i,j in zip(package_names, x[1:]))
# lastly you can make an anon function to run your test file:
run_test_str_func = lambda x: "; ".join(f"{x[0]} {test_filepath}")
# loop through all permutations
for i in all_version_permutations:
# construct the bash string for this specitic loop iteration
tmp_bash_str = "; ".join(x(i) for x in [
uninstall_str_func, install_str_func, run_test_str_func])
# run the test, use a try expect statement just in case, if you want to
try: subprocess.call(tmp_bash_str)
except: contine
# now that the good_versions.json file is complete it can be read in here. There is no best way to format it or define minimum and maximum ranges across a wide array of versions of multipe packages and interpreters. Feel free to use your own heuristics
results = json.load(results_filepath)
...