I have an issue where I need to link certain sample names to each other, the problem however is that the sample names which I want to match are a little bit different from the keys in a dictionary I have from which I need to get the correct value.
Example:
sample = "foo_foo.bar.12"
matching_dict = {"foo_foo-bar-12": "foo.bar.12"}
I have about 5500 samples, each with a different type of arrangement, so not every sample looks like the example I gave.
Ideally I want a dynamic way of comparing the 2 strings with each other and then get the value from the dict if they are most alike.
You could use Levenshtein distance. This measures how similar two strings are to each other. There is a very easy python library for it called python-levenshtein
. With this you could compare your sample
to all the values in the dictionary, and calculate which value in the dict has the lowest Levenshtein distance.