pythonkeystorefiltered

Filtering two py2store stores with the same set of keys


In the following code, based on an example I found using py2store, I use with_key_filt to make two daccs (one with train data, the other with test data). I do get a filtered annots store, but the wfs store is not filtered. What am I doing wrong?

from py2store import cached_keys

class Dacc:
    """Waveform and annotation data access"""
    def __init__(self, wfs, annots, annot_to_tag=lambda x: x['tag']):
        self.wfs = wfs  # waveform store  (keys: filepaths, values: numpy arrays)
        self.annots = annots  # annotation store (keys: filepaths, values: dicts or pandas series)
        self.annot_to_tag = annot_to_tag  # function to compute a tag from an annotation item

    @classmethod
    def with_key_filt(cls, key_filt, wfs, annots, annot_to_tag, chunker):
        """
        Make an instance of the dacc class where the data is filtered out.
        You could also filter out externaly, but this can be convenient
        """
        filtered_annots = cached_keys(annots, keys_cache=key_filt)
        return cls(wfs, filtered_annots, annot_to_tag)

    def wf_tag_gen(self):
        """Generator of (wf, tag) tuples"""
        for k in self.annots:
            try:
                wf = self.wfs[k]
                annot = self.annots[k]
                yield wf, self.annot_to_tag(annot)
            except KeyError:
                pass

Solution

  • It seems the intent of with_key_filt seems to be to filter annots, which itself is used as the seed of the wg_tag_gen generator (and probably the other generators you didn't post). As such, it does indeed filter everything.

    But I do agree on your expectation that the wfs should be filtered as well. To achieve this, you just need to add one line to filter the wfs.

    class TheDaccYouWant(Dacc):
        @classmethod
        def with_key_filt(cls, key_filt, wfs, annots, annot_to_tag, chunker):
            filtered_annots = cached_keys(annots, keys_cache=key_filt)
            wfs = cached_keys(wfs, keys_cache=key_filt)  # here's what was added
            return cls(wfs, filtered_annots, annot_to_tag, chunker)