pythonarrayslistshingles

Fastest way to compare arrays


Extending from this question, I need the fastest possible solution to this:

Given the following:

m=['abc','bcd','cde','def']
r=[['abc','def'],['bcd','cde'],['abc','def','bcd']]

I'd like to edit these objects (or produced new objects) such that for each element in list m, if it exists less than 2 times across all the lists of r, the element is removed from m and also from wherever it occurs in r.

So the result of the above would look like this:

['abc','bcd','def']

...because 'cde' is only found once in r.

Even better would be this:

[2, 2, 1, 2]

...or a count of frequencies of elements in m across lists in r. Then, based on the number, I could edit the lists in r based on the index of the output if the value meets a certain criteria.

So for example, remove index i of each list in r if i <2 or >100.

There is a round-about way to do this, but it is slower than molasses in January:

My starting point is that this:

[[1 if mx in rx else 0 for mx in m] for rx in map(set, r)]

will produce this:

[[1, 0, 0, 1], [0, 1, 1, 0], [1, 1, 0, 1]]

Thanks in advance!


Solution

  • Here is a line to get the counts:

    print [sum([1 for _r in r if _m in _r]) for _m in m]
    

    It gives the same result, you have written:

    [2, 2, 1, 2]