pythonlistbooleaninclusion

How to create a binary list based on inclusion of list elements in another list


Given two lists of words, dictionary and sentence, I'm trying to create a binary representation based on the inclusion of words of dictionary in the sentence such as [1,0,0,0,0,0,1,...,0] where 1 indicates that the ith word in the dictionary shows up in the sentence.

What's the fastest way I can do this?

Example data:

dictionary =  ['aardvark', 'apple','eat','I','like','maize','man','to','zebra', 'zed']
sentence = ['I', 'like', 'to', 'eat', 'apples']
result = [0,0,1,1,1,0,0,1,0,0]

Is there something faster than the following considering that I'm working with very large lists of approximately 56'000 elements in size?

x = [int(i in sentence) for i in dictionary]

Solution

  • I would suggest something like this:

    words = set(['hello','there']) #have the words available as a set
    sentance = ['hello','monkey','theres','there']
    rep = [ 1 if w in words else 0 for w in sentance ]
    >>> 
    [1, 0, 0, 1]
    

    I would take this approach because sets have O(1) lookup time, that to check if w is in words takes a constant time. This results in the list comprehension being O(n) as it must visit each word once. I believe this is close to or as efficient as you will get.

    You also mentioned creating a 'Boolean' array, this would allow you to simply have the following instead:

    rep = [ w in words for w in sentance ]
    >>> 
    [True, False, False, True]