pythonnumpypandasbitstring

Convert Bitstring (String of 1 and 0s) to numpy array


I have a pandas Dataframe containing 1 columns which contains a string of bits eg.'100100101'. i want to convert this string into an numpy array.

How can I do that?

EDIT:

Using

features = df.bit.apply(lambda x: np.array(list(map(int,list(x)))))
#...
model.fit(features, lables)

leads to an error on model.fit:

ValueError: setting an array element with a sequence.

The Solution that works for my case i came up with due to marked answer:

for bitString in input_table['Bitstring'].values:
    bits = np.array(map(int, list(bitString)))
    featureList.append(bits)
features = np.array(featureList)
#....
model.fit(features, lables)

Solution

  • For a string s = "100100101", you can convert it to a numpy array at least two different ways.

    The first by using numpy's fromstring method. It is a bit awkward, because you have to specify the datatype and subtract out the "base" value of the elements.

    import numpy as np
    
    s = "100100101"
    a = np.fromstring(s,'u1') - ord('0')
    
    print a  # [1 0 0 1 0 0 1 0 1]
    

    Where 'u1' is the datatype and ord('0') is used to subtract the "base" value from each element.

    The second way is by converting each string element to an integer (since strings are iterable), then passing that list into np.array:

    import numpy as np
    
    s = "100100101"
    b = np.array(map(int, s))
    
    print b  # [1 0 0 1 0 0 1 0 1]
    

    Then

    # To see its a numpy array:
    print type(a)  # <type 'numpy.ndarray'>
    print a[0]     # 1
    print a[1]     # 0
    # ...
    

    Note the second approach scales significantly worse than the first as the length of the input string s increases. For small strings, it's close, but consider the timeit results for strings of 90 characters (I just used s * 10):

    fromstring: 49.283392424 s
    map/array:   2.154540959 s
    

    (This is using the default timeit.repeat arguments, the minimum of 3 runs, each run computing the time to run 1M string->array conversions)