pythonpython-3.xnumpyoptimizationcbor

Optimizing CBOR reading functions to pass data into numpy


I'm trying to read image data in from a CBOR file into a Numpy array.

Ideally I'm looking for a more efficient way to read convert the bytes from two's compliment to unsigned and then read the image data into a numpy array.

I experimented with a few different ways to convert and read the bytes but wasn't able to improve the speed by a significant margin.

Originally I was using a for loop to convert the bytes (1. below) then I used numpy with modulo (2. below) and then moved to selective addition (3. below).

My full functions are below as well.

1) for x in data:
    new_byte = x%256
2) ndarray%256
3) image[image<0] += 256
import os
from cbor2 import dumps, loads, decoder
import numpy as np
import itertools

def decode_image_bytes(image_byte_array):
    """Input: 1-D list of 16 bit two's compliment bytes 
        Operations: Converts the bytes to unsigned and decodes them
        Output: a 1-D array of 16-bit image data"""
    # Convert input to numpy array
    image = np.array(image_byte_array)
    # Convert two's complement bytes to unsigned
    image[image<0] += 256
    # Split the unsigned bytes into segments
    bytes_array=np.array_split(image,(len(image)/2))
    holder = list()
    # Convert segements into integer values
    for x in bytes_array:
        holder.append(int.from_bytes(list(x), byteorder='big', signed=False))
    return holder

def decode_image_metadata(image_dimensions_bytes_array):
    """Input: 1-D list of sint64 two's complement bytes
        Operations: Converts bytes to unsigned and decodes them
        Output: Dictionary with possible values: 'width, height, channels, Z, time'"""
    # Convert input to numpy array
    dimensions = np.array(image_dimensions_bytes_array)
    # Covert two's complement bytes to unsigned
    dimensions[dimensions<0] += 256
    # Split the unsigned bytes into segements
    bytes_array=np.array_split(dimensions,(len(dimensions)/8))
    # Convert the segments into integer values
    for x in range(0, len(bytes_array)):
        bytes_array[x]=int.from_bytes(list(bytes_array[x]), byteorder='big', signed=True)
    # Put the converted integer values into a dictionary
    end = dict(itertools.zip_longest(['width', 'height', 'channels', 'Z', 'time'], bytes_array, fillvalue=None))
    return end

Right now it takes 20-30 seconds to convert the bytes and return the Numpy array. I'd like to cut that in half if possible.

Right now I've come up with using to eliminate the for loops. Is there a better method?

bytes_array = np.apply_along_axis(metadata_values, 1, bytes_array)

def metadata_values(element):
    return int.from_bytes(element, byteorder='big', signed=True)

Solution

  • Unless you're doing it for your own edificaion, you shouldn't be writing your own conversion between binary number representations, as it will be orders of magnitude slower.

    Here is an example of reading bytes into a numpy array of various formats:

    >>> b = bytes([0,1,127,128,255,254]) #equivelant to reading bytes from a file in binary mode
    >>> np.frombuffer(b, dtype=np.uint8)
    array([  0,   1, 127, 128, 255, 254], dtype=uint8) #notice the *U*int vs int
    >>> np.frombuffer(b, dtype=np.int8)
    array([   0,    1,  127, -128,   -1,   -2], dtype=int8)
    >>> #you can also specify other than 1 byte data formats as long as you have the right amount of bytes
    >>> np.frombuffer(b, dtype=np.int16)
    array([   256, -32641,   -257], dtype=int16)
    >>> np.frombuffer(b, dtype=np.uint16)
    array([  256, 32895, 65279], dtype=uint16)