pythontensorflowkerastf.kerastf.data.dataset

Tensorflow: Creating a TensorFlow dataset using multi-dimensional input data with differing length. (Video Data)


The problem I am having is part of my 4th-year university project. The project is to translate sign language. The set-up I have at the moment for the input data is a NumPy array of shape [n_videos] each video in this list is a NumPy tensor of the shape [n_frames, n_hands=2, n_hand_landmarks=21, n_points(x,y,z)=3 ]

The output data is simply an array of words so for example a given video tensor could be mapped to the phrase "<start> are you finished <end>".

The issue I am having is that when I try to create the dataset I get the following error

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-56-bf28891dc793> in <module>
     16 print(target_tensor_train.shape)
     17 
---> 18 dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
     19 dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in from_tensor_slices(tensors, name)
    779       Dataset: A `Dataset`.
    780     """
--> 781     return TensorSliceDataset(tensors, name=name)
    782 
    783   class _GeneratorState(object):

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py in __init__(self, element, is_files, name)
   4659   def __init__(self, element, is_files=False, name=None):
   4660     """See `Dataset.from_tensor_slices()` for details."""
-> 4661     element = structure.normalize_element(element)
   4662     batched_spec = structure.type_spec_from_value(element)
   4663     self._tensors = structure.to_batched_tensor_list(batched_spec, element)

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/data/util/structure.py in normalize_element(element, element_signature)
    127           dtype = getattr(spec, "dtype", None)
    128           normalized_components.append(
--> 129               ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
    130   return nest.pack_sequence_as(pack_as, normalized_components)
    131 

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py in wrapped(*args, **kwargs)
    161         with Trace(trace_name, **trace_kwargs):
    162           return func(*args, **kwargs)
--> 163       return func(*args, **kwargs)
    164 
    165     return wrapped

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
   1619 
   1620     if ret is None:
-> 1621       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1622 
   1623     if ret is NotImplemented:

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py in _default_conversion_function(***failed resolving arguments***)
     50 def _default_conversion_function(value, dtype, name, as_ref):
     51   del as_ref  # Unused.
---> 52   return constant_op.constant(value, dtype, name=name)
     53 
     54 

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
    269     ValueError: if called on a symbolic tensor.
    270   """
--> 271   return _constant_impl(value, dtype, shape, name, verify_shape=False,
    272                         allow_broadcast=True)
    273 

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    281       with trace.Trace("tf.constant"):
    282         return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 283     return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    284 
    285   g = ops.get_default_graph()

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    306 def _constant_eager_impl(ctx, value, dtype, shape, verify_shape):
    307   """Creates a constant on the current device."""
--> 308   t = convert_to_eager_tensor(value, ctx, dtype)
    309   if shape is None:
    310     return t

/opt/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    104       dtype = dtypes.as_dtype(dtype).as_datatype_enum
    105   ctx.ensure_initialized()
--> 106   return ops.EagerTensor(value, ctx.device_name, dtype)
    107 
    108 

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

The code I am using has been edited from Ch 18 in the Machine Learning with TensorFlow Second Edition textbook from Manning. I am using TensorFlow 2.

My code is shown below to demonstrate the shape of the data.

all_data = np.load('people_data_1.0.npz', allow_pickle=True)
phrases = all_data['Phrases']
input_data = all_data['Data']

print(input_data.shape)
print([item.shape for item in input_data])

(20,)

[(43, 2, 21, 3), (75, 2, 21, 3), (56, 2, 21, 3), (45, 2, 21, 3), (77, 2, 21, 3), (81, 2, 21, 3), (93, 2, 21, 3), (76, 2, 21, 3), (71, 2, 21, 3), (69, 2, 21, 3), (63, 2, 21, 3), (73, 2, 21, 3), (76, 2, 21, 3), (98, 2, 21, 3), (101, 2, 21, 3), (47, 2, 21, 3), (67, 2, 21, 3), (46, 2, 21, 3), (48, 2, 21, 3), (74, 2, 21, 3)]

After the output data is tokenized and loaded it looks as follows;

[[ 1  4  3 13  2  0  0]
 [ 1  4  3 14 15  2  0]
 [ 1  4  3 11  2  0  0]
 [ 1  4  3  7  2  0  0]
 [ 1  4  3  8  2  0  0]
 [ 1  4  3  9  2  0  0]
 [ 1  5  6 10  3  2  0]
 [ 1  5  6 12  2  0  0]
 [ 1 16  3 17 18 19  2]
 [ 1 20 21  2  0  0  0]
 [ 1  4  3 11  2  0  0]
 [ 1  4  3  7  2  0  0]
 [ 1  4  3  8  2  0  0]
 [ 1  4  3  9  2  0  0]
 [ 1  5  6 10  3  2  0]
 [ 1  4  3  7  2  0  0]
 [ 1  4  3  8  2  0  0]
 [ 1  4  3  9  2  0  0]
 [ 1  5  6 10  3  2  0]
 [ 1  5  6 12  2  0  0]] 

i.e. 

Target Language; index to word mapping
1 ----> <start>
4 ----> are
3 ----> you
7 ----> ill
2 ----> <end>

Then when I check the shape and data type of my input and output data it looks as show below

[print(i.shape, i.dtype) for i in input_data]
[print(o.shape, o.dtype) for o in target_tensor]

(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(1,) object
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32
(7,) int32

Now the code to follow is where the error occurs.


    BUFFER_SIZE = len(input_tensor_train)
    BATCH_SIZE = 5
    
    dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
    dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

I feel like it has something to do with the input being a list of different sized np arrays. I considered padding the video data with zeros at the end similar to the words but felt this would cause the size of my data to increase quite drastically and was curious if there is another way to solve this issue.

Any help with this matter and a point in the direction of another method for dealing with this kind of input and output data would be greatly appreciated.

Thanks, William.


Solution

  • To create a dataset of videos of different length I suggest something like that:

    file_names = [str(i) for i in range(20)]
    
    def dummy_read_file(name):
        length = tf.random.uniform(shape=[], minval=10, maxval=40, dtype=tf.int32)
        return tf.random.normal(shape=[length, 2, 21, 3])
    
    dataset = tf.data.Dataset.from_tensor_slices(file_names)
    dataset = dataset.map(lambda file_name: {"file_name": file_name, "video": dummy_read_file(file_name)})
    dataset = dataset.padded_batch(4)
    
    for batch in dataset.as_numpy_iterator():
        print(batch["video"].shape)
    
    # (4, 28, 2, 21, 3)
    # (4, 24, 2, 21, 3)
    # (4, 27, 2, 21, 3)
    # (4, 23, 2, 21, 3)
    # (4, 26, 2, 21, 3)
    

    In order to make batches of closed length for better performance replace dataset = dataset.padded_batch(4) as follows

    ...
    dataset = dataset.apply(tf.data.experimental.bucket_by_sequence_length(
        element_length_func=lambda sample: tf.shape(sample["video"])[0], 
        bucket_boundaries=[20, 30], 
        bucket_batch_sizes=[5, 4, 3], 
    ))
    ...
    
    for batch in dataset.as_numpy_iterator():
        print(batch["video"].shape)
    
    # (4, 27, 2, 21, 3)
    # (5, 16, 2, 21, 3)
    # (5, 19, 2, 21, 3)
    # (4, 26, 2, 21, 3)
    # (2, 11, 2, 21, 3)
    

    Or use tf.data.Dataset.bucket_by_sequence_length for latest TensorFlow versions.

    You can also try tf.RaggedTensor but I cannot recommend it. It may be unstable for very big tensors like entire video dataset and practically useless for batches.

    For further optimization make the bucketing before actual file upload by video length precalculation.