I am asking a question very similar to that of TensorFlow broadcasting of RaggedTensor
Basically, I am doing machine learning, and one data sample consists of a list of lists of coordinates, where each list of coordinates represents a drawing stroke on a canvas. One such sample might be,
[
[[2,3], [2,4], [3,6], [4,8]],
[[7,3], [10,9]],
[[10,12], [14,17], [13,15]]
]
I would like to normalize these coordinates by subtracting by the mean and dividing by the standard deviation. Specifically, I want to find the mean and standard deviation of all the x-coordinates (index=0) and y-coordinates (index=1), respectively. I got these values by
list_points=tf.ragged.constant(list_points)
STD=tf.math.reduce_std(list_points, axis=(0,1))
mean=tf.reduce_mean(list_points, axis=(0,1))
STD and mean both have shape of (2,)
Now, I want to subtract the mean from list_points (this is the sample list of lists of coordinates), but it seems that for ragged_rank=3, I can only subtract by a scalar or a tensor that covers every single data point. Is there an easy way that I can simply subtract the RaggedTensor by a Tensor of shape (2,)?
I have tried to simply subtract mean from list_points directly, but whatever I do, I get this error:
ValueError: pylist has scalar values depth 3, but ragged_rank=3 requires scalar value depth greater than 3
In your case, ragged_rank
is 1 in fact. Thus tf.reduce_mean
can be used as follows
list_points = tf.ragged.constant(
[
[[2,3], [2,4], [3,6], [4,8]],
[[7,3], [10,9]],
[[10,12], [14,17], [13,15]]
],
ragged_rank=1,
dtype=tf.float32
)
list_points.shape
# TensorShape([3, None, 2])
mean = tf.reduce_mean(list_points, axis=[0, 1])
# tf.Tensor: shape=(2,), dtype=float64, numpy=array([7.22222222, 8.55555556])>
std = tf.reduce_mean(list_points**2, axis=[0, 1]) - tf.reduce_mean(list_points, axis=[0, 1])**2
# <tf.Tensor: shape=(2,), dtype=float64, numpy=array([19.72839506, 23.80246914])>
We can subtract (add, multiply, etc) from a ragged tensor of ragged rank 1 an ordinary tensor of rank 1 if their first dimensions coincides.
list_points - mean
# <tf.RaggedTensor [[[-5.2222223, -5.5555553],
# [-5.2222223, -4.5555553],
# [-4.2222223, -2.5555553],
# [-3.2222223, -0.55555534]], [[-0.22222233, -5.5555553],
# [2.7777777, 0.44444466]] ,
# [[2.7777777, 3.4444447],
# [6.7777777, 8.444445],
# [5.7777777, 6.4444447]]]>
This is possible because under the hood of the raged tensor of ragged rank 1 we have an ordinary tensor
list_points.values.shape
# TensorShape([9, 2])
For ragged_rank > 1
case we can attract tf.math.segment_mean
that is more tricky.