This is on TensorFlow 1.11.0. The documentation of tft.apply_buckets
is not very descriptive. In specific, I read:
"bucket_boundaries: The bucket boundaries represented as a rank 2 Tensor."
I assume this has to be bucket indices and bucket boundaries?
When I try with the toy example below:
import tensorflow as tf
import tensorflow_transform as tft
import numpy as np
tf.enable_eager_execution()
x = np.array([-1,9,19, 29, 39])
xt = tf.cast(
tf.convert_to_tensor(x),
tf.float32
)
boundaries = tf.cast(
tf.transpose(
tf.convert_to_tensor([[0, 1, 2, 3], [10, 20, 30, 40]])
),
tf.float32
)
buckets = tft.apply_buckets(xt, boundaries)
I get:
InvalidArgumentError: Expected sorted boundaries [Op:BucketizeWithInputBoundaries] name: assign_buckets
Note that in this case x
and bucket_boundaries
arguments are:
tf.Tensor([-1. 9. 19. 29. 39.], shape=(5,), dtype=float32)
tf.Tensor(
[[ 0. 10.]
[ 1. 20.]
[ 2. 30.]
[ 3. 40.]], shape=(4, 2), dtype=float32)
So, it seems like bucket_boundaries
is not supposed to be indices and boundaries. Does anyone know how to properly use this method?
After some playing around, I found out that bucket_boundaries
is supposed to be a 2 dimensional array where entries are bucket boundaries and the array is wrapped so it has two columns. See example below:
import tensorflow as tf
import tensorflow_transform as tft
import numpy as np
tf.enable_eager_execution()
x = np.array([-1,9,19, 29, 39])
xt = tf.cast(
tf.convert_to_tensor(x),
tf.float32
)
boundaries = tf.cast(
tf.transpose(
tf.convert_to_tensor([[0, 20, 40, 60], [10, 30, 50, 70]])
),
tf.float32
)
buckets = tft.apply_buckets(xt, boundaries)
So, the expected inputs are:
print (xt)
print (buckets)
print (boundaries)
tf.Tensor([-1. 9. 19. 29. 39.], shape=(5,), dtype=float32)
tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor(
[[ 0. 10.]
[20. 30.]
[40. 50.]
[60. 70.]], shape=(4, 2), dtype=float32)