I use TF.js
to run a key-point prediction model for an input image in browser. And I'd like to apply affine transformation to the value of every keypoint using TF.js
and webgl
backend.
For the value of every key-point I'd like to do translate
, scale
and rotation
.
Input
As a result of model prediction, I have a tensor with the shape [coord, n]
, where coord
is [x, y]
position of the keypoint in pixels.
My tensor
inputTensor.print();
> Tensor
[[103.9713821, 128.1083069], // <- [x, y]
[103.7512436, 107.0477371],
[103.3587036, 115.1293793],
[99.65448 , 92.0794601 ],
[103.9862061, 101.7136688],
[104.2239304, 95.8158569 ],
[104.6783295, 82.7580566 ]]
Formula
I see tf.image.transform
uses the following formula to compute the pixel position.
(x', y') = ((a0 x + a1 y + a2) / k, (b0 x + b1 y + b2) / k)
where k = c0 x + c1 y + 1.
I have values for [a0, a1, a2, b0 b1, b2, c0, c1]
, so seems like I only need a way to apply this formula to every (x, y) pair in my tensor.
CPU Example (I need it on TF.js)
I've tried to do the transformation on the CPU using THREE.js. It works but is too slow. Hope it will give you some ideas of what I expect.
const landmarks: Float32Array = inputTensor.dataSync();
const output: Point3D[] = [];
for (let i = 0; i < landmarks.length - 1; i += 2) {
const x = landmarks[i];
const y = landmarks[i + 1];
const mat4 = new Matrix4();
mat4.identity();
// Fill in with the basic values
mat4.multiply(new Matrix4().makeTranslation(x, y, 0));
// Scale
mat4.multiply(
new Matrix4().makeScale(
1 / scaleX,
1 / scaleY,
1,
),
);
// Rotate
mat4.multiply(new Matrix4().makeRotationZ(rotate));
// Translate
mat4.multiply(
new Matrix4().makeTranslation(
translateX,
translateY,
0,
),
);
const p = new Vector3(x, y, 0).applyMatrix4(mat4);
output.push(new Point3D(p.x, p.y, p.z));
}
Note
As far as I see tf.image.transform
doesn't work for me since it operates with the position of the element, but I need to operate with the value.
It is easy but the processing using large matrixes multiplication on every single point is using the process of time, you can apply it on changed, refresh rates or scopes. Identity matrixes is faster to determine how much of the input picture change, you are going into the correct way.
[ Example ]:
y1 = tf.keras.layers.Cropping2D(cropping=((start_y, pic_height - box_height - start_y), (start, pic_width - box_width - start)))(picture)
target_1 = tf.keras.layers.Cropping2D(cropping=((previous_start_y, pic_height - box_height - previous_start_y), (previous_start, pic_width - box_width - previous_start)))(char_1)
temp_3 = tf.where(tf.math.greater_equal( np.asarray(y1, dtype=np.float32), np.asarray(target_1, dtype=np.float32)), [1.0], [0.0]).numpy()
temp_3 = tf.math.multiply( temp_3, y1, name=None )