mathlanguage-agnosticcoordinatescoordinate-transformationtransformation-matrix# How to find a transformation matrix given the measurements from two coordinate systems?

##### Points on the Monitor Relative to the Camera

I am working on a task where a customer's gaze direction is calculated to determine whether they looked at the monitor or outside of it. I drew the following to get an understanding of what needs to be done:

The picture depicts the following (measurements in mm):

- Black rectangle with 530x942 dimensions is a monitor.
- A person is standing 500 from the monitor, with the height of 1675 from his eyes to the ground.
- A blue mark located 50mm from the top center from the monitor is a camera.
- P1, P2 and P3 are points where a person looks.
- The distance from the camera to the ground is 2000
- d=537.04 is the distance from the eye to P3, calculated by the pythagorean theorem (√(500
^{2}+ 196^{2})) - Similarly, the distance from eye to P1 = 761.89 (from eye to P2 as well), calculated by the pythagorean theorem (√(540.43
^{2}+ 537.04^{2}))

So far, I manually calculated the distances as coordinates of X,Y,Z. They are as follows:

The eye coordinates relative to the camera are: **Eye=(0,−325,−596.34)**

P1 (Top-left of the monitor):

Horizontal offset from the center: −265 mm

Vertical offset from the top center of the monitor: −50 mm

Depth: 0 mm (since it's on the same plane as the camera)

Coordinates:
**P1=(−265,−50,0)**

P2 (Top-right of the monitor):

Horizontal offset from the center: 265 mm

Vertical offset from the top center of the monitor: −50 mm

Depth: 0 mm

Coordinates:
**P2=(265,−50,0)**

P3 (Center of the monitor):

Horizontal offset: 0 mm

Vertical offset from the camera: −521 mm

Depth: 0 mm

Coordinates:
**P3=(0,−521,0)**

Thus, I derived the following:

Eye to P1:

Vector=P1−Eye=(−265,−50−(−325),0−(−596.34))=(−265,275,596.34)

Eye to P2:

Vector=P2−Eye=(265,−50−(−325),0−(−596.34))=(265,275,596.34)

Eye to P3:

Vector=P3−Eye=(0,−521−(−325),0−(−596.34))=(0,−196,596.34)

Now, **I would like to know if I have got the gaze directions (of a person'e eye to P1, P2 and P3 from the camera's PoV) correctly based on the following method where it states**:

Please note that although the 3D gaze (gaze_dir) is defined as a difference between target's and subject's positions (target_pos3d - person_eyes3d) each of them is expressed in different coordinate system, i.e.

`gaze_dir = M * (target_pos3d - person_eyes3d)`

where`M`

depends on a normal direction between eyes and the camera.

Also, **how do I calculate the transformation matrix M if ever need be**?

Solution

According to the correspondent paper P. Kellnhofer et al. (2019), *Gaze360: Physically Unconstrained Gaze Estimation in the Wild*, ICCV (PDF file size ~17MB), 4th page, on the topic *Gaze Direction* the gaze vector is converted from camera (there called ladybug) coordinate system L = [L_{x}, L_{y}, L_{z}] into eye coordinate system E = [E_{x}, E_{y}, E_{z}] as follows.

Gaze vector g_{L} in ladybug coordinates:

g_{L}= p_{t}− p_{e}

where p_{t} is the target cross point, and p_{e} the eye point, relative to the camera.

The eye coordinate system E has its origin in p_{e}. The basis vector E_{z} has in world coordinates the same direction as g_{L}, i.e. doesn't point from p_{e} to p_{t} but "backwards" from p_{e}. That's why E_{z} is the negated g_{L}. It sounds unintuitive but actually it is more convenient for operations considering the view depth when operating on negative z-values in eye coordinates. Additionally we normalize E_{z} by dividing by its length.

E_{z}= —g_{L}/ ||g_{L}||

The other basis vectors E_{x} and E_{y} have to be orthogonal to E_{x}. According to the text E_{x} lies in the plane defined by L_{x} and L_{y} without a roll, i.e. without a rotation around the x-axis. In other words we can temporarily assume that the yet unknown E_{y} runs parallel to L_{y}. That's actually usually not true, as we mostly don't gaze into the camera but at a target point elsewhere, but it's enough for now as the actually performed roll when looking somewhere else than the camera won't change E_{x}.

Now, the vector created by the cross product of two vectors is orthogonal to these. So we calculate E_{x}, with normalization:

E_{x}= (L_{y}× E_{z}) / ||L_{y}× E_{z}||

Note that E_{x} is now orthogonal to the eye's YZ-plane, and E_{z} will be orthogonal to the XY-plane per definitionem because it's our anchor vector. The only remaining step is to calculate the actual E_{y} as the orthogonal vector to the XZ-plane, i.e. consider that we actually make a roll of some angle relative to the camera about the now known E_{x} when lookng somewhere (angle = 0 when looking straight at the camera). Again we're using the cross product. No normalization needed, as the cross product of two normalized vectors will be normalized, too.

E_{y}= E_{z}× E_{x}

Then the gaze vector in eye coordinates g_{E} is, like in the text, yielded by applying a view transformation to g_{L}:

g_{E}= E ∙ g_{L}/ ||g_{L}||

At that E is nothing else then the view transformation matrix M with the columns, from left to right, E_{x}, E_{y}, E_{z}.

When the subject looks directly at the camera, i.e. p_{t} = [0, 0, 0], it's guaranteedly g_{E} = [0, 0, −1].

For P_{1} one gets after all the above mentioned calculations, arbitratily assuming L_{y} = [0, 1, 0] (actually use the true L_{y} according to the camera's orientation):

E_{z}= [0.3742, -0.3883, -0.8421] E_{x}= [-0.9138, 0, -0.4061] E_{y}= [0.1577, 0.9215, 0.3548]

i.e. the view transform matrix

```
| -0.9138 0.1577 0.3742 |
M = | 0 0.9215 -0.3883 |
| -0.4061 0.3548 -0.8421 |
```

- Why do I divide Z by W in a perspective projection in OpenGL?
- How to calculate with googol or even larger numbers in java?
- How to check for NaN values
- How does this bitwise operation check for a power of 2?
- how to make this function periodic in MATLAB?
- How do I determine the number of digits of an integer in C?
- Find a point in a circumference given X
- How to determine if a large integer is a power of 3 in Python?
- Fastest prime test for small-ish numbers
- Automatically simplify redundant arithmetic relations
- Python polynomial pow
- Rounding to even in C#
- Count ones in a segment (binary)
- Confusion between C++ and OpenGL matrix order (row-major vs column-major)
- How to validate a International Securities Identification Number (ISIN) number
- Finding intersection points between 3 spheres
- Simple way to interpolate between points in 3D space to form a smooth surface
- Get a vector that starts at a given point and is a tangent to some given object
- Getting decimal value from division
- Why aren’t posit arithmetic representations commonly used?
- Can you please explain Reed Solomon encoding part's Identity matrix?
- Ruby Floating Point Math - Issue with Precision in Sum Calc
- How do you calculate the average of a set of circular data?
- Number of nodes in tree where each node has k positive integers whose sum is less than n, and greater than parent's sum
- Do CPUs have a hardware "math cache" or dictionary that stores the result of simple math operations for quicker processing?
- JavaScript % (modulo) gives a negative result for negative numbers
- Approximating logarithm using harmonic mean
- barycentric coordinate clamping on 3d triangle
- Fastest way to list all primes below N
- fmod function from math.h library in C not working correctly?