realitykit coordinate-systems visionos immersive-space

Don't understand visionOS immersive space coordinate systems

I am testing coordinate transforms in a visionOS app that uses an immersive space.
In that space, a box is displayed that is not at the origin. The box can be tapped, and the returned Point3D is printed out.

The spatial tap gesture is used in 3 versions:
First using the default coordinate space like in the code below, then coordinate space .global and eventually .local.

This is my ImmersiveView:

struct ImmersiveView: View {

    var body: some View {
        RealityView { content in
            let mesh = MeshResource.generateBox(width: 1, height: 0.5, depth: 0.25, splitFaces: true)
            
            var frontMaterial = UnlitMaterial()
            frontMaterial.color.tint = .green
            var topMaterial = UnlitMaterial()
            topMaterial.color.tint = .red
            
            let boxEntity = ModelEntity(mesh: mesh, materials: [frontMaterial, topMaterial])
            boxEntity.components.set(InputTargetComponent(allowedInputTypes: .all))
            boxEntity.components[CollisionComponent.self] = CollisionComponent(shapes: [ShapeResource.generateConvex(from: mesh)])
            boxEntity.transform.translation = [0, 0, -3]
            
            content.add(boxEntity)
         }
        .gesture(tapGesture)
   }

    var tapGesture: some Gesture {
        SpatialTapGesture(coordinateSpace: .local)
            .targetedToAnyEntity()
            .onEnded { event in
                let point3D = event.location3D
                print(point3D)
            }
    }

}

The test app is executed on the simulator.

EDIT:
By now I know that there is a 3rd coordinate space. The parameter passed to SpatialTapGesture has type CoordinateSpaceProtocol (docu), and CoordinateSpaceProtocol has, besides global and local, also a static var immersiveSpace (docu). If the spatial tap gesture is initialized with .immersiveSpace, I get yet another set of tap coordinates, appended below.
Thus, I am even more confused how to interpret these Point3D values.

There should be a simple way to get the point tapped on the box entity.
End of edit.

Here are the logs:

SpatialTapGesture()

Front top left: (x: -9.3138427734375, y: 338.8812255859375, z: -3270.0)
Front top right: (x: 1285.9720458984375, y: 333.781982421875, z: -3270.0)
Front bottom right: (x: 1287.6568603515625, y: 950.25439453125, z: -3270.0)
Front bottom left: (x: -0.68157958984375, y: 954.317626953125, z: -3270.0)

Top rear left: (x: 2.19390869140625, y: 300.0, z: -3529.07763671875)
Top rear right: (x: 1285.1815185546875, y: 300.0, z: -3525.16357421875)
Top front right: (x: 1287.944580078125, y: 300.0, z: -3315.341064453125)
Top front left: (x: 2.83245849609375, y: 300.0, z: -3314.37890625)

SpatialTapGesture(coordinateSpace: .global)

Front top left: (x: -11.9898681640625, y: 332.7701416015625, z: -3910.0)
Front top right: (x: 1290.196044921875, y: 329.361572265625, z: -3910.0)
Front bottom right: (x: 1291.71435546875, y: 951.95458984375, z: -3910.0)
Front bottom left: (x: -9.7513427734375, y: 948.22998046875, z: -3910.0)

Top rear left: (x: -6.54705810546875, y: 300.0, z: -4166.34814453125)
Top rear right: (x: 1283.194091796875, y: 300.0, z: -4176.96533203125)
Top front right: (x: 1295.661376953125, y: 300.0, z: -3959.49169921875)
Top front left: (x: -1.35040283203125, y: 300.0, z: -3963.63916015625)

SpatialTapGesture(coordinateSpace: .local)

Front top left: (x: -12.3238525390625, y: 326.27392578125, z: -3270.0)
Front top right: (x: 1298.88818359375, y: 329.40087890625, z: -3270.0)
Front bottom right: (x: 1288.92919921875, y: 952.740234375, z: -3270.0)
Front bottom left: (x: -1.74945068359375, y: 956.993896484375, z: -3270.0)

Top rear left: (x: 1.71380615234375, y: 300.0, z: -3513.2685546875)
Top rear right: (x: 1287.155517578125, y: 300.0, z: -3516.19384765625)
Top front right: (x: 1296.16064453125, y: 300.0, z: -3307.510986328125)
Top front left: (x: -6.49407958984375, y: 300.0, z: -3322.9765625)

SpatialTapGesture(coordinateSpace: .immersiveSpace)

Front top left: (x: -652.4783935546875, y: -331.3809814453125, z: -3910.0)
Front top right: (x: 641.110595703125, y: -312.7972412109375, z: -3910.0)
Front bottom right: (x: 628.82421875, y: 300.383056640625, z: -3910.0)
Front bottom left: (x: -648.674072265625, y: 292.794921875, z: -3910.0)

Top rear left: (x: -639.6309814453125, y: -340.0, z: -4199.0126953125)
Top rear right: (x: 658.1767578125, y: -340.0, z: -4157.3818359375)
Top front right: (x: 645.925537109375, y: -340.0, z: -3939.31396484375)
Top front left: (x: -653.8341064453125, y: -340.0, z: -3945.56689453125)

The (x,y,z) values depend of course on the accuracy with that the edges of the cube are tapped.
Taking this into account, one can see the following:

The values of SpatialTapGesture() and SpatialTapGesture(coordinateSpace: .local) are the same, so the default coordinate space is .local (I could not find this in the docs).

From these values follows:
All local/default and global coordinate values are roughly the same with the following exceptions:

The front z values are locally larger than the global values by a value of about 600.
The rear z values are locally larger than the global values by a value of about 330. This is surely related to the z offset of the box entity of 3 meters.
The box width is roughly 1300, the height roughly 650, corresponding to the box parameters (1m x 0.5m).
The box depth is locally 230, globally 300.

What I don't understand:

In which units is location3D returned, i.e. how do I convert the location3D values to meters?
Why is the box depth locally and globally different (but this might be an accuracy problem)?

Solution

I posted a short version of this question on Apple's Developer Forum, and they posted the solution.
It is hard to find this solution, because it is not (yet) documented. The docu mentions that the value returned by the SpatialTapGesture has the properties location and location3D. It does not say that there is also a property entity that is the entity tapped.
Using this entity, one can convert location3D e.g. to the local coordinate space of the entity, using

let entity = value.entity
value.convert(value.location3D, from: .local, to: entity)