pythonmathmontecarlo

Average distance of any point of a disk to its boundary


I was wondering what was the average distance "d" of any point in a ball to its boundary (not specifically the closest point). I made a simple monte carlo computation using Muller's method to get a uniform distribution - see linked answer : Python Uniform distribution of points on 4 dimensional sphere :

def sample_within(radius, dim, nb_points=1, ):
    # Gaussian-distributed points
    x = np.random.normal(size=(nb_points, dim))

    # Normalize to unit length (points on the surface of the n-sphere)
    x /= np.linalg.norm(x, axis=1)[:, np.newaxis]

    # Random radii with distribution ~ r^(1/n) => uniform volumic distribution in the ball
    r = np.random.rand(nb_points) ** (1/dim)

    # Scale by radius
    points = radius * x * r[:, np.newaxis]
    return points

Note: I also used a "stupid" sampling in a box and discarding what ever points lies out of the sphere, everything I say in this post is true wiht both sampling method as they produce the same outcome.

And using (the same method) :

def sample_on_border(radius, dim:int, nb_points:int):
   # Gaussian-distributed points
   x = np.random.normal(size=(nb_points, dim))

   # Normalize to unit length (points on the surface of the dim-sphere)
   x /= np.linalg.norm(x, axis=1)[:, np.newaxis]

   # Scale by radius
   points = radius * x
   return points

The distances are calculated using (which I have checked manually):

def calc_average_distance(radius: float, dim: int, nb_in_points: int, nb_border_points: int)->float:
    border_points = sample_on_border(radius, dim, nb_border_points)
    in_points = sample_within(radius, dim, nb_in_points)
    # in_points = border_points #specific case R=1

    distances = np.sqrt(np.sum((border_points[:, np.newaxis, :] - in_points[np.newaxis, :, :])**2, axis=2))
    average_distance_to_border = np.mean(distances)
    return average_distance_to_border 

From now on, I will speak in dimension 2 (disk) and with a radius of 1.

Note: It is easy to demonstrate that the average distance between 2 points of the circle is 4/pi (1.27), and using only sampled points on the border in my Monte Carlo computation yield the same result. Also, the average distance between the center of the circle and the points of the circle is obviously 1 (since R=1). Therefore, I would expect the average distance d to be between 1 and 1.27 (it is just intuition though).

The computed average distance d is : d = 1.13182 ± 0.00035 (with 5 sigma confidence) which is indeed between 1 and 1.27 (seems to even be the average).

Now, I was also interest to find if there is an analytical solution to this problem, and I found the paper "The Average Directional Distance To The Boundary Of A Ball Or Disk" at https://www.emis.de/journals/AMEN/2024/AMEN-A230614.pdf This paper adress the problem in n-dimension, but its first section specifically deal with the case in dimension 2.

Note: For the average distance between two points on the unit circle (i.e., when R=1), the paper gives the result 2/pi . However, unless I am mistaken, the correct value should be 4/π as previously stated and this can even be derived from the paper formula by taking r=R=1.

However, I cannot find any (other) mistake in the paper and the analytical solution (which I have checked several times) yields 8/3pi so around 0.85 which is obvisouly different from the 1.13 I get when using my monte carlo code (and also not between 1 and 4/pi).

Also, just to be sure, I tried:

Can anyone help me identify a mistake in my reasoning/code ?

Edit: I failed to mention, that this post https://math.stackexchange.com/questions/875011/average-distance-from-a-point-in-a-ball-to-a-point-on-its-boundary gives 6/5 for a 3d-ball which is the result I get by Monte-Carlo but differs from the paper's result (3/4). I am too bad at math to understand their results, but by pure annalogy I figured that 1.13.. is in fact 32/(9pi) which is the value of the double integral between 0 and 1 and 0 and 2pi of r*sqrt(r**2 - 2rcos(theta)+1).

Thanks


Solution

  • The error you are doing is to compute the average distance between a point within the disk and a point within the border.

    This is not what the article describes. The article describe the average distance between a point within a disk and the intersection of the border for a direction

    This may seem to be the same. Since choosing Φ randomly results into choosing randomly a point in the border. But it is not. Because the border point is this way not uniformly chosen on the border (it is the direction Φ from the inner point that is uniformly chosen)

    Trying to maintain as much as I can your code, using the same sample_within and sample_on_border (to ease generalization to other dimensions), what would be the distance d(P,Φ), if not (as you did) just ||P-(cos Φ, sin Φ)|

    P being points returned by sample_within, and (cos Φ, sin Φ) the points returned by sample_on_border (that is a strange way to choose Φ, sure, rather than just a uniform choosing in [0, 2π], but that way I reuse sample_on_border, and that way, it may be usable for more than 2D). To make notation lighter, and in that spirit "I am not just choosing an angle, but a point on a unit sphere", let's call u=(cos Φ, sin Φ)

    One way to figure out that distance is to say that distance is α such as P+α.u is on the unit circle (or sphere for higher dimensions), with α≥0 (α<0 is also counted, but with direction -Φ)

    That is ||P+α.u|| = 1

    That is <P+αu, P+αu> = 1

    That is <P,P>+2α<P,u>+α²<u,u>

    Since u is on the unit sphere, <u,u>=1. <P,P>=ρ² if ρ is the distance to the center of P. That is just a 2nd degree equation to solve with unknown α

    α² + 2<P,U>α + ρ²-1 = 0 Δ = 4<P,U>² + 4 - 4ρ² So we have two solutions α = (-2<P,U> ± √Δ)/2 = ±√(<P,U>²+1-ρ²) - <P,U> Since 1-ρ² is positive, √(<P,U>²+1-ρ²) is slightly more that |<P,U>|, so is positive only for solution α = √(<P,U>²+1-ρ²) - <P,U>

    So, that is your distance.

    Let's revise now your last function with that distance in mind

    def calc_average_distance2(dim: int, nb_in_points: int, nb_border_points: int)->float:
        # Just rename `border_points` to `direction_point`, since it is a direction we are choosing here
        # That is what I called u
        direction_point = sample_on_border(1, dim, nb_border_points)
        # And this, what I called P
        in_points = sample_within(1, dim, nb_in_points)
        # I mention later why I tried with this line
        #in_points = sample_on_border(1, dim, nb_in_points)
        # <P,u>
        scal = (direction_point[:,None,:] * in_points[None,:,:]).sum(axis=2)
        ρ = np.linalg.norm(in_points, axis=1)[None,:]
        dist = np.sqrt(scal**2+1-ρ**2)-scal
        # And this comment of yours, I realize only now, while copying this to 
        # stack overflow, has the exact same purpose, probably that my own commented line
        # in_points = border_points #specific case R=1
        return dist.mean() 
    

    Because I did my reasoning on a unit sphere, and was too lazy to redo it with a radius, I removed the radius parameter and use a fixed 1

    The result on my machine, with 5000 points × 1000 directions is 0.8490785129084816 which is close enough to 8/(3π)≈0.8488263631567751

    And if I uncomment the line chosing in_points on the border, to compute what is called P₁ in the article (I suspect that was also the role of your commented line), I get 0.6368585804371621 which is close enough to 2/π≈0.6366197723675814

    So, I am not 100% positive that this is what you wanted to do, and if that is what you meant by "average distance of any points of a dist to its boundary". Reason why I used conditional about you wanting to use this code to generalize to higher dimensions. But I am 100% positive that this is what the article is doing.

    A simpler version

    Earlier, I said "α>0. -α case will be counted when direction will be -Φ".

    But after all, what would happen if we used both α solution for average? That is just counting average for both Φ and -Φ (or u, and -u, if with think "direction" rather than "angle"). So, it would have worked without that α>0 restriction, and counting both α solution, negative and positive in the average. But of course it is |α| that is the distance then

    But what is the average of both |α| solutions?

    Well, it is It is ½ ( |√(<P,U>²+1-ρ²) - <P,U>| + |-√(<P,U>²+1-ρ²) - <P,U>| )

    And since the first is clearly positive (as we already said) and second clearly negative, absolute values can be resolved directly. It is
    ½ ( √(<P,U>²+1-ρ²) - <P,U> + √(<P,U>²+1-ρ²) + <P,U>| ) = √(<P,U>²+1-ρ²)

    So, not a huge simplification, sure. that mean that we can drop the -<P,u> term

    dist = np.sqrt(scal**2+1-ρ**2)-scal
    ⇒
    dist = np.sqrt(scal**2+1-ρ**2)
    

    works as well