tensorflowpcadimensionality-reductionprojector

How to use the 'sphereize data' option with PCA in TensorFlow


I have used PCA with the 'Sphereize data' option on the following page successfully: https://projector.tensorflow.org/

I wonder how to run the same computation locally using the TensorFlow API. I found the PCA documentation in the API documentation, but I am not sure if sphereizing the data is available somewhere in the API too?


Solution

  • The "sphereize data" option normalizes the data by shifting each point by the centroid and making unit norm.

    Here is the code used in Tensorboard (in typescript):

      normalize() {
        // Compute the centroid of all data points.
        let centroid = vector.centroid(this.points, (a) => a.vector);
        if (centroid == null) {
          throw Error('centroid should not be null');
        }
        // Shift all points by the centroid and make them unit norm.
        for (let id = 0; id < this.points.length; ++id) {
          let dataPoint = this.points[id];
          dataPoint.vector = vector.sub(dataPoint.vector, centroid);
          if (vector.norm2(dataPoint.vector) > 0) {
            // If we take the unit norm of a vector of all 0s, we get a vector of
            // all NaNs. We prevent that with a guard.
            vector.unit(dataPoint.vector);
          }
        }
      }
    

    You can reproduce that normalization using the following python function:

    def sphereize_data(x):
        """
        x is a 2D Tensor of shape :(num_vectors, dim_vectors) 
        """
        centroids = tf.reduce_mean(x, axis=0, keepdims=True) 
        return tf.math.div_no_nan((x - centroids), tf.norm(x - centroids, axis=0, keepdims=True))