compiler-optimizationtensortensorflow-xla

Are Tensor sharding and Tensor tilting the same implementation?


I know the each concept of Tensor Sharding and Tensor Tiling. But Is there any differences between them? Especially about the XLA/Hlo or GSPMD concept in parallel training (data parallel or model parallel).


Solution

  • No, Tensor sharding and Tensor tilting are not the same implementation. They are both techniques used in parallel training of machine learning models, but they serve different purposes.

    Tensor sharding is a technique used to distribute the computation of large tensors across multiple devices or machines in a distributed system. The tensor is divided into smaller pieces, or shards, and each shard is processed independently on different devices.

    Tensor tilting, on the other hand, is a technique used to optimize the performance of tensor operations by partitioning the tensor into smaller, fixed-size tiles that can be loaded into memory and processed more efficiently.

    Both techniques can be used in conjunction with XLA (Accelerated Linear Algebra) and Hlo (High-Level Optimizer) technologies to optimize the computation graph used in deep learning training. GSPMD (gated synchronous parallelism data parallelism) is a specific parallel training approach that leverages these technologies and techniques to efficiently distribute the data and computations required for training across multiple devices or machines.