I know the each concept of Tensor Sharding and Tensor Tiling. But Is there any differences between them? Especially about the XLA/Hlo or GSPMD concept in parallel training (data parallel or model parallel).
No, Tensor sharding and Tensor tilting are not the same implementation. They are both techniques used in parallel training of machine learning models, but they serve different purposes.
Tensor sharding is a technique used to distribute the computation of large tensors across multiple devices or machines in a distributed system. The tensor is divided into smaller pieces, or shards, and each shard is processed independently on different devices.
Tensor tilting, on the other hand, is a technique used to optimize the performance of tensor operations by partitioning the tensor into smaller, fixed-size tiles that can be loaded into memory and processed more efficiently.
Both techniques can be used in conjunction with XLA (Accelerated Linear Algebra) and Hlo (High-Level Optimizer) technologies to optimize the computation graph used in deep learning training. GSPMD (gated synchronous parallelism data parallelism) is a specific parallel training approach that leverages these technologies and techniques to efficiently distribute the data and computations required for training across multiple devices or machines.