Consider an AXI4 Interconnect on the PL (FPGA) side.
When I double click to see the available options, there is a tab in Slave interfaces. Containing the following options.
What is the purpose of enabling register slice? Does outer refer to the L2 cache? And what does Auto mean?
What is the purpose of enabling the Data FIFO? For burst transactions? Doesn't the DMA controller have it's own FIFO?
Enabling Register slices (AXI Interconnect v2.1 - pg. 93) basically creates a pipeline between your AXI master and slave connections to break a critical timing path. It does not seem to have anything related to L2 cache. The available options provide you with the following (pg. 113):
AXI Reference Guide provides a guideline for AXI System Optimization in page 91. For example, it states that: "Large and complex IP blocks such as processors, DDR3 memory controllers, and PCIe bridges are good candidates for having register slices enabled. The register slice breaks timing paths and allows more freedom for Place and Route (PAR) tools to move a large IP block away from the congestion of the interconnect core and other IP logic". Nonetheless, I recommend reading that whole section as excessive use of register slices may be counterproductive." and it really depends on your system design.
The purpose of enabling Data FIFO is to provide data buffering and enable higher throughput. The 32 deep mode option provides a 32-deep LUT-RAM based FIFO (data channel only), while 512 deep (packet mode) provides a 512-deep block RAM based Packet FIFO. The Packet FIFO mode provides an additional 32-deep FIFO on the correspoding address channel to to avoid full/empty stalls in the middle of bursts. This basically adds a delay to read/write operations to avoid stalls. Read the following pages for more in depth information. (AXI Interconnect v2.1 - pg. 94). The options provide the following:
Finally, I don't know the exacts of Xilinx's DMA implementation but I believe the intent of including a buffer there would be if your receiving module was not as fast as your DMA. That is, the DMA can provide more data than your module can read, thus buffering its output could enhance communication speed(and release your DMA faster in some cases).