It's known that a single PF can map to multiple VFs.
About the number of VFs associated with a single PF:
In PCIe 5.0 spec:
IMPLEMENTATION NOTE
VFs Spanning Multiple Bus Numbers
As an example, consider an SR-IOV Device that supports a single PF. Initially, only PF 0 is visible. Software Sets ARI Capable Hierarchy. From the SR-IOV Extended Capability it determines: InitialVFs is 600, First VF Offset is 1 and VF Stride is 1.
If software sets NumVFs in the range [0 … 255], then the Device uses a single Bus Number.
If software sets NumVFs in the range [256 … 511], then the Device uses two Bus Numbers.
If software sets NumVFs in the range [512 … 600], then the Device uses three Bus Numbers.
PF 0 and VF 0,1 through VF 0,255 are always on the first (captured) Bus Number. VF 0,256 through VF 0,511 are always on the second Bus Number (captured Bus Number plus 1). VF 0,512 through VF 0,600 are always on the third Bus Number (captured Bus Number plus 2).
From Oracle:
Each SR-IOV device can have a physical function and each physical function can have up to 64,000 virtual functions associated with it.
From the "sharing PCIe I/O bandwidth" point of view, it might be understandable to having hundres or thousands of VFs (associated with a single PF), each VF is assigned to a VM, with the assumption that most of the VFs are in idle state at a particular time point;
However, from the "chip manufacturing" point of view, for a non-trival PCIe function, duplicating hundreds or thousands of the VF part of the IP instances within a single die would make the die area too large to be practical.
So my question is, as stated in the subject line, are there practical use cases for having so many VFs associcated with a single PF?
When a server is used to run Virtual Machines, using para-virtualized network interfaces which are emulated by the Hypervisor is the simplest, but by far not the most efficient solution. Therefore, when a VM guest needs a performant network interface, SR-IOV approach is used, whereby the physical network card exposes a number of Virtual Functions (VFs), and the Hypervisor exposes to each VM one or more of these VFs as a full PCI device. From the point onward, the VM guest has direct access to the network card hardware resources carved out for the assigned VFs. Often, a single VM guest needs more than just one network interface, as would be the case for a virtual appliance functioning as router or firewall. Such VFs which represent a splinter of a physical network card as often referred to as "vNIC" (virtual network interface card) or "ENA" (elastic network adapter in the case of AWS). The number of such vNICs (and hence VFs) required is in proportion to the count of VMs that a server should suppport, which can reach 256 guests on a server with 2 sockets, 64 CPU cores each with HyperThreading - at 2 vNICs per VM on average, the number would reach 1,000 VFs. If the server is intended to support containers (such as Kubernetes), being lighter than full-fledged virtual machines allows even more containers on a server, which when combined with a requirement for performant networking, i.e. SR-IOV (direct container access to network card hardware queues), means many thousands of VFs need to be supported.
With respect to implementation, Synopsis has published an article addressing the flop count challenge you are concerned about, and they recommend using SRAM to store the configuration space of PCIe devices rather than straight flops.