For example, a modern i7-8700k can supposedly do ~60 GFLOPS (single-precision, source) while its maximum frequency is 4.7GHz. As far as I am aware, an instruction has to take at least one cycle to complete, so how is this possible?
There are multiple factors that are all multiplied together for this large effect:
Multiplying all those factors together we get: 8 * 6 * 2 * 2 * 4.3 = 825 GFLOPS (matching the stats reported here). This calculation certainly does not mean that it can actually be attained. For example the processor may downclock significantly under such a workload in order to stay within its power budget, which is what Intel has been doing at least since Haswell (though the specifics have changed and it applied to server parts). Also, most real code has significant trouble feeding that many FMAs with data. Large matrix multiplications can get close though, and for example according to these stats the 8700k reached 496.7 Gflops in their SGEMM benchmark. Possibly the 8700k's max AVX2 turbo speed on 6 cores is 2.6GHz but as far as I can find it does not have an AVX offset by default (only needed when overclocked), or that GEMM is just not that close to hitting peak FLOPS.