Is a multi-core CPU required to implement SIMD?
I found the following phrase "multiple processing elements" when reading Wikipedia about SIMD. So what's the difference between this phrase and "multi-core CPU"?
No, each core normally can perform most general operations from the instruction set. But the "multiple processing elements" for SIMD operations just perform a single operation on different data (different bytes or words).
For example, each core of ARM Cortex-A53 microarchitecture has capability to run SIMD instructions independently of other cores, while such SIMD instruction sets as MMX, SSE and SSE2 were first introduced on single-core CPUs.