Hyperthreading vs. Superscalar execution

Imagine a CPU (or core) that is superscalar (multiple execution units) and also has hyperthreading (SMT) support.

Why is the number of software threads the CPU can truly execute in parallel typically given by the number of logical cores (i.e. so-called hardware threads) it possesses, and not the total number of execution units it has?
If my understanding is correct, SMT doesn't actually enable true parallel execution, it instead simply makes context switching much faster/more efficient by duplicating certain parts of the CPU (those that store the architectural state, but not the main execution resources). On the other hand, superscalar architecture allows true simultaneous execution of multiple instructions per clock cycle, because the CPU has multiple execution units, i.e. multiple parallel pipelines which can each can process a separate thread, in true parallel fashion.

So for example, if a CPU has 2 cores, and each core has 2 execution units, shouldn't its hardware concurrency (the number of threads it can truly execute in parallel) be 4? Why is its hardware concurrency instead given by the number of logical cores, when SMT doesn't actually enable true parallel execution?

Solution

You can't just slam instructions into the execution units.
If you want two a 2-way SMT you need to keep two architectural states and fetch two instruction streams.

If a company has 100 developers but only two project managers it can only develop two projects in parallel (but it can concurrently develop more if it make the PMs switch project each day or so).

If a CPU can fetch only from two instruction streams (keeping only two thread contexts) you can assign it only two threads to execute in parallel.
You can however make a time-division and execute more threads concurrently.

The software has no access to the execution units, that would make a circular argument (the software needs the EUs to execute but the EUs need the software to execute).
The CPU will try to use as much as the EUs as possible exploiting Out-of-order and speculating on anything it can.
Actually, hyper-threading is just a way to keep all the resources busy (like sharing a developer with another PM when they have little to do).

But if all fails and an EU is not used, then that possible unit of work has simply gone wasted.