This answer here postulates that to actually generate a square wave (or any other abstract wave-shape) you have to layer multiple sine waves on top of each other. Yet old hardware (Commodore, NES, etc) lacked sine wave channels and instead relied heavily on square pulse-waves, triangle waves, noise and sawtooth waves. I always assumed this was done because those waves are easier to generate than a simple sine wave. So,would genereating these wave shapes not be computationally more expensive? Why was it done anyway?
This answer here postulates that to actually generate a square wave […] you have to layer multiple sine waves on top of each other.
Not really, it just describes how a square wave can be analyzed to prove certain facts about its sound - how much energy is in each frequency band and such. This is somewhat similar to how every integer can be factored into one or more smaller prime factors (15=3×5) which is useful when analyzing algorithms but still doesn't change how we came up with the original number (maybe counting 15 sheep).
Separating a "complex" wave into sinusoidal components are very useful mathematically, but does not tell us the mechanism behind its original creation.
I always assumed this was done because those waves are easier to generate than a simple sine wave.
Your assumption here is correct. Starting with a digital circuit, the square wave is the easiest and cheapest waveform to create1. Just turn a voltage on and off using a single transistor. It is also cheaper in a mass-market manufacturing context because a sine wave generator (and even a saw-tooth) made from analog electronics will require a lot of extra components in order to not drift with temperature, age, and humidity.
It is also arguably more useful in a synthesizer context than one single sine wave because it has a lot of harmonics you can modify with a filter like in the SID.
The next step on the complexity ladder is any ramp-shape, like the triangle or saw-tooth. While you can make these using analog electronics, even back in the early eighties they were typically implemented by a simple DAC driven by a digital counter. The rate of the counter determined how fast the waveform goes from 0 to MAX and thus determined the pitch.
Once you have your DAC in your computer you could use it to generate a sine wave but it requires either impossibly expensive real-time calculations or a large table of pre-calculated sine values, so it was rarely (never?) done. When computers got some useful amount of RAM and bandwidth, they quickly switched to plain arbitrary samples and never looked back.
1) In fact, anything else is so much more complicated that today we just do everything using simple digital pulses and just filter the result in various ways (PDM, PWM, Delta-sigma)