I was wondering what is the reason behind branding a MCU as 32 bit or 64 bit. In the simplistic architecture like Harvard or Neumann architecture it used to be width of data bus. But in the market I have seen MCUs which have 64 bit data lines and yet marketed as 32 bit MCUs. Can somebody explain?
It is not true that the bit width of a processor was defined by the data bus width. Intel 8088 (used in the original IBM PC) was a 16bit device with an 8 bit data bus, and Motorola 68008 (Sinclair QL) was a 32bit device with an 8 bit bus.
It is primarily defined by the nature of the instruction set (width of operands) and the register width (necessarily the same).
When most devices had matching bus and instruction/register widths (i.e. prior to about 1980), there was no need for a distinction and that it was unclear whether it refered to bus or register/insttruction width was of little consequence, when narrow bus width bus versions of wide instruction/register devices were introduced it represented a marketing dilemma. The QL was widely advertised as having a 32 bit processor despite its 8 bit bus, while the 8088 was sometimes referred to as an 8/16 bit part. The 68008 could trivially perform 32bit operations in a single instruction - the fact that it took 4 bus cycles to get the operand was transparent to software, and the total number of instruction and data fetch cycles was still far fewer than it would take an 8 bit processor to perform the same 32 bit operation.
Another interesting architecture in this context is ARM architecture v4 that supports a 16 bit mode known as "Thumb" in addition to the 32bit ARM mode, In Thumb mode both the instruction and register set is 16 bit. This has higher code density than ARM mode. Where an external memory interface is used, most ARM v4 parts support both a 16 or 32 bit external bus - either ARM or Thumb may be used with either, but when a 16 bit bus is implemented, Thumb mode generally runs more efficiently than the 32 bit instruction set due to the single bus cycle per instruction or operand fetch.
Given the increasing variety of architectures instruction/register sets and bus widths, it makes sense now to characterise an architecture by its instruction/register set.