Why the compiler doesn't simply move data from edi to eax in unoptimized debug builds?

I was observing the assembly created for the following function here.

int square(int num) {
    return num;
}

This is the assembly generated for the function above:

square:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     eax, DWORD PTR [rbp-4]
        pop     rbp
        ret

My question is, can I rewrite it into this and assume that it works fine?

square:
        push    rbp
        mov     rbp, rsp
        mov     eax, edi
        pop     rbp
        ret

What is the need to first move edi into [rbp-4] and then move [rbp-4] into eax? Since both source and destination are registers I think we can move data in a single move instruction.

Edited

My question is about unoptimized version of code, I know if I compile it with -O1 it emits far more short and succinct code. But my question is about semantics of that extra mov instruction.

Solution

Without optimization, the compiler generates the steps to do following a literal specification of the language. In C, the parameters to a function are essentially local variables, much as if you had declared them inside the function. So this C code:

int square(int num)
{
    …
}

is, in the semantics of the C standard, much like:

int square(?)
{
    int num = value that the caller passed as the first argument;
    …
}

Without optimization, the compiler is following this model. When the routine starts, the compiler sets up a stack frame and stores the arguments (which are in registers) to the parameters (which are on the stack).

Then return num; loads the parameter from the stack.

With optimization on, the compiler removes the unnecessary use of the stack.