And a related question: How would stack traces/similar debugging features look like in it?
And please excuse me if this is a stupid question, but I don't know much about low-level programming. I know most CPUs have instructions related to the stack, but would a properly optimized stackless language really be that much slower?
Short answer: if your problem requires a stack, the one built in to the language/hardware will probably be a good deal faster than one you could write.
Let's think what a "stackless language" would be.
The original Fortran language had no concept of a stack. All you could write was one big MAIN program. Then it was found incredibly useful to be able to write subroutines and call them, so that was added, along with functions that return values. However, as I discovered personally, it you had MAIN call a subroutine A, and that calls B, and then somehow B would find itself calling A again, guess what? The machine would "hang in a return loop", because when A tried to return to MAIN it would instead return to B, which would return to A, and so on. It had no way to remember more than one thing to return to.
So for some problems, in that Fortran, you couldn't solve the problem without writing your own stack. That is, you would have an array, and an integer variable keeping track of what to do next by indexing into that array, and you would end up doing things that, later, came to be called "push" and "pop".
This was found to be so useful, it was built in to later languages. There are various ways to do it. Before machines started having a built-in stack, languages like PL/1 would effectively create a stack in the form of a linked list of activation records being constantly allocated and deleted. (Not very efficient, but it worked.)
So if your language has no stack, and you try to solve certain problems, you will have to chisel your own stack out of the living language, because the problem simply requires it. An example of such a problem is depth-first tree walking.
So if you can do that, is the language "stackless" or not?