I was following nesdev.org/6502_cpu.txt to implement my own NES emulator. It should be able to do read and writes at the right cycle time for the instruction.
After taking a look at all the other instructions, I stumbled onto this wording.
RTS
# address R/W description
--- ------- --- -----------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away)
3 $0100,S R increment S
4 $0100,S R pull PCL from stack, increment S
5 $0100,S R pull PCH from stack
6 PC R increment PC
increment PC. Does this mean the PC should only be incremented, or (like cycle 2) should I read before incrementing and just throw the resulting value away? Same counts for increment S.
Why should I read:
R/W indicates R.PC or S are set as address.Why should I not read:
# address R/W description
--- ------- --- ------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R read next instruction byte (and throw it away)
Cycle 2: Yes, read the next instruction byte and throw it away
Cycle 3: Yes, read from the stack and throw it away
Cycle 6: Yes, read from the address pointed to by the PC (before incrementing) and throw it away
These cycle instructions would be the only two without a read or a write
They would except that the 6502 always does a read or write on every cycle.
One half of each cycle enables the memory bus always. Effectively, the clock drives the bus enable pin (not exactly true: a signal derived from the clock does this, but with subtly different timing to ensure that the address bus - and data bus for writes - is stable). This means that the 6502 will read from some address even when it doesn't need to read memory.
So, in cycle 2 of RTS it really does read the address pointed to by the incremented PC and then throw away the result. In opcodes with operands, the byte read would not be thrown away.
In cycle 3, it needs to increment the stack pointer to get the low byte of the return address. As it uses the same ALU as everything else, this takes a whole cycle. In the meantime a read still has to happen and, as it happens that the CPU has already set up the address bus to reflect the stack pointer, it reads from the stack, unnecessarily.
In cycle 6, yes it does read from the PC before incrementing it but the byte it reads is thrown away. The reason for doing this is that, at the point where JSR saves the PC on to the stack, it hasn't yet read the last byte of the operand, so the address on the stack is not the next opcode to read but the high byte of the address to which JSR jumps.
JSR
# address R/W description
--- ------- --- -------------------------------------------------
1 PC R fetch opcode, increment PC
2 PC R fetch low address byte, increment PC
3 $0100,S R internal operation (predecrement S?)
4 $0100,S W push PCH on stack, decrement S
5 $0100,S W push PCL on stack, decrement S
6 PC R copy low address byte to PCL, fetch high address
byte to PCH
Does any of this matter? It depends. If you are writing an emulator, it doesn't really matter if you don't emulate the unnecessary reads for normal memory or ROM, but some IO chips change state when their registers are read. For example, a read of a register might cancel an interrupt flag or cause a timer to start or stop.
The indexing instructions require an extra cycle when a page boundary is crossed. This is because the index register has to be added to the address low byte and, if a page boundary has been crossed it has to increment the high address byte which takes an extra cycle because it needs the ALU. While it's doing that, you get a spurious read of the equivalent address on the lower page.