assembly addressing-mode 6502 atari-2600

Efficient multiple indirection in 6502 code

Issue

I'm looking at a 6502 program that has multiple arrays of bytes (sound effect data corresponding to a particular voice), which are of varying lengths. Currently this involves explicitly iterating through the first (if queued), then the second etc, and each voice has a separate set of variables for volume, delay etc, so the code is set up to use these hard-coded labels.

I'd like to roll this into a loop, indexing into these additional variables and the sound effect data. Indexing into the variables is fairly straightforward, using indexed addressing, but indexing into the sound effect data involves a lot more work, and I'm wondering if I'm missing something in the application of indexed indirect and indirect indexed addressing.

Below is a self-contained example of what I'm doing at the moment. The part I'd like to tighten up, if possible, is the code in LoadFromTable, ideally with some use of both X and Y addressing:

  .equ  Ptr0,  0x80
  .equ  Ptr1,  0x81

  .org  0xFE00

  .org  0x0000

Init:
  LDX #0xFF
  TXS

Main:
  LDX #0x00
  LDY #0x00
  JSR LoadFromTable
  ; A should be 'H',  0x48

  LDX #0x01
  LDY #0x00
  JSR LoadFromTable
  ; A should be 'B',  0x42

  LDX #0x02
  LDY #0x02
  JSR LoadFromTable
  ; A should be 'A',  0x41

  JMP Main

LoadFromTable:
  TXA           ; Double outer index to account for 16 bit pointers
  ASL           ;   "
  TAX           ;   "
  LDA Table,X   ; Load the low byte of the array into a pointer
  STA Ptr0      ;   "
  INX           ; Load the high byte of the array into the pointer
  LDA Table,X   ;   "
  STA Ptr1      ;   "
  LDA (Ptr0),Y  ; Load the character at the inner index into the array
  RTS

  .org  0x0040

Table:
  .word Item0
  .word Item1
  .word Item2

  .org  0x0080

Item0:
  .byte 'H', 'E', 'L', 'L', 'O', 0x00

Item1:
  .byte 'B', 'O', 'N', 'J', 'O', 'U', 'R', 0x00

Item2:
  .byte 'C', 'I', 'A', 'O', 0x00

  .org  0x00FA

  .word Init
  .word Init
  .word Init

Implementation

Taking onboard the split table idea from @NickWestgate and hoisting out the initial pointer calculation as noted by @Michael, I've moved from something like this:

PROCESS_MUSIC:
  ; ...
  BNE   MusDoB

MusChanA:
  ; ...
  LDA   MUSICA,X
  BNE   MusCmdToneA
  ; ...
  JMP   MusChanA

MusCmdToneA:
  ; ...
  BNE   MusNoteA
  ; ...

MusNoteA:
  ; ...
  LDA   MUSICA,X
  ; ...

MusDoB:
  ; ...
  BNE   MusDoDone

MusChanB:
  ; ...
  LDA   MUSICB,X
  BNE   MusCmdToneB
  ; ...
  JMP   MusChanB

MusCmdToneB:
  ; ...
  BNE   MusNoteB
  ; ...

MusNoteB:
  ; ...

MusDoDone:
  RTS

to this more generalised subroutine:

PROCESS_MUSIC:
  LDX #0x01

PerChannel:
  ; ...
  BNE EndPerChannel
  LDA MusicTableL,X
  STA tmp0
  LDA MusicTableH,X
  STA tmp1

MusChan:
  ; ...
  LDA (tmp0),Y
  BNE MusCmdTone
  ; ...
  BEQ MusChan

MusCmdTone:
  ; ...
  BNE MusNote
  ; ...

MusNote:
  ; ...
  LDA (tmp0),Y
  ; ...

EndPerChannel:
  DEX 
  BPL PerChannel
  RTS

with the addition of the following tables:

MusicTableL:
    .byte <MUSICA
    .byte <MUSICB

MusicTableH:
    .byte >MUSICA
    .byte >MUSICB

This removes the need for the LoadFromTable function I'd originally been using, and seems much cleaner overall.

Solution

Here are a few ideas. One is passing in an index that's already doubled (i.e. if you can arrange that, or it might already be in the accumulator at some earlier stage).

Another is splitting up the address tables:

LoadFromTable:
  LDA TableL,X ; Load the low byte of the array into a pointer
  STA Ptr0      ;   "
  LDA TableH,X ; Load the high byte of the array into the pointer
  STA Ptr1      ;   "
  LDA (Ptr0),Y  ; Load the character at the inner index into the array
  RTS

TableL:
  .byte #<Item0
  .byte #<Item1
  .byte #<Item2

TableH:
  .byte #>Item0
  .byte #>Item1
  .byte #>Item2

If you can't split up the tables, you can probably still get rid of an INX by doing:

  LDA Table,X   ; Load the low byte of the array into a pointer
  STA Ptr0      ;   "
  LDA Table+1,X ; Load the high byte of the array into the pointer
  STA Ptr1      ;   "

Self modifying code might be useful. Living on page zero will be a factor:

  LDA Table,X   ; Load the low byte of the array into a pointer
  STA Load+1    ;   "
  LDA Table+1,X ; Load the high byte of the array into the pointer
  STA Load+2    ;   "
Load:
  LDA $FFFF,Y   ; Load the character at the inner index into the array

You could also see whether adding Y to the pointer as you store it saves any cycles. It might depend on the most common path used (i.e. if it usually doesn't INC Ptr2/Load+2).