Sunday, July 6, 2008

Time is of the Essence

After reading about Mike Dailly's raster split trouble I decided to write a bit about how the top split is done in Paradroid Redux.



There are three things to do when playing area begins:

  • change background color
  • start character display
  • start sprite display
The first one is just $d021 change, other two require a bit of trickery.

To mask vertically scrolling characters one would usually use illegal graphics mode (Extended Color Mode comined with Bit Map Mode and/or Multi Color Mode) or sprites. The first method produces black pixels so it isn't usable unless background is black, and the second one is unusable if sprites need to cross the split.

So, it's time for some font trickery. Raster interrupt several lines above the split changes to blank font, and then the actual split interrupt changes $D018 to display correct character data. This means "wasting" $0800 bytes for the blank font, but half of that memory is used as temporary buffer elsewhere so actual cost of clean split is one kilobyte.


Clipping sprites cleanly requires trickery as well as there is no way to start sprite display from the middle of graphic data. One way to achieve clipping is clearing top of sprite graphics if it overflows top split, but that would waste time both when clearing the memory and when running extra animator/digit generator rounds to restore top of sprite when more of it gets shown. Another way to achieve clipping is to put sprite at non-visible x-coordinate and then change them at the correct line. However, there is no time to change multiple registers in time.

Guess what? $D018 comes to rescue once again. There is an extra screen where the sprite pointers point to blank sprites. When the time comes, split interrupt doesn't only change font (bits 1-3 of $D018) but also displayed screen (bits 4-7 of $D018). Nice and easy solution but it requires a blank screen, another one kilobyte wasted. No, not really - there is no reason why one half of the blank font couldn't be used for blank screen. What that means is that sprite clipping is practically free as there is no extra memory required and $D018 needs to be written anyway for character blanking.


There is one problem with the screen change though. While VIC-II reads font data and sprite pointers & sprite data every line, character pointers (screen data) are only read on every eighth line. This means that while sprite and font changes are immediate, screen change affects display 0-7 lines later. To overcome this the top line of playing area is copied onto blank screen so character pointers are correct when $D018 is changed.

As blank screen is located inside blank font this copying creates yet another problem. Blank font isn't that blank any more. The topmost line is at SCREEN + $140, which means that chars $28-$2c aren't blank and will produce garbage if they appear on the topmost line. The easiest way to avoid that is to not use those chars inside playing area at all, so that's what is done. The same problem happens because of blank sprite pointers at SCREEN + $3F8, char $7F. That one is unused as well.


And what does all this has to do with timing problems Mike mentioned?

VIC-II doesn't have time to fetch sprite data when CPU is not using memory bus, so it has to stop CPU momentarily whenever sprites are active. Just how many cycles VIC-II steals from CPU depends on which sprites are active, and this causes a timing hell as you have to change registers when C64 is inside the side border area to avoid flicker. With $D021 change you have all the side border (23 cycles) to change it, but $D018 is trickier. Sprite data is fetched very early, the first three sprites get their data read at the end of previous line. This means that if $D018 write is late sprites will stay blank for one extra line.

To get register writes done at the very beginning of side border area the game uses CIA timer to stabilize raster timing. During game init CIA1 timer A is started, running through 63-65 cycles depending on VIC-II version. This means that $DC04 is always synchronized with current display X position. IRQ only needs to read $DC04 and skip that many cycles.


Did I forget bad lines? Every eight line VIC-II needs to read character pointers and that's not possible without stopping CPU for most of the raster line. This means that there is absolutely no time for anything unnecessary. In the case the split happens on a bad line the game triggers raster IRQ two lines above split, prepares next interrupt at the correct line and then preloads $D018/$D021 values and executes two-cycle instructions until IRQ happens. That guarantees minimum interrupt latency. When raster interrupt happens it will just write those two registers, clean up the stack (this second interrupt pushed status register and return address into stack) and jumps into the common code.



Nothing explains code better than source, so here it is. Only relevant parts are shown, and for clarity I've removed all assembly directives which were there to make sure branches don't span page boundaries (which would add one cycle to the branch).

       IRQ at line 95, prepare for split
 
...
 
lda #$10
ora _vScroll
sta $d011
cmp #$16
bne .057
 
; special case for bad line

lda #<Irq_116
sta $fffe
 
lda _d018+1
sta _d018b+1
lda _d021+1
sta _d021b+1
lda #116
bne .x1 ; jmp
 
; normal case
 
.057 lda #<Irq_118
sta $fffe
lda #118
 
.x1 sta $d012
 
...
 
;----------------------------------------------------------------
 
; this one used when ($d011 & 7) = 6, stuffs
; d018/d021 as fast as possible at raster 118
 
subroutine
Irq_116 pha
sty .yr+1
cld
 
lda #<Irq_118b
sta $fffe
lda #>Irq_118b
sta $ffff
lda #118
sta $d012
inc $d019
 
; 118/15 = 7.8 so this one is executed 8 times
 
sbc #15
bcs *-2 ; 8*5-1=39 cycles
 
; preload registers and execute 2-cycle
; instuctions until next IRQ happens
 
_d018b lda #scr_GAME
_d021b ldy #0
cli
repeat 16
cli ; 32 cycles wasted
repend
 
; now is the time to write registers, we always enter
; via interrupt as the above code never runs this far
 
Irq_118b
sta $d018
sty $d021
 
; clean up stack and continue normal IRQ code
 
pla ; flags
pla ; PC lo
pla ; PC hi, always != 0
bne .irq0 ; jmp
 
;----------------------------------------------------------------
 
; normal case, use timer value to stabilize
; raster regardless of sprites over the split
 
Irq_118
pha
sty .yr+1
cld
 
lda $dc04 ; [1,15] ([2,15] if NTSC/Drean)
eor #$0f ; [14,0] ([13,0])
sta .j3+1
.j3 bpl *+2 ; jump into the delay code
 
; entering at offset 0 delays 16 cycles,
; entering at offset 14 delays 2 cycles
;
; OP_CMP_IMM is opcode for CMP #immediate (2 cycles),
; OP_CMP_ZP is opcode for COM $zeropage (3 cycles)
 
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_ZP
nop
 
_d021 ldy #0
_d018 lda #scr_GAME
sta $d018
sty $d021
 
; continue with interrupt
.irq0 ...