Sunday, July 6, 2008

Time is of the Essence

After reading about Mike Dailly's raster split trouble I decided to write a bit about how the top split is done in Paradroid Redux.



There are three things to do when playing area begins:

  • change background color
  • start character display
  • start sprite display
The first one is just $d021 change, other two require a bit of trickery.

To mask vertically scrolling characters one would usually use illegal graphics mode (Extended Color Mode comined with Bit Map Mode and/or Multi Color Mode) or sprites. The first method produces black pixels so it isn't usable unless background is black, and the second one is unusable if sprites need to cross the split.

So, it's time for some font trickery. Raster interrupt several lines above the split changes to blank font, and then the actual split interrupt changes $D018 to display correct character data. This means "wasting" $0800 bytes for the blank font, but half of that memory is used as temporary buffer elsewhere so actual cost of clean split is one kilobyte.


Clipping sprites cleanly requires trickery as well as there is no way to start sprite display from the middle of graphic data. One way to achieve clipping is clearing top of sprite graphics if it overflows top split, but that would waste time both when clearing the memory and when running extra animator/digit generator rounds to restore top of sprite when more of it gets shown. Another way to achieve clipping is to put sprite at non-visible x-coordinate and then change them at the correct line. However, there is no time to change multiple registers in time.

Guess what? $D018 comes to rescue once again. There is an extra screen where the sprite pointers point to blank sprites. When the time comes, split interrupt doesn't only change font (bits 1-3 of $D018) but also displayed screen (bits 4-7 of $D018). Nice and easy solution but it requires a blank screen, another one kilobyte wasted. No, not really - there is no reason why one half of the blank font couldn't be used for blank screen. What that means is that sprite clipping is practically free as there is no extra memory required and $D018 needs to be written anyway for character blanking.


There is one problem with the screen change though. While VIC-II reads font data and sprite pointers & sprite data every line, character pointers (screen data) are only read on every eighth line. This means that while sprite and font changes are immediate, screen change affects display 0-7 lines later. To overcome this the top line of playing area is copied onto blank screen so character pointers are correct when $D018 is changed.

As blank screen is located inside blank font this copying creates yet another problem. Blank font isn't that blank any more. The topmost line is at SCREEN + $140, which means that chars $28-$2c aren't blank and will produce garbage if they appear on the topmost line. The easiest way to avoid that is to not use those chars inside playing area at all, so that's what is done. The same problem happens because of blank sprite pointers at SCREEN + $3F8, char $7F. That one is unused as well.


And what does all this has to do with timing problems Mike mentioned?

VIC-II doesn't have time to fetch sprite data when CPU is not using memory bus, so it has to stop CPU momentarily whenever sprites are active. Just how many cycles VIC-II steals from CPU depends on which sprites are active, and this causes a timing hell as you have to change registers when C64 is inside the side border area to avoid flicker. With $D021 change you have all the side border (23 cycles) to change it, but $D018 is trickier. Sprite data is fetched very early, the first three sprites get their data read at the end of previous line. This means that if $D018 write is late sprites will stay blank for one extra line.

To get register writes done at the very beginning of side border area the game uses CIA timer to stabilize raster timing. During game init CIA1 timer A is started, running through 63-65 cycles depending on VIC-II version. This means that $DC04 is always synchronized with current display X position. IRQ only needs to read $DC04 and skip that many cycles.


Did I forget bad lines? Every eight line VIC-II needs to read character pointers and that's not possible without stopping CPU for most of the raster line. This means that there is absolutely no time for anything unnecessary. In the case the split happens on a bad line the game triggers raster IRQ two lines above split, prepares next interrupt at the correct line and then preloads $D018/$D021 values and executes two-cycle instructions until IRQ happens. That guarantees minimum interrupt latency. When raster interrupt happens it will just write those two registers, clean up the stack (this second interrupt pushed status register and return address into stack) and jumps into the common code.



Nothing explains code better than source, so here it is. Only relevant parts are shown, and for clarity I've removed all assembly directives which were there to make sure branches don't span page boundaries (which would add one cycle to the branch).

       IRQ at line 95, prepare for split
 
...
 
lda #$10
ora _vScroll
sta $d011
cmp #$16
bne .057
 
; special case for bad line

lda #<Irq_116
sta $fffe
 
lda _d018+1
sta _d018b+1
lda _d021+1
sta _d021b+1
lda #116
bne .x1 ; jmp
 
; normal case
 
.057 lda #<Irq_118
sta $fffe
lda #118
 
.x1 sta $d012
 
...
 
;----------------------------------------------------------------
 
; this one used when ($d011 & 7) = 6, stuffs
; d018/d021 as fast as possible at raster 118
 
subroutine
Irq_116 pha
sty .yr+1
cld
 
lda #<Irq_118b
sta $fffe
lda #>Irq_118b
sta $ffff
lda #118
sta $d012
inc $d019
 
; 118/15 = 7.8 so this one is executed 8 times
 
sbc #15
bcs *-2 ; 8*5-1=39 cycles
 
; preload registers and execute 2-cycle
; instuctions until next IRQ happens
 
_d018b lda #scr_GAME
_d021b ldy #0
cli
repeat 16
cli ; 32 cycles wasted
repend
 
; now is the time to write registers, we always enter
; via interrupt as the above code never runs this far
 
Irq_118b
sta $d018
sty $d021
 
; clean up stack and continue normal IRQ code
 
pla ; flags
pla ; PC lo
pla ; PC hi, always != 0
bne .irq0 ; jmp
 
;----------------------------------------------------------------
 
; normal case, use timer value to stabilize
; raster regardless of sprites over the split
 
Irq_118
pha
sty .yr+1
cld
 
lda $dc04 ; [1,15] ([2,15] if NTSC/Drean)
eor #$0f ; [14,0] ([13,0])
sta .j3+1
.j3 bpl *+2 ; jump into the delay code
 
; entering at offset 0 delays 16 cycles,
; entering at offset 14 delays 2 cycles
;
; OP_CMP_IMM is opcode for CMP #immediate (2 cycles),
; OP_CMP_ZP is opcode for COM $zeropage (3 cycles)
 
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_IMM
cmp #OP_CMP_ZP
nop
 
_d021 ldy #0
_d018 lda #scr_GAME
sta $d018
sty $d021
 
; continue with interrupt
.irq0 ...

Monday, June 23, 2008

Have a nice summer solstice(ish)



To honor the sun I give you a new release - or two releases actually.



My plan was to build an unified version with both graphic sets held in memory all the time, but that didn't happen due lack of memory. Too bad, having different graphics on different decks would have given some more variety to the game.

I did manage to fit all necessary data into memory, but as I had to
EOR fonts together there was no way to swap between them without temporary 2 KB buffer. You really can't unpack LZ data and EOR it simultaneously, which took me way too long to realize.

While it would have been possible to do the EOR in two passes with the 1 KB buffer I do have, I didn't bother with that as I would have to drop dual graphics as soon as I need the memory back anyway.



Oh, I did fix one single pixel bug in the hires font too :)



Edit: I also added single pixel bug into Metal Edition - when you clear deck the first time, there may be extra pixel in the background star. That one is gone as soon as you move a bit vertically, so I won't do another build just to fix it.

Sunday, April 20, 2008

Not quite dead yet

Here is little something for you who fear that the project is dead. Not much have changed though:
  • Subgame should now take 10 seconds regardless whether you're playing on C64 or C128. That's still slightly longer than original, but way better than 12 seconds.

  • Game should finally be Drean 64 compatible. I thought I put the code for this into the game 18 months ago, but apparently I didn't. Well, who's going to send me a Drean 64/128 so I can actually test it?

  • Most likely there are some other changes too, it was six weeks ago when I last touched the source. The only reason I did it now was to fix the download link.
Note: if you for some reason want to archive every single release, then do yourself a favor. Don't use build number as filename part! It was never meant for that. Instead, parse it as BB-DDMMYY where BB is daily build count, DD is day of month, MM is month and YY is year. Then reorder these as YYMMDDBB and when using that as part of filename you get chronologically sorted list.

Edit: Drean compatibility is now confirmed, thanks to the_woz. Check out his blog, especially Drean-specific entries.

Tuesday, December 25, 2007

Have a nice winter solstice

For those too busy to read any further, click here.


Important: archive updated January 2nd, you need to delete old high score file as it's not compatible any more.


Due to some rather unfortunate events in the family I haven't had as much time for PR as I would have liked to, so there are no major changes. Minor changes include:

  • fixed all but one of known bugs.
  • 2500 bonus points if you clear a ship without shooting any enemy droids. Note that if you hit a single droid on ship one, you won't get bonus even if you clear ship two without any shooting as hit counter is preserved from one ship to the next (same is done with accuracy calculation).
  • you can reduce enemy droid pulser count in the subgame by damaging them. This doesn't have much effect with the higher class droids, and you will have to cope with whatever energy the droid has left...
  • background stars are a bit more interesting now.
  • as always, it's slightly smaller and faster :)

As I haven't had much time to test this one, report any oddities please.


Changes which didn't make it to this version:

  • subgame bonus points for 11-1 / 12-0 wins. No time to fix the bugs caused by this...
  • raiders. You know, those annoying rogue droids in Paradroid'90.


Even if my time for coding has been limited, that doesn't mean that I haven't thought about the game during the slow times at work. I'm positive that the actual playing area can be enlarged by t least one character row. With C128 I think it might be possible to do two or three additional rows without the game slowing down. We'll see if I ever have time for that.

Sunday, October 7, 2007

I'm sane, thank you :P

Contrary to some other claims I'm not crazy, or at least I imagine so.

That's not the only error in Paradroid talk page - so here we have (drumroll, please)

The Definite C64 Paradroid Version Guide

  • Paradroid (original), 1985

  • Paradroid Competition Edition, 1986
    This one is identical to the original, except that it has some vertical blank waits removed. That allows the game run faster most of the time.
    Scroll code is unchanged, whoever wrote that it was enhanced clearly hasn't disassembled all versions and done comparisons between them...

  • Paradroid Metal Edition a.k.a. Heavy Metal Paradroid, 1986
    Minor changes, mostly allowing the use of multicolor chars, remaining ones save couple of cycles and/or bytes here and there.
    Uses C128 2 MHz mode in top/bottom border for higher speed.
    Fixes the decimal mode flag bug which causes weird sound fx in earlier versions.
    Some scroll text changes - this includes two bad chars which seem to be in every original ME tape!

  • Paradroid Redux, 2006-
    Nanos gigantium humeris insidentes.

Monday, September 24, 2007

Tweaking

Too little time for anything major (yet!) but scoring and subgame have seen some little changes.


  • Bonus score if you do well in subgame. 20% bonus for each remaining pulser if you win 11-1, 40% if you win 12-0

  • 2000 point bonus if you're wimp and use only transfer to overtake the ship. Not enough to compensate for score lost by not shooting droids, as I don't want to encourage that kind of cowardism ;)

  • Shoot droids to pieces before transferring to them - their pulser count reduces by one for every 16 points of damage. Don't forget that you have to cope with whatever energy they have left, though!



In addition to adding small things I've also discarded some ideas

  • Grenades/mines. What to do when droid explodes a mine but there are no free sprites?

  • Two player mode through link cable. That cuts down sprites available for enemy droids and fire, lowering the difficulty considerably. With fever sprites it's also harder to hide the fact that the game teleports droids away if it runs out sprites.


I have doubts about transport pads as well. These would transfer player within a deck, but that would require resetting droid positions to avoid several visibility problems. And that would mean teleporting all droids, meaning you could face the same robot you were running away just a second or two ago half a deck away. You may say that there isn't much realism in the game, but that makes it even more important to preserve what's left of it!

Wednesday, September 5, 2007

Wasting time

How can one waste gazillions of cycles to mirror one sprite? Quite easily, just forget speed and concentrate on compact code.



MirrorSprite
 
ldy #0
sty src
sty .5+1
 
; src = A<<6 | $4000
 
sec
ror
ror src
lsr
ror src
sta src+1
 
; ptr = X<<6 | $4000
 
txa
sec
ror
ror .5+1
lsr
ror .5+1
sta .5+2
 
; get sprite multicolor flag
 
ldy #$3F
lda (src),y
sta tmp2 ; b7=1 if multicolor
 
lda #60
 
.1 tax ; x=60,61,62, 57,58,59, ... 3,4,5, 0,1,2
 
lda #3
sta bytesLeft
 
.2 dey ; y=62,61,60, 59,58,57, ... 5,4,3, 2,1,0
lda (src),y
sta tmp1
lda #$01
.3 lsr tmp1 ; 5
bit tmp2 ; 8
bpl .4 ;10 hires, 8 loops 1 bit each
 
php ;13 else multicolor, 4 loops 2 bits each
lsr tmp1 ;18
rol ;20
plp ;24
.4 rol ;26
bcc .3 ;29 8*16=128 / 4*29=116
.5 sta $8000,x ; 63*128=8064
 
inx
dec bytesLeft
bne .2
 
txa
; sec ; asserted with "bcc .3"
sbc #6
bpl .1
 
rts
 


Hey, I just realized I can make it at least one byte sorter! ;)

Update: cycle counts were way off...