Language…
16 users online: anonimzwx, cohimbra, Darolac, DasFueller, Dennsen86,  Doctor No, Fozymandias, Gamet2004, Green, jirok1, LightAligns, Metal-Yoshi94, Raychu2021, sinseiga, Sokobansolver, twicepipes - Guests: 283 - Bots: 305
Users: 64,795 (2,375 active)
Latest user: mathew

optimized sprite rotation on snes base hardware

Today, I was having fun optimizing my sprite rotation algorithm I made a while ago for my homebrew game. How it works is it runs in the leftover CPU time in my game (whenever it is not calculating game logic) and it automatically fills the RAM with as many frames as possible (it's a compromise between smoothness, set up time, and amount of unique sprites) so when you approach a certain part of the level, the rotation is already calculated.

I got it just fast enough to do a 32x32 sprite per frame, if there is nothing happening onscreen. It doesn't look that fast on paper, but for a rotating 32x32 sprite with 64 angles, it only needs 32 frames because of horizontal/vertical flipping, which takes up only half a second.

I'm not that good with making notes, but I used tricks such as expecting positive numbers to ALWAYS leave the carry bit clear, and negative numbers to ALWAYS leave the carry bit set. I also used lookup tables to convert pixels to planar format.

Code
rotate_sprites_for_modular_animation:
-;
lda $0000,y
beq +
tax
phy
jsr rotate_sprite
ply
iny #2
lda.l {terminate_rotation}
beq -
+;
stz {modular_animation_data}
stz {terminate_rotation}
rts

rotate_sprite:
phb
php
sep #$20

lda #$80
sta {scratch_pad_ram}+34      //the LUTs are in bank $80
sta {scratch_pad_ram}+38      //These are pointers for the
sta {scratch_pad_ram}+42      //LUTs
sta {scratch_pad_ram}+46



lda $0004,x                   //$0000,x is ROM address
sta {rotation_step}           //$0002,x is ROM bank
stz {rotation_angle}          //$0004,x is rotation step amount
lda $000a,x                   //$0006,x is RAM address
asl #3                        //$0008,x is RAM bank
sta {size}                    //$000a,x is size 2=16x16, 4=32x32
asl #2
sta {d}
lda $0000,x
stz {x_pixel}
sta {x_pixel_hi}
lda $0001,x
stz {y_pixel}
sta {y_pixel_hi}
lda $0002,x
sta {scratch_pad_ram}+3       //these are the banks of the
sta {scratch_pad_ram}+7       //"pixel pointers"
sta {scratch_pad_ram}+11
sta {scratch_pad_ram}+15
sta {scratch_pad_ram}+19
sta {scratch_pad_ram}+23
sta {scratch_pad_ram}+27
sta {scratch_pad_ram}+31

lda $0008,x
pha
rep #$20


lda $0006,x
tax

plb                         //make data bank hold the RAM bank
phd                         //and X hold the destination address
lda #$0000
tcd
jsr convert_bitmap
pld
plp
plb
rts

new_rotation_step:             //I forgot how I did this math stuff
                               //but it's kind've like setting up
pla                            //mode 7 registers
sta.b {y_pixel}
pla
sta.b {x_pixel}

lda.l {terminate_rotation}
beq +
rts
+;

lda.b {rotation_step}
clc
adc.b {rotation_angle}
sta.b {rotation_angle}
cmp #$0080
bcc convert_bitmap
rts


convert_bitmap:
sep #$20
lda #$00
sta $004200
rep #$20

phx
lda.b {rotation_angle}
asl
and #$01fe
tax
lda $000000+sine,x
sta.b {sine}
lda $000000+cosine,x
sta.b {cosine}
plx

lda.b {x_pixel}
pha
lda.b {y_pixel}
pha

lda.b {sine}
clc
adc.b {cosine}
sta.b {a}
sep #$20
sta $00211b
xba
sta $00211b
lda.b {size}
lsr
sta $00211c
rep #$20
lda.b {size}
xba
clc
adc.b {a}
lsr
sec
sbc $002134
clc
adc.b {x_pixel}
sta.b {x_pixel}
lda.b {cosine}
sec
sbc.b {sine}
sta.b {a}
sep #$20
sta $00211b
xba
sta $00211b
lda.b {size}
lsr
sta $00211c
rep #$20
lda.b {size}
xba
clc
adc.b {a}
lsr

sec
sbc $002134
clc
adc.b {y_pixel}
sta.b {y_pixel}
lda.b {size}
sta.b {c}
lsr #3
sta.b {b}
lda.b {size}
asl #2
sta.b {a}

sep #$20
lda #$81
sta $004200
rep #$20

lda.b {cosine}          //adjust sine and cosine so clc and sec
bpl +                   //are not needed
dec
sta.b {cosine}
+;

lda.b {sine}
bpl +
dec
sta.b {sine}
+;

convert_bitmap_loop:
lda.b {c}
bne old_rotation_step
jmp new_rotation_step

old_rotation_step:
lda.b {x_pixel}
pha
lda.b {y_pixel}
pha
convert_line:


jmp convert_pixel

convert_pixel_done:

txa
clc
adc #$0020
tax
lda.b {b}
bne convert_pixel
pla
clc
adc.b {cosine}
adc #$0000
sta.b {y_pixel}
pla
clc
adc.b {sine}
adc #$0000
sta.b {x_pixel}
dec.b {c}
lda.b {size}
lsr #3

sta.b {b}
lda.b {size}
txa
sec
sbc.b {a}
inc #2
tax
bit #$000e
bne convert_bitmap_loop
clc
adc.b {d}
sec
sbc #$0010
tax
jmp convert_bitmap_loop


convert_pixel:                    //This is the most important part
                                  //of this code.
dec.b {b}
lda.b {y_pixel}                   //This is where pixels get drawn.
sta.b {scratch_pad_ram}+1
sec
sbc.b {sine}
sbc #$0000
sta.b {scratch_pad_ram}+5         //First it calculates the Y position
sbc.b {sine}                      //of every pixel,
sta.b {scratch_pad_ram}+9
sbc.b {sine}
sta.b {scratch_pad_ram}+13
sbc.b {sine}
sta.b {scratch_pad_ram}+17
sbc.b {sine}
sta.b {scratch_pad_ram}+21
sbc.b {sine}
sta.b {scratch_pad_ram}+25
sbc.b {sine}
sta.b {scratch_pad_ram}+29
sbc.b {sine}
sta.b {y_pixel}

lda.b {x_pixel}                   //Then it calculates the X position
sta.b {scratch_pad_ram}           //of every pixel.
clc
adc.b {cosine}
adc #$0000
sta.b {scratch_pad_ram}+4         //The top byte of the X position
adc.b {cosine}                    //overwrites the low byte of the
sta.b {scratch_pad_ram}+8         //Y position, creating the ROM
adc.b {cosine}                    //address of the pixel, in the
sta.b {scratch_pad_ram}+12        //format: bbbbbbbbyyyyyyyyxxxxxxxx
adc.b {cosine}                    //where b is the bank, and x and y
sta.b {scratch_pad_ram}+16        //are coordinates in a 256x256
adc.b {cosine}                    //bitmap image in the bank that
sta.b {scratch_pad_ram}+20        //contain the rotatable sprites.
adc.b {cosine}
sta.b {scratch_pad_ram}+24
adc.b {cosine}
sta.b {scratch_pad_ram}+28
adc.b {cosine}
sta.b {x_pixel}

lda [{scratch_pad_ram}+1]         //now it calculates the offsets of
asl #4                            //the planar look up tables
ora [{scratch_pad_ram}+17]
and #$00ff
asl
sta.b {scratch_pad_ram}+32
lda [{scratch_pad_ram}+5]
asl #4
ora [{scratch_pad_ram}+21]
and #$00ff
asl
sta.b {scratch_pad_ram}+36
lda [{scratch_pad_ram}+9]
asl #4
ora [{scratch_pad_ram}+25]
and #$00ff
asl
sta.b {scratch_pad_ram}+40
lda [{scratch_pad_ram}+13]
asl #4
ora [{scratch_pad_ram}+29]
and #$00ff
asl
sta.b {scratch_pad_ram}+44


ldy #packed_to_planar_lo
lda [{scratch_pad_ram}+32],y       //now it packs together bitplanes
asl                                //0 and 1
ora [{scratch_pad_ram}+36],y
asl
ora [{scratch_pad_ram}+40],y
asl
ora [{scratch_pad_ram}+44],y
sta $0000,x

ldy #packed_to_planar_hi           //now it packs together bitplanes
lda [{scratch_pad_ram}+32],y       //2 and 3
asl
ora [{scratch_pad_ram}+36],y
asl
ora [{scratch_pad_ram}+40],y
asl
ora [{scratch_pad_ram}+44],y
sta $0010,x

jmp convert_pixel_done

packed_to_planar_lo:
dw $0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101	//DCBAdcba > ---B---b---A---a
dw $0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111
dw $1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101
dw $1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111
dw $0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101
dw $0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111
dw $1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101
dw $1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111
dw $0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101
dw $0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111
dw $1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101
dw $1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111
dw $0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101,$0000,$0001,$0100,$0101
dw $0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111,$0010,$0011,$0110,$0111
dw $1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101,$1000,$1001,$1100,$1101
dw $1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111,$1010,$1011,$1110,$1111

packed_to_planar_hi:
dw $0000,$0000,$0000,$0000,$0001,$0001,$0001,$0001,$0100,$0100,$0100,$0100,$0101,$0101,$0101,$0101	//DCBAdcba > ---D---d---C---c
dw $0000,$0000,$0000,$0000,$0001,$0001,$0001,$0001,$0100,$0100,$0100,$0100,$0101,$0101,$0101,$0101
dw $0000,$0000,$0000,$0000,$0001,$0001,$0001,$0001,$0100,$0100,$0100,$0100,$0101,$0101,$0101,$0101
dw $0000,$0000,$0000,$0000,$0001,$0001,$0001,$0001,$0100,$0100,$0100,$0100,$0101,$0101,$0101,$0101
dw $0010,$0010,$0010,$0010,$0011,$0011,$0011,$0011,$0110,$0110,$0110,$0110,$0111,$0111,$0111,$0111
dw $0010,$0010,$0010,$0010,$0011,$0011,$0011,$0011,$0110,$0110,$0110,$0110,$0111,$0111,$0111,$0111
dw $0010,$0010,$0010,$0010,$0011,$0011,$0011,$0011,$0110,$0110,$0110,$0110,$0111,$0111,$0111,$0111
dw $0010,$0010,$0010,$0010,$0011,$0011,$0011,$0011,$0110,$0110,$0110,$0110,$0111,$0111,$0111,$0111
dw $1000,$1000,$1000,$1000,$1001,$1001,$1001,$1001,$1100,$1100,$1100,$1100,$1101,$1101,$1101,$1101
dw $1000,$1000,$1000,$1000,$1001,$1001,$1001,$1001,$1100,$1100,$1100,$1100,$1101,$1101,$1101,$1101
dw $1000,$1000,$1000,$1000,$1001,$1001,$1001,$1001,$1100,$1100,$1100,$1100,$1101,$1101,$1101,$1101
dw $1000,$1000,$1000,$1000,$1001,$1001,$1001,$1001,$1100,$1100,$1100,$1100,$1101,$1101,$1101,$1101
dw $1010,$1010,$1010,$1010,$1011,$1011,$1011,$1011,$1110,$1110,$1110,$1110,$1111,$1111,$1111,$1111
dw $1010,$1010,$1010,$1010,$1011,$1011,$1011,$1011,$1110,$1110,$1110,$1110,$1111,$1111,$1111,$1111
dw $1010,$1010,$1010,$1010,$1011,$1011,$1011,$1011,$1110,$1110,$1110,$1110,$1111,$1111,$1111,$1111
dw $1010,$1010,$1010,$1010,$1011,$1011,$1011,$1011,$1110,$1110,$1110,$1110,$1111,$1111,$1111,$1111
This looks really cool, but I'm not an asm expert yet. Maybe someone else who is a asm genius can chime in.

Also what did you need help with on this? Or are you just showing some asm skills. This is usually the help thread on asm.
Pretty cool-looking (even though I don't understand half of it), but you should ask a mod to move it here.
Here is a demo of my game that uses the rotation code. If you want to see how fast it can rotate, go to level 2 because there is a pirate with a rotating canon right at the start of the unfinished level. To get to level 2, push start while in level 1.

I also had to do a lot of tricks getting a lot of fluidly animated characters onscreen at once. One trick was predicting when it needs to DMA more than 4kB at once, and delaying the animation of a character by a frame to prevent screen tearing glitches (I actually learned this trick from reading DKC source code). Another trick I use is checking for duplicated sprites.

http://bin.smwcentral.net/u/28835/Alisha%2527s%2BAdventure.zip
Very interesting as well as impressive. I've tried comprehending the code but it's a bit too advanced for me. Looks like there's a lot of (for me) complicated math and logic involved.

How do you define the 32x32 graphic which needs to be rotated? Do you supply the routine a pointer to the beginning of the 32x32 graphics?

Considering the graphics need to be uploaded in vblank, do you just DMA the graphics from RAM to VRAM inside NMI?

How do you deal with multiple cannons on-screen? Will they be able to have separate rotation frames, or will they rotate perfectly in sync? Will it cause significant slowdown having two or more cannons on the screen?

Also, some questions somewhat unrelated to the rotation itself but they still piqued my interest:
Quote
How it works is it runs in the leftover CPU time in my game (whenever it is not calculating game logic)

How do you actually check if there's leftover CPU time in the game? I mean it does *sound* easy but I never really thought about how this can be done. I can't think of any SNES hardware register which can help me out with this, for example, or coding tricks for that matter. The way you wrote that sentence made it sound as though as you can actually *measure* how much time there is left, as well as measure how much time the rotation rendering will take.

I noticed the boss in level 1 can also rotate and roll around, does it use the same rotation method as the cannon, pre-rendering rotation frames? Or did you just manage to fit all frames in the graphics files?

Quote
and delaying the animation of a character by a frame to prevent screen tearing glitches

I guess you just skip the graphics routine for the sprite for one frame or so, or how exactly does this work?
My blog. I could post stuff now and then

My Assembly for the SNES tutorial (it's actually finished now!)
Originally posted by Ersanio
Very interesting as well as impressive. I've tried comprehending the code but it's a bit too advanced for me. Looks like there's a lot of (for me) complicated math and logic involved.

How do you define the 32x32 graphic which needs to be rotated? Do you supply the routine a pointer to the beginning of the 32x32 graphics?


It starts the routine with a list of sprites/joints that need to be rotated in the level in Y index, and it points to data that contains where from, where to, size and how many rotation steps. I'll add some notes in my code to explain what is what.

Quote

Considering the graphics need to be uploaded in vblank, do you just DMA the graphics from RAM to VRAM inside NMI?


Well yeah it does it during vblank. I don't really know what you mean by inside NMI. I just have a NMI interrupt at the beginning of vblank, and it returns from the interrupt at the end of the main game logic code.

Quote

How do you deal with multiple cannons on-screen? Will they be able to have separate rotation frames, or will they rotate perfectly in sync? Will it cause significant slowdown having two or more cannons on the screen?


They can rotate separate from each other. It won't cause any slowdown, but the average framerate of the cannon's rotation will gradually decrease depending on how many of them are on screen. It can DMA up to 8 32x32 sprites per frame, any more than that, and it starts alternating.

I animated most of the characters at 15fps, because I figured that it takes 4 frames to update 16kB of vram.

Quote


Also, some questions somewhat unrelated to the rotation itself but they still piqued my interest:
Quote
How it works is it runs in the leftover CPU time in my game (whenever it is not calculating game logic)

How do you actually check if there's leftover CPU time in the game? I mean it does *sound* easy but I never really thought about how this can be done. I can't think of any SNES hardware register which can help me out with this, for example, or coding tricks for that matter. The way you wrote that sentence made it sound as though as you can actually *measure* how much time there is left, as well as measure how much time the rotation rendering will take.


The game starts with the initialization code, then it gets stuck in an infinite loop that contains the rotation code. Then it gets an NMI code, does the vblank and game logic code. Then it returns to where it left off in the rotation code. Then it gets interrupted again.


Quote

I noticed the boss in level 1 can also rotate and roll around, does it use the same rotation method as the cannon, pre-rendering rotation frames? Or did you just manage to fit all frames in the graphics files?


Same technique.

Quote


Quote
and delaying the animation of a character by a frame to prevent screen tearing glitches

I guess you just skip the graphics routine for the sprite for one frame or so, or how exactly does this work?


This is the order of how my game engine works:

1) DMA everything
2) Object AI, physics and collision
3) Animation management (this routine figures out what to DMA and where in VRAM, and what animations to hold off until the next frame)
4) Draw sprites to OAM

The sprites still get shown and "moved" onscreen every frame, it's just the change in animation frame that gets delayed. For example, if Alisha jumps, sometimes it will still show her "standing" while jumping for a frame or two, before it shows the correct jumping animation.