Lately I've been attempting to optimize the LC_LZ2 decompression routine of SMW. It went perfectly fine until I hit my limit.
Why I'm optimizing this? To decrease the level loading time even if it is for a split-second. My code currently looks as the following:
I'm pretty sure that this can be optimized even more. But seeing that I hit my limit, I have no idea how. I'd prefer advanced ASM hackers to contribute to this optimization. But other people can try too.
With optimizing I mean faster code, even if it costs ROM space. This means that the code should use minimal amount of cycles. It's all for the sake of decreasing level loading times and what not. It would be awesome if we actually saw some visible faster loading time.
You can find a list of cycles in this document.
My blog. I could post stuff now and then
My Assembly for the SNES tutorial (it's actually finished now!)
Why I'm optimizing this? To decrease the level loading time even if it is for a split-second. My code currently looks as the following:
Code
HEADER LOROM !Freespace = $1D8000 ORG $00B8E3 JML Decomp_start macro ReadByte() LDA [$8A] LDX $8A INX BNE + LDX.w #$8000 INC $8C + STX $8A endmacro ORG !Freespace Decomp: .return JML $00B8EA .start %ReadByte() CMP.b #$FF BEQ .return STA $8F AND.b #$E0 CMP.b #$E0 BEQ + PHA LDA $8F REP #$20 AND.w #$001F BRA .label2 + LDA $8F ASL ASL ASL AND.b #$E0 PHA LDA $8F AND.b #$03 XBA %ReadByte() REP #$20 .label2 INC A STA $8D SEP #$20 PLA BEQ .label3 BPL .nextup .label4 %ReadByte() XBA %ReadByte() TAX - PHY TXY LDA [$00],Y PLY STA [$00],Y INY INX REP #$20 DEC $8D SEP #$20 BNE - JMP.w .start .nextup ASL BPL .label5 ASL BPL .label6 %ReadByte() LDX $8D - STA [$00],Y INC A INY DEX BNE - JMP.w .start .label3 %ReadByte() STA [$00],Y INY LDX $8D DEX STX $8D BNE .label3 JMP .start .label5 %ReadByte() LDX $8D - STA [$00],Y INY DEX BNE - JMP .start .label6 %ReadByte() XBA %ReadByte() LDX $8D - XBA STA [$00],Y INY DEX BEQ + XBA STA [$00],Y INY DEX BNE - + JMP .start
I'm pretty sure that this can be optimized even more. But seeing that I hit my limit, I have no idea how. I'd prefer advanced ASM hackers to contribute to this optimization. But other people can try too.
With optimizing I mean faster code, even if it costs ROM space. This means that the code should use minimal amount of cycles. It's all for the sake of decreasing level loading times and what not. It would be awesome if we actually saw some visible faster loading time.
You can find a list of cycles in this document.
My blog. I could post stuff now and then
My Assembly for the SNES tutorial (it's actually finished now!)