Language…
14 users online:  Alex, DanMario24YT, Dennsen86, derv82, drkrdnk, eltiolavara9, Fozymandias, GRIMMKIN, Nayfal, prisvag, Red2010, Silver_Revolver, steelsburg, yno14jax - Guests: 299 - Bots: 277
Users: 64,795 (2,374 active)
Latest user: mathew

Comprehensive Super FX ASM Guide

AdvancedASM Coding

Note: This is a really long tutorial, make sure you are comfortable and read it carefully. Take pauses and remember to go steady.

This tutorial is intended to teach the basic aspects of Super FX assembly, from the basic registers to sorta complex codes. If you have a question and/or think something isn't clear/right, please post in this thread rather than PMing me.

Index:
A)CPU Basics
B)Getting used to the registers
C)Basic operations
D)Example codes
E)Chip Activation

A)CPU Basics

What is Super FX? It is a Co-processor for general usage, clocked at 21.7 MHz (being capable of receiving overclocks at maximum 60 MHz) and RISC architecture. The Super FX makes use of 16 registers, each being 16-bit in size, ranging from R0 to R15. This CPU also have a pipeline system, loading the next instructions as one is executed. There are other registers accessed by SNES only, they will be covered on this tutorial as well.

B)Getting used to the registers

As it is noted, Super FX contains 16 registers for usage but not all them are general only. This table will show the registers and their relation with Super FX, registers in Italic means that GSU can't access it:

Register Name Super NES CPU Address Special Functions
R0 $3000:$3001 Default Source/Destination Registers
R1 $3002:$3003 PLOT Instructions, X coordinate
R2 $3004:$3005 PLOT Instructions, Y coordinate
R3 $3006:$3007 -
R4 $3008:$3009 LMULT Instructions, lower 16 bits
R5 $300A:$300B -
R6 $300C:$300D FMULT and LMULT instructions, multiplication
R7 $300E:$300F MERGE instruction, source 1
R8 $3010:$3011 MERGE instruction, source 2
R9 $3012:$3013 -
R10 $3014:$3015 None but best used as Stack Pointer
R11 $3016:$3017 LINK instruction destination register
R12 $3018:$3019 LOOP instruction counter
R13 $301A:$301B LOOP instruction branch
R14 $301C:$301D ROM address pointer
R15 $301E:$301F Program counter
Status/Flag $3030:$3031 Indicates the status of the GSU.
Program Bank Register $3034 The program bank register specifies the memory bank register to be accessed.
ROM Bank Register $3036 (Read-Only) The ROM bank register specifies the ROM bank when loading data from ROM using the ROM buffering system.
RAM Bank Register $303C (Read-Only) The RAM bank register specifies the RAM bank when loading/writing data from RAM.
Cache Base Register $303E:$303F (Read-Only) The cache base register specifies the starting address when data are loaded from ROM or RAM to the cache RAM.
Screen Base Register $3038 (Write-Only) The screen base register is used to specify the start address in the character data storage area.
Screen Mode Register $303A (Write-Only) The screen mode register specifies the color gradient and screen height during PLOT processing and controls ROM and RAM bus assignments.
Colour Register - The colour register contains data which specifies the colours to be plotted when PLOT processing is performed.
Plot Option Register - The plot option register contains flags which specify the mode to be used when a COLOR, GETC, or PLOT instruction is executed.
Backup RAM Register $3033 (Write-Only) Makes sure data at Banks $78:$79 get protected or not for writing.
Version Code Register $303B (Read-Only) Checks for the version of the Super FX chip.
Config Register $3037 (Write-Only) The CONFIG register selects the operating speed of the multiplier in the GSU and sets up a mask for the interrupt signal.
Clock Select Register $3039 (Write-Only) This register assigns the Super FX operating frequency.


C)Basic operations

Now that you are aware of the registers, you will learn the basic codes, this section deals with knowledge of the operations so we can apply them later in this tutorial.

Immediate operations:
Code
IBT R10,#$83
IWT R4,#$1234

Super FX is a 16-bit CPU, almost every 8-bit operation will sign-extend the bytes to words by grabbing the bit 7 and copying from bits 8 through bit 15. The above code clarifies that, doing IBT R10,#$83 makes R10 = $FF83. Why? #$83 in binary equals to 1000 0011, count the bits, the last bit is always copied from the upper bits. Word operations sets the values as is.

Loading banks:
Code
IBT R7,#$10
FROM R7
ROMB
IBT R0,#$01
RAMB

The above code sets ROM bank to $10, meaning that ROM operations will be done taking bank $10 in mind, also, it sets bank $71, so RAM operations are done in bank $71.

Source and Destination:
Code
TO R10
GETB
FROM R1
COLOR
WITH R3
SUB R3

The above code sets R10 as destination from the Get Byte from ROM. So Super FX gets data from ROM and puts on R10. Later on, it sets R1 as source and puts the value of R1 to the color register. Later, it sets R3 as both source and destination and subtracts R3 from it.

The above operation can be read like this: Data -> R10 ; R1 -> COLOR ; R3-R3=R3 (0)
Be aware that R0 is the default Source and Destination register, whatever operation you do that isn't a branch nor MOVE/MOVES/ALT operations, resets the source and destination to R0.

Store and Loading:
Code
GETB
STW (R4)
LDB (R1)
SM ($5678),R0
LM R3,($1234)

GETB loads a byte from ROM, taking account address in ROMB:R14. STW stores a 16-bit (word) value from source to the value in the register, for example, in that code, the register used is R4, if R4 = $3232 it means that data will be stored in address RAMB:$3232. On the other hand, LDB does the reverse operation, loading a value (in this case a byte) on the destination register. SM and LM do the same except it loads/stores 16-bit values only and you can specify the address. NOTE: If you load 16-bit values, take care of even/odd addresses. If address is even, the high byte will be located at Address+1, however if address is odd, the high byte will be located at Address-1.

Jumping and Comparing:
Suppose that R1 = $8000 and R3 = $2FFF
Code
FROM R1
CMP R3
BCS Label
NOP

Code
LINK #4
IWT R15,#Label
NOP
Return:
[...]
Label:
JMP R11

Code
FROM R1
LJMP R11
NOP

The above code is a simple compare, same operation as SNES does, set the source to compare, if it sets the flags then you can set the branches.

The other code is a simple "subroutine", LINK (ranges from 1-4 bytes) loads the return address by doing (1 thru 4)+ R15 = R11. R15 is the Program Counter, it is where the processor is executing the codes, so if you modify R15, you are basically jumping to routines. Changing R15 makes Super FX jump to the desired location. NOTE: Due to pipeline, you should be careful for two or more bytes when jumping, Super FX will only read the first byte of the next instruction.

JMP works like the SNES version, except you use the register as jump address while LJMP does a long jump, it works by getting the source as the bank and the other register as address to jump.

Bitshift, Addition and Subtraction operations:
Code
INC R0
DEC R1
ADC R2
ADC #3
ADD R4
ADD #5
SBC R6
SUB #7
SUB R8
ASR
LSR
ROR
ROL

The above operations are pretty much self explanatory. Increase register by 1. Decrease register by 1. Add with Carry from source to destination (Source + Rn = Destination). Add without carry. Subtract with carry. Subtract without carry. Arithmetic shift right. Logical shift right. Rotate through carry right and Rotate through carry left.

The difference of the shifts is that ASR copies bit 15 into itself while LSR doesn't, shifting normally.

Bitwise operations:
Code
FROM R1
NOT
AND R1
OR R5
XOR R2
BIC #15

NOT is a simple operation, it inverts every bit. AND compares the values and if the bits match, the bit value is maintained if not the other is discarded. OR works the opposite way as AND, if the bits don't match, then the value are maintained rather than discarded. XOR albeit similar of the OR instruction, this one takes in consideration that bits SHOULDN'T be matched, otherwise they'll be inverted and last but not least, BIC performs logical AND on corresponding bits of source register and the 1's complement of register specified in register, this means the value stated will be inverted THEN AND operation will be done.

Multiplication:
Code
TO R1
MULT R0
TO R2
UMULT R0
LMULT

Multiplcation is simple on Super FX. MULT and UMULT does 8-bit multiplication only while LMULT and FMULT does 16-bit calculations.

The difference of MULT and UMULT is that MULT does signed operations (it checks for the 7th bit) while UMULT doesn't, also they differ from LMULT and FMULT that they can set registers to multiply from whereas FMULT and LMULT uses R6 as prefixed register to multiply from and R4 as low word destination from the 32-bit result.

For example: Source -> R5 = $52CF and R1 = $63CF
Code
FROM R5
MULT R1

The result would be R0 = $0961. Why? The operation is 8-bit but result is 16-bit, it takes account for sign bit. You can do yourself on Windows calculator, $FFCF*$FFCF=$0961. As for an unsigned multiplication, let's take this for example: Source -> R5 = $364F and R1 = $B2CF
Code
FROM R5
UMULT R1

The result would be R0 = $3FE1. Why? Same reason as above, HOWEVER the operation isn't signed, therefore in Windows calculator, you'd do $004F*$00CF=$3FE1.

Long multiplications are a tad harder but they do good in complex operations. FMULT omits the R4 destination while LMULT sets the whole result, take this as an example: Source -> R5 = $B556 and R6 = $DAAB
Code
FROM R5
LMULT

The result would be: R0 = $0AE3 and R4 = $5C72. To check the result in Windows calculator, do $FFFFB556*$FFFFDAAB = $0AE35C72. Remembering, only UMULT doesn't account for most significant bit (either bit 7 in 8-bit operations or bit 15 in 16-bit operations.)

Loop and Cache:
Code
IBT R12,#$04
CACHE
MOVE R13,R15
[...]
LOOP
NOP

The above code is simple, R12 sets the amount of times a routine should be looped. The MOVE opcode copies address from R15 (PC) to Looback address Register. The CACHE opcode needs to be used prior loops so when a LOOP command is executed, the contents of data will be ran on Cache RAM next time rather than ROM/RAM. LOOP decrements R12 and checks if it is zero, if it is, don't loop again, otherwise, jump to the address specified in R13.

Misc. Code:
Code
SWAP
MOVE R11,R15
TO R12
HIB
LOB
MERGE
STOP

Well, starting with SWAP. SWAP changes the position of high byte to low byte and vice versa. For example, if R0 is $1234, after a SWAP it'd be $3412. MOVE copies the value from source to destination, the syntax is MOVE Destination,Source. MOVES does the same as MOVE except it sets flags that can be useful for testing values. HIB gets the high byte value and places on low byte from destination. LOB does AND #$00FF and gets the low byte only.
MERGE is a tad complicated operation but it works like this: MERGE gets the high byte of R7 and places on the high byte of destination register while gets the high byte of R8 and places on the low byte of the destination register, effectively merging them.
STOP does as is, it stops Super FX's clock for SNES to read the output result.

Code
ADD R3
ALT1
ADD R3
ALT2
ADD R3
ALT3
ADD R3

The above code deals with alternate codes, by using ALTn instructions, you can replace certain operations with others. It is done automatically on the assembler but you can use it to save a few bytes or cycles even.
For example, without any ALTn, ADD R3 stays as is. With ALT1, then ADD R3 turns into ADC R3. With ALT2, ADD R3 turns into ADD #3. With ALT3, then ADD R3 turns into ADC #3. Beware of them!

Bitmap Code:
Code
IBT R0,#$02
CMODE
FROM R7
COLOR
LOOP
PLOT

The bitmap code is easy to understand but they require attention when working with them. CMODE sets the flags for the PLOT operation, such as transparency, dither, sprite mode and 256 bit colour. COLOR reads the source address to get the palette index for plotting.
The PLOT opcode works like a printer, it reads for the X and Y coordinates (specified by R1 and R2 respectively), the palette index pointed by COLOR and the Screen Base Register. Take into mind that PLOT will increment X so you don't have to do it.

Code
RPIX
GETC

The above code is extra codes for bitmap processing, RPIX is the alternate code for PLOT, it reads the pixel position by checking the coordinates and reads the colour information on the destination register. The GETC works like GETB except that it places data straight into Colour Register.

A reminder that this section deals only with the basics of code, below, I will do simple examples, with commented code for easier understanding.

D)Example codes

The codes below are just examples, they are shown here to present you, how you should interpret and understand the usage of the opcodes and how to assemble them as you need.
Code
SUB R0			;Do R0-R0=R0 (0)
RAMB			;Store bank value from R0 to RAM Bank
IBT R1,#$44		;R1 = $0044
IWT R2,#$8000		;R2 = $8000
FROM R1			;Source is R1
STB (R2)		;Store byte value from R1 on address at R2. High byte is ignored.

Code
IBT R0,#$01		;R1 = $0001
ROMB			;Store bank value from R0 to ROM Bank
IWT R2,#$1DFB		;R2 = $1DFB
IWT R14,#$8000		;R14 = $8000 - Also start ROM buffering (ROM pointer)
TO R6			;Set R6 as destination
GETB			;Get data from ROM to destination - ROMB:R14 - In this case $01:8000
TO R4			;Set R4 as destination
LDW (R2)		;Load word value from address in R2 to destination in R4. R0 turns destination again.

Code
LINK #4			;Get return address by doing R15+4 = R11
IWT R15,#JumpHere	;Jump to Label
WITH R5			;Meanwhile load this opcode and make R5 source and destination

ReturnLabel:
FROM R5			;When return, get R5 as source
ADD R1			;Do R5+R1=R2
SM ($1AAF),R2		;Store the result (16-bit) from R2 to address $1AAF
STOP			;Stop the CPU

JumpHere:
UMULT #5		;Do R5*5=R5
JMP R11			;Return
TO R2			;Set R2 as destination

Code
IBT R1,#$80		;R1 = $FF80
FROM R1			;Set R1 as source
TO R2			;Set R2 as destination
XOR #15			;Do R1^F = R2 ($FF8F)
FROM R2			;Set R2 as source
AND R1			;Do R2 & R1 = R0 ($FF80)
BIC R2			;Do R0 & (~ R2) = R0 ($0000) - It Inverts the register THEN it ANDs it.
NOT			;Invert all bits = R0 ($FFFF)

Code
IBT R0,#$02		;R0 = $0002
CMODE			;Set Transparency and Dithering Mode
IBT R1,#$00		;\ Clear X and Y positions
IBT R2,#$00		;/
IBT R12,#$15		;Loop 14 times
IWT R7,#$8000		;Set RAM area
CACHE			;The subsequent code will be read on cache
MOVE R13,R15		;Set loopback address
LDB (R7)		;Load byte to R0 (If not specified, source/destination will be ALWAYS R0)
INC R7			;After loading byte, increment the address
COLOR			;Store data from R0 to Colour Register
LOOP			;Loop
PLOT			;Plot the colour on specified coordinates, increasing X (This will draw a line)
STOP			;After looping enough, stop CPU


Obviously, this covers only the basics for the Super FX ASM, to achieve the maximum potentiality of this tutorial, you should practice doing the codes and seeing the results, there is no secret, it is trial and error.

E)Chip Activation

In order to use Super FX, a few steps must have to be taken:
1. Move the NMI/IRQ routines to RAM and repoint the vectors
2. Move the Super FX Invoke routine on WRAM ($7E or $7F preferably the former)
3. Setup the initial hardware configuration for GSU
Code
org $FFE0	;Repoint Vector Info - Native
dw $0100,$0100,$0104,$0100,$0100,$0108,$8000,$010C

org $FFF0	;Repoint Vector Info - Emulation
dw $0100,$0100,$0104,$0100,$0100,$0108,$8000,$010C

Basically you are repointing the vector info for the IRQ and NMI as well BRK routines. Also, you can set up checks, so you don't have to fully upload the NMI/IRQ routines to be on WRAM, basically making a check if GSU is active, so you can wait until processing is done.

By the way, to make use of the Super FX chip, you need to upload the invoke data to WRAM, the data is something like this:
Code
	LDX #$3D			;\ Give Super FX Game Pak ROM and RAM access
	STX $303A			;/
	STY $3034			; Set Super FX bank
	STA $301E			; (PC)
	LDA #$0020			;\	Check for G (Go) Flag
-	BIT	$3030			; | If routine isn't finished yet
	BNE	-			;/	Loop until it is...
	SEP #$20			; Clear 16-bit mode
	STZ $303A			; Give back ROM and RAM access to the SNES
	RTS

After uploading the code on WRAM and set up the IRQ/NMI routines, you have to write your own routines for Super FX, after you do that, you need to invoke the code, to do that, you do:
Code
REP #$20
LDY.b #Label>>16		;\ Put address in the proper place...
LDA.w #Label			;/
JSR $xxxx			; Call Super FX and wait.
[...]				; *other code*

arch superfx			; REMEMBER! Use asar to easily create routines
				; using Super FX's ASM language
Label:
[...]				; Code goes here
STOP				; Finish processing data.
arch 65816			; Return to SNES ASM mode

LDY will hold the Bank address of the GSU routine while LDA will hold the address within the bank, then you jump to the invoke routine you uploaded. By the way, REMEMBER to finish GSU routines with a STOP, or else you have serious chances of making things crash.

Also, read the comments well, it covers the operations of each example code, so you can see what each does. It is a short tutorial but don't be afraid to ask questions if you have any doubts.

For more information, I disponibilized a material for reading that contains the opcodes and overall info of Super FX that you can get it here.
Also, you can consult this amazing Super FX guide, it contains tables and information that this tutorial doesn't cover for more info: Link Here!

Planned features:
- Work with SA-1
- Work with DSP-1

Update: - Added information about the External Site.
- Added information about how activate Super FX routines.
Great job! I might make some rogue edits to this over time if you don't mind ;)
Not at all, the more knowledge to share, the better! It is better to ask questions in order to solve any doubts regarding Super FX ASM and moreover other kinds of Co-processor as well.
I think its kinda easy of how this was based on the basic ASM that interprets the process of the CPU that is well regulated and understandable for somehow related to Banks, destination/sources, etc. I think this is very understandable. Nice job. :3
From what I know, no flashcart can utilize the Super FX chip yet. Since you know so much about this co processor chip have you looked into making the Super FX work on SD2SNES?

Very impressive tutorial.
Thanks! I hope this can help other people to understand and learn how to code routines for this processor.

As for the SD2SNES, not yet, a friend of mine already told me to check it out but I haven't found time for that but I might give a look into it. If possible even, try to implement the unused 8MB map that Nintendo didn't used on their carts.

Could you add something on loading and executing the SuperFX program from the main program? I'm using the tutorial to make an entirely new program for the SNES.
I made an update to the thread, now including the information on how to activate the chip to process your routine.

Also, be aware that if Super FX is running, you must not use ROM and/or SRAM, so your operations gets limited into working at WRAM mostly.

I have a triangle drawing algorithm

To test if a point is in a triangle, it must be on the inside of all three edges. This can be tested by the following:

A0 = Y1-Y2
B0 = X2-X1
C0 = (X1*Y2)-(X2*Y1)
Edge0 = (A0*X)+(B0*Y)+C0

So, if edge0 is positive, it is inside the edge. Replace 1 and 2 with the following for each edge:

Edge0 1 2
Edge1 2 0
Edge2 0 1

As you can see it rotates, so there must be counterclockwise orientation. Another thing is that if the point is 0,0 (the first point tested), the A0 and B0 terms go to zero, leaving only C0. If X is incremented, add A0. Likewise, add B0 for a Y increment.

Pseudo code is as follows:
Code
findEdgeCoefficients();
rowCoefficientInits = edgeCoefficientCValues;
for each X in 0 to width
{
rowCoefficients = rowCoefficientInits;
for each Y in 0 to height
{
if (rowCoefficients >= 0) plotPoint();
rowCoefficients += edgeCoefficientAValues;
}
rowCoefficientInits += edgeCoefficientBValues;
}
This tut ain't that long, you liar. Well, maybe a bit long, but not really, really, really, really long.

I'm gonna try this on Yoshi's Island, but I wanna know what the game uses Super FX's new commands for. Can you still do LDA and STA? Hope so.

I feel sorry for your tut, because very few people know Super FX ASM. Another question: How am I gonna use this on Super Mario World? uberASM Tool (why does Vitor say it as UberASM?) isn't compatible with it yet.
My Mode 0 guide.

My Discord server. It has a lot of archived ASM stuff, so check that out!

ICYMI: The Super FX Development Kit has been retired after discussions, combined with a lack of popularity compared to SA-1. It has been archived in this thread in case anyone wants to pick it back up, but keep in mind, it has bugs that have crawled since its last release.

AdvancedASM Coding