Language…
14 users online:  AmperSam,  Anorakun, DinoMom, Golden Yoshi, Green, Hiro-sofT, Kerd, LightAligns, Papangu, Pizzagamer9791, Silver_Revolver, Skewer,  Telinc1, yoshisisland - Guests: 299 - Bots: 547
Users: 64,795 (2,368 active)
Latest user: mathew

HDMA Coding Tutorial

ASM CodingHDMA

HDMA Tutorial

Welcome to my tutorial where I'm teaching you on how to use HDMA!

What you need:
  • A Super NES ROM (could be homebrew, we use SMW)
  • A text editor
  • Anoni's Register Documentation or any other document which describes registers on the SNES
  • An assembler (such as Asar), though for SMW, we use UberASM Tool

Table of Contents


What is HDMA

HDMA, short for Horizontal or H-Blank DMA, is a special type of DMA on the SNES which transfers some data for each scanline. The idea is that a cathode ray tube (also known as CRT) includes an electron beam which draws ("scans" in technical terms) the screen by going from left to right, top to bottom.
During the interval of going from the next line (i.e. right to left) as well the next image (i.e. bottom to top), the electron beam is turned off since an active one would leave some garbage data. That period is called "blanking" since the beam doesn't draw anything, that portion of the screen is basically blank. The blanking period when a new line is drawn is called "h-blank" (I hope you now know why HDMA is named that way) whereas the period where a new image is drawn is called "v-blank".
This is important since only when the electron beam is turned off, you can freely update the screen, otherwise accessing the PPU registers will interfere with the PPU and will mess up the screen. Furthermore, the two blanking periods aren't weighted equally: In general, you want to update the screen in v-blank since that is the largest blanking period as well as define how the screen will look like on a large scale. H-blank, on the other hand, is more useful for manipulating things on a per-scanline base, you can update the screen mid-screen.
The latter is how various effects such as gradients, parallax scrolling, 3D perspective and many more are achieved, essentially giving the PPU a different image to draw from certain scanlines onwards!

You can see Retro Game Mechanics Explained's video about blanking for more information. In particular, it shows you how blanking is generally modelled.


On the SNES, there are realistically two methods to write in h-blank: Interrupts and HDMA.
Interrupts are a method of computing where a certain code is stopped in favour of another code (hence interrupting). On the SNES, its two interrupts, NMI and IRQ, are primarily build for blanking where the former fires when v-blank starts and the latter can be programmed to be fired at any pixel which makes it great for running code in h-blank. This is e.g. how SMW handles the status bar and why it doesn't seem to get affected by other layer 3 images:

The top part of the screen has got layer 3 at position (0, 0) and colour maths on layer 3 disabled whereas the bottom part uses the layer 3 offsets from $22 and $24 and can have colour maths on layer 3 enabled.

HDMA, on the other hand, is more limited in scope in the sense that you can't run code with it. However, it is fast and simple to program which in most cases is more superior to IRQ, particularly because you can write data at specific scanlines without interrupting the main loop. SMW examples are the various windowing effects such as message boxes, the level end circle and keyhole:

This effect wouldn't be really feasible with IRQ, particularly because it would otherwise lag the game quite a bit since the scanlines with windowing couldn't be used to handle physics and the like.

Setting up UberASM Tool

For Super Mario World, our methods to run HDMA is generally done with UberASM Tool. If you're working with other games, you'll probably use a related patch if the game doesn't ship with it already. For homebrew, I assume it is your project so it's your job to set up HDMA with your system.

For SMW and UberASM, all you have to do is to assign an ASM file to level. The included list.txt will show you how its done. The ASM file must contain at least one of the four labels:
  • load runs during the Mario Start screen, before the level has been build. This includes transitions between two sublevels despite lacking the Mario Start otherwise.
  • init is also the level load but later than load and is the short period between Mario Start (i.e. build level) and the fade to the level. This is where the graphics are uploaded as well as how the display for the level will be set up.
  • main runs for every frame inside the level except when a message box is active.
  • nmi is a special label which runs at the beginning of a very special routine. The label name come up at some point and I hope you know what its primarily purpose is.
For the most part, init and main are enough which includes using HDMA. load can come up at the object creation routine whereas nmi only if it has to be used (and for the most part, you don't need it with HDMA).

Either way, we should start with the first chapter.

Brightness HDMA

In order to make you familiar with HDMA, let's start with the most simple one: Creating a brightness gradient.
This is generally one of the first things you'll typically learn since these are one of the most simple implementations of HDMA.

What this will teach you:
  • Controlling brightness.
  • Get to know the DMA and HDMA registers.
  • Setting up a simple HDMA transfer.

Let's start with setting up HDMA: The SNES supports up to 8 DMA operations which can be either normal DMA (data transfer) or HDMA (one write per scanline). Both of them work by loading data from a given address and store them to a PPU register or read from one and store to a data block.

To start with the coding, let me introduce the DMA registers: DMA registers are located at the $43xy area with x being the DMA channel (x = 0 - 7) and y being the controller (y = 0 - F). The result is that you can easily figure out, which DMA channel is what: $4300-$430F all refer to channel 0, $4310-$431F all refer to channel 1, etc.
It should be noted that in SMW, you can realistically only use channels 3 - 6. Channel 7 is used for windowing HDMA and using it will mess things up (e.g. message boxes, level end circle) while the remaining channels are used for DMA. SA-1 Pack fixes the issue a bit in the sense that only two channels are used for DMA while channel 2 for windowing.

Naturally, to explain the individual controllers, I use the x placeholder for the registers since they're identical across all eight channels.

$43x0: DMAPx
This register contains the settings for the DMA, the properties. These settings are separated into various bits using the following format: da-ifttt

  • d - Transfer Direction. What it does is to control to the direction of the transfer. Bit clear means the transfer goes from bus A (memory) to bus B (register) while set means from bus B to bus A.
    For HDMA, you want to keep it clear because the uses of inverting the flow of transfer with HDMA is very limited.
  • a - HDMA Addressing Mode. This allows you to specify whether HDMA is direct i.e. the HDMA table contains data or indirect i.e. the HDMA table contains pointers.
    We want to keep this bit clear because indirect mode will come afterwards. It also isn't used for DMA but that is besides the point here.
  • i - DMA Address Increment. That one merely controls the direction of DMA i.e. is the data read forwards or backwards? Not used for HDMA so you can keep it clear.
  • f - DMA Fixed Transfer. Useful for DMA since that will keep the source in place i.e. writes the same byte or word to the destination.
    Useless for HDMA since it ignores that bit.
  • ttt - Transfer Mode. This controls how many bytes are transferred in what kind. In particular, this allows you to write to multiple registers at a time as well as allowing you to write to the same register twice.
    For brightness, you can keep it at zero since mode 0 means to transfer only one byte. We'll come later on the other options, though.


$43x1: BBADx
This is the address of bus B, usually the destination of the transfer. DMA is hardcoded to write to the $0021xx area i.e. to the PPU registers.

$43x2: A1TxL
$43x3: A1TxH
$43x4: AlxB
These three registers control the bus A address, usually the source of the transfer. This can be any kind of memory which can be accessed by the address bus, it's really your regular good ol' SNES addresses.
Be aware that it must be actual memory i.e. ROM, WRAM, cartridge RAM (i.e. SRAM and BW-RAM) and RAM provided by the enhancement chip (e.g. SA-1's I-RAM). You can't set the bus A address to be any register, PPU or otherwise.

$420C (mirrored at $0D9F): HDMAEN
This is the register which HDMA channel is used and which isn't. Normally, you shouldn't use the register directly but rather write to its WRAM mirror. In particular, you need to know that other HDMA channels can be enabled but we'll get to that shortly later on.

There are more registers but right now, they aren't required. So we now get to writing the HDMA code:

Most HDMA codes are really the same can be easily reused even among different types of HDMA. Nonetheless, it is a good idea to understand what every component does.

In general, most start by writing to the transfer mode and the affected registers. Open up regs.txt and search for the register which controls the brightness.

Code
    LDA #$00
    STA $4330

Simply store zero to the DMA transfer mode i.e. one byte, one register only, direct HDMA, bus A to bus B.

Code
    LDA #$00
    STA $4331

Very simple code: Set HDMA to write to register $2100, INIDISP. INIDISP is used to control the screen brightness on the SNES which makes it a great tool for global darkness (it also allows you to enable f-blank but that's besides the point). With HDMA, you can have a nice brightness gradient.

Code
    LDA.b #Gradient
    STA $4332
    LDA.b #Gradient>>8
    STA $4333
    LDA.b #Gradient>>16
    STA $4334

That one is slightly more complicated but what it does is to get the label, which contains an address, of the gradient as a constant and store them to the bus A address registers. This is indeed possible since a label just contains a 24-bit value. Just make sure you have got the right addressing since Asar generally uses word addressing for constant labels. Actually, always use explicit addressing when it is ambiguous what addressing is used.

Code
    LDA.b #%00001000
    TSB $0D9F|!addr

The final piece of the code: Enable HDMA on the channel 3 using the mirror and not the register. HDMEN works bitwise i.e. the values for each channel are stored as powers of 2. Using binary values allows you to see them the best: The rightmost bit is for channel 0 and it goes upwards to channel 7 on the left side.
Furthermore, the reason we use the mirror and not the register directly is because HDMA doesn't like to be enabled outside of v-blank, not to mention the register is write-only which messes with TSB since that one requires you to know the content of the memory (in contrast, you only need to have one DMA channel active which means the latter can be just stored directly).
Finally, it is recommend to use TSB instead of STA since the latter will override existing HDMA enable flags while the former will keep them.
Fun fact: That's by the way why messages and the like break HDMA. They never expect to have other HDMA channels enabled and thus use a STA and STZ instead of TSB and TRB.

Challenge for you: Right now, the code is kind of unoptimised. With a 16-bit accumulator, it is possible to optimise the code a bit by taking advantage of reducing two register writes. Which of these registers make the most sense to be accessed in 16-bit mode?


See? It wasn't so difficult, was it?

But now we're missing a table. HDMA is generally created by tools but you can calculate them by hand as well, particularly so you can understand, how to build one yourself with a program or something. So let's get started!

An HDMA table has got two components: A scanline count and the data. At it's most basic form (i.e. mode 0), the table generally looks like this:
Code
db $xx,$yy

where $xx is the amount of scanlines the values are written and ranges from $01 to $80 and $yy is the value you want to store at that position, which can be any value the register allows.
To have multiple changes, simple add more rows to the table:
Code
db $xx,$yy
db $ww,$zz
...

However, you should also take care when to end the HDMA table. Unless the scanline count is in total 224/$E0 scanlines, you should always end the table with a $00, otherwise HDMA will read garbage data for the rest of the scanlines. A side effect of terminating with a $00 is that the final value will be used for the rest of the screen.
The table will look something like this:
Code
db $xx,$yy
db $ww,$zz
...
db $00


For example, this is a valid HDMA table (not specific to INIDISP):
Code
db $05,$AD
db $05,$9F
db $2D,$E1
db $01,$10
db $00


Now we want to get the correct values. In order to do so, open up regs.txt and look up the description of INIDISP:
Quote
2100 wb++++ INIDISP - Screen Display
x---bbbb

x = Force blank on when set.
bbbb = Screen brightness, F=max, 0="off".

So what does that mean? It simply means, the brightness has got 16 different steps, ranging from $00 to $0F. A value of $80 enables f-blank i.e. it disables the electron beam at that position (not to be confused with the value $00 which just displays black) but that isn't required here.

Challenge for you: Let's take a look at the following table:
Code
db $0F,$00
db $04,$01
db $04,$02
db $04,$03
db $04,$04
db $04,$05
db $04,$06
db $04,$07
db $04,$08
db $04,$09
db $04,$0A
db $04,$0B
db $04,$0C
db $04,$0D
db $04,$0E
db $01,$0F
db $00

Before you insert the code, how do you image the gradient will appear?




Now we get to the question: Why is it impossible to use a scanline count of greater than $80? Because it really isn't in the sense that you can't use them but rather because these values have got a different property when used. In particular, a scanline count above $80 enabled continuous mode which stores a different value for each scanline. This is comparable to RLE compression i.e. compress repeating bytes but also allow for uncompressed streams of data.
To write such a row, simply write the total amount of scanlines + $80 (the total limit is $7F) as well as n times scanlines of bytes after the scanline count (for brightness, that's simply one byte per scanline).

Let's take a look at this table:
Code
db $0F,$0F
db $86,$0E,$0D,$0C,$0B,$0C,$0E
db $04,$0F
db $84,$0E,$0D,$0C,$0B
db $10,$0A
db $82,$0C,$0F
db $00

If that looks confusing, remember that you can use the colon as a separated like this db $xx : db $yy

And much like before: How do you imagine the gradient will appear?



Final note: Setting up HDMA should be generally done in level load since unless something else writes to these channels, HDMA will persist for the rest of the level, even going to the loading screen of the next level and Time Up / Game Over screen, not to mention that resetting HDMA may cause some side effect, particularly enabling it mid-screen, though the latter is more an NMI issue.

Colour Gradient HDMA

Now we get to a slightly more complicated HDMA effect: Colour gradients. The reason is that one HDMA often isn't enough for them.

What this will teach you:
  • Writing to fixed colour.
  • Writing to CG-RAM
  • Using multiple HDMA channels to achieve an effect.
  • Setting up write-twice HDMA.
  • Order of HDMA.

Fixed colour HDMA is largely the same as with brightness HDMA with two differences: You use three channels and write to the register differently. The reason lies with the limitation of COLDATA, the register which controls the fixed colour:
Code
2132  wb+++- COLDATA - Fixed Color Data
        bgrccccc

        b/g/r = Which color plane(s) to set the intensity for.
        ccccc = Color intensity.

That means, if you want to have a certain colour on the fixed colour, you have to perform three writes to COLDATA like here:
Code
            LDA #$3f
            STA $2132
            LDA #$4f
            STA $2132
            LDA #$80
            STA $2132

Translated to HDMA, that means you have to set up HDMA three times.

Tip: Though you can duplicate the code three times and adjust them appropriately, it is much easier to just reuse data i.e. which pieces of code are redundant and which ones aren't.

To test out the gradient, I have prepared some simple HDMA tables:
Code
Red:
db $10,$2F
db $10,$2D
db $10,$2B
db $10,$29
db $10,$27
db $10,$24
db $10,$22
db $10,$20
db $00

Green:
db $10,$5C
db $10,$5A
db $10,$58
db $10,$55
db $10,$53
db $10,$51
db $10,$4E
db $10,$4C
db $00

Blue:
db $10,$9F
db $10,$9E
db $10,$9D
db $10,$9C
db $10,$9B
db $10,$99
db $10,$98
db $10,$97
db $00

The result should look like this:



However, even with the above optimisation, it still is possible get some more optimisation with COLDATA HDMA. The answer is to use two channels instead of three. How? Remember one specific information for DMAPx which I glossed over before:
Code
            ttt  = Transfer Mode.
            000 => 1 register write once             (1 byte:  p               )
            001 => 2 registers write once            (2 bytes: p, p+1          )
            010 => 1 register write twice            (2 bytes: p, p            )
            011 => 2 registers write twice each      (4 bytes: p, p,   p+1, p+1)
            100 => 4 registers write once            (4 bytes: p, p+1, p+2, p+3)
            101 => 2 registers write twice alternate (4 bytes: p, p+1, p,   p+1)
            110 => 1 register write twice            (2 bytes: p, p            )
            111 => 2 registers write twice each      (4 bytes: p, p,   p+1, p+1)

As you can see, you can transfer more bytes with HDMA with the right settings. Up to four bytes to be precise, though you can write one register only up to two times.

Certain writing modes are more limited then others. For example, two registers write twice alternate sounds to me like writing to the VRAM registers VMDATAL and VMDATAH which themselves can only be written in v- and f-blank (but there are some kind of uses for them).
But of course, other transfer modes such as 0, 1, 2 and 3 are more useful.

Changing the transfer mode will also change the tables. For example if you write two bytes per scanline, the tables will have this kind of shape:
Code
db $xx,$yy,$YY
db $ww,$zz,$ZZ
...
db $00

Or if you prefer this way:
Code
db $xx : dw $YYyy
db $ww : dw $ZZzz
...
db $00

You just have to expand the data to the amount of bytes you transfer!

Though this will result in more used up ROM space (since now the variations of two colours are counted at once), ultimately, this is worth because DMA channels are far more limited than memory space.

Challenge time: Which transfer mode do you have to use to use COLDATA more optimally? How do you modify the code to use two channels? And how does a two-table version of the above gradient look like?


That being said, fixed colour isn't the only way to create a colour gradient. There are, in fact, other colours on the SNES: CG-RAM.
Writing to CG-RAM has got plenty of uses. For example, it too controls the background colour in some way, being used for the mainscreen backdrop, which in term can be used for some real foreground gradients (the usual method is still in the background, actually, just that everything is transparent to the background colour) and some other backgrounds use the fixed colour for other purposes.

Writing to CG-RAM is slightly more complicated in the sense that you have to write to use two registers: CGADD and CGDATA. CGADD is a write once 8-bit register which sets the destination which colour to write, the same value which is known as "Palette, Color" in Lunar Magic's palette editor.
CGDATA is a write twice 8-bit register. The input it takes is a 15-bit BGR colour, the "SNES RGB Value" in the palette editor, but separated into two bytes.

Now, this sounds all simple and good but how about setting it up? Writing to CGRAM either is different to writing to COLDATA or isn't much more different depending on the method. Let's talk about the latter:

You're writing three bytes which means two channels. One channel always sets the colour index, the other channel writes to CG-RAM. But what matters here in contrast to COLDATA is the order of the channels: The SNES runs the channels from the lowest to the highest number. That means, CGADD must be stored before CGDATA and therefore gets a lowest ID.

Challenge time: Build a simple CGRAM HDMA table with CGADD and CGDATA on separate tables and create the corresponding codes.

The other way is to use one channel only. Though it sounds weird, it is possible to use two registers, write twice to write. The idea is that by adding one extra write to CGADD as a buffer, you can use the same table to set the destination and the colour. Furthermore, mind the order of the bytes: Which byte in the table contains the CGADD value?

Challenge time: Build a simple CGRAM HDMA table with CGADD and CGDATA in the same table and create the corresponding code.

Though it is more efficient in terms of channel use, this method has got a disadvantage in the sense that it is less efficient for h-blank which is also rather precious. The worst part is that when you want to save on channels, you already are using a lot of channels which may overflow h-blank...

Either way, if you translate the above colours into CGRAM and write to colour 0, you get this kind of result:

Wow, everything is so bright!

Though that one hasn't got anything to do with HDMA on its own, it's still some useful knowledge: I mentioned before of some kind of "mainscreen" when I mentioned about foreground gradients. In case, you don't know it before, the SNES draws two screens (called "main-" and "subscreen") which can be used as a priority system (e.g. give one layer very low priority) but also is used to blend them together in form of colour maths.
In particular, both screens have got their own backdrop (i.e. no pixel is drawn) where in subscreen (the lower screen), it's the fixed colour, whereas on the mainscreen (the higher one), it's CGRAM colour 0.
Furthermore, most games set the SNES to always performs colour maths even if it isn't directly visible. Yes, it is colour 0. If you ever experimented with colour maths, you sometimes ended up with an invisible subscreen except through the transparent thing which happens because colour 0 isn't set to be transparent either. Conversely, most games set colour 0 to zero since there is often no point in doing that.

And by the way, if this reminds you of foreground HDMA: That's basically just colour maths applied with an HDMA colour gradient. In fact, calling most types of "foreground HDMA" is a misnomer since it is neither HDMA which is used nor the gradient is in the foreground. What actually happens is that it is just a regular colour gradient on fixed colour (i.e. the background colour) and colour maths is applied to all layers which has got a side effect that not every sprite is affected as well as that the background colour disappears...
As mention before, colour 0 HDMA is an example of a real foreground HDMA gradient because it is on mainscreen and thus can have something behind. Not only that, but it also fixes the issue that not every sprite (such as Mario) is affected by the colours and also leaves the fixed colour i.e. the background colour alone.

Either way, enough ranting, now to the next chapter!

Parallax and Waves HDMA

Now we get to one of the more interesting parts of this tutorial: Parallax and Waves HDMA. Often reduced to just "parallax" (but not the only method since multiple layers, sprites and graphics changing is also possible), it is one of the more impressive effects and generally hyped among the community.

What this will teach you:
  • Setting the position of the background offsets.
    • Setting up parallax HDMA.
    • Generating HDMA waves.
  • Writing to an HDMA table in RAM.
  • Using indirect HDMA

Now we get to something different: Writing HDMA in RAM. You don't always have to store the HDMA tables in ROM, of course. In fact, parallax scrolling and the like wouldn't be possible if HDMA weren't able to read from RAM unless IRQ magic has been used.

Before that, it was pretty easy to write the HDMA table since you just had to use your own thought process but now you have to express that in ASM. Crossing the fingers that you remember what you have learned!

As a challenge for you, try to recreate the following parallax scrolling:

(The scrolling are all powers of two i.e. they can be easily handled with bit shifting.)

Tip: If the background doesn't scroll, remember that the table has to be updated as well. And if the background looks weird at fade in, remember that is has to set at or before fade in as well.

Worse is wave HDMA which is quite terrible since you have to write multiple scanlines with the same value. There is, however, a very useful solution: Indirect HDMA. You probably remember one of the options when I mentioned the DMA properties, don't you? Well, here is a detailed explanation on how indirect HDMA works:

You probably came up at using indirect addressing at some point. You know, stuff like LDA ($00) where the value comes from whatever is defined in $00 and $01 and the data bank instead of the content of $00 itself. Indirect HDMA works similar to that i.e. the HDMA tables you were previously using don't contain the values themselves but rather contain an address and the bank byte is set in DASBx, register $43x7.
On its own, the table looks like any other two bytes table:
Code
db $xx : dw $yyyy
db $ww : dw $zzzz
...
db $00

The real difference is that the table contains, as mentioned, pointers. To see what I mean, let's take a look at this code and table:
Code
    REP #$20
    LDA $1C
    STA $7F9700
    LSR
    STA $7F9702
    LSR
    STA $7F9704
    LSR
    STA $7F9706
    SEP #$20
RTL

ParallaxTable:
db $20 : dw $9706
db $20 : dw $9704
db $20 : dw $9702
db $20 : dw $9700
db $20 : dw $9702
db $20 : dw $9704
db $20 : dw $9706

Though not hardcoded to reference bank $7F, let's assume it is set beforehand. The above code will result in the following parallax scrolling for using only eight bytes of RAM:

Isn't it tripy?

For the most part, enabling indirect HDMA is the same as enabling direct HDMA aside from the fact that you have to also set the bank of the indirect HDMA table inside the code. This won't allow you to use data across multiple tables but let's be honest: Is there any point in it changing the bank when a single bank can access 64 KiB of data?

One of the major uses of indirect HDMA is to reuse data as well as not requiring you to bother with the line count in the data, focusing on the values only. This, by the way, is how I managed to create Fading Brightness HDMA: Using brightness HDMA will disable the fading because the screen is always drawn with the last value written to a register which in this case means, any value written to INIDISP in NMI gets overwritten by HDMA.
I solved this by storing the brightness in RAM but more importantly, also make the code and table as simple as possible. You see, I couldn't just use direct HDMA since I had to redraw the table for every frame, calculate the same brightness steps again and would take as much RAM as the table. Using indirect HDMA, I only needed 16 bytes of RAM and calculate each brightness value only once for each frame. This is the power of indirect HDMA.

Now to the next chapter.

...

Oh, right, I haven't taught you to create waves HDMA. Okay, there is one feature which has yet to be mentioned: Indirect continuous HDMA.
Indirect continuous HDMA is a bit special because with direct HDMA, there is only the HDMA table but with indirect HDMA, there are the pointers on one end and the data on the other end. This will open the question: Which one is continuous, the pointers or the data?
Answer: The data.
One limitation I came across Fading Brightness HDMA is the fact that you can only use repeating values, never continuous values. This results in an even larger table than usual and is one of the reasons I implemented an in-game conversion algorithm alongside keeping compatibilities with normal tables. However, it is useful for other purposes such as waves HDMA. Can you see why?


This is IMO one of the best methods to handles waves HDMA since you only have to use as many bytes of RAM as the wavelength times two.
It also is very useful for gradual parallax since the values will have to change for every scanline anyway, not to mention it is unlikely that they will be reused.


In fact, some of you may even see another very useful use of indirect continuous HDMA, one which SMW even implements: HDMA buffers. This is the topic of the next chapter.

Two fair bits of warning:
  • Remember that HDMA is subject to screen tearing i.e. if the code takes too long then the bottom part of the screen can end up being partially drawn. If you want to prevent this, you have to use a double buffer i.e. have two HDMA tables and switch the tables between each frame (preferably set the address in NMI and not in the main code).
    This can be seen with the level end circle which is subject to slowdown especially at the start.
  • The other reason is that indirect HDMA isn't as efficient as direct HDMA since the game has to read two bytes and then how many bytes it has to transfer and may be more likely to be subject to h-blank overflow than usual. This overhead only scales with the amount of indirect HDMA channels, though.


Windowing and HDMA Buffers

Now we get to to the last major chapter: HDMA buffers as well was one of their applications: Windowing / Masking.

What this will teach you:
  • Masking out portions of the screen with windowing.
  • Setting up an HDMA buffer.
  • Other applications of HDMA buffers.

This chapter is basically a continuation of the previous one in the sense that it uses indirect continuous HDMA. The difference is the use: While the previous chapter taught you to use indirect HDMA to save RAM, HDMA buffers are one of the more wasteful uses of indirect HDMA since you have to keep track the value for every scanline i.e. up to 224 bytes times however many you want to write.
On the other hand, you have got a table which at its worst has got its data interlaced (when using multiple registers) but for all cases, the data is very easy to access since the values are all equally spaced out and there is no annoying line count. You basically the affected PPU register individual values per scanline.

If you ever looked at the only places where SMW uses HDMA, you will notice that one address comes up very often: $7E04A0. That is one such example of an HDMA buffer since the table is 448 bytes large (224 scanlines * 2 bytes/scanline).

How do you create such kind of buffers? Well, remember how to define indirect HDMA as well as the scanline limitations of HDMA.



Now that we got that part out, what about setting windows? If I'm honest, windowing is quite a complex topic and could be its own tutorial alongside colour maths and main- and subscreen (and indeed, it really is). As a result, the rest of the tutorial focuses more on setting up a minimal window as well as mentioning other uses of HDMA buffers. You can see more on how the SNES handled windowing but also a few examples in this YouTube video.

In order to set up a window, you have to:
  • Enable, which layers in general have got which window enabled.
  • Set, which layers on which window are inverted
  • Set, which layers on which screen have got windowing enabled.
  • Set, wow the windows overlap for each layer.
  • Set, how the colours are clipped for each screen.

A lot of the options sound complicated. However, the TMW and TSW registers are set together with TM and TS in SMW so you don't have to take care for them (you can't really control them outside of level init and NMI anyway). Likewise, the overlapping of windows is also hardcoded in SMW since it uses only one window at a time.
This will leave you with three options: Windowing per layer, windowing inversion per layer and clipping. SMW puts them close to each other in WRAM from $7E0041 to $7E0044.

The three registers which enable windowing as well as windowing inversions are W12SEL $2123, W34SEL $2124 and WOBJSEL $2125, all mirrored from $7E0041 to $7E0043. They all use the same format i.e. their low nibble (---- xxxx) controls BG1, BG3 and OBJ while the high nibble (xxxx ----) handle BG2, BG4 and COL and each layer has got four bits of controls: ABCD, where
  • A: Enable window 2
  • B: Invert window 2
  • C: Enable window 1
  • D: Invert window 1
For $44, that one is a mirror of register CGWSEL $2130 which controls whether the backdrops are also masked or only the other layers, among other settings. These are: ccmm--sd, though we only have to look at the cc and mm bits:
  • cc: Clip colours on mainscreen to black (00 = never, 01 = outside window only, 10 = inside window only, 11 = always).
  • mm: Clip colours on subscreen to black (same format as above).
Lastly, the windows themselves. These use the WHn registers (n = 0 - 3) $2126 - $2129 where WH0 and WH2 control the left edge of window 1 and 2 while WH1 and WH3 control their right edge. That should be enough to create a nice little window.

Of course, the rest is pretty much whatever you want. For example, here is a nice little GIF I made:

(If you're wondering why some scanlines appear glitched: The electron beam was faster than my code which is a good example on why you should use a double buffer if the table is updated midscreen.)

By the way, do you remember that channel 7 on SMW isn't usable because of windowing? Because it turns out, SMW already has got it set up for you! All you have to do is to enable channel 7 and set the masking registers! That knowledge is still useful if you want to use window 2, though.

The use of HDMA buffers isn't limited to windowing, of course. In fact, three of my major HDMA codes, "Scrollable" HDMA Gradients, Parallax HDMA Toolkit and Windowing HDMA Toolkit all use HDMA buffers in their own ways.

The former uses indirect continuous HDMA in a slightly different way in the sense that the table is read only but also displays only part of the whole data. The idea is that in order to have a scrolling HDMA gradient, you don't have to just scroll through an HDMA table but rather scroll through a table where each scanline has got a different colour value and that is the best handled with indirect continuous HDMA since you get two tables with only data. The alternative is to use direct HDMA but that requires five bytes per scanline instead of three as it is done here.
The middle one use a buffer for a more generic approach for parallax HDMA even if it results in duplicate calculations. In particular, the screen can easily scroll vertically without requiring you to use large tables with single scanline values (similar issue as above with the old method with scrollable gradients).
The latter also uses a buffer to handle the values but solves the duplicate calculation issue with the use of MVN which allows you to copy the values within the table. Furthermore, it also has got the advantage that you can control where the waves start and where they end which means it is useful with moving tides.

Lastly, there is a use of HDMA which is quite similar to windowing:

Contrary to what you may think, these walls are drawn with HDMA rather than as Super FX's bitmap conversion. They do so by shifting the vertical position of the background downwards by the scanline position + the width of the shape on that scanline (e.g. a width of 10 on scanline 42 would be Y = 52). This will result in centred lines but with some HDMA on the X offsets, you can also move the position of the lines horizontally.
Though the internal calculation is rather different, ultimately, it is the same principle as windowing: You enter the position and width for each scanline to create a shape which can't be described by a simple tilemap.

Closing Sentence

That's all this tutorial will explain! Basically, I showed you some applications of HDMA but there are many others such as changing the background mode midscreen, changing main- and subscreen and doing perspective with Mode 7. Maybe you'll even go further and do more extreme stuff with it such as recreating the parallax of DKC2's ship and castle levels.
In fact, HDMA doesn't has to be even used to modify the screen. Certain games use HDMA to communicate with the SMP, the audio processor. That one is very complicated and is generally not recommend unless you have built feasible knowledge on uploading data to ARAM.

Ultimately, there isn't much more I can teach you considering that all kinds of HDMA use the methods which were shown off in this tutorial.

ASM CodingHDMA