14 KiB
Sprite Glow: EE side
It's pretty simple, no asm.
VU Memory Map
Offset is 400 (0-400 -> 400-800 as input buffers)
400 - 800 (input second buffer)
800 - template copy 1
884 - template copy 2
980 - constants
DMA init
@ 0x500930 tag: TAG: 0x00000000 cnt qwc 0x0018
vif0: STCYCL cl: 4 wl: 4
vif1: UNPACK-V4-32: 24 addr: 980 us: false tops: false
@ 0x500ac0 tag: TAG: 0x00777650 ref qwc 0x0054
vif0: STCYCL cl: 4 wl: 4
vif1: UNPACK-V4-32: 84 addr: 800 us: false tops: false
@ 0x500ad0 tag: TAG: 0x00777650 ref qwc 0x0054
vif0: MSCAL 0x0
vif1: UNPACK-V4-32: 84 addr: 884 us: false tops: false
@ 0x500ae0 tag: TAG: 0x00000000 cnt qwc 0x0000
vif0: BASE 0x0
vif1: OFFSET 0x190
@ 0x500af0 tag: TAG: 0x00000000 cnt qwc 0x0000
vif0: NOP
vif1: FLUSHE
First tag is uploading the constants.
Next two are uploading two copies of the templates and run init. Output is double buffered, so this makes sense.
Next tag sets up input double buffer.
Final tag is just sync before starting the draws
Template system
The VU program is (as usual) double buffered.
There are two input buffers, containing data uploaded through VIF.
There are two output buffers, each containing the sprite-glow-template.
While the program runs, it transform vertices, putting the results in the template.
Once done, it xgkicks, which waits for the previous run's draw to finish if needed,
then begins the drawing process.
What is drawn?
Annoyingly, there are 6 draws this time:
- Probe Clear alpha
- Probe Draw alpha
- Offscreen sample
- Offscreen repeat
- Draw Alpha
- Draw Final
In summary:
- Probe Clear alpha: draw a
alpha = 0square. Always drawn. - Probe Draw alpha: draw an
alpha = 1square. 1 px smaller than first draw. Uses normal z test.
Draw 1 and 2: "Probe Clear alpha" and "Probe Draw alpha"
First is the clear. The first tag is 5 regs of adgif:
texflush(new 'static 'gs-alpha :a 2 :b 2 :c 2 :d 1)(new 'static 'gs-test :ate 1 :afail 1 :zte 1 :ztst 2)(new 'static 'gs-zbuf :zbp 304 :psm 1 :zmsk 1)(new 'static 'gs-frame :fbp 408 :fbw 8 :fbmsk #xffffff)
The alpha is:
Cv = (0 - 0) * 0 + Cd
which means to not write any color.
The test is:
- alpha test always fails, means only rgba is written always
- ztest is usual GEQUAL
When combined with the alpha settings, this only writes alpha, no rgb or depth.
The zbuf looks at the normal zbuf, but z writing is masked again. Just in case.
The gs-frame yet again masks away rgb writes. Just in case.
There are two draws. Both are sprites. The first draw sets alpha to 0 and uses z = #xffffff, so it always passes.
The second draw is similar but:
- z comes from the transformed vertex. So it has normal depth test behavior
- alpha is 1
- it's 1 pixel smaller than the first draw.
So the end result is:
- a
alpha = 0square always - an
alpha = 1square, offset in by 1 pixel on all sides, but only where depth passes.
Draw 3: Offscreen Sample
Switches to GS context 2. Samples from framebuffer and writes to a temporary texture (64 px width, uses only 32).
- only writes alpha
- mmag/mmin on
- tcc = 1, tfx = 1 (rgba, decal) - RGBA all come from texture, nothing fancy
- rgb writes masked (only write alpha)
- clamp clamps
- alpha test thing disables z buffer writing.
This basically just copies the inner square from draw 2 to a separate texture.
Draw 4: Repeat draw
This appears to draw over itself again and again. I think it effectively blends using texture filtering. So the 0, 0 px will have the average value of alpha.
Draw 5: flare alpha
This is set up in the repeat-draw-adcmds. Drawing to the framebuffer again.
VU1 program
out memory map:
68: adgif0
69: adgif1
70: adgif2
71: adgif3
72: adgif4
;; math:
first, input position is multiplied by camera matrix (including adding part). Only xyz is computed here. (p0, vf01)
color: rgb *= a
fade = clamp(0, 1, p0.z * fade_a + fade_b)
INIT program
iaddiu vi05, vi00, 0x320 | nop ;; vi05 = 800 (template)
lq.xyzw vf25, 988(vi00) | nop
lq.xyzw vf26, 989(vi00) | nop
lq.xyzw vf27, 990(vi00) | nop
lq.xyzw vf30, 996(vi00) | nop
lq.xyzw vf31, 997(vi00) | nop
lq.xyzw vf28, 1002(vi00) | nop
lq.xyzw vf29, 1003(vi00) | nop
nop | nop :e
nop | nop
regs:
vi02 = input ptr
vi03 = num_sprites
vi04 = adgif ptr
vi05 = output buffer (double buffered, so toggles)
vf25 = hvdf
vf26 = hmge
vf27 = consts
vf30 = basis_x
vf31 = basis_y
vf28 = clamp_min
vf29 = clamp_max
DRAW program
xtop vi02 | nop ;; vi02 = input buffer's control
nop | nop
ilwr.x vi03, vi02 | nop ;; vi03 = num_sprites (1 always?)
iaddi vi02, vi02, 0x1 | nop ;; vi02 = sprite data
iaddiu vi04, vi02, 0x90 | nop ;; vi04 = adgif data
L1:
lq.xyzw vf03, 2(vi02) | nop ;; vf03 = color
lq.xyzw vf02, 1(vi02) | nop ;; vf02 = [size_probe, z_offset, rot-angle, size-y]
lq.xyzw vf01, 0(vi02) | nop ;; vf01 = [position.xyz, size-x]
lq.xyzw vf24, 983(vi00) | nop ;; vf24 = [camera_mat[3]]
lq.xyzw vf21, 980(vi00) | nop ;; vf21 = [camera_mat[0]]
lq.xyzw vf22, 981(vi00) | nop ;; vf22 = [camera_mat[1]]
lq.xyzw vf23, 982(vi00) | nop ;; vf23 = [camera_mat[2]]
lq.xyzw vf04, 3(vi02) | mulaw.xyz ACC, vf24, vf00 ;; vf04 = [fade_a, fade_b, X, X] | multiply
lq.xyzw vf24, 987(vi00) | maddax.xyz ACC, vf21, vf01 ;; vf24 = [perspective[3]] | multiply
lq.xyzw vf21, 984(vi00) | madday.xyz ACC, vf22, vf01 ;; vf21 = [perspective[0]] | multiply
lq.xyzw vf22, 985(vi00) | maddz.xyz vf01, vf23, vf01 ;; vf22 = [perspective[1]] | multiply
lq.xyzw vf23, 986(vi00) | nop ;; vf23 = [perspective[2]]
lq.xyzw vf09, 0(vi04) | nop ;; vf09 = adgif[0]
lq.xyzw vf10, 1(vi04) | mulw.xyz vf03, vf03, vf03 ;; vf10 = adgif[1] | color multiply by alpha
div Q, vf02.y, vf01.z | mulz.x vf04, vf04, vf01 ;; Q = (z_offset / p0.z) | fade_a *= p0.z
lq.xyzw vf11, 2(vi04) | nop ;; vf11 = adgif[2]
0.0078125 | nop :i ;; I = 0.0078125 (= 1/128)
lq.xyzw vf12, 3(vi04) | nop ;; vf12 = adgif[3]
lq.xyzw vf13, 4(vi04) | addy.x vf04, vf04, vf04 ;; vf13 = adgif[4] | fade_a += fade_b
sq.xyzw vf09, 68(vi05) | muly.z vf05, vf02, vf27 ;; adgif0 store | vf05.z = rot-angle * deg_to_rad
move.w vf05, vf00 | addw.z vf02, vf00, vf01 ;; vf05.w = 1 | vf02.z = size-x
sq.xyzw vf10, 69(vi05) | mul.w vf09, vf00, Q ;; agdif1 store | vf09.w = (z_offset / p0.z)
sq.xyzw vf11, 70(vi05) | nop ;; adgif2 store
sq.xyzw vf12, 71(vi05) | miniw.x vf04, vf04, vf00 ;; adgif3 store | clamp fade 1
sq.xyzw vf13, 72(vi05) | nop ;; adgif4 store
nop | subw.w vf09, vf00, vf09 ;; vf09.w = 1 - (z_offset / p0.z);
nop | maxx.x vf04, vf04, vf00 ;; clamp fade 2
nop | mulw.xyz vf01, vf01, vf09 ;; multiply by pscale
nop | mulx.xyz vf03, vf03, vf04 ;; multiply color by fade
nop | mulaw.xyzw ACC, vf24, vf00 ;; multiply by perspective matrix
nop | maddax.xyzw ACC, vf21, vf01
nop | madday.xyzw ACC, vf22, vf01
nop | muli.xyz vf03, vf03, I ;; color scaling.
nop | maddz.xyzw vf01, vf23, vf01 ;; perspective matrix.
nop | nop
iaddi vi03, vi03, -0x1 | mulz.z vf06, vf05, vf05 ;; dec sprite count | vf06 = rot^2
lq.xyzw vf15, 991(vi00) | nop ;; vf15 = sincos01
iaddi vi02, vi02, 0x3 | nop ;; inc input pointer... by the wrong amount lol
fcset 0x0 | mul.xyzw vf07, vf01, vf26 ;; hmge mult
nop | mulz.zw vf09, vf05, vf06 ;; vf09 = rot^3
lq.xyzw vf15, 992(vi00) | mula.zw ACC, vf05, vf15 ;; vf15 = sincos23 | acc working on sincos.
nop | nop
div Q, vf00.w, vf07.w | clipw.xyz vf07, vf07 ;; Q = 1 / p_hmged.w | clip!!!
nop | mulz.zw vf10, vf09, vf06 ;; vf10 is rot thing
lq.xyzw vf15, 993(vi00) | madda.zw ACC, vf09, vf15 ;; vf15 is rot coeff, working on sincos
nop | nop
fcand vi01, 0x3f | nop ;; check clipping result
ibne vi00, vi01, L2 | mulz.zw vf09, vf10, vf06 ;; skip if clipped | working on rot
lq.xyzw vf15, 994(vi00) | madda.zw ACC, vf10, vf15 ;; rot | rot
nop | mul.xyz vf01, vf01, Q ;; perspective multiply
nop | mul.xyzw vf02, vf02, Q ;; vf02 *= q
nop | mulz.zw vf10, vf09, vf06 ;; rot
lq.xyzw vf15, 995(vi00) | madda.zw ACC, vf09, vf15 ;; rot89 | rot
nop | add.xyzw vf01, vf01, vf25 ;; hvdf offset
nop | maxw.x vf02, vf02, vf00 ;; clip size_probe to 1
nop | miniw.x vf02, vf02, vf29 ;; min size_probe against clamp max.w
nop | miniz.zw vf02, vf02, vf29 ;; min zw against clamp max.z
nop | madd.zw vf05, vf10, vf15 ;; vf05 = sincos output
nop | ftoi0.xyzw vf03, vf03 ;; colors to ints or whatever
nop | addx.xy vf09, vf28, vf02 ;; vf09 = [cmin.x + probe_size, cmin.y + probe_size]
nop | subx.xy vf11, vf01, vf02 ;; vf11 = probe lower corner = [p0.xy - probe_size]
nop | addx.xy vf12, vf01, vf02 ;; vf12 = probe upper corner = [po.xy + probe_size]
nop | subx.xy vf10, vf29, vf02 ;; vf10 = [cmax.x - probe_size, cmax.y - probe_size]
nop | mulaz.xyzw ACC, vf30, vf05
nop | msubw.xyzw vf15, vf31, vf05 ;; rotate the basis
nop | max.xy vf20, vf01, vf09 ;; vf20 is some clamped position.
nop | addx.zw vf11, vf01, vf00 ;; vf11.zw = vf01.zw
nop | addx.zw vf12, vf01, vf00 ;; vf12.zw = vf01.zw
nop | subw.xy vf17, vf28, vf00 ;; vf17 = [clamp_min.x - 1, clamp_min.y - 1]
nop | mulz.xyzw vf15, vf15, vf02 ;; rot_basis *= vf02.z (x-scale)
nop | addw.xy vf18, vf28, vf00 ;; vf18 = [calm_min.x + 1, clamp_min.y + 1]
nop | ftoi4.xyzw vf11, vf11 ;; usual ftoi4 of clear positions
nop | ftoi4.xyzw vf12, vf12
nop | mini.xy vf20, vf20, vf10 ;; vf20 clamp again.
nop | mulaw.xyzw ACC, vf30, vf05
sq.xyzw vf03, 75(vi05) | maddz.xyzw vf16, vf31, vf05 ;; store color of flare | second row of rotation matrix
sq.xyz vf11, 11(vi05) | sub.xy vf17, vf20, vf17 ;; store clear pos | offset vf17
sq.xyz vf12, 12(vi05) | sub.xy vf18, vf20, vf18 ;; store clear pos | offset vf18
lq.xyzw vf11, 998(vi00) | subx.xy vf19, vf20, vf02 ;; compute first clear pos,
lq.xyzw vf12, 999(vi00) | mulw.xyzw vf16, vf16, vf02 ;; y scale
lq.xyzw vf13, 1000(vi00) | addx.xy vf20, vf20, vf02 ;; first clear pos, upper corner.
lq.xyzw vf14, 1001(vi00) | mulaw.xyzw ACC, vf01, vf00
nop | maddax.xyzw ACC, vf15, vf11
nop | maddy.xyzw vf11, vf16, vf11
nop | mulaw.xyzw ACC, vf01, vf00
nop | maddax.xyzw ACC, vf15, vf12
nop | maddy.xyzw vf12, vf16, vf12
nop | mulaw.xyzw ACC, vf01, vf00
nop | maddax.xyzw ACC, vf15, vf13
nop | maddy.xyzw vf13, vf16, vf13
nop | mulaw.xyzw ACC, vf01, vf00
nop | maddax.xyzw ACC, vf15, vf14
nop | maddy.xyzw vf14, vf16, vf14
nop | subx.xy vf17, vf17, vf02
nop | addx.xy vf18, vf18, vf02
iaddiu vi04, vi04, 0x50 | subw.xy vf19, vf19, vf00 ;; offset first clear pos lower
nop | addw.xy vf20, vf20, vf00 ;; offset first clear pos upper
nop | ftoi4.xyzw vf11, vf11
nop | ftoi4.xyzw vf12, vf12
nop | ftoi4.xyzw vf13, vf13
nop | ftoi4.xyzw vf14, vf14
sq.xy vf11, 61(vi05) | ftoi4.xyzw vf17, vf17
sq.xy vf12, 62(vi05) | ftoi4.xyzw vf18, vf18
sq.xy vf13, 63(vi05) | ftoi4.xyzw vf19, vf19
sq.xy vf14, 64(vi05) | ftoi4.xyzw vf20, vf20
sq.xy vf17, 24(vi05) | nop
sq.xy vf18, 26(vi05) | nop
sq.xy vf19, 8(vi05) | nop
sq.xy vf20, 9(vi05) | nop
sq.xy vf11, 77(vi05) | nop
sq.xy vf12, 79(vi05) | nop
sq.xy vf13, 81(vi05) | nop
sq.xy vf14, 83(vi05) | nop
xgkick vi05 | nop
L2:
iaddiu vi01, vi00, 0x694 | nop
ibne vi00, vi03, L1 | nop
isub vi05, vi01, vi05 | nop
nop | nop :e
nop | nop