14 KiB

Raw Permalink Blame History

Sprite Glow: EE side

It's pretty simple, no asm.

VU Memory Map

Offset is 400 (0-400 -> 400-800 as input buffers)


400 - 800 (input second buffer)
800 - template copy 1
884 - template copy 2
980 - constants

DMA init

@ 0x500930 tag: TAG: 0x00000000 cnt  qwc 0x0018
 vif0: STCYCL cl: 4 wl: 4
 vif1: UNPACK-V4-32: 24 addr: 980 us: false tops: false

@ 0x500ac0 tag: TAG: 0x00777650 ref  qwc 0x0054
 vif0: STCYCL cl: 4 wl: 4
 vif1: UNPACK-V4-32: 84 addr: 800 us: false tops: false

@ 0x500ad0 tag: TAG: 0x00777650 ref  qwc 0x0054
 vif0: MSCAL 0x0
 vif1: UNPACK-V4-32: 84 addr: 884 us: false tops: false

@ 0x500ae0 tag: TAG: 0x00000000 cnt  qwc 0x0000
 vif0: BASE 0x0
 vif1: OFFSET 0x190

@ 0x500af0 tag: TAG: 0x00000000 cnt  qwc 0x0000
 vif0: NOP
 vif1: FLUSHE

First tag is uploading the constants.

Next two are uploading two copies of the templates and run init. Output is double buffered, so this makes sense.

Next tag sets up input double buffer.

Final tag is just sync before starting the draws

Template system

The VU program is (as usual) double buffered. There are two input buffers, containing data uploaded through VIF. There are two output buffers, each containing the sprite-glow-template. While the program runs, it transform vertices, putting the results in the template. Once done, it xgkicks, which waits for the previous run's draw to finish if needed, then begins the drawing process.

What is drawn?

Annoyingly, there are 6 draws this time:

Probe Clear alpha
Probe Draw alpha
Offscreen sample
Offscreen repeat
Draw Alpha
Draw Final

In summary:

Probe Clear alpha: draw a alpha = 0 square. Always drawn.
Probe Draw alpha: draw an alpha = 1 square. 1 px smaller than first draw. Uses normal z test.

Draw 1 and 2: "Probe Clear alpha" and "Probe Draw alpha"

First is the clear. The first tag is 5 regs of adgif:

texflush
(new 'static 'gs-alpha :a 2 :b 2 :c 2 :d 1)
(new 'static 'gs-test :ate 1 :afail 1 :zte 1 :ztst 2)
(new 'static 'gs-zbuf :zbp 304 :psm 1 :zmsk 1)
(new 'static 'gs-frame :fbp 408 :fbw 8 :fbmsk #xffffff)

The alpha is:

Cv = (0 - 0) * 0 + Cd

which means to not write any color.

The test is:

alpha test always fails, means only rgba is written always
ztest is usual GEQUAL

When combined with the alpha settings, this only writes alpha, no rgb or depth.

The zbuf looks at the normal zbuf, but z writing is masked again. Just in case.

The gs-frame yet again masks away rgb writes. Just in case.

There are two draws. Both are sprites. The first draw sets alpha to 0 and uses z = #xffffff, so it always passes.

The second draw is similar but:

z comes from the transformed vertex. So it has normal depth test behavior
alpha is 1
it's 1 pixel smaller than the first draw.

So the end result is:

a alpha = 0 square always
an alpha = 1 square, offset in by 1 pixel on all sides, but only where depth passes.

Draw 3: Offscreen Sample

Switches to GS context 2. Samples from framebuffer and writes to a temporary texture (64 px width, uses only 32).

only writes alpha
mmag/mmin on
tcc = 1, tfx = 1 (rgba, decal) - RGBA all come from texture, nothing fancy
rgb writes masked (only write alpha)
clamp clamps
alpha test thing disables z buffer writing.

This basically just copies the inner square from draw 2 to a separate texture.

Draw 4: Repeat draw

This appears to draw over itself again and again. I think it effectively blends using texture filtering. So the 0, 0 px will have the average value of alpha.

Draw 5: flare alpha

This is set up in the repeat-draw-adcmds. Drawing to the framebuffer again.

VU1 program

out memory map:

68: adgif0
69: adgif1
70: adgif2
71: adgif3
72: adgif4

;; math:

first, input position is multiplied by camera matrix (including adding part). Only xyz is computed here. (p0, vf01)

color: rgb *= a

fade = clamp(0, 1, p0.z * fade_a + fade_b)




INIT program
  iaddiu vi05, vi00, 0x320   |  nop      ;; vi05 = 800 (template)
  lq.xyzw vf25, 988(vi00)    |  nop
  lq.xyzw vf26, 989(vi00)    |  nop
  lq.xyzw vf27, 990(vi00)    |  nop
  lq.xyzw vf30, 996(vi00)    |  nop
  lq.xyzw vf31, 997(vi00)    |  nop
  lq.xyzw vf28, 1002(vi00)   |  nop
  lq.xyzw vf29, 1003(vi00)   |  nop
  nop                        |  nop :e
  nop                        |  nop

regs:

vi02 = input ptr
vi03 = num_sprites
vi04 = adgif ptr
vi05 = output buffer (double buffered, so toggles)

vf25 = hvdf
vf26 = hmge
vf27 = consts
vf30 = basis_x
vf31 = basis_y
vf28 = clamp_min
vf29 = clamp_max

DRAW program
  xtop vi02                  |  nop          ;; vi02 = input buffer's control
  nop                        |  nop
  ilwr.x vi03, vi02          |  nop          ;; vi03 = num_sprites (1 always?)
  iaddi vi02, vi02, 0x1      |  nop          ;; vi02 = sprite data
  iaddiu vi04, vi02, 0x90    |  nop          ;; vi04 = adgif data
L1:
  lq.xyzw vf03, 2(vi02)      |  nop          ;; vf03 = color
  lq.xyzw vf02, 1(vi02)      |  nop          ;; vf02 = [size_probe, z_offset, rot-angle, size-y]
  lq.xyzw vf01, 0(vi02)      |  nop          ;; vf01 = [position.xyz, size-x]
  lq.xyzw vf24, 983(vi00)    |  nop          ;; vf24 = [camera_mat[3]]
  lq.xyzw vf21, 980(vi00)    |  nop          ;; vf21 = [camera_mat[0]]
  lq.xyzw vf22, 981(vi00)    |  nop          ;; vf22 = [camera_mat[1]]
  lq.xyzw vf23, 982(vi00)    |  nop          ;; vf23 = [camera_mat[2]]
  lq.xyzw vf04, 3(vi02)      |  mulaw.xyz ACC, vf24, vf00   ;; vf04 = [fade_a, fade_b, X, X] | multiply
  lq.xyzw vf24, 987(vi00)    |  maddax.xyz ACC, vf21, vf01  ;; vf24 = [perspective[3]] | multiply
  lq.xyzw vf21, 984(vi00)    |  madday.xyz ACC, vf22, vf01  ;; vf21 = [perspective[0]] | multiply
  lq.xyzw vf22, 985(vi00)    |  maddz.xyz vf01, vf23, vf01  ;; vf22 = [perspective[1]] | multiply
  lq.xyzw vf23, 986(vi00)    |  nop                         ;; vf23 = [perspective[2]]
  lq.xyzw vf09, 0(vi04)      |  nop                         ;; vf09 = adgif[0]
  lq.xyzw vf10, 1(vi04)      |  mulw.xyz vf03, vf03, vf03   ;; vf10 = adgif[1] | color multiply by alpha
  div Q, vf02.y, vf01.z      |  mulz.x vf04, vf04, vf01     ;; Q = (z_offset / p0.z) | fade_a *= p0.z
  lq.xyzw vf11, 2(vi04)      |  nop                         ;; vf11 = adgif[2]
  0.0078125                  |  nop :i                      ;; I = 0.0078125 (= 1/128)
  lq.xyzw vf12, 3(vi04)      |  nop                         ;; vf12 = adgif[3]
  lq.xyzw vf13, 4(vi04)      |  addy.x vf04, vf04, vf04     ;; vf13 = adgif[4] | fade_a += fade_b
  sq.xyzw vf09, 68(vi05)     |  muly.z vf05, vf02, vf27     ;; adgif0 store    | vf05.z = rot-angle * deg_to_rad
  move.w vf05, vf00          |  addw.z vf02, vf00, vf01     ;; vf05.w = 1      | vf02.z = size-x
  sq.xyzw vf10, 69(vi05)     |  mul.w vf09, vf00, Q         ;; agdif1 store    | vf09.w = (z_offset / p0.z)
  sq.xyzw vf11, 70(vi05)     |  nop                         ;; adgif2 store
  sq.xyzw vf12, 71(vi05)     |  miniw.x vf04, vf04, vf00    ;; adgif3 store    | clamp fade 1
  sq.xyzw vf13, 72(vi05)     |  nop                         ;; adgif4 store
  nop                        |  subw.w vf09, vf00, vf09     ;; vf09.w = 1 - (z_offset / p0.z);
  nop                        |  maxx.x vf04, vf04, vf00     ;; clamp fade 2
  nop                        |  mulw.xyz vf01, vf01, vf09   ;; multiply by pscale
  nop                        |  mulx.xyz vf03, vf03, vf04   ;; multiply color by fade
  nop                        |  mulaw.xyzw ACC, vf24, vf00  ;; multiply by perspective matrix
  nop                        |  maddax.xyzw ACC, vf21, vf01
  nop                        |  madday.xyzw ACC, vf22, vf01
  nop                        |  muli.xyz vf03, vf03, I      ;; color scaling.
  nop                        |  maddz.xyzw vf01, vf23, vf01 ;; perspective matrix.
  nop                        |  nop
  iaddi vi03, vi03, -0x1     |  mulz.z vf06, vf05, vf05     ;; dec sprite count | vf06 = rot^2
  lq.xyzw vf15, 991(vi00)    |  nop                         ;; vf15 = sincos01
  iaddi vi02, vi02, 0x3      |  nop                         ;; inc input pointer... by the wrong amount lol
  fcset 0x0                  |  mul.xyzw vf07, vf01, vf26   ;; hmge mult
  nop                        |  mulz.zw vf09, vf05, vf06    ;; vf09 = rot^3
  lq.xyzw vf15, 992(vi00)    |  mula.zw ACC, vf05, vf15     ;; vf15 = sincos23 | acc working on sincos.
  nop                        |  nop
  div Q, vf00.w, vf07.w      |  clipw.xyz vf07, vf07        ;; Q = 1 / p_hmged.w | clip!!!
  nop                        |  mulz.zw vf10, vf09, vf06    ;; vf10 is rot thing
  lq.xyzw vf15, 993(vi00)    |  madda.zw ACC, vf09, vf15    ;; vf15 is rot coeff, working on sincos
  nop                        |  nop
  fcand vi01, 0x3f           |  nop                         ;; check clipping result
  ibne vi00, vi01, L2        |  mulz.zw vf09, vf10, vf06    ;; skip if clipped | working on rot
  lq.xyzw vf15, 994(vi00)    |  madda.zw ACC, vf10, vf15    ;; rot | rot
  nop                        |  mul.xyz vf01, vf01, Q       ;; perspective multiply
  nop                        |  mul.xyzw vf02, vf02, Q      ;; vf02 *= q
  nop                        |  mulz.zw vf10, vf09, vf06    ;; rot
  lq.xyzw vf15, 995(vi00)    |  madda.zw ACC, vf09, vf15    ;; rot89 | rot
  nop                        |  add.xyzw vf01, vf01, vf25   ;; hvdf offset
  nop                        |  maxw.x vf02, vf02, vf00     ;; clip size_probe to 1
  nop                        |  miniw.x vf02, vf02, vf29    ;; min size_probe against clamp max.w
  nop                        |  miniz.zw vf02, vf02, vf29   ;; min zw against clamp max.z
  nop                        |  madd.zw vf05, vf10, vf15    ;; vf05 = sincos output
  nop                        |  ftoi0.xyzw vf03, vf03       ;; colors to ints or whatever
  nop                        |  addx.xy vf09, vf28, vf02    ;; vf09 = [cmin.x + probe_size, cmin.y + probe_size]
  nop                        |  subx.xy vf11, vf01, vf02    ;; vf11 = probe lower corner = [p0.xy - probe_size]
  nop                        |  addx.xy vf12, vf01, vf02    ;; vf12 = probe upper corner = [po.xy + probe_size]
  nop                        |  subx.xy vf10, vf29, vf02    ;; vf10 = [cmax.x - probe_size, cmax.y - probe_size]
  nop                        |  mulaz.xyzw ACC, vf30, vf05
  nop                        |  msubw.xyzw vf15, vf31, vf05 ;; rotate the basis

  nop                        |  max.xy vf20, vf01, vf09     ;; vf20 is some clamped position.
  nop                        |  addx.zw vf11, vf01, vf00    ;; vf11.zw = vf01.zw
  nop                        |  addx.zw vf12, vf01, vf00    ;; vf12.zw = vf01.zw
  nop                        |  subw.xy vf17, vf28, vf00    ;; vf17 = [clamp_min.x - 1, clamp_min.y - 1]
  nop                        |  mulz.xyzw vf15, vf15, vf02  ;; rot_basis *= vf02.z (x-scale)
  nop                        |  addw.xy vf18, vf28, vf00    ;; vf18 = [calm_min.x + 1, clamp_min.y + 1]
  nop                        |  ftoi4.xyzw vf11, vf11       ;; usual ftoi4 of clear positions
  nop                        |  ftoi4.xyzw vf12, vf12
  nop                        |  mini.xy vf20, vf20, vf10    ;; vf20 clamp again.
  nop                        |  mulaw.xyzw ACC, vf30, vf05
  sq.xyzw vf03, 75(vi05)     |  maddz.xyzw vf16, vf31, vf05 ;; store color of flare | second row of rotation matrix
  sq.xyz vf11, 11(vi05)      |  sub.xy vf17, vf20, vf17     ;; store clear pos      | offset vf17
  sq.xyz vf12, 12(vi05)      |  sub.xy vf18, vf20, vf18     ;; store clear pos      | offset vf18
  lq.xyzw vf11, 998(vi00)    |  subx.xy vf19, vf20, vf02    ;; compute first clear pos,
  lq.xyzw vf12, 999(vi00)    |  mulw.xyzw vf16, vf16, vf02  ;; y scale
  lq.xyzw vf13, 1000(vi00)   |  addx.xy vf20, vf20, vf02    ;; first clear pos, upper corner.
  lq.xyzw vf14, 1001(vi00)   |  mulaw.xyzw ACC, vf01, vf00
  nop                        |  maddax.xyzw ACC, vf15, vf11
  nop                        |  maddy.xyzw vf11, vf16, vf11
  nop                        |  mulaw.xyzw ACC, vf01, vf00
  nop                        |  maddax.xyzw ACC, vf15, vf12
  nop                        |  maddy.xyzw vf12, vf16, vf12
  nop                        |  mulaw.xyzw ACC, vf01, vf00
  nop                        |  maddax.xyzw ACC, vf15, vf13
  nop                        |  maddy.xyzw vf13, vf16, vf13
  nop                        |  mulaw.xyzw ACC, vf01, vf00
  nop                        |  maddax.xyzw ACC, vf15, vf14
  nop                        |  maddy.xyzw vf14, vf16, vf14
  nop                        |  subx.xy vf17, vf17, vf02
  nop                        |  addx.xy vf18, vf18, vf02
  iaddiu vi04, vi04, 0x50    |  subw.xy vf19, vf19, vf00 ;; offset first clear pos lower
  nop                        |  addw.xy vf20, vf20, vf00 ;; offset first clear pos upper
  nop                        |  ftoi4.xyzw vf11, vf11
  nop                        |  ftoi4.xyzw vf12, vf12
  nop                        |  ftoi4.xyzw vf13, vf13
  nop                        |  ftoi4.xyzw vf14, vf14
  sq.xy vf11, 61(vi05)       |  ftoi4.xyzw vf17, vf17
  sq.xy vf12, 62(vi05)       |  ftoi4.xyzw vf18, vf18
  sq.xy vf13, 63(vi05)       |  ftoi4.xyzw vf19, vf19
  sq.xy vf14, 64(vi05)       |  ftoi4.xyzw vf20, vf20
  sq.xy vf17, 24(vi05)       |  nop
  sq.xy vf18, 26(vi05)       |  nop
  sq.xy vf19, 8(vi05)        |  nop
  sq.xy vf20, 9(vi05)        |  nop
  sq.xy vf11, 77(vi05)       |  nop
  sq.xy vf12, 79(vi05)       |  nop
  sq.xy vf13, 81(vi05)       |  nop
  sq.xy vf14, 83(vi05)       |  nop
  xgkick vi05                |  nop
L2:
  iaddiu vi01, vi00, 0x694   |  nop
  ibne vi00, vi03, L1        |  nop
  isub vi05, vi01, vi05      |  nop
  nop                        |  nop :e
  nop                        |  nop

14 KiB Raw Permalink Blame History