Loop over effect (skip those flagged with use-mercneric) Loop over fragments First thing is the "row" data. row.x/y is st-vif-add from the merc-ctrl-header. row.z = 0x47800000, row.w = 0x4b010000 Next is the unsigned-four. This is unpacked with unpack8. It always goes to 140 in the VU memory. The source data is just the merc-fragment (I believe it includes merc-byte-header) After is the lump-four. This has a STMOD enabled and is unpack8. It goes after the unsigned-four (variable size) in VU memory. The source data is after. Rounding in static data is ((u4c + 3) >> 2) << 4. After is the Floating Point data. This is copied as unpack32. _only_ on the first fragment of an effect, there's an upload to 132 of 8 qw: the first 7 are lights. the final is the first quadword of the merc-ctrl-header (xyz-scale, st-magic, st-out-a, st-out-b) there are secrets hidden in the lights: - light 0's w is some flag with ignore alpha in it. Next is (optional) matrix uploads. There is a loop of transfers. These are all size 7 qw. Next is the MSCAL! It has a different number of the first fragment of an effect. This tells merc to load the light stuff. End Loop over fragments Increment effect Increment effect info Decrement effect count update some next-merc thing in the scratchpad # Merc Renderer ## Memory layout Unpacks adgif shaders/giftags to the output memory. Can reuse the last shader from the last frag in the effect. ## Matrix Setup The matrix contains a "tmat" and "nmat". The "tmat" transforms the point and the "nmat" rotates the normals. Matrices that are freshly uploaded with this fragment are preprocessed to include the effect of the perspective matrix. The inputs are in registers 8, 10, 12, 25, and the outputs are 9, 11, 13, 26. The output is written over the input. ``` mula.xyzw ACC, vf15, vf08 maddz.xyzw vf09, vf16, vf08 mula.xyzw ACC, vf15, vf10 maddz.xyzw vf11, vf16, vf10 mula.xyzw ACC, vf15, vf12 maddz.xyzw vf13, vf16, vf12 addax.xyzw vf20, vf00 madda.xyzw ACC, vf27, vf25 maddz.xyzw vf26, vf28, vf25 ``` with ``` vf15 = [spdx, spdy, spdz, 0] vf16 = [0, 0, 0, spdw] vf27 = [pdx, pdy, pdz, 0] vf28 = [0, 0, 0, pdw] ``` (see `merc_asm.asm` for how to compute these in more detail) ## Vertex layout The "lump" data contains the vertices. The layout is: ``` mat0, mat1, nrmx, posx dst0, dst1, nrmy, posy texs, text, nrmz, posz ``` It begins with `mat1-cnt` vertices that are deformed by only a single matrix. The `rgba-offset` points to somewhere in the u4 data. Each vertex has a single rgba, stored as unpacked u8's. ## Mat1 Loop `vi01` is the lump pointer that reads vertices, `vf17`, `vf18`, `vf19` are the "lump offsets" to be applied to the unpacked vertices. The operations are: ```asm ilwr.x vi08, vi01 ;; load mat0 from vertex lqi.xyzw vf08, vi01 ;; load vertex qw 0 lqi.xyzw vf11, vi01 ;; load vertex qw 1 lqi.xyzw vf14, vi01 ;; load vertex qw 2 lq.xyz vf29, 4(vi08) ;; load nmat0 lq.xyz vf30, 5(vi08) ;; load nmat1 lq.xyzw vf31, 6(vi08) ;; load nmat2 add.zw vf08, vf08, vf17 ;; lump offset add.xyzw vf11, vf11, vf18 ;; lump offset add.xyzw vf14, vf14, vf19 ;; lump offset mtir vi10, vf11.x ;; get dest0 mtir vi13, vf11.y ;; get dest1 ;; rotate normal mulaz.xyzw ACC, vf29, vf08 maddaz.xyzw ACC, vf30, vf11 maddz.xyz vf11, vf31, vf14 ;; load tmat lq.xyzw vf25, 0(vi08) lq.xyzw vf26, 1(vi08) lq.xyzw vf27, 2(vi08) lq.xyzw vf28, 3(vi08) ;; get normal 1/length erleng.xyz P, vf11 ;; transform point mulaw.xyzw ACC, vf25, vf08 maddaw.xyzw ACC, vf26, vf11 maddw.xyzw vf08, vf27, vf14 add.xyzw vf08, vf08, vf28 ;; clear nrmz mr32.z vf14, vf00 ;; ONLY if merc prime miniw.w vf08, vf08, vf01 ;; perspective divide div Q, vf01.w, vf08.w mul.xyz vf08, vf08, Q mul.xyzw vf14, vf14, Q ;; load rgba lqi.xyzw vf23, vi03 ;; hvdf offset add.xyzw vf08, vf08, vf22 ;; normalize normal mfp.w vf20, P mulw.xyzw vf11, vf11, vf20 ;; fog max miniw.w vf08, vf08, vf03 ;; fetch mat1 (note that vi01 is incremented 3x in pipeline) ilw.y vi09, -6(vi01) ;; dot product with light mulax.xyzw ACC, vf01, vf11 madday.xyzw ACC, vf02, vf11 maddz.xyzw vf11, vf03, vf11 ;; fog min maxw.w vf08, vf08, vf02 ;; rgba itof itof0.xyzw vf23, vf23 ;; light clamp maxx.xyzw vf11, vf11, vf00 move.xyzw vf21, vf08 ;; color the lights mulax.xyzw ACC, vf04, vf11 madday.xyzw ACC, vf05, vf11 maddaz.xyzw ACC, vf06, vf11 ;; IF vi09 <= 0 addx.w vf21, vf21, vf17 ;; add ambient lighting maddw.xyzw vf11, vf07, vf00 ;; ftoi the vertex position ftoi4.xyzw vf21, vf21 ;; apply vertex color mul.xyzw vf11, vf11, vf23 ;; store dest sq.xyzw vf21, 2(vi10) ;; IF vi09 == 0 ftoi4.xyzw vf21, vf08 ;; store st sq.xyzw vf14, 0(vi10) sq.xyzw vf14, 0(vi13) ;; store second position sq.xyzw vf21, 2(vi13) ;; final light miniy.xyzw vf11, vf11, vf17 ftoi0.xyzw vf11, vf11 sq.xyzw vf11, 1(vi10) sq.xyzw vf11, 1(vi13) ``` Note: it might be that the last vertex can't change its adc flag for dst2. Maybe only in some cases. Worth checking more if there are stripping issues. ## Mat2 Loop NOTE: we might need to advance perc by 1 at the beginning if there are any mat0's. ```asm ;; compute perc ptr ilw.x vi02, 3(vi12) iadd vi02, vi02, vi12 ;; load vertex lqi.xyzw vf08, vi01 lqi.xyzw vf11, vi01 lqi.xyzw vf14, vi01 ;; and perc lqi.xyzw vf24, vi02 ;; extract mat idx mtir vi10, vf08.x mtir vi13, vf08.y ;; convert perc itof0.xyzw vf24, vf24 ;; lump offset add.zw vf08, vf08, vf17 add.xyzw vf11, vf11, vf18 add.xyzw vf14, vf14, vf19 ;; mask off sign bit of mat0 and mat1 iand vi10, vi10, vi05 (vi05 = 0x7f) <- looks like a 0 here means "reuse prev mat" iand vi13, vi13, vi05 ;; scale perc mulw.xyzw vf24, vf24, vf29 ;; vf29.w = 0.003921569 ;; load matrices lq.xyzw vf20, 0(vi10) ;; tmat0.0 lq.xyzw vf25, 0(vi13) ;; tmat1.0 lq.xyzw vf23, 1(vi10) ;; tmat0.1 lq.xyzw vf26, 1(vi13) ;; tmat1.1 lq.xyzw vf20, 2(vi10) ;; tmat0.2 lq.xyzw vf27, 2(vi13) ;; tmat1.2 lq.xyzw vf23, 3(vi10) ;; tmat0.3 lq.xyzw vf28, 3(vi13) ;; tmat1.3 lq.xyzw vf20, 4(vi10) ;; nmat0.0 lq.xyz vf29, 4(vi13) ;; nmat1.0 lq.xyzw vf23, 5(vi10) ;; nmat0.1 lq.xyz vf30, 5(vi13) ;; nmat1.1 lq.xyzw vf20, 6(vi10) ;; nmat0.2 lq.xyzw vf31, 6(vi13) ;; nmat1.2 ;; multiply rows by perc. ;; mat0 uses perc.x, mat0 uses perc.y. Matrices are added. ;; ex: tmat2 = tmat0.2 * perc.x + tmat1.2 * perc.y Results are vf25 = tmat0 vf26 = tmat1 vf27 = tmat2 vf28 = tmat3 vf29 = nmat0 vf30 = nmat1 vf31 = nmat2 ;; rotate normal mulaz.xyzw ACC, vf29, vf08 maddaz.xyzw ACC, vf30, vf11 maddz.xyz vf11, vf31, vf14 ;; transform point mulaw.xyzw ACC, vf25, vf08 maddaw.xyzw ACC, vf26, vf11 maddw.xyzw vf08, vf27, vf14 add.xyzw vf08, vf08, vf28 ;; length of normal erleng.xyz P, vf11 ;; rgba offset ilwr.y vi03, vi12 iadd vi03, vi03, vi12 ;; persepctive divide div Q, vf01.w, vf08.w mul.xyz vf08, vf08, Q mul.xyzw vf14, vf14, Q ;; load rgba lqi.xyzw vf23, vi03 ;; hvdf offset add.xyzw vf08, vf08, vf22 ;; normalize normal mfp.w vf20, P mulw.xyzw vf11, vf11, vf20 ;; fog miniw.w vf08, vf08, vf03 maxw.w vf08, vf08, vf02 ;; go get mat1 again ilw.y vi09, -6(vi01) ;; light dot product mulax.xyzw ACC, vf01, vf11 madday.xyzw ACC, vf02, vf11 maddz.xyzw vf11, vf03, vf11 ;; vertex color convert itof0.xyzw vf23, vf23 ;; light clamp maxx.xyzw vf11, vf11, vf00 ;; adc logic ilw.y vi09, -6(vi01) move.xyzw vf21, vf08 ibgtz vi09, L47 addx.w vf21, vf21, vf17 L47: ilw.x vi09, -9(vi01) ftoi4.xyzw vf21, vf21 ilw.x vi09, -9(vi01) sq.xyzw vf21, 2(vi10) ibgez vi09, L50 ftoi4.xyzw vf21, vf08 L50: sq.xyzw vf21, 2(vi13) ``` ## Final Copies assuming no mercprime, leaving out pipe flush (which is a tangled mess to flush the pipes of whatever of the 3 different loops were) ``` ;; vi08 = mercprime flag ilw.w vi08, 1(vi00) ;; find the byte header again xtop vi02 iaddiu vi04, vi02, 0x8c ;; srcdest-off ilwr.x vi05, vi04 ;; samecopy-cnt ilw.w vi06, 1(vi04) ;; crosscopy-cnt ilw.x vi07, 2(vi04) ;; zero out vf25, vf26 minix.xyzw vf25, vf00, vf00 minix.xyzw vf26, vf00, vf00 ;; compute srcdest table address iadd vi05, vi05, vi04 ;; compute output zone iaddiu vi04, vi02, 0x173 ;; compute srcdest cross copy table address iadd vi06, vi06, vi05 ;; compute end of cross copy iadd vi07, vi07, vi06 ;; compute address of the _old_ output buffer iaddiu vi08, vi00, 0x1ba isub vi08, vi08, vi02 iaddiu vi08, vi08, 0x173 ;; move addrs to vf mfir.x vf25, vi04 ;; vf25.x = output zone mfir.y vf25, vi04 ;; vf25.y = output zone mfir.x vf26, vi08 ;; vf26.x = old output zone mfir.y vf26, vi04 ;; vf26.y = output zone ;; set up maxx.xyzw vf13, vf13, vf00 ;; vf13 = 0 maxi.xy vf27, vf00, I, I = 8388608.0 ;; vf27.xy = [8388608.0 8388608.0] maxi.w vf27, vf00, I, I = 256 ;; vf27.w = 256 itof0.xyzw vf25, vf25 itof0.xyzw vf26, vf26 ior vi02, vi05, vi00 ;; vi02 = srcdst table add.xyzw vf25, vf25, vf27 ;; hack float add trick add.xyzw vf26, vf26, vf27 miniy.xyzw vf13, vf13, vf17 ;; float tricks ibne vi06, vi05, L150 ;; branch if there are samecopys ior vi06, vi07, vi00 | max.xyzw vf25, vf26, vf26 L150: ;; load from copy table lqi.xyzw vf27, vi05 ;; table to float itof0.xyzw vf27, vf27 ```