# Shrub Renderer The shrub renderer is part of the background system. Each level probably has 1 or 0 `drawable-tree-instance-shrub`s, containing all of the shrubs in that level (if it has any shrubs). Because the shrub renderer is part of the background system, actual DMA generation happens in `finish-background`. ## Original Design In `shrub`, there are prototypes and instances. Each "prototype" defines a model (like a bush, tree, etc). Each "instance" is a particular placement of a prototype in the world. Each "prototype" has 4 different geometries. Some of the geometries can be missing: - prototype-generic-shrub - prototype-shrubbery - prototype-trans-shrubbery - billboard The first two are believed to have the same data, but if the shrub is very close to the player and partially off-screen, it must be scissored, and only the `generic` renderer supports scissoring. The `prototype-trans-shrubbery` allows shrubs to fade away. It's likely that the format is extremely similar, or even the exact same. The `billboard` is a single quad. Effects: - Time of Day lighting. It looks like each "drawable-tree-instance-shrub" has a time-of-day color palette that is adjusted based on the time of day - Per-instance time of day lighting. Each instance may use different colors. - Wind effect. This applies an additional transformation matrix per instance. ## Our Design We will ignore the prototype-generic-shrub - OpenGL will take care of scissoring for us. Like with tfrag/tie, we will do the time of day interpolation in C++. The shrubs without wind effect will be converted into a single giant mesh. Doing it as a single mesh reduces the number of draw calls, and the entire mesh can be left in GPU memory the whole time. The shrubs with wind effect will be drawn as individual instances, as different shrubs need different wind matrices. It's likely going to be similar to `render_tree_wind`. The time-of-day effect will be done like in tfrag/tie. We will create a new time of day texture on each frame, based on the current time, and each vertex will index into a single large texture. This approach is nice because the interpolation/upload can be done in a single large batch. ## Setup Before (in `background.gc`) The shrub system doesn't use the precomputed visibility strings, so we can ignore this. - The `background-upload-vu0` function loads `vf16-vf31` with various math camera values. - The `background-upload-vu0` function loads hte `background-vu0-block` program to VU0 and runs the subroutine at 0. - The current level index (0 or 1) is stored in the scratchpad (as a `terrain-context`) - The time of day colors are calculated with `time-of-day-interp-colors`. The colors are stored in `*instance-tie-work*`. We can move this to C++ and do it faster. After setup, the main function to generate DMA is `draw-drawable-tree-instance-shrub`. This function will be removed in the PC port. Instead, we will send the C++ code some data: - camera matrix - name of the level ## `draw-drawable-tree-instance-shrub` Basic outline - Reset the `instance-shrub-work` - Check if renderer is enabled - Call `draw-inline-array-instance-shrub`. Each prototype has a "bucket" containing a linked list of instances. This function adds the instances to the buckets. - Call `draw-prototype-inline-array-shrub`. This builds the final DMA list from the buckets. - Various performance counter things that we can ignore. ## `draw-inline-array-instance-shrub` Args: - `a0` dma buffer - `a1` inline array of `draw-node` (a usual draw-node BVH with child type `instance-shrubbery`) - `a2` length of this array - `a3` inline array of `prototype-bucket-shrub` ```lisp B0: ;; block 0: one-time setup L57: ;; Function prologue daddiu sp, sp, -32 sd ra, 0(sp) sq gp, 16(sp) lui t3, 28672 ;; t3 = 0x70000000, the scratchpad lw v1, 4(a0) ;; v1 = (-> dma-buf base). we'll be writing DMA data here. lui t2, 4096 ;; t2 = 0x10000000 (used later) lui t1, 4096 ;; t1 = 0x10000000 (used later) ;; this does some data cache stuff. we don't have to worry about it. sync.l cache dxwbin v1, 0 sync.l cache dxwbin v1, 1 sync.l lw t0, *instance-shrub-work*(s7) ;; t0 = instance-shrub-work. This stores many temporary variables. ori t5, t2, 54272 ;; t5 = 0x1000D400 (DMA SPR_TO register) sw a0, 6524(t0) ;; stash dma-buf argument in instance-shrub-work.dma-buffer ori a0, t1, 53248 ;; a0 = 0x1000D000 (DMA SPR_FROM register) lw t2, *wind-work*(s7) ;; t2 = *wind-work* ;; note on crazy scratchpad stuff. ;; to get faster speed, it is useful to have both the input (instances) and output (DMA data) stored ;; in the scratchpad. However, the scratchpad is not big enough to store everything. ;; they divide the scrachpad in 4: ;; 0-5200 is one "instance" buffer ;; 5200-10400 is the other "instance" buffer ;; 10400-12448 is on "out" buffer ;; 12448-end is the other "out" buffer. ;; This code reads instance data from one instance buffer and writes DMA data to one out buffer. ;; while this is happening, the SPR_TO/SPR_FROM channels will be copying the next instances to ;; the other instance buffer, and copying the output dma back into the dma-buf. ;; Once they are done, the buffers will swapped. So there is continuous copying and processing. ;; I will use notation like spad.instance-buf and spad.out-buf to indicate the scratchpad buffers. ;; There are two instance buffers, and we don't have to really care which one they are using - ;; we can assume that they implemented double buffering properly. ori t1, t3, 10416 ;; t1 = spad.out-buf (high buffer) sw r0, 6544(t0) ;; instance-work.chains = 0 ;; Note on "stack" ;; this draw-node tree is... a tree. ;; this drawing function traverses the tree. ;; in order to traverse a tree, you need something like a stack. ;; the tree has a fixed max depth of 6 ;; The node/length fields of the instance-shrub-work are this stack. ;; t4 is the "stack pointer". It points to instance-shrub-work + 4*depth. ;; Then you can access at the normal offsets of node/length to access the correct ;; slot for your stack frame. or t4, t0, r0 ;; t4 = instance-work (todo, why?) lqc2 vf3, 6064(t0) ;; vf3 = instance-work.constants (128, 1.0, 0.0, fog0) sw t5, 6412(t0) ;; instance-work.to-spr = 0x1000D400 (just stashing this here for later) ori t6, t3, 16 ;; t6 = spad.instance-buf (low buffer) addiu t7, r0, 720 ;; t7 = 720 sw a3, 6476(t0) ;; instance-work.prototypes = the input inline array of prototypes addiu t3, r0, 0 ;; t3 = 0 sw a3, 6404(t0) ;; instance-work.bucket-ptr = the input inline array of prototypes addiu a3, r0, 0 ;; a3 = 0 sw a1, 6428(t4) ;; instance-work.node = the input draw node. (note, we're using t4 here) or t3, t1, r0 ;; t3 = spad.out-buf sw a2, 6452(t4) ;; instance-work.length = the input length (num draw nodes at this level) addiu a1, r0, -1 ;; a1 = -1 sw t7, 6516(t0) ;; instance-work.current-shrub-near-packet = 720 (?) daddiu t7, t0, 48 ;; t7 = instance-work.chaina sw t6, 6408(t0) ;; instance-work.src-ptr = spad.instance-buf daddiu a2, t0, 176 ;; a2 = instance-work.chainb sw t6, 6388(t0) ;; instance-work.instance-ptr = spad.instance-buf daddiu t6, r0, -64 ;; t6 = -64 sw t5, 6412(t0) ;; instance-work.to-spr = 0x1000D4000 (oops, did it twice) ;; note on alignment. ;; the instance-shrub-work object is only 16-byte aligned. ;; but, for some reason, they want these chaina/chainb things to be 64 byte aligned. ;; they put a 48 byte "dummy" field before them, and and with -64 to get aligned versions. ;; I'll call these aligned versions chaina-aligned/chainb-aligned and t5, t7, t6 ;; t5 = chaina-aligned sw a0, 6416(t0) ;; instance-work.from-spr = 0x1000D000 and a2, a2, t6 ;; a2 = chainb-aligned sw t5, 6392(t0) ;; instance-work.chain-ptr = chaina-aligned addiu t5, r0, -1 ;; t5 = -1 sw a2, 6396(t0) ;; instance-work.chain-ptr-next = chainb-aligned sll r0, r0, 0 ;; nop sw t4, 6400(t0) ;; instance-work.stack-ptr = t4 (right now, at base) sll r0, r0, 0 ;; nop sw t5, 6540(t0) ;; instance-work.last-shrubs = -1 sll r0, r0, 0 ;; nop sw r0, 6548(t0) ;; instance-work.flags = 0 sll r0, r0, 0 ;; nop sw r0, 6560(t0) ;; instance-work.inst-count = 0 sll r0, r0, 0 ;; nop sw r0, 6556(t0) ;; instance-work.node-count = 0 ;; Note on vcallms 17. this is a tiny program that loads vf's ;; plane is the culling planes (in normal world coordinates) ;; vf24-vf27 use the camera-rot matrix. This confusingly also includes the ;; translation, but does not include the projection matrix. ;; each vector is just the z component of that camera vector repeated 4 times ;; (it's computed in the vcallms 0 of background-upload-vu0) ;; lq.xyzw vf16, 0(vi00) | nop ;; plane0 ;; lq.xyzw vf17, 1(vi00) | nop ;; plane1 ;; lq.xyzw vf18, 2(vi00) | nop ;; plane2 ;; lq.xyzw vf19, 3(vi00) | nop ;; plane3 ;; lq.xyzw vf24, 12(vi00) | nop ;; [cam-rot0.z cam0-rot.z cam0-rot.z cam0-rot.z] ;; lq.xyzw vf25, 13(vi00) | nop ;; same but cam-rot1 ;; lq.xyzw vf26, 14(vi00) | nop :e ;; lq.xyzw vf27, 15(vi00) | nop ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; B1: L58: ;; LOOP TOP. We reach here when we want to explore a new draw node. vcallms 17 ;; set up vf registers lw t4, 6400(t0) ;; t4 = instance-work.stack-ptr addiu t5, r0, 7 ;; t5 = 7 (remaining instances in group. we find up to 7 visible instances) lw a2, 6392(t0) ;; a2 = instance-work.chain-ptr sll r0, r0, 0 ;; nops, I guess to wait for the vu program? sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 ;; starting here, we're looking for a node that we can draw. ;; this is doing "sphere in view frustum" culling through the BVH tree ;; it will exit once it's found the next visible thing to draw. ;; the details here are: ;; - normal "can we see the sphere?" check ;; - also a distance from the camera check. If we fail that, skip. ;; - this builds DMA, but not drawing DMA. It builds DMA to upload the thing to the scratchpad. ;; once we find it, go to L63. B2: L59: dsubu t7, t4, t0 ;; t7 = 0 if at root of tree, negative otherwise lw t6, 6452(t4) ;; t6 = length at current stack frame bltz t7, L63 ;; if we're not at one of the roots, draw it. we wouldn't have added it otherwise. lw t8, 6428(t4) ;; t8 = node ;; we'll only get here if we're at the root. We have no idea if the roots are visible or not beq t6, r0, L62 ;; if no nodes, skip! lqc2 vf2, 12(t8) ;; vf2 = bsphere of the node ;; note that this code assumes we're deep enough to find instance-shrubs. ;; and sets up DMA to DMA them to the scratchpad for later processing. ;; but, we might have only found draw-nodes. ;; this is okay. The DMA we set up here will only be used if we actually find instance-shrubs. ;; we also set up the stack for more draw nodes. Again, it's okay because we'll only actually increment ;; the stack pointer if we find out that there are more levels. ;; the bsphere culling code for draw nodes/instances are identical, so that part ;; can be used in either case. B4: sll r0, r0, 0 ;; nop lqc2 vf6, -4(t8) ;; vf6.w = distance of the node. (other stuff is junk I think) vmulax.xyzw acc, vf16, vf2 ;; sphere in view frustum (will eventually put result in vf4) lbu t6, 3(t8) ;; t6 = node flags vmadday.xyzw acc, vf17, vf2 ;; sphere in view frustum lw t7, 4(t8) ;; t7 = node child vmaddaz.xyzw acc, vf18, vf2 ;; sphere in view frustum lbu t8, 2(t8) ;; t8 = node child count vmsubaw.xyzw acc, vf19, vf0 ;; sphere in view frustum lq t9, 6016(t0) ;; t9 = instance-work.dma-ref vmaddw.xyzw vf4, vf1, vf2 ;; sphere in view frustum (done!, vf4 now has signed distance from planes) sw t7, 6432(t4) ;; place child on stack vmulaw.xyzw acc, vf1, vf6 ;; acc = [dist, dist, dist, dist] sw t8, 6456(t4) ;; place child's length on stack vmsubax.xyzw acc, vf24, vf2 ;; dist calc (note, just for computing z) sq t9, 0(a2) ;; store dma-ref in chain-ptr vmsubay.xyzw acc, vf25, vf2 ;; more dist calc daddiu t9, t7, -4 ;; t9 = node minus type tag vmsubaz.xyzw acc, vf26, vf2 ;; more dist calc sll t7, t8, 2 ;; t7 = num children * 4 qmfc2.i ra, vf4 ;; ra = sphere/plane signed distances addu t7, t7, t8 ;; t7 = num children * 5 vmsubaw.xyzw acc, vf27, vf0 ;; more dist calc sw t9, 4(a2) ;; store address of draw nodes in the dma tag vmaddw.xyzw vf7, vf1, vf2 ;; finish dist calc sw t8, 8(a2) ;; stash the child count after the dma tag (space unused) pcgtw t8, r0, ra ;; check signed distance to planes lw t9, 6452(t4) ;; t9 = current stack length ppach ra, r0, t8 ;; pack so signed distance compares are in lower 64 lw t8, 6428(t4) ;; t8 = node bne ra, r0, L61 ;; branch on reject sb t7, 0(a2) ;; store qwc in chain ;; if we reach here, we passed the sphere in view check B5: sll r0, r0, 0 sll r0, r0, 0 daddiu t7, t9, -1 ;; t7 = stack length - 1 qmfc2.i t9, vf7 ;; t9 = dist check result daddiu t8, t8, 32 ;; advance to next node (assuming draw nodes) sll r0, r0, 0 bltz t9, L61 ;; branch if failed dist check sll r0, r0, 0 B6: beq t6, r0, L60 ;; check if we actually reached the instances (0 = instances). sll r0, r0, 0 ;; B7: beq r0, r0, L59 ;; didn't reach instances. need to go deeper in tree! daddiu t4, t4, 4 ;; inrease stack depth. branch will find visible things. ;; if we reach here: ;; - we've reached leaves (instances) ;; - the instance is visible ;; - we have a chain set up to DMA it to the scratchpad. B8: L60: daddiu a2, a2, 16 ;; advance dma building pointer (looks like we have room for up to 8) sw t7, 6452(t4) ;; decrement stack length (we're done with this one) daddiu t5, t5, -1 ;; decrement instance count (counts down from 7, we can only do 7 in a group) sw t8, 6428(t4) ;; increment node in stack blez t5, L63 ;; goto L63 if we're full for this group dsubu t6, t4, t0 ;; check if we're at the root still B9: bgtz t7, L59 ;; not full, more at this level. sll r0, r0, 0 B10: blez t6, L63 ;; if we're at the root of the tree and the lenth is zero, we're done, draw what we have. daddiu t4, t4, -4 ;; "return" and decrement sp (go up a level, we finished exploring this one) ;; common "advance to next based on stack" ;; we might have to return multiple levels, and this loop here does this. B11: L61: sll r0, r0, 0 lw t7, 6452(t4) ;; t7 = length sll r0, r0, 0 lw t6, 6428(t4) ;; t6 = node daddiu t7, t7, -1 ;; dec dsubu t8, t4, t0 ;; depth check daddiu t6, t6, 32 ;; inc node sw t7, 6452(t4) ;; store len bgtz t7, L59 ;; keep going if not done (break out of returning loop) sw t6, 6428(t4) ;; store node B12: blez t8, L63 ;; draw if we're at the end. sll r0, r0, 0 B13: L62: beq r0, r0, L61 ;; reloop in the return loop daddiu t4, t4, -4 ;; ascend one level ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; DMA TO SPR ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; if we reach here, we've got a chain set up that will send visible instances to the SPR. B14: L63: sll r0, r0, 0 ;; nop sw t4, 6400(t0) ;; store draw node stack pointer in instance-shurb-work sll r0, r0, 0 ;; nop lw t5, 6392(t0) ;; t5 = instance-work.chain-ptr (the start of the visible instance chain we just made) sll r0, r0, 0 ;; nop lw t4, 6412(t0) ;; t4 = instance-work.to-spr (EE DMA control register address) beq t5, a2, L66 ;; will be equal if we didn't have any DMA lq t5, 6032(t0) ;; dma-end (an 'end packet) ;; if we get here, we actually have data to send ;; these two blocks just wait until any in-progress to-sprs finish. ;; every iteration of the loop increments the "wait-to-spr" counter ;; (they likely tuned this code to reduce waits by moving stuff around) B15: L64: lw t6, 0(t4) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 andi t6, t6, 256 sll r0, r0, 0 beq t6, r0, L65 sll r0, r0, 0 B16: sll r0, r0, 0 lw t6, 6568(t0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 daddiu t6, t6, 1 sll r0, r0, 0 sw t6, 6568(t0) beq r0, r0, L64 sll r0, r0, 0 ;; when we get here, there is no in-progress spr-to transfer B17: L65: sll r0, r0, 0 ;; nop lw t6, 6544(t0) ;; t6 = instance-work.chains (just a counter of how many spad uploads we do) sll r0, r0, 0 ;; nop sq t5, 0(a2) ;; store the end DMA tag (must go at the end of the DMA transfer) lw t5, 6392(t0) ;; t5 = instance-work.chain-ptr (start of the DMA chain) addiu a2, r0, 324 ;; a2 = 324 (constant to start DMA) lw t7, 6396(t0) ;; t7 = instance-work.chain-ptr-next (to-spr chain dma mem is double buffered) ori t8, r0, 65535 ;; t8 = 65535 sw t5, 6396(t0) ;; instance-work.chain-ptr-next = chain-ptr (swap!) daddiu t6, t6, 1 ;; increment chain count sw t7, 6392(t0) ;; instance-work.chain-ptr = chain-ptr-next (swap!) or t7, t5, r0 ;; t7 = chain for next time sll r0, r0, 0 ;; nop sw t6, 6544(t0) ;; write back incremented chain count sll r0, r0, 0 ;; nop lw t6, 6388(t0) ;; t6 = instance-work.instance-ptr (the scratchpad destination for the instance) sync.l cache dxwbin t7, 0 ;; write back the data (required before DMAing, EE DMA bypasses CPU caches) sync.l cache dxwbin t7, 1 sync.l daddiu t7, t7, 64 sync.l cache dxwbin t7, 0 sync.l cache dxwbin t7, 1 sync.l sw t6, 128(t4) ;; set up destination addr in DMA register sw t5, 48(t4) ;; set up source addr xori t5, t6, 5232 ;; toggle destination pointer (scratchpad destinations are double buffered) sw r0, 32(t4) ;; set qwc = 0 (I think it's ignored in chain mode) sync.l sw a2, 0(t4) ;; start transfer! sync.l sll r0, r0, 0 sw t5, 6408(t0) ;; store instance-work.src-ptr beq r0, r0, L68 ;; always go to L68! sw t5, 6388(t0) ;; store instance-work.instance-ptr (starting a new block, so equal to src-ptr) ;; if we reach here, it's because we didn't have any more visible instances. ;; we have two cases: ;; 1). we have stuff in scratchpad (the other buffer) waiting to be drawn. ;; 2). nothing was visible, so we have nothing in scratchpad. ;; we can tell these two cases from the sign of the a1 flag. B18: L66: bltz a1, L98 ;; goto end (L98) if the flag is negative lw a2, 6388(t0) ;; a2 = instance-work.instance-ptr. B19: sll r0, r0, 0 sw r0, 6540(t0) ;; instance-work.last-shrubs = 0 sll r0, r0, 0 xori a2, a2, 5232 ;; flip spad buffer (the last group isn't double buffered) sll r0, r0, 0 sw a2, 6408(t0) ;; store src-ptr sll r0, r0, 0 sw a2, 6388(t0) ;; store instance-ptr ;; dma sync - make sure the last to-spr is done. B20: L67: lw a2, 0(t4) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 andi a2, a2, 256 sll r0, r0, 0 beq a2, r0, L68 sll r0, r0, 0 B21: sll r0, r0, 0 lw a2, 6568(t0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 daddiu a2, a2, 1 sll r0, r0, 0 sw a2, 6568(t0) beq r0, r0, L67 sll r0, r0, 0 ;; the details of the from-spr is unknown, but it seems like setting a1 flag > 0 is used to indicate ;; that we have some pending stuff in spad that we have to copy back. B22: L68: bgez a1, L93 ;; if we have stuff, go to some later spad dma code lw a2, 6408(t0) ;; a2 = instance-work.src-ptr B23: beq r0, r0, L58 ;; nope, we're done, go to loop top addiu a1, r0, 10000 ;; but, remember we just did a dma sync for to. So we do have more work to do. ;; ideally we'll find more visible stuff and add to what we have now. ;; but if we don't, we set this flag to >0 to indicate that we have ;; stuff that we still need to process. ;; we reach here once we have visible instances in the scratchpad. ;; but, before we can process them, we have to make sure the output buffer ;; in the scratchpad has enough room. ;; If not, we do a DMA transfer back to RAM (to the dma-buf passed in) ;; this is copying completed VU1 DMA data. B24: L69: daddiu t4, a3, -106 ;; 106 instances max in out buf, I guess lqc2 vf2, 16(a2) ;; vf2 = bsphere of the first instance (they start prepping for the instance loop here...) blez t4, L72 ;; goto L72 if we have enough room in spr lbu t4, 6(a2) ;; t4 = instance.bucket-index (loaded as a u8, maybe only up to 255 buckets/tree?) ;; next three blocks wait for from-spr to finish. Need to do this before ;; starting the next from-spr transfer B25: sll r0, r0, 0 lw a0, 6416(t0) sll r0, r0, 0 sll r0, r0, 0 B26: L70: lw t3, 0(a0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 andi t3, t3, 256 sll r0, r0, 0 beq t3, r0, L71 sll r0, r0, 0 B27: sll r0, r0, 0 lw t3, 6564(t0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 daddiu t3, t3, 1 sll r0, r0, 0 sw t3, 6564(t0) beq r0, r0, L70 sll r0, r0, 0 ;; start from-spr and swap output data buffers B28: L71: sw t1, 128(a0) xori t1, t1, 6144 ;; swap buffer sw v1, 16(a0) sll t3, a3, 4 ;; compute size (16 qw's per instance?) addu v1, v1, t3 ;; v1 is the next dma-buf output address (maybe needed for refs in upcoming DMA build) or t3, t1, r0 sw a3, 32(a0) addiu a3, r0, 256 sw a3, 0(a0) ;; start! addiu a3, r0, 0 ;; reset count ;; if we reach here, we're finally ready to process the instance. ;; one cool trick they do here is to build B29: L72: vcallms 33 ;; see backround-vu0-result.txt. This program does the sphere in view and distance checks. ;; the result is stored in vf04/vf06 and vi02 lw t5, 6548(t0) ;; t5 = instance-work.flags (was initialized to 0) beq a1, t4, L74 ;; if we're using the same prototype as last time, skip ahead a bit. daddiu t6, a1, -10000 B30: beq t6, r0, L73 lw a1, 6404(t0) B31: ;; I think this only runs on the very first run. sll r0, r0, 0 ;; it copies the last/next/counts of instance-work to the first thing in the proto bucket array lq t5, 6336(t0) sll r0, r0, 0 lq t6, 6352(t0) sll r0, r0, 0 lq t7, 6368(t0) sll r0, r0, 0 sq t5, 92(a1) sll r0, r0, 0 sq t6, 60(a1) sll r0, r0, 0 sq t7, 76(a1) B32: L73: or a1, t4, r0 ;; a1 = current prototype idx (remember it for next time) lw t5, 6476(t0) ;; t5 = prototypes array addiu t6, r0, 112 ;; t6 = 112 sq r0, 6336(t0) ;; work.lasts = 0 multu3 t4, t4, t6 ;; multiply for array access sq r0, 6352(t0) ;; work.nexts = 0 daddu t4, t5, t4 ;; t4 = ptr to bucket sq r0, 6368(t0) ;; work.counts = 0 sll r0, r0, 0 ;; nop sw t4, 6404(t0) ;; store bucket in work.bucket-ptr sll r0, r0, 0 ;; nop lw t5, 4(t4) ;; t5 = bucket flags sll r0, r0, 0 ;; nop lqc2 vf15, 44(t4) ;; vf15 = lengths andi t5, t5, 1 ;; t5 = flag & 1 lqc2 vf14, 28(t4) ;; vf14 = near/mid/far plane vmul.xyz vf15, vf15, vf3 ;; vf15 = lengths * some constants? sw t5, 6548(t0) ;; store flags in instance-work.flags ;; from here on, it looks like we jump to L92 if we reject the instance ;; NOTE: starting here is the matrix stuff. ;; we'll need to understand this to "de-instance" the non-wind instances ;; and to implement wind in C++ B33: L74: bne t5, r0, L92 ;; check flags & 1. This flag is only set from the debug menu (see dm-enable-instance-func) ;; and it's just used to disable a specific prototype for debugging. ld t5, 56(a2) ;; loading the origin matrix (4x 16-bit integers/row) (this the last row) B34: sll r0, r0, 0 ld t4, 32(a2) ;; t4 = row 0 pextlh t5, t5, r0 ;; unpack row 3 to u32's (effectively shifts left 16) ld t6, 40(a2) ;; t6 = row 1 psraw t7, t5, 10 ;; t7 = shift row 3 right by 10 (two shifts equivalent to shift left by 6 and sign extend) ld t5, 48(a2) ;; t5 = row 2 pextlh t8, t4, r0 ;; t8 = row 0 to u32's lhu t4, 8(a2) ;; t4 = instance.color-indices (I think an offset in the tree's palette, different from TIE) psraw t8, t8, 16 ;; t8 = shift row 0 right by 16 (two shifts equivalent to just sign extending) lq t9, 64(a2) ;; t9 = instance.flat-normal pextlh t6, t6, r0 ;; t6 = row 1 unpacked qmtc2.ni vf13, t7 ;; vf13 = row 3 psraw t6, t6, 16 ;; t6 = row 1 shifted qmtc2.ni vf18, t9 ;; vf18 = instance.flat-normal pextlh t5, t5, r0 ;; t5 = row 2 unpacked qmtc2.ni vf10, t8 ;; vf10 = row 0 psraw t5, t5, 16 ;; t5 = row 2 shifted qmtc2.ni vf11, t6 ;; vf11 = row 1 daddu t4, t4, t0 ;; t4 = color data - 304 qmtc2.ni vf12, t5 ;; vf12 = row 2 sll r0, r0, 0 cfc2.i t5, vi1 ;; t5 = vis result. vitof0.xyzw vf13, vf13 ;; vf13 = row 3, as floats lw t6, 304(t4) ;; t6 = rgba for this instance (8888 format) bne t5, r0, L92 ;; possibly reject this instance. lq t4, 6080(t0) ;; t4 = color constants (some hacky int to float stuff here) B35: pextlb t5, r0, t6 ;; t5 = unpacked rgba to u16's lqc2 vf4, 6096(t0) ;; vf4 = hmge-d pextlh t5, r0, t5 ;; t5 = unpacked rgba to u32's lqc2 vf25, 6176(t0) ;; vf25 = min-dist (interesting...) vsub.xyzw vf9, vf6, vf14 ;; vf6 is the "dist" of the draw node? sll r0, r0, 0 psllw t6, t5, 8 ;; t6 = multiply colors by 256 mfc1 r0, f31 paddw t4, t6, t4 ;; t4 = colors + color constants mfc1 r0, f31 vmula.xyzw acc, vf1, vf3 ;; sll r0, r0, 0 vmsub.xyzw vf9, vf9, vf15 sq t5, 6160(t0) ;; stash bb color vadd.xyz vf13, vf13, vf2 ;; same bsphere origin trick as tie sq t4, 6144(t0) ;; store floating point color vsubw.xyzw vf8, vf6, vf2 ;; distance compensate for bsphere radius sll r0, r0, 0 vitof12.xyzw vf10, vf10 ;; row 0 as floats sll r0, r0, 0 vmini.xyzw vf9, vf9, vf3 ;; dist crap lw t4, 6404(t0) ;; t4 = bucket-ptr vadd.xyz vf18, vf18, vf13 ;; flat-normal + real-origin sll r0, r0, 0 vmulax.xyzw acc, vf28, vf13 ;; lw t4, 24(t4) ;; geom3 vmadday.xyzw acc, vf29, vf13 sll r0, r0, 0 vmaxx.xyzw vf9, vf9, vf0 sll r0, r0, 0 vmaddaz.xyzw acc, vf30, vf13 sll r0, r0, 0 vmaddw.xyzw vf5, vf31, vf0 ;; vf.w is inverse distance from camera, I think sll r0, r0, 0 vitof12.xyzw vf11, vf11 ;; vf11 = row 1 floats sll r0, r0, 0 vftoi0.xyzw vf19, vf9 ;; distance stuff sll r0, r0, 0 vmini.xyzw vf25, vf8, vf25 ;; apply min dist sll r0, r0, 0 vsubz.xyzw vf4, vf8, vf4 ;; apply hmge addiu t5, r0, 128 ;; ?? t5 = 128 vitof12.xyzw vf12, vf12 ;; vf12 = row 2 float addiu t6, r0, 255 ;; ?? t6 = 255 vmulw.y vf9, vf9, vf15 ;; multiply by lengths sll r0, r0, 0 sll r0, r0, 0 qmfc2.i t7, vf19 ;; integer dist compare vdiv Q, vf3.w, vf5.w ;; compute Q here, I guess sll r0, r0, 0 and t6, t7, t6 sll r0, r0, 0 dsubu t7, t5, t6 sw t6, 6156(t0) ;; adjusted color for fade out. beq t5, t6, L80 ;; branch if don't try billboard, I think? sqc2 vf25, 6176(t0) B36: beq t4, r0, L75 ;; don't do billboard if we don't have it sw t7, 6172(t0) B37: ;;;;;;;;;;;;;;; ;; BILLBOARD ;;;;;;;;;;;;;;; vmulax.xyzw acc, vf28, vf18 lq t4, 5104(t0) vmadday.xyzw acc, vf29, vf18 lq t5, 5120(t0) vmaddaz.xyzw acc, vf30, vf18 lw t6, 6348(t0) vmaddw.xyzw vf18, vf31, vf0 lw t7, 6364(t0) sll t8, a3, 4 lqc2 vf8, 6112(t0) addu t8, t8, v1 lqc2 vf7, 64(a2) vmulaq.xyz acc, vf5, Q lq a2, 6160(t0) vmulaw.w acc, vf5, vf0 movz t6, t8, t6 vmadd.xyzw vf5, vf1, vf8 lhu t9, 6374(t0) vmulq.w vf19, vf7, Q sll r0, r0, 0 daddiu t9, t9, 1 lqc2 vf6, 5136(t0) vmulq.xyzw vf26, vf1, Q sw t6, 6348(t0) vmulq.xyzw vf27, vf1, Q sw t8, 6364(t0) vnop sll r0, r0, 0 vmaxz.w vf5, vf5, vf6 sh t9, 6374(t0) vdiv Q, vf3.w, vf18.w sll r0, r0, 0 vmulax.xyzw acc, vf20, vf10 sq t4, 0(t3) vaddx.x vf26, vf0, vf0 sq t5, 16(t3) vminiw.w vf5, vf5, vf6 sq a2, 48(t3) vmadday.xyzw acc, vf21, vf10 sq a2, 96(t3) vmaddz.xyzw vf10, vf22, vf10 sq a2, 144(t3) vmulaw.w acc, vf18, vf0 sq a2, 192(t3) vmulaq.xyz acc, vf18, Q sw t7, 4(t3) vmadd.xyzw vf18, vf1, vf8 sll r0, r0, 0 vmulq.w vf8, vf7, Q sll r0, r0, 0 vmulq.xyzw vf24, vf1, Q sll r0, r0, 0 vmulq.xyzw vf25, vf1, Q sll r0, r0, 0 vmaxz.w vf18, vf18, vf6 sll r0, r0, 0 vadd.xy vf24, vf0, vf0 sll r0, r0, 0 vaddy.y vf25, vf0, vf0 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf11 sll r0, r0, 0 vminiw.w vf18, vf18, vf6 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf11 sll r0, r0, 0 vmaddz.xyzw vf11, vf22, vf11 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf12 sll r0, r0, 0 vsub.xyzw vf16, vf18, vf5 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf12 sll r0, r0, 0 vmaddz.xyzw vf12, vf22, vf12 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf13 sll r0, r0, 0 vaddy.y vf16, vf16, vf16 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf13 sll r0, r0, 0 vmaddaz.xyzw acc, vf22, vf13 sll r0, r0, 0 vmaddw.xyzw vf13, vf23, vf0 sll r0, r0, 0 vmul.xy vf17, vf16, vf16 sll r0, r0, 0 sll r0, r0, 0 sqc2 vf24, 32(t3) sll r0, r0, 0 sqc2 vf25, 80(t3) sll r0, r0, 0 sqc2 vf26, 128(t3) vaddy.x vf17, vf17, vf17 sll r0, r0, 0 sll r0, r0, 0 sqc2 vf27, 176(t3) vmulw.xyzw vf2, vf18, vf0 sll r0, r0, 0 vmulw.xyzw vf4, vf18, vf0 sll r0, r0, 0 vrsqrt Q, vf0.w, vf17.x sll r0, r0, 0 sll r0, r0, 0 vwaitq vmulq.xy vf17, vf16, Q sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 vsuby.x vf16, vf0, vf17 sll r0, r0, 0 vaddx.y vf16, vf0, vf17 sll r0, r0, 0 sll r0, r0, 0 sqc2 vf10, 240(t3) sll r0, r0, 0 sqc2 vf11, 256(t3) vmulw.xy vf8, vf16, vf8 sll r0, r0, 0 vmulw.xy vf19, vf16, vf19 sll r0, r0, 0 sll r0, r0, 0 lq a2, 6144(t0) sll r0, r0, 0 sll r0, r0, 0 vmul.xy vf8, vf8, vf6 sll r0, r0, 0 vmul.xy vf19, vf19, vf6 sll r0, r0, 0 vmulw.xyzw vf6, vf5, vf0 sll r0, r0, 0 vmulw.xyzw vf7, vf5, vf0 sq a2, 304(t3) vadd.xy vf2, vf18, vf8 sll r0, r0, 0 vsub.xy vf4, vf18, vf8 sll r0, r0, 0 vadd.xy vf6, vf5, vf19 sll r0, r0, 0 vsub.xy vf7, vf5, vf19 sll r0, r0, 0 vftoi4.xyzw vf2, vf2 sll r0, r0, 0 vftoi4.xyzw vf4, vf4 daddiu t3, t3, 224 vftoi4.xyzw vf6, vf6 daddiu a3, a3, 14 vftoi4.xyzw vf7, vf7 lw a2, 6156(t0) sll r0, r0, 0 sqc2 vf2, -160(t3) sll r0, r0, 0 sqc2 vf4, -112(t3) sll r0, r0, 0 sqc2 vf6, -64(t3) beq a2, r0, L92 sqc2 vf7, -16(t3) B38: beq r0, r0, L76 sll r0, r0, 0 B39: L75: beq t6, r0, L92 vmulax.xyzw acc, vf20, vf10 B40: vmadday.xyzw acc, vf21, vf10 lq a2, 6144(t0) vmaddz.xyzw vf10, vf22, vf10 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf11 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf11 sll r0, r0, 0 vmaddz.xyzw vf11, vf22, vf11 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf12 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf12 sll r0, r0, 0 vmaddz.xyzw vf12, vf22, vf12 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf13 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf13 sll r0, r0, 0 vmaddaz.xyzw acc, vf22, vf13 sll r0, r0, 0 vmaddw.xyzw vf13, vf23, vf0 sq a2, 80(t3) sll r0, r0, 0 sqc2 vf10, 16(t3) sll r0, r0, 0 sqc2 vf11, 32(t3) B41: L76: sll a2, a3, 4 lhu t4, 6380(t0) addu t5, a2, v1 lhu t7, 6372(t0) sll t6, t4, 4 lw a2, 6360(t0) daddu t8, t6, t0 lw t6, 6344(t0) daddiu t7, t7, 1 lq t8, 4400(t8) daddiu a3, a3, 6 sh t7, 6372(t0) daddiu t7, t4, 1 sq t8, 0(t3) daddiu t8, t7, -20 sqc2 vf12, 48(t3) movz t7, r0, t8 sqc2 vf13, 64(t3) daddiu t8, t4, -10 sh t7, 6380(t0) daddiu t3, t3, 96 sw a2, -92(t3) beq t4, r0, L77 sw t5, 6360(t0) B42: bne t8, r0, L78 sll r0, r0, 0 B43: L77: sll r0, r0, 0 lq t4, 5040(t0) sll r0, r0, 0 lq t7, 5056(t0) sll r0, r0, 0 sw t5, 6344(t0) sll r0, r0, 0 movz t4, t7, t6 daddiu a3, a3, 1 sq t4, 0(t3) sll r0, r0, 0 sw a2, 4(t3) beq r0, r0, L92 daddiu t3, t3, 16 B44: L78: daddiu t5, t4, -9 sll r0, r0, 0 beq t5, r0, L79 daddiu t4, t4, -19 B45: bne t4, r0, L92 sll r0, r0, 0 B46: L79: sll r0, r0, 0 sll t4, t7, 4 sll r0, r0, 0 daddu t4, t4, t0 daddiu a3, a3, 1 lq t4, 4720(t4) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sq t4, 0(t3) sll r0, r0, 0 sw a2, 4(t3) beq r0, r0, L92 daddiu t3, t3, 16 ;; I think the end of billboard. B47: L80: sll r0, r0, 0 lw t4, 1324(t2) ;; t4 = wind time (from global wind work) sll r0, r0, 0 lhu t5, 62(a2) ;; t5 = wind-index of the instance sll r0, r0, 0 lw a2, 6384(t0) ;; a2 = wind-vectors dsll t6, t5, 4 ;; t6 = t5 * 16 lqc2 vf19, 6048(t0) ;; vf19 = wind-const daddu a2, a2, t6 ;; a2 = wind-vector + (wind-index * 16) daddu t4, t5, t4 ;; t4 = wind-time + wind-index andi t5, t4, 63 ;; t5 = (wind-time + wind-index) & 63 ld t4, 8(a2) ;; t4 = winds sll t6, t5, 4 ;; t6 = ((wind-time + wind-index) & 63) * 16 ld t5, 0(a2) ;; t5 = winds addu t7, t6, t2 qmfc2.i t6, vf4 pextlw t4, r0, t4 lqc2 vf16, 12(t7) pextlw t5, r0, t5 qmtc2.i vf18, t4 sll r0, r0, 0 qmtc2.i vf17, t5 vmula.xyzw acc, vf16, vf1 sll r0, r0, 0 vmsubax.xyzw acc, vf18, vf19 sll r0, r0, 0 vmsuby.xyzw vf16, vf17, vf19 sll r0, r0, 0 pcgtw t5, r0, t6 mfc1 r0, f31 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 lqc2 vf24, 6208(t0) vmulaz.xyzw acc, vf16, vf19 sll r0, r0, 0 vmadd.xyzw vf18, vf1, vf18 sll r0, r0, 0 sll r0, r0, 0 lqc2 vf25, 6224(t0) sll r0, r0, 0 lqc2 vf26, 6240(t0) sll r0, r0, 0 lqc2 vf27, 6256(t0) vmulaz.xyzw acc, vf18, vf19 sll r0, r0, 0 vmadd.xyzw vf17, vf17, vf1 sll r0, r0, 0 vmulax.xyzw acc, vf24, vf2 sll r0, r0, 0 vmadday.xyzw acc, vf25, vf2 sll r0, r0, 0 vmaddaz.xyzw acc, vf26, vf2 sll r0, r0, 0 vminiw.xyzw vf17, vf17, vf0 sll r0, r0, 0 vmsubaw.xyzw acc, vf27, vf0 sll r0, r0, 0 vmsubw.xyzw vf24, vf1, vf2 sll r0, r0, 0 sll r0, r0, 0 qmfc2.i t4, vf18 vmaxw.xyzw vf27, vf17, vf19 sll r0, r0, 0 ppacw t4, r0, t4 mfc1 r0, f31 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 qmfc2.i t6, vf24 vmuly.xyzw vf27, vf27, vf9 sll r0, r0, 0 pcgtw t6, r0, t6 mfc1 r0, f31 ppach t6, r0, t6 mfc1 r0, f31 vmulax.yw acc, vf0, vf0 sll r0, r0, 0 vmulay.xz acc, vf27, vf10 sll r0, r0, 0 vmadd.xyzw vf10, vf1, vf10 sll r0, r0, 0 or t5, t6, t5 qmfc2.i t6, vf27 vmulax.yw acc, vf0, vf0 lw t7, 6552(t0) vmulay.xz acc, vf27, vf11 sll r0, r0, 0 vmadd.xyzw vf11, vf1, vf11 sll r0, r0, 0 bne t7, s7, L81 ppacw t6, r0, t6 B48: vmulax.yw acc, vf0, vf0 sd t4, 8(a2) vmulay.xz acc, vf27, vf12 sd t6, 0(a2) bne t5, r0, L86 vmadd.xyzw vf12, vf1, vf12 B49: beq r0, r0, L82 sll r0, r0, 0 B50: L81: vmulax.yw acc, vf0, vf0 sll r0, r0, 0 vmulay.xz acc, vf27, vf12 sll r0, r0, 0 bne t5, r0, L86 vmadd.xyzw vf12, vf1, vf12 B51: L82: vmulax.xyzw acc, vf20, vf10 lq a2, 6144(t0) vmadday.xyzw acc, vf21, vf10 sll r0, r0, 0 vmaddz.xyzw vf10, vf22, vf10 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf11 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf11 sll r0, r0, 0 vmaddz.xyzw vf11, vf22, vf11 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf12 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf12 sll r0, r0, 0 vmaddz.xyzw vf12, vf22, vf12 sll r0, r0, 0 vmulax.xyzw acc, vf20, vf13 sll r0, r0, 0 vmadday.xyzw acc, vf21, vf13 sll r0, r0, 0 vmaddaz.xyzw acc, vf22, vf13 sll r0, r0, 0 vmaddw.xyzw vf13, vf23, vf0 sq a2, 80(t3) sll r0, r0, 0 sqc2 vf10, 16(t3) sll r0, r0, 0 sqc2 vf11, 32(t3) sll a2, a3, 4 lhu t4, 6378(t0) addu t5, a2, v1 lhu t7, 6370(t0) sll t6, t4, 4 lw a2, 6356(t0) daddu t8, t6, t0 lw t6, 6340(t0) daddiu t7, t7, 1 lq t8, 4400(t8) daddiu a3, a3, 6 sh t7, 6370(t0) daddiu t7, t4, 1 sq t8, 0(t3) daddiu t8, t7, -20 sqc2 vf12, 48(t3) movz t7, r0, t8 sqc2 vf13, 64(t3) daddiu t8, t4, -10 sh t7, 6378(t0) daddiu t3, t3, 96 sw a2, -92(t3) beq t4, r0, L83 sw t5, 6356(t0) B52: bne t8, r0, L84 sll r0, r0, 0 B53: L83: sll r0, r0, 0 lq t4, 5040(t0) sll r0, r0, 0 lq t7, 5056(t0) sll r0, r0, 0 sw t5, 6340(t0) sll r0, r0, 0 movz t4, t7, t6 daddiu a3, a3, 1 sq t4, 0(t3) sll r0, r0, 0 sw a2, 4(t3) beq r0, r0, L92 daddiu t3, t3, 16 B54: L84: daddiu t5, t4, -9 sll r0, r0, 0 beq t5, r0, L85 daddiu t4, t4, -19 B55: bne t4, r0, L92 sll r0, r0, 0 B56: L85: sll r0, r0, 0 sll t4, t7, 4 sll r0, r0, 0 daddu t4, t4, t0 daddiu a3, a3, 1 lq t4, 4720(t4) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sq t4, 0(t3) sll r0, r0, 0 sw a2, 4(t3) beq r0, r0, L92 daddiu t3, t3, 16 B57: L86: vmulax.xyzw acc, vf28, vf10 lqc2 vf24, 6160(t0) vmadday.xyzw acc, vf29, vf10 sll r0, r0, 0 vmaddz.xyzw vf10, vf30, vf10 sll r0, r0, 0 vmulax.xyzw acc, vf28, vf11 sll r0, r0, 0 vmadday.xyzw acc, vf29, vf11 lhu t4, 6536(t0) vmaddz.xyzw vf11, vf30, vf11 lw a2, 6404(t0) vmulax.xyzw acc, vf28, vf12 daddiu t8, t4, 1 vmadday.xyzw acc, vf29, vf12 sh t8, 6536(t0) vmaddz.xyzw vf12, vf30, vf12 lw t4, 12(a2) ;; load the generic geometry? vmulax.xyzw acc, vf28, vf13 lw t5, 6532(t0) vmadday.xyzw acc, vf29, vf13 lh t6, 2(t4) ;; generic frag count. vmaddaz.xyzw acc, vf30, vf13 lw a2, 6528(t0) vmaddw.xyzw vf13, vf31, vf0 lw t7, 6516(t0) vitof0.xyz vf24, vf24 sh t8, 6368(t0) B58: ;; generic loop L87: daddiu t8, a3, -115 sll r0, r0, 0 blez t8, L90 lw t8, 28(t4) ;; load the frag B59: ;; dma L88: lw t3, 0(a0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 andi t3, t3, 256 sll r0, r0, 0 beq t3, r0, L89 sll r0, r0, 0 B60: sll r0, r0, 0 lw t3, 6564(t0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 daddiu t3, t3, 1 sll r0, r0, 0 sw t3, 6564(t0) beq r0, r0, L88 sll r0, r0, 0 B61: L89: sw t1, 128(a0) xori t1, t1, 6144 sw v1, 16(a0) sll t3, a3, 4 addu v1, v1, t3 or t3, t1, r0 sw a3, 32(a0) addiu a3, r0, 256 sw a3, 0(a0) addiu a3, r0, 0 B62: L90: daddu t9, t7, t0 addiu t7, t7, -144 daddiu t4, t4, 4 daddiu t9, t9, 5152 bgez t7, L91 lq ra, 0(t9) B63: sll r0, r0, 0 addiu t7, r0, 720 B64: L91: sll r0, r0, 0 sw t5, 84(t9) sll t5, a3, 4 sq ra, 0(t3) addu t5, t5, v1 sqc2 vf10, 16(t3) movz a2, t5, a2 sqc2 vf11, 32(t3) daddiu a3, a3, 12 sqc2 vf12, 48(t3) sll r0, r0, 0 lw ra, 4(t8) ;; ra = vtx-cnt sll r0, r0, 0 sqc2 vf13, 64(t3) sll r0, r0, 0 sqc2 vf24, 80(t3) sll r0, r0, 0 sw ra, 96(t3) sll r0, r0, 0 lw ra, 12(t8) ;; ra = cnt sll r0, r0, 0 lbu gp, 8(t8) ;; gp = cnt-qwc sll r0, r0, 0 sw ra, 20(t9) sll r0, r0, 0 sb gp, 16(t9) sll r0, r0, 0 sb gp, 30(t9) sll r0, r0, 0 lw ra, 24(t8) ;; ra = stq sll r0, r0, 0 lbu gp, 11(t8) ;; gp = stq-qwc sll r0, r0, 0 sw ra, 36(t9) sll r0, r0, 0 sb gp, 32(t9) sll r0, r0, 0 lw ra, 20(t8) ;; ra = col sll r0, r0, 0 lbu gp, 10(t8) ;; gp = col-qwc sll r0, r0, 0 sw ra, 52(t9) sll r0, r0, 0 sb gp, 48(t9) sll r0, r0, 0 lw ra, 16(t8) ;; ra = vtx sll r0, r0, 0 lbu gp, 9(t8) ;; gp = vtx-qwc sll r0, r0, 0 sw ra, 68(t9) sll r0, r0, 0 sb gp, 64(t9) sll r0, r0, 0 lw t8, 4(t8) sll r0, r0, 0 lq ra, 16(t9) sll r0, r0, 0 sb t8, 46(t9) sll r0, r0, 0 sb t8, 62(t9) sll r0, r0, 0 sb t8, 78(t9) sll r0, r0, 0 sq ra, 112(t3) sll r0, r0, 0 lq t8, 32(t9) sll r0, r0, 0 lq ra, 48(t9) sll r0, r0, 0 sq t8, 128(t3) sll r0, r0, 0 sq ra, 144(t3) sll r0, r0, 0 lq t8, 64(t9) sll r0, r0, 0 lq t9, 80(t9) sll r0, r0, 0 sq t8, 160(t3) daddiu t3, t3, 192 sq t9, -16(t3) daddiu t6, t6, -1 sll r0, r0, 0 bgtz t6, L87 sll r0, r0, 0 B65: sll r0, r0, 0 sw t7, 6516(t0) lui t4, 4096 sw t5, 6532(t0) ori t4, t4, 54272 sw a2, 6528(t0) sll r0, r0, 0 sll r0, r0, 0 B66: L92: vcallms 25 lw a2, 6408(t0) sll r0, r0, 0 lw t4, 6420(t0) daddiu a2, a2, 80 sll r0, r0, 0 daddiu t4, t4, -1 sw a2, 6408(t0) bgtz t4, L69 sw t4, 6420(t0) B67: L93: sll r0, r0, 0 lw t4, 8(a2) daddiu a2, a2, 16 lw t5, 6540(t0) sll r0, r0, 0 sw a2, 6408(t0) bne t4, r0, L69 sw t4, 6420(t0) B68: bne t5, r0, L58 sll r0, r0, 0 B69: sll r0, r0, 0 lw a1, 6404(t0) sll r0, r0, 0 lq a2, 6336(t0) sll r0, r0, 0 lq t2, 6352(t0) sll r0, r0, 0 lq t3, 6368(t0) sll r0, r0, 0 sq a2, 92(a1) sll r0, r0, 0 sq t2, 60(a1) sll r0, r0, 0 sq t3, 76(a1) beq a3, r0, L96 sll r0, r0, 0 B70: sll r0, r0, 0 lw a0, 6416(t0) sll r0, r0, 0 sll r0, r0, 0 B71: L94: lw a1, 0(a0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 andi a1, a1, 256 sll r0, r0, 0 beq a1, r0, L95 sll r0, r0, 0 B72: sll r0, r0, 0 lw a1, 6564(t0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 daddiu a1, a1, 1 sll r0, r0, 0 sw a1, 6564(t0) beq r0, r0, L94 sll r0, r0, 0 B73: L95: sw v1, 16(a0) sll a1, a3, 4 sw t1, 128(a0) xori a2, t1, 6144 addu v1, v1, a1 or a1, a2, r0 sw a3, 32(a0) addiu a1, r0, 256 sw a1, 0(a0) addiu a1, r0, 0 B74: L96: lw a1, 0(a0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 andi a1, a1, 256 sll r0, r0, 0 beq a1, r0, L97 sll r0, r0, 0 B75: sll r0, r0, 0 lw a1, 6564(t0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 daddiu a1, a1, 1 sll r0, r0, 0 sw a1, 6564(t0) beq r0, r0, L96 sll r0, r0, 0 B76: L97: lw a0, 6524(t0) sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 sw v1, 4(a0) sll r0, r0, 0 B77: L98: or v0, r0, r0 ld ra, 0(sp) lq gp, 16(sp) jr ra daddiu sp, sp, 32 sll r0, r0, 0 sll r0, r0, 0 sll r0, r0, 0 ```