mirror of
https://github.com/open-goal/jak-project
synced 2026-05-27 08:09:29 -04:00
0fcc7eb8e9
This adds environment mapping support to `Merc2`, and turns it on for Jak 1 and Jak 2. - The performance is much better - Jak 1 can be toggled back to the old behavior with `(set! *emerc-hack* #f)`. The new environment mapping is identical to the old one everywhere I checked. - Jak 1 still falls back to generic for ripple/texscroll/blerc/eyes - there's still no dynamic texture or vertex updating support. The eye detection stuff will sometimes flag stuff as eyes which is not eyes, which is fine, but means that generic will be used in some places where emerc could be used. For example, the shiny plates on jak's arm will be drawn with generic because jak has eyes. - Jak 2 hasn't been checked super carefully against PCSX2 yet. - Jak 2 still isn't technically using emerc, but instead putting emerc models in the merc bucket. - The interface to merc is a lot different now and totally custom OpenGOAL DMA code. The original merc drawing asm doesn't run anymore. - The FR3 format changed - Something funky going on with foreground lighting in escape, but doesn't seem to be related to this change? Performance comparison, jak 1, in likely the most generic-merc heavy spot:  
2191 lines
104 KiB
Markdown
2191 lines
104 KiB
Markdown
# Emerc
|
|
|
|
## Outline
|
|
It's one of two renderers used for foreground + environment mapping. There's also a generc + merc (mercneric) renderer.
|
|
|
|
As far as I know, the supported effects are:
|
|
- skinning, with up to 3 bones influencing each vertex, and per-vertex specification of bone weights
|
|
- up to 3 directional lights, plus an ambient light
|
|
- vertex colors
|
|
- texturing
|
|
- texture-based environment mapping (done per vertex, not fragment)
|
|
|
|
Our hope is to port the emerc renderer to PC, then use it for all rendering for envmapped foreground objects. I believe that `emerc` will be easier to understand than `mercneric`. The hope is that either `emerc` can be used for all models, or once we understand `emerc`, it will be straightforward to convert `mercneric`-only models to work with PC `emerc`.
|
|
|
|
The mercneric renderer handles partially offscreen stuff, and is believed to be slower than emerc. However, mercneric may use less VU1 time, in exchange for more EE time.
|
|
|
|
As far as I can tell, the way the game decides to use emerc only if all three of these conditions are true:
|
|
- `emerc` effect bit is set in the model, indicating it can use `emerc`.
|
|
- we're an actor spawned by scene-player
|
|
- we're not in a frame range specified by `scissor-frame` in the scene info
|
|
|
|
The `emerc` bit is only there on high-resolution cutscene models.
|
|
Most of the time, there are no frames specified in `scissor-frame`. This makes sense, usually the actors are onscreen during cutscenes, and `emerc` seems quite tolerant of partially offscreen characters. (similar story in jak 1 - they were aggressive at letting merc draw offscreen instead of clipping triangles, likely because the clipping pipeline is so much slower).
|
|
|
|
In very rare cases, they manually specified a frame range for a character who is mostly offscreen (like daxter's feet are visible in frame 2324 of `city-krew-collection-intro`), and then the character is rendered with `mercneric`.
|
|
|
|
My guess is that they just used emerc by default everywhere. If a cutscene character is partially offscreen/behind the camera in a bad way that causes GS coordinates to overflow, this would draw garbage triangles, and they would manually annotate the frame range where this happened.
|
|
|
|
## Review of how all this gets called
|
|
|
|
### Setup
|
|
- A level containing `entity-actor`s is loaded
|
|
- The `level-update` method (called once per frame) in `entity.gc` calls `birth!` on `entity-actor`s that are visible and eligible to be spawned
|
|
- The newly created actor process is initialized by calling `init-from-entity!`, which is a method that all objects must implement.
|
|
- This method will eventually call `initialize-skeleton`, a method of the parent `process-drawable` class.
|
|
- This method creates a `draw-control` with `skeleton-group->draw-control`
|
|
- This method calls `setup-cspace-and-add`
|
|
- This method adds the process drawable to `*foreground-draw-engine*`, a list of processes to be drawn.
|
|
- The connection uses function `add-process-drawable`, which just calls the `dma-add-func` of the `draw-control`, which is `dma-add-process-drawable` by default
|
|
|
|
### Per-Frame Draw
|
|
- Game-objects are responsible for calling `ja-post`, or adding themselves to the matrix-engine list, or somehow coming up with `joint` transforms.
|
|
|
|
- main loop in `main.gc` calls `(*draw-hook*)`, which points to `real-main-draw-hook`. This function generates all DMA data for drawing.
|
|
- `foreground-engine-execute`
|
|
- `foreground-init` (doesn't do anything emerc-related)
|
|
- calls `execute-connections` on the engine, the `dma-add-process-drawable` for each object
|
|
- various stuff for shadows/picking lights
|
|
- generates `vu-lights` (light values in VU-friendly format)
|
|
- picks LOD based on distances
|
|
- sets texture masks to indidate to texture system which LODs of which textures will be used
|
|
- determines if `close-to-screen` culling is needed.
|
|
- call `foreground-draw`
|
|
- add an entry to the `*bone-calculation-list*` to tell it to compute skinning matrices.
|
|
- rotate lights to camera frame (note that merc only gets a perspective transform, transforming to camera frame is done in skinning calc to avoid a full affine transform on VU1)
|
|
- there's some confusing logic for the renderer selection, but in the end it populates `merc-effect-bucket-info` including a color and a few flags.
|
|
- calls `foreground-emerc`, which generates DMA data for `emerc` (asm func)
|
|
- `foreground-execute-cpu-vu0-engines`
|
|
- runs bones, modifying the above DMA data to contain skinning matrices computed from joints.
|
|
- `display-frame-finish` called after all drawing
|
|
- Calls `emerc-vu1-init-buffers`, which adds some init data to all used `emerc` buckets.
|
|
|
|
### Emerc DMA Generation
|
|
The call in GOAL:
|
|
```
|
|
(set! dma-ptr (foreground-emerc dc (-> (scratchpad-object foreground-work) regs mtxs) dma-ptr 29 19))
|
|
```
|
|
The arguments are:
|
|
- `draw-control`, which contains settings for drawing, and the actual merc geometry (called `geo`)
|
|
- a pointer to the "matrix area", which will contain skinning matrices computed by `bones`
|
|
- `dma-ptr`, a pointer to the DMA buffer to write data to
|
|
- 29, 19, likely addresses in the VU1 microprogram to start execution. Typically there is one program for the first run of the renderer, which initializes some VU1 registers/data memory, and then a slightly shorter program that skips the init step.
|
|
|
|
Before the asm, the rough breakdown is:
|
|
- a `draw-control` stores 4 geos, one for each lod (some may be unpopulated)
|
|
- Each `geo` is a `merc-ctrl`, which is an entire model
|
|
- Each `merc-ctrl` is made up `merc-effect`s
|
|
- Each `merc-ctrl` is made up of "fragment"s. Each fragment has a `frag-geo` (actual data needed in VU1) and `frag-ctrl` (metadata describing how to upload data to VU1)
|
|
- Each fragment has a few types of data:
|
|
- `unsigned-four`: containing weights (u8), rgba (u8), addresses for crosscopy/samecopy. Unpacked [u8x4] to [u32x4] by VIF on upload to VU1.
|
|
- `lump-four`: containing vertex data. Unpacked [u8x4] to [u32x4 + some_magic_constant] by VIF on upload to VU1. This unpack magically converts integers to floats.
|
|
- `fp` data: containing a header, and "shaders" (giftags for setting up textures/settings). Copied directly by VIF.
|
|
|
|
The calling function `foreground-draw` sets flags (per effect) in the `merc-bucket-info` array. All emerc stuff gets `merc-path` set to 1.
|
|
|
|
High-level description of what it does.
|
|
Note that this is simplified from the assembly version, which combines some dma transfers shown here.
|
|
Also - this does not actually run any DMA or microprograms, it just generates a DMA chain that will do this later
|
|
On the next game frame, the giant DMA chain generated by all renderers will be submitted, and all these will run.
|
|
```cpp
|
|
// get the merc control for our level of detail (selected in drawable.gc)
|
|
MercControl& mc = draw_control.lod_set[draw_control.cur_lod].geo;
|
|
|
|
// loop over each "effect" in the merc control.
|
|
// The "effect" is the grouping for what can be sent to one renderer or another
|
|
for (int effect_idx = 0; effect_idx < mc.header.effect_count; effect_idx++) {
|
|
MercEffect& merc_effect = mc.effect[effect_idx]; // merc data in the art group
|
|
MercBuckedInfo& merc_effect_info = gForeground.merc_bucket_info[effect_idx]; // settings generated by foreground-draw
|
|
|
|
if (merc_effect_info.disable_draw) {
|
|
continue; // skip if disabled
|
|
}
|
|
|
|
if (merc_effect_info.merc_path != 1) {
|
|
continue; // skip if not emerc (1 means emerc here)
|
|
}
|
|
|
|
// where we started writing dma for this effect
|
|
u8* effect_dma_start = dma_ptr;
|
|
|
|
// the source data (stored in the art group) that we'll be sending.
|
|
u8* source_ptr = merc_effect.frag_geo;
|
|
|
|
// loop over fragments
|
|
for (int frag_idx = 0; frag_idx < merc_effect.frag_count; frag_idx++) {
|
|
MercFragmentControl& frag_ctrl = merc_effect.frag_ctrl[frag_idx];
|
|
// set the ROW register of the VIF.
|
|
// when kRowAdd flag is given, the VIF will add these 4 values to each component of each quadword it writes out.
|
|
// This is used as part of the process to go from u8's to floats
|
|
// (they do some cool magic where they don't actually do int->float, they just add integers with VIF and
|
|
// do float math on VU1 and it works out somehow)
|
|
dma_ptr = generate_vif_strow(dma_ptr, mc.header.st_vif_add, mc.header.st_vif_add, 0x47800000, 0x4b010000);
|
|
|
|
// number of quadwords (16-byte words) in EE memory of unsigned_four data to send
|
|
// unsigned_four data is stored as [u8, u8, u8, u8] and unpacked to [u32, u32, u32, u32].
|
|
// the count variable is in units of 4 values. (4 bytes in EE memory, 16 bytes in VU1 memory)
|
|
int u4_qwc_in_ee_mem = (frag_ctrl.unsigned_four_count + 3) / 4;
|
|
|
|
int dest_addr_qw = 140;
|
|
|
|
dma_ptr = generate_vif_unpack(dma_ptr
|
|
kUnpackV4_8, // unpack [u8, u8, u8, u8] to [u32, u32, u32, u32]
|
|
kUnsigned, // zero extend when unpacking
|
|
dest_addr_qw, // VU1 data address (in quadwords)
|
|
kUseTop, // add value of TOP register to destination (VU1 program controls destination)
|
|
source_ptr, // source pointer
|
|
u4_qwc_in_ee_mem, // number of QW to transfer from EE memory
|
|
frag_ctrl.unsigned_four_count, // number of QW written to VU1 memory
|
|
kNoRow, // do not add row
|
|
);
|
|
// note: to write 7 QW of data, the would have this in EE memory:
|
|
// [v0, v1, v2, v3] (4 bytes)
|
|
// [v4, v5, v6, XX] (4 bytes)
|
|
// they would transfer 2 QW to vif (including 1 padding byte)
|
|
// but you can tell VIF to unpack only 7 QW, and it will discard the padding.
|
|
|
|
// advance source pointer to the next data (lump data)
|
|
source_ptr += u4_qwc_in_ee_mem * 16;
|
|
|
|
// advance dest pointer.
|
|
dest_addr_qw += frag_ctrl.unsigned_four_count;
|
|
|
|
// lump 4 is unpacked from [u8, u8, u8, u8] to [u32 + rx, u32 + ry, u32 + rz, u32 + rw]
|
|
// where [rx, ry, rz, rw] are specified in ROW set above.
|
|
int l4_qwc_in_ee_mem = (frag_ctrl.lump_four_count + 3) / 4;
|
|
|
|
dma_ptr = generate_vif_unpack(dma_ptr
|
|
kUnpackV4_8, // unpack [u8, u8, u8, u8] to [u32, u32, u32, u32]
|
|
kUnsigned, // zero extend when unpacking
|
|
dest_addr_qw, // VU1 data address (in quadwords)
|
|
kUseTop, // add value of TOP register to destination (VU1 program controls destination)
|
|
source_ptr, // source pointer
|
|
l4_qwc_in_ee_mem, // number of QW to transfer from EE memory
|
|
frag_ctrl.lump_four_count, // number of QW written to VU1 memory
|
|
kAddRow // add the row value
|
|
);
|
|
|
|
// advance source pointer to the next data (lump data)
|
|
source_ptr += l4_qwc_in_ee_mem * 16;
|
|
|
|
// advance dest pointer.
|
|
dest_addr_qw += frag_ctrl.unsigned_four_count;
|
|
|
|
// send fp data.
|
|
dma_ptr = generate_vif_unpack(dma_ptr
|
|
kUnpackV4_32, // just plain memcpy to VU1 memory
|
|
kSigned, // no effect? they set it explicitly always, not sure why.
|
|
dest_addr_qw, // VU1 data address (in quadwords)
|
|
kUseTop, // add value of TOP register to destination (VU1 program controls destination)
|
|
source_ptr, // source pointer
|
|
fp_qwc, // number of QW to transfer from EE memory
|
|
frag_ctrl.fp_qwc, // number of QW written to VU1 memory
|
|
kNoRow // don't add the row value
|
|
);
|
|
|
|
// adavne source pointer
|
|
source_ptr += frag_ctrl.fp_qwc * 16;
|
|
|
|
// there's some special data shared between all fragments. We put this DMA after the DMA
|
|
// for the first fragment as an optimization. We can write the first fragment of this effect
|
|
// to VU1 data memory while VU1 is processing the last fragment of the previous effect.
|
|
// This is ok because the per-fragment data is double buffered (controlled with the TOP register)
|
|
// However, the shared data is not double buffered, and we must wait for the previous effect
|
|
// to be fully done before transferring. We want to delay this as long as possible, so we
|
|
// transfer the first per-fragment data of this effect before this part.
|
|
if (frag_idx == 0) {
|
|
// sneak some more data in lights
|
|
auto lights = gForeground.merc_bucket_info.lights;
|
|
lights.qws[1].w = ignore_alpha ? 0x3f85026b : 0x3f85026a;
|
|
// copy the 7 qw of lights to the dma buffer now, setting up a transfer for them to go
|
|
// to address 140 in VU1 (no TOP).
|
|
// the previous code sets up these lights in VU format (vu-lights).
|
|
|
|
dma_ptr = dma_memcpy_to_buffer_then_vu1(dma_ptr, 132, &lights, 7);
|
|
// copy these 4 values to address 139 (copying them to the dma-buffer now)
|
|
dma_ptr = dma_copy_to_buffer_then_vu1(dma_ptr, 139, merc_ctrl.header.xyz_scale, merc_ctrl.header.st_magic, merc_ctrl.header.st_out_a, merc_ctrl.header.st_out_b);
|
|
|
|
// emerc new transfer - copying 1 qw color_fade (u8's unpacked to u32)
|
|
dma_ptr = dma_copy_to_buffer_then_vu1(dma_ptr, 118, unpack_u8_to_u32(merc_effect_info.color_fade));
|
|
|
|
AdgifShader* envmap_shader = DefaultEnvmapShader;
|
|
if (merc_effect.extra_info && merc_effect.extra_info.shader_offset) { // nonzero check
|
|
envmap_shader = ((u8*)&merc_effect.extra_info) + 16 * merc_effect.extra_info.shader_offset;
|
|
}
|
|
|
|
// 5 qw envmap shader
|
|
dma_ptr = dma_copy_to_buffer_then_vu1(dma_ptr, 119, envmap_shader, 5 * 16);
|
|
}
|
|
|
|
|
|
// fragments will (most of the time) need new matrix data.
|
|
// there are some cases where they can reuse some matrix data from previous fragments in the same
|
|
// effect, so it's possible for there to be no matrices to transfer. But usually there are some
|
|
for (int mat_xfer = 0; mat_xfer < frag_ctrl.max_xfer_count; mat_xfer++) {
|
|
auto& info = frag_ctrl.mat_dest_data[mat_xfer];
|
|
dma_ptr = dma_transfer_matrix(dma_ptr, info.matrix_dest, matrix_mem + sizeof(MercMatrix) * info.mattrix_number);
|
|
}
|
|
|
|
// finally, call program.
|
|
dma_ptr = dma_mscal(frag_idx == 0 ? program_addr_1 : program_addr_2);
|
|
|
|
|
|
}
|
|
|
|
// a bunch of bucket patching crap
|
|
}
|
|
```
|
|
|
|
The actual asm:
|
|
```
|
|
L101: ;; function prologue
|
|
daddiu sp, sp, -128
|
|
sd ra, 0(sp)
|
|
sq s0, 16(sp)
|
|
sq s1, 32(sp)
|
|
sq s2, 48(sp)
|
|
sq s3, 64(sp)
|
|
sq s4, 80(sp)
|
|
sq s5, 96(sp)
|
|
sq gp, 112(sp)
|
|
|
|
;; one-time setup for this "merc-control". A merc-control is a model (at a particular lod)
|
|
;; for a process-drawable.
|
|
;; using dc as the input draw-control (a constant)
|
|
;; using mc as (-> dc lod-set (-> dc cur-lod) geo), the merc-control we're drawing (a constant)
|
|
;; using t8 = mep as (-> mc effect <n>), one of the merc-effects in the merc-control (variable)
|
|
;; using t7 = mec (merc-effect counter), the number of remaining merc-counters
|
|
;; using t9 = mebp as (-> *foreground* merc-bucket-info effect <n>), one of the merc-bucket-info's filled out by
|
|
;; the calling function, containing per-effect settings.
|
|
B0:
|
|
or t7, a3, r0 ;; t7 = program-addr-1
|
|
or v1, t0, r0 ;; v1 = program-addr-2
|
|
lui t0, 4096 ;; t0 = 0x10000000
|
|
lui t1, 18304 ;; t1 = 0x47800000
|
|
daddiu t0, t0, 1 ;; t0 = 0x10000001
|
|
dsll32 t1, t1, 0 ;; t1 = 0x47800000'00000000
|
|
lui a3, 12288 ;; a3 = 0x30000000
|
|
lui t8, 19201 ;; t8 = 0x4b010000
|
|
pcpyld t0, a3, t0 ;; t0 = 0x00000000'30000000'00000000'10000001 (STROW)
|
|
lbu a3, 78(a0) ;; a3 = (-> dc cur-lod)
|
|
pcpyld t1, t8, t1 ;; t1 = 0x00000000'4b010000'47800000'00000000
|
|
lui t2, 28160 ;; t2 = 0x6e000000
|
|
addiu t8, r0, 8 ;; t8 = 8
|
|
multu3 a3, a3, t8 ;; a3 = (* 8 (-> dc cur-lod))
|
|
lui t3, 1280 ;; t3 = 0x05000000
|
|
lui t4, 27648 ;; t4 = 0x6c000000
|
|
dsll32 t2, t2, 0 ;; t2 = 0x6e000000'00000000
|
|
dsll32 t4, t4, 0 ;; t4 = 0x6c000000'00000000
|
|
daddu t4, t4, t3 ;; t4 = 0x6c000000'05000000
|
|
daddu t3, t2, t3 ;; t3 = 0x6e000000'05000000
|
|
daddiu t3, t3, 1 ;; t3 = 0x6e000000'05000001
|
|
daddu a0, a3, a0 ;; a0 = (+ dc (* 8 (-> dc cur-lod)))
|
|
pcpyld t2, t2, r0 ;; t2 = 0x6e000000'00000000'00000000'00000000 (unpack-v4-8, no change to row)
|
|
lw a0, 28(a0) ;; a0 = (-> dc lod-set (-> dc cur-lod) geo) ;; a merc-ctrl
|
|
pcpyld t3, t3, r0 ;; t3 = 0x6e000000'05000001'00000000'00000000 (unpack-v4-8, row add)
|
|
pcpyld t4, t4, r0 ;; t4 = 0x6c000000'05000000'00000000'00000000 (unpack-v4-32, disable row add)
|
|
lui t5, 12288 ;; t5 = 0x30000000
|
|
lui t6, 4096 ;; t6 = 0x10000000
|
|
daddiu t5, t5, 7 ;; t5 = 0x30000007
|
|
lui t8, 5120 ;; t8 = 0x14000000
|
|
lui a3, 27655 ;; a3 = 0x6c070000
|
|
daddu t7, t8, t7 ;; t7 = 0x14000000 + program-addr-1
|
|
dsll32 a3, a3, 0 ;; a3 = 0x6c070000'00000000
|
|
dsll32 t8, t7, 0 ;; t8 = (0x14000000 + program-addr-1) << 32
|
|
pcpyld t5, a3, t5 ;; t5 = 0x6c070000'00000000'00000000'30000007
|
|
lwu t7, 52(a0) ;; t7 = (-> mc effect-count)
|
|
pcpyld t6, t8, t6 ;; t6 = ((0x14000000 + program-addr-1) << 32) << 64 + 0x00000000'10000000
|
|
daddiu t8, a0, 156 ;; t8 = (-> mc effect 0) = mep "merc effect pointer"
|
|
beq t7, r0, L109 ;; branch if there's no effects (I think this is buggy and jumps to the wrong spot)
|
|
lw a3, *foreground*(s7) ;; a3 = *foreground*
|
|
|
|
B1:
|
|
daddiu t9, a3, 2508 ;; t9 = (-> *foreground* merc-bucket-info effect 0)
|
|
B2:
|
|
|
|
;; TOP of per-effect loop
|
|
;; (I've marked lines with stats if they are just for computing statistics)
|
|
L102:
|
|
lbu a3, 6(t9) ;; a3 = (-> mebp disable-draw)
|
|
or ra, a2, r0 ;; ra = start-of-dma-for-this-effect
|
|
lbu gp, 4(t9) ;; gp = (-> mebp merc-path)
|
|
bne a3, r0, L109 ;; jump to next effect if this is disabled.
|
|
lw a3, *merc-global-stats*(s7) ;; a3 = mgs
|
|
|
|
B3:
|
|
daddiu a3, a3, 16 ;; a3 = (-> *merc-global-stats* emerc)
|
|
daddiu gp, gp, -1 ;; check if `merc-path` is 1, skip this fragment if it's something else
|
|
sll r0, r0, 0
|
|
bne gp, r0, L109
|
|
lhu s4, 2(a3) ;; stats.fragments
|
|
|
|
B4:
|
|
lhu s3, 18(t8) ;; s3 = (-> mep frag-count)
|
|
lwu gp, 4(a3) ;; stats
|
|
lhu s5, 22(t8) ;; s5 = (-> mep tri-count)
|
|
daddu s4, s4, s3 ;; stats
|
|
lwu s3, 8(a3) ;; stats
|
|
lhu s2, 24(t8) ;; s2 = (-> mep dvert-count)
|
|
daddu gp, gp, s5 ;; stats
|
|
sh s4, 2(a3) ;; stats
|
|
sw gp, 4(a3) ;; stats
|
|
daddu s5, s3, s2 ;; stats
|
|
lwu t2, 0(t8) ;; t2 = (-> mep frag-geo)
|
|
lwu gp, 4(t8) ;; gp = (-> mep frag-ctrl)
|
|
lui s4, 12288 ;; 0x30000000
|
|
dsll32 t2, t2, 0 ;; (-> mep frag-geo) << 32
|
|
sw s5, 8(a3) ;; stats
|
|
or t2, t2, s4 ;; t2 = ((-> mep frag-geo) << 32) + 0x30000000 (upper 64-bits still have dma tmpl)
|
|
lhu s5, 18(t8) ;; s5 = (-> mep frag-count)
|
|
addiu s4, r0, 0 ;; s4 = 0
|
|
beq s5, r0, L109 ;; skip to next effect if no frags in this effect.
|
|
sll r0, r0, 0
|
|
|
|
B5:
|
|
sll r0, r0, 0
|
|
|
|
;; top of per-fragment loop.
|
|
;; s4 = current-frag-idx
|
|
;; s5 = num-frags
|
|
;; a2 = dma-ptr
|
|
;; DMA memory layout
|
|
;; lower-bits higher bits
|
|
;; 0 [dmatag-lower, dmatag-upper, strow-viftag, ROW_X ] ;; transfer 1 qw, immediately after this
|
|
;; 1 [ROW_Y , ROW_Z , ROW_W , nop-viftag ] ;; the qw transferred by 0
|
|
;; 2 [dmatag-lower, dmatag-upper, nop , unpack-v4-8] ;; (unsigned4's)
|
|
;; 3 [dmatag-lower, dmatag-upper, strow 1 , unpack-v4-8] ;; lumps
|
|
;; 4 [dmatag-lower, dmatag-upper, strow 0 , unpack-v4-32]
|
|
B6:
|
|
L103:
|
|
lbu s0, 0(gp) ;; s0 = frag-ctrl.unsigned-four-count (number of 4xu8's in memory)
|
|
sll r0, r0, 0
|
|
lbu s2, 1(gp) ;; s2 = frag-ctrl.lump-four-count
|
|
xori s1, r0, 49292 ;; s1 = 0xc08c
|
|
lbu s3, 2(gp) ;; s3 = frag-ctrl.fp-qwc
|
|
daddiu v0, s0, 3 ;; v0 = u4count + 3
|
|
lw a3, 44(a0) ;; a3 = header.st-vif-add
|
|
srl v0, v0, 2 ;; v0 = (u4count + 3) / 4
|
|
sq t0, 0(a2) ;; set DMA qw 0 (dmatag-strow only)
|
|
xor t2, t2, v0 ;; set dma qwc
|
|
sq t2, 32(a2) ;; store dma line 2.
|
|
xor t2, t2, v0 ;; unset dma qwc
|
|
sh s1, 44(a2) ;; set addr for unpack (tops + unsigned bits)
|
|
daddu s1, s1, s0 ;; unpdate qwc for next unpack
|
|
sb s0, 46(a2) ;; set qwc for unpack
|
|
dsll32 s0, v0, 4 ;; v0 = (u4-ee-qwc << 36)
|
|
daddu t3, t2, s0 ;; t3 = dma-tag templ
|
|
daddiu s0, s2, 3 ;; s0 = l4c + 3
|
|
sw a3, 12(a2) ;; ROW_X = header.st-vif-add
|
|
srl s0, s0, 2 ;; s0 /= 4
|
|
sq t1, 16(a2) ;; ROW_Z, W
|
|
xor t3, t3, s0 ;; set dma qwc
|
|
sq t3, 48(a2) ;; store dma templ 3
|
|
xor t3, t3, s0 ;; unset dma qwc
|
|
sh s1, 60(a2) ;; set vif unpack
|
|
daddu s1, s1, s2 ;; next dest
|
|
sb s2, 62(a2) ;; store.
|
|
dsll32 s2, s0, 4 ;; s2 = dma-src-inc shifted
|
|
sw a3, 16(a2) ;; ROW Y
|
|
daddu t4, t3, s2 ;; unpack-v4-32 tmpl
|
|
xor t4, t4, s3 ;; set qwc in dma tmpl
|
|
xori a3, s1, 16384 ;; turn off sign extension in unpack
|
|
sq t4, 64(a2) ;; store dma 4
|
|
xor t4, t4, s3 ;; unset qwc
|
|
sb s3, 78(a2) ;; set qwc in unpack
|
|
dsll32 s3, s3, 4 ;; qwc -> bytes
|
|
sh a3, 76(a2) ;; set unpack
|
|
daddu t2, t4, s3 ;; ?? (maybe reset t2 tmpl)
|
|
lbu s3, 3(gp) ;; s3 = mat-xfer-count
|
|
daddiu gp, gp, 4 ;; next fragment control
|
|
bne s4, r0, L105 ;; do B7, B8, B9, B10 only on first fragment
|
|
daddiu a2, a2, 80 ;; advance DMA ptr.
|
|
|
|
B7:
|
|
sd t6, 0(a2) ;; weirdo dma generation code (somebody had too much fun here)
|
|
addiu s2, r0, 8 ;; transfer 8 qw
|
|
sd t6, 8(a2) ;; more weird crap
|
|
lui a3, 27656 ;; 0x6c08
|
|
sb s2, 0(a2) ;; transfer 8 qw
|
|
daddiu a3, a3, 132 ;; to 140
|
|
lw s2, *foreground*(s7) ;; fg
|
|
daddiu s2, s2, 2384 ;; s2 = merc-bucket-info array
|
|
sw a3, 12(a2) ;; unpack to 140
|
|
lq a3, 0(s2) ;; a3 = lights 0
|
|
lq s1, 16(s2) ;; s1 = lights 1
|
|
lq s0, 32(s2) ;; s0 = lights 2
|
|
lq v0, 48(s2) ;; v0 = lights 3
|
|
sq a3, 16(a2) ;; store lights
|
|
sq s1, 32(a2)
|
|
sq s0, 48(a2)
|
|
sq v0, 64(a2)
|
|
lq a3, 64(s2) ;; lights again
|
|
lq s1, 80(s2)
|
|
lq s0, 96(s2) ;; lights 6
|
|
lui v0, 16261
|
|
lq s2, 28(a0)
|
|
daddiu v0, v0, 619 ;; v0 = 0x3f85026b
|
|
sq a3, 80(a2) ;; light store
|
|
lbu a3, 5(t9) ;; a3 = ignore-alpha
|
|
sq s1, 96(a2) ;; lights
|
|
sq s0, 112(a2) ;; last lights
|
|
dsubu a3, v0, a3 ;; compute ignore alpha
|
|
sq s2, 128(a2) ;; header
|
|
sw a3, 28(a2) ;; light[1].w
|
|
daddiu a2, a2, 144 ;; inc dma
|
|
sd t6, 0(a2)
|
|
addiu s2, r0, 6
|
|
sd t6, 8(a2)
|
|
lui a3, 27654 ;; 0x6C06
|
|
sb s2, 0(a2)
|
|
daddiu a3, a3, 118
|
|
sw a3, 12(a2) ;; to 124
|
|
lw a3, 0(t9) ;; a3 = color fade
|
|
pextlb a3, r0, a3 ;; unpack u8 to u32's
|
|
pextlh a3, r0, a3
|
|
sq a3, 16(a2) ;; store color fade
|
|
lw a3, *default-envmap-shader*(s7) ;; envmap ptr.
|
|
lw s2, 28(t8) ;; merc-extra-info
|
|
beq s2, r0, L104
|
|
sll r0, r0, 0
|
|
|
|
B8:
|
|
lbu s1, 1(s2)
|
|
beq s1, r0, L104
|
|
sll r0, r0, 0
|
|
|
|
B9:
|
|
sll a3, s1, 4
|
|
addu a3, s2, a3
|
|
B10:
|
|
L104:
|
|
lq s2, 0(a3) ;; copy shader to dma buff
|
|
lq s1, 16(a3)
|
|
lq s0, 32(a3)
|
|
lq v0, 48(a3)
|
|
lq a3, 64(a3)
|
|
sq s2, 32(a2)
|
|
sq s1, 48(a2)
|
|
sq s0, 64(a2)
|
|
sq v0, 80(a2)
|
|
sq a3, 96(a2)
|
|
daddiu a2, a2, 112
|
|
|
|
;; after first time per-effect stuff
|
|
B11:
|
|
L105:
|
|
beq s3, r0, L107
|
|
addiu s2, r0, 128 ;; s2 = 128 (matrix size)
|
|
|
|
B12:
|
|
lbu a3, 0(gp) ;; get mat number
|
|
sll r0, r0, 0
|
|
B13:
|
|
L106:
|
|
multu3 s1, a3, s2 ;; s1 = matrix offset in ee world
|
|
sq t5, 0(a2) ;; mat transfer tmplate
|
|
lbu s0, 1(gp) ;; mat dest
|
|
daddiu gp, gp, 2 ;; gp = next mat transfer
|
|
lbu a3, 0(gp) ;; a3 = next matrix offset
|
|
daddiu s3, s3, -1 ;; dec remaining
|
|
sb s0, 12(a2) ;; store dest
|
|
daddiu a2, a2, 16 ;; inc dma
|
|
daddu s1, s1, a1 ;; compute matrix pointer
|
|
sll r0, r0, 0
|
|
bne s3, r0, L106
|
|
sw s1, -12(a2)
|
|
|
|
B14:
|
|
L107:
|
|
sq t6, 0(a2)
|
|
daddiu a2, a2, 16
|
|
bne s4, r0, L108
|
|
daddiu s4, s4, 1
|
|
|
|
B15:
|
|
or a3, v1, r0 ;; execute program (1 for first round, 2 for later ones)
|
|
sb a3, -4(a2)
|
|
B16:
|
|
L108:
|
|
bne s4, s5, L103 ;; loop frag
|
|
sll r0, r0, 0
|
|
|
|
B17: ;; patching crap, based on texture index now. should document eventually...
|
|
lui s5, 28672
|
|
lbu a3, 26(t8)
|
|
addiu gp, r0, 48
|
|
lw s5, 52(s5)
|
|
mult3 a3, a3, gp
|
|
sll r0, r0, 0
|
|
daddu a3, s5, a3
|
|
sll r0, r0, 0
|
|
lw gp, 12(a3)
|
|
sll r0, r0, 0
|
|
lw s5, 16(a3)
|
|
lui s4, 8192
|
|
sq r0, 0(a2)
|
|
movz gp, ra, gp
|
|
sw s4, 0(a2)
|
|
or s4, a2, r0
|
|
sw gp, 12(a3)
|
|
daddiu a2, a2, 16
|
|
beq s5, r0, L109
|
|
sw s4, 16(a3)
|
|
|
|
B18:
|
|
sll r0, r0, 0
|
|
sw ra, 4(s5)
|
|
B19:
|
|
L109:
|
|
daddiu t8, t8, 32
|
|
daddiu t9, t9, 8
|
|
daddiu t7, t7, -1
|
|
bne t7, r0, L102 ;; loop effect
|
|
sll r0, r0, 0
|
|
|
|
B20:
|
|
or v0, a2, r0
|
|
ld ra, 0(sp)
|
|
lq gp, 112(sp)
|
|
lq s5, 96(sp)
|
|
lq s4, 80(sp)
|
|
lq s3, 64(sp)
|
|
lq s2, 48(sp)
|
|
lq s1, 32(sp)
|
|
lq s0, 16(sp)
|
|
jr ra
|
|
daddiu sp, sp, 128
|
|
|
|
sll r0, r0, 0
|
|
sll r0, r0, 0
|
|
```
|
|
|
|
### Summary of above
|
|
Overall, it's very similar to merc. There's some extra data transfered:
|
|
- the "low memory" stuff setup in `emerc.gc` is 1 QW longer (an extra "`unperspect` QW")
|
|
- the `rgba color-fade` is transferred to non-double-buffered memory (like lights) (1 Qw, unpack to u32's)
|
|
- 5 QW shader for envmapping (either `*default-envmap-shader*` or one provided in merc extra info)
|
|
|
|
emerc data appears backward compatible with merc, which makes sense:
|
|
- emerc falls back to merc if it's too far away to envmap
|
|
- we put emerc stuff through merc code (blending and stuff is wrong, but the geometry comes out right)
|
|
|
|
The promising thing is that we don't seem to need much extra information to do environment mapping.
|
|
I kinda though we'd need another set of texture coordinates, but I don't see where that enters yet.
|
|
|
|
If all we need is the shader, plus tint values, it would be easy to do this for any model that succeeds with merc.
|
|
|
|
## EMERC VU1 constants
|
|
Triangle strip giftag - same as normal merc exactly (in the normal no-alpha case)
|
|
```lisp
|
|
(set! (-> s5-0 tri-strip-gif tag)
|
|
(new 'static 'gif-tag64
|
|
:pre #x1
|
|
:prim (new 'static 'gs-prim :prim (gs-prim-type tri-strip) :iip #x1 :tme #x1 :fge #x1)
|
|
:nreg #x3
|
|
)
|
|
)
|
|
(set! (-> s5-0 tri-strip-gif regs)
|
|
(new 'static 'gif-tag-regs :regs0 (gif-reg-id st) :regs1 (gif-reg-id rgbaq) :regs2 (gif-reg-id xyzf2))
|
|
)
|
|
;; word 3 gets set to #x303e4000
|
|
```
|
|
|
|
### Program list
|
|
- `0`: per-frame init
|
|
- `19`: effect init
|
|
- `29`: process frag
|
|
|
|
### Memory map
|
|
All in 16-byte quadword addresses.
|
|
```
|
|
Low memory: after DMA
|
|
[0 ] : tri-strip-gif (st, rgbaq, xyzf2), no abe, 0x303e4000 in word 3, same as merc.
|
|
[1 ] : adgif-shader giftag (giftag for 5 a+d's)
|
|
[2 ] : hvdf-offset
|
|
[3 - 7] : perspective matrix (only perspective project, no rotation/translation). 3 gets set to persp_vector
|
|
[7 ] : fog (pfog0, fog-min, fog-max, 0.0)
|
|
[8 ] : unperspect (1/P(0, 0), 1/P(1, 1), 0.5, 1/P(2, 3))
|
|
|
|
Low memory: after inits (both frame and effect)
|
|
[0 ] : tri-strip-gif (st, rgbaq, xyzf2), no abe, 0x303e4000 in word 3, same as merc.
|
|
[1 ] : adgif-shader giftag (giftag for 5 a+d's)
|
|
[2 ] : hvdf-offset
|
|
[3 ] : P_mult = [low.P(0, 0), low.P(1, 1), low.P(2, 2), low.P(2, 3)]
|
|
[4 ] : P_add = [low.P(3, 0), low.P(3, 1), low.P(3, 2), low.P(3, 3)]
|
|
[5 ] : P_mult_scale = P_mult * header.xyz-scale
|
|
[7 ] : fog (pfog0, fog-min, fog-max, 0.0)
|
|
[8 ] : unperspect (1/P(0, 0), 1/P(1, 1), 0.5, 1/P(2, 3))
|
|
|
|
```
|
|
|
|
### Summary of math
|
|
|
|
The "transformed vertex" refers to the vertex before perspective divide, and pfog0 multiply.
|
|
The "transformed normal" is the rotated normal, after normalization.
|
|
```
|
|
vf08 = transformed
|
|
vf23 = unperspect
|
|
vf14 = rgba-fade
|
|
vf24 = normal st
|
|
|
|
mul.xyzw vf09, vf08, vf23 ;; do unperspect
|
|
|
|
subw.z vf10, vf10, vf00 ;; subtract 1 from z
|
|
|
|
addw.z vf09, vf00, vf09 ;; xyww the unperspected thing
|
|
|
|
mul.xyz vf15, vf09, vf10 ;;
|
|
|
|
adday.xyzw vf15, vf15
|
|
|
|
maddz.x vf15, vf21, vf15
|
|
|
|
div Q, vf15.x, vf10.z
|
|
|
|
mulaw.xyzw ACC, vf09, vf00
|
|
|
|
mul.xyzw vf09, vf08, vf23
|
|
|
|
madd.xyzw vf10, vf10, Q
|
|
|
|
eleng.xyz P, vf10
|
|
|
|
mfp.w vf10, P
|
|
|
|
div Q, vf23.z, vf10.w
|
|
|
|
addaz.xyzw vf00, vf23
|
|
|
|
madd.xyzw vf10, vf10, Q
|
|
|
|
mulz.xy vf24, vf10, vf24 ;; mul tex by q
|
|
|
|
;; new rgba
|
|
sq.xyzw vf14, 443(vi10)
|
|
|
|
;;
|
|
vf24
|
|
```
|
|
|
|
|
|
### VU1 Program: init (per frame)
|
|
```
|
|
lq.xyzw vf01, 7(vi00) | nop
|
|
lq.xyzw vf25, 3(vi00) | nop
|
|
lq.xyzw vf26, 4(vi00) | nop
|
|
lq.xyzw vf27, 5(vi00) | nop
|
|
lq.xyzw vf28, 6(vi00) | nop
|
|
lq.xyzw vf08, 8(vi00) | nop
|
|
mr32.xyzw vf01, vf01 | nop
|
|
move.y vf25, vf26 | nop
|
|
move.zw vf25, vf27 | nop
|
|
sq.xyzw vf25, 3(vi00) | nop
|
|
sq.xyzw vf08, 124(vi00) | nop
|
|
2048.0 | nop :i
|
|
255.0 | maxi.x vf17, vf00, I :i
|
|
-65537.0 | maxi.y vf17, vf00, I :i
|
|
mr32.xyzw vf02, vf01 | minii.z vf17, vf00, I
|
|
lq.xyzw vf22, 2(vi00) | minii.z vf18, vf00, I
|
|
0.003921569 | minii.z vf19, vf00, I :i
|
|
sq.xyzw vf28, 4(vi00) | minii.w vf29, vf00, I :e
|
|
mr32.xyzw vf03, vf02 | nop
|
|
```
|
|
|
|
Simplified code (`??`'s are either garbage, or some value that isn't important later on). Leaving out stores to low memory documented in the Memory Map section.
|
|
```
|
|
vf01 = [??, ??, ??, low.pfog0]
|
|
vf02 = [??, ??, ??, low.fog_min]
|
|
vf03 = [??, ??, ??, low.fog_max]
|
|
vf17 = [2048., 255., -65537., ??]
|
|
vf22 = low_in.hvdf_offset
|
|
```
|
|
|
|
### VU1 Program: init (per effect)
|
|
Note that this continues directly into the per-frag program, to match the note in frag == 0 case in the dma generation part.
|
|
|
|
```
|
|
lq.xyzw vf25, 139(vi00) | nop
|
|
lq.xyzw vf26, 3(vi00) | nop
|
|
lq.xyz vf01, 132(vi00) | nop
|
|
lq.xyz vf02, 133(vi00) | nop
|
|
lq.xyz vf03, 134(vi00) | addy.xy vf19, vf00, vf25
|
|
lq.xyzw vf04, 135(vi00) | mulx.xyzw vf26, vf26, vf25
|
|
lq.xyzw vf05, 136(vi00) | nop
|
|
lq.xyzw vf06, 137(vi00) | nop
|
|
lq.xyzw vf07, 138(vi00) | nop
|
|
sq.xyzw vf26, 5(vi00) | nop ;; P_mult_scale store.
|
|
```
|
|
|
|
Simplified code (note: some of this stuff set later)
|
|
```
|
|
vf25 = [xyz-scale, st-magic, st-out-a, st-out-b];
|
|
vf26 = low.P_mult * xyz-scale;
|
|
vf01 = [lt0.xyz, pfog0]
|
|
vf02 = [lt1.xyz, fog-min]
|
|
vf03 = [lt2.xyz, fog-max]
|
|
vf19 = [st-magic, st-magic, -65537, xyz-add.z];
|
|
vf04 = lt0_color;
|
|
vf05 = lt1_color;
|
|
vf06 = lt2_color;
|
|
```
|
|
|
|
### VU1 Program: per-fragment, pre-looping init
|
|
```
|
|
;; reg setup stuff
|
|
lq.xyzw vf28, 139(vi00) | minix.xyzw vf15, vf00, vf00 ;; vf28 = merc-ctrl-header, vf15 = [0, 0, 0, 0]
|
|
xtop vi15 | nop ;; vi15 = 0 (output buffer)
|
|
iaddiu vi12, vi15, 0x8c | nop ;; vi12 = xtop + 140 (merc-byte-header, u4)
|
|
nop | nop ;; in merc was a branch for st-a/st-b select.
|
|
ilwr.w vi03, vi12 | maxz.xy vf18, vf00, vf28 ;; set vf18.xy = [st-out-a, st-out-a] (for a buffer)
|
|
iaddiu vi15, vi00, 0x173 | nop ;; vi15 = xtop + 371
|
|
lq.xyzw vf14, 0(vi00) | nop ;; vf14 = tri-strip-gif-tag
|
|
nop | nop ;; in merc was fadeout
|
|
iadd vi03, vi03, vi12 | nop ;; st-output location = st-out-a + xtop + 140
|
|
ilwr.w vi09, vi03 | nop ;; vi09 = fp-header u8's [shader-cnt, kick-off, kick-step, hword-cnt]
|
|
lqi.xyzw vf27, vi03 | nop ;; vf27 = xyz-add
|
|
ilw.x vi04, 1(vi12) | nop ;; vi04 = mat1-cnt
|
|
iaddiu vi05, vi00, 0x7f | addw.xyz vf15, vf15, vf00 ;; vf15 = [1, 1, 1, 0], vi05 = 0x7f
|
|
iand vi09, vi09, vi05 | nop ;; mask to get vi09 = shader-cnt
|
|
ilw.y vi06, 1(vi12) | miniz.w vf19, vf00, vf27 ;; setup vf19, vi06 = mat2-cnt
|
|
nop | miniy.w vf18, vf00, vf27 ;; setup vf18, merc had branch for no strips.
|
|
ilwr.z vi01, vi12 | minix.w vf17, vf00, vf27 ;; vi01 = lump-off
|
|
|
|
;; vf17 = [2048, 255, -65537, xyz-add.x]
|
|
;; vf18 = [st-out-X, st-out-X, -65537, xyz-add.y] (X = a if xtop = 0, X = b otherwise)
|
|
;; vf19 = [st-magic, st-magic, -65537, xyz-add.z]
|
|
|
|
;; shader setup (not envmap)
|
|
lq.xyzw vf13, 1(vi00) | nop ;; vf13 = adgif gif tag.
|
|
ilwr.w vi02, vi03 | nop ;; vi02 = shader control word 0 (dest offset)
|
|
lqi.xyzw vf08, vi03 | nop ;; load shader data
|
|
lqi.xyzw vf09, vi03 | nop
|
|
lqi.xyzw vf10, vi03 | nop
|
|
lqi.xyzw vf11, vi03 | nop
|
|
lqi.xyzw vf12, vi03 | nop
|
|
iadd vi02, vi02, vi15 | nop ;; compute destination
|
|
mtir vi08, vf09.w | nop ;; eop stuff (not sure this makes sense in 1-shader emerc)
|
|
sqi.xyzw vf13, vi02 | nop ;; store adgif gif tag
|
|
sqi.xyzw vf08, vi02 | nop ;; shader store 1
|
|
sqi.xyzw vf09, vi02 | nop ;; shader store 2
|
|
mfir.x vf14, vi08 | nop ;; set eop bit in giftag template
|
|
sqi.xyzw vf10, vi02 | nop ;; shader store 3
|
|
sqi.xyzw vf11, vi02 | nop ;; shader store 4
|
|
sqi.xyzw vf12, vi02 | nop ;; shader store 5
|
|
sq.xyzw vf14, 0(vi02) | nop ;; store end giftag
|
|
|
|
;; matrix warmup
|
|
lq.xyzw vf28, 3(vi00) | nop ;; vf28 = persp-diag
|
|
ilw.y vi08, 3(vi12) | nop ;; vi08 = mat-slot.0
|
|
lq.xyzw vf16, 5(vi00) | nop ;; vf16 = scaled-persp-diag
|
|
lq.xyzw vf20, 4(vi00) | nop ;; vf20 = persp-off
|
|
ilw.z vi09, 3(vi12) | mul.xyzw vf27, vf28, vf15 ;; vf27 = [pdx, pdy, pdz, 0], vi09 = mat-slot.1
|
|
ior vi11, vi08, vi00 | mul.xyzw vf28, vf28, vf00 ;; vf28 = [0, 0, 0, pdw], vi11 = vi08 = mat-slot.0
|
|
ibeq vi00, vi08, L2 | mul.xyzw vf15, vf16, vf15 ;; vf15 = [spdx, spdy, spdz, 0], skip if slot = 0
|
|
iaddi vi13, vi12, 0x3 | mul.xyzw vf16, vf16, vf00 ;; vi13 = mat-slot-ptr, vf16 = [0, 0, 0, spdw]
|
|
```
|
|
- mostly same as merc
|
|
- always picks `st-a`, merc had a branch here based on state of `xtop`.
|
|
- no fade out flag stuff
|
|
|
|
### Matrix multiply loop
|
|
Premultiplies uploaded matrices by perspective. Only does matrices that were uploaded this time.
|
|
Same as merc, so skipping.
|
|
|
|
### The rest of it
|
|
- Transformed vertex (before perspective divide and pfog0 multiply is store back over `lump[2]`)
|
|
- Transformed normal is stored over `rgba`
|
|
```
|
|
L2: (L14 in og merc)
|
|
;; Pipelining Start for vertex transform
|
|
ilw.x vi02, 3(vi12) | nop ;; vi02 = perc-off
|
|
ibeq vi00, vi04, L13 | nop ;; goto L13 if mat1 count is 0
|
|
iadd vi01, vi01, vi12 | nop ;; vi01 = lump.
|
|
|
|
;; Pipelining start for matrix 1's
|
|
ilwr.x vi08, vi01 | nop ;; vi08 = lump[0].x = mat-0?
|
|
lqi.xyzw vf08, vi01 | nop
|
|
lqi.xyzw vf11, vi01 | nop
|
|
lqi.xyzw vf14, vi01 | nop ;; vf14 = lump[2] = [texs, text, nrmz, posz]
|
|
lq.xyz vf29, 4(vi08) | nop
|
|
lq.xyz vf30, 5(vi08) | add.zw vf08, vf08, vf17
|
|
lq.xyzw vf31, 6(vi08) | add.xyzw vf11, vf11, vf18
|
|
iaddi vi04, vi04, -0x1 | add.xyzw vf14, vf14, vf19
|
|
iadd vi02, vi02, vi12 | nop
|
|
lqi.xyzw vf24, vi02 | mulaz.xyzw ACC, vf29, vf08
|
|
mtir vi10, vf11.x | maddaz.xyzw ACC, vf30, vf11
|
|
mtir vi13, vf11.y | maddz.xyz vf11, vf31, vf14
|
|
lq.xyzw vf25, 0(vi08) | nop
|
|
lq.xyzw vf26, 1(vi08) | itof0.xyzw vf24, vf24
|
|
lq.xyzw vf27, 2(vi08) | nop
|
|
erleng.xyz P, vf11 | nop
|
|
lq.xyzw vf28, 3(vi08) | mulaw.xyzw ACC, vf25, vf08
|
|
nop | maddaw.xyzw ACC, vf26, vf11 ;; modified from merc, no mercprime crap
|
|
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
|
|
lqi.xyzw vf09, vi01 | nop
|
|
ilwr.y vi03, vi12 | nop
|
|
ilw.z vi07, 1(vi12) | nop
|
|
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
|
|
lqi.xyzw vf15, vi01 | nop
|
|
mtir vi08, vf09.x | nop ;; mercprime stuff in og.
|
|
|
|
;; CHANGE: transformed vf08 (pre perspective divide, pfog mult)
|
|
;; is stored back! over lop lump[2] (texs, text, nrmz, posz)
|
|
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
|
|
|
|
iadd vi03, vi03, vi12 | nop
|
|
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
|
|
iadd vi04, vi04, vi03 | add.xyzw vf12, vf12, vf18
|
|
lq.xyz vf29, 4(vi08) | add.xyzw vf15, vf15, vf19
|
|
lq.xyz vf30, 5(vi08) | nop
|
|
iadd vi06, vi06, vi04 | nop
|
|
lq.xyzw vf31, 6(vi08) | nop
|
|
lq.xyzw vf25, 0(vi08) | nop
|
|
lq.xyzw vf26, 1(vi08) | mul.xyz vf08, vf08, Q
|
|
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
|
|
mtir vi14, vf12.y | nop
|
|
lq.xyzw vf27, 2(vi08) | nop
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22 ;; load rgba, hvdf offset
|
|
iadd vi07, vi07, vi06 | mulaz.xyzw ACC, vf29, vf09
|
|
lq.xyzw vf28, 3(vi08) | maddaz.xyzw ACC, vf30, vf12
|
|
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
|
|
nop | nop
|
|
1024.0 | miniw.w vf08, vf08, vf03 :i
|
|
nop | mulaw.xyzw ACC, vf25, vf09 ;; modified, no mercprime branch
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20 ;;
|
|
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I ;; like mercprimt path (L82 in og merc)
|
|
3072.0 | nop :i
|
|
nop | minii.xy vf08, vf08, I
|
|
|
|
;; CHANGE store back normal over RGBA.
|
|
sq.xyzw vf11, -1(vi03) | maddaw.xyzw ACC, vf26, vf12
|
|
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
|
|
lqi.xyzw vf10, vi01 | mulax.xyzw ACC, vf01, vf11
|
|
ibne vi04, vi03, L4 | madday.xyzw ACC, vf02, vf11 ;; branch to L4, pipelined mat 1
|
|
nop | maddz.xyzw vf11, vf03, vf11
|
|
ibne vi06, vi03, L17 | nop
|
|
nop | nop
|
|
b L52 | nop
|
|
nop | nop
|
|
|
|
;; pipelined mat 1 loop start
|
|
L3: (L16 in og)
|
|
sq.xyzw vf11, -1(vi03) | nop ;; normal store back
|
|
3072.0 | mulax.xyzw ACC, vf01, vf11 :i ;; mercprime crap
|
|
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
|
|
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
|
|
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
|
|
;; pipelined mat 1 entry point
|
|
L4: (L17 in og)
|
|
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
|
|
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
|
|
mtir vi08, vf10.x | itof0.xyzw vf23, vf23
|
|
ilw.y vi09, -9(vi01) | maxx.xyzw vf11, vf11, vf00
|
|
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
|
|
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
|
|
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
|
|
lq.xyz vf29, 4(vi08) | add.xyzw vf16, vf16, vf19
|
|
lq.xyz vf30, 5(vi08) | mulax.xyzw ACC, vf04, vf11
|
|
ibgtz vi09, L5 | madday.xyzw ACC, vf05, vf11
|
|
lq.xyzw vf31, 6(vi08) | maddaz.xyzw ACC, vf06, vf11
|
|
nop | addx.w vf21, vf21, vf17
|
|
L5: (L18 in og)
|
|
lq.xyzw vf25, 0(vi08) | maddw.xyzw vf11, vf07, vf00
|
|
lq.xyzw vf26, 1(vi08) | mul.xyz vf09, vf09, Q
|
|
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
|
|
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf27, 2(vi08) | mul.xyzw vf11, vf11, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
|
|
ibne vi00, vi09, L6 | mulaz.xyzw ACC, vf29, vf10
|
|
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
|
|
nop | ftoi4.xyzw vf21, vf08
|
|
L6: (L19 in og)
|
|
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
|
|
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
|
|
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
|
|
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
|
|
lq.xyzw vf28, 3(vi08) | mulw.xyzw vf12, vf12, vf20
|
|
1024.0 | ftoi0.xyzw vf11, vf11 :i
|
|
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
|
|
ibne vi04, vi03, L7 | maddaw.xyzw ACC, vf26, vf13
|
|
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
|
|
ibne vi06, vi03, L22 | nop
|
|
ilw.y vi09, -6(vi01) | nop
|
|
ibne vi07, vi03, L57 | nop
|
|
nop | nop
|
|
b L67 | nop
|
|
nop | nop
|
|
L7: (L20 in og)
|
|
sq.xyzw vf12, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
|
|
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
|
|
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
|
|
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
|
|
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
|
|
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
|
|
mtir vi08, vf08.x | itof0.xyzw vf23, vf23
|
|
ilw.y vi09, -9(vi01) | maxx.xyzw vf12, vf12, vf00
|
|
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
|
|
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
|
|
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
|
|
lq.xyz vf29, 4(vi08) | add.xyzw vf14, vf14, vf19
|
|
lq.xyz vf30, 5(vi08) | mulax.xyzw ACC, vf04, vf12
|
|
ibgtz vi09, L8 | madday.xyzw ACC, vf05, vf12
|
|
lq.xyzw vf31, 6(vi08) | maddaz.xyzw ACC, vf06, vf12
|
|
nop | addx.w vf21, vf21, vf17
|
|
L8: (L21 in og)
|
|
lq.xyzw vf25, 0(vi08) | maddw.xyzw vf12, vf07, vf00
|
|
lq.xyzw vf26, 1(vi08) | mul.xyz vf10, vf10, Q
|
|
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
|
|
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf27, 2(vi08) | mul.xyzw vf12, vf12, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
|
|
ibne vi00, vi09, L9 | mulaz.xyzw ACC, vf29, vf08
|
|
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
|
|
nop | ftoi4.xyzw vf21, vf09
|
|
L9: (L22 in og)
|
|
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
|
|
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
|
|
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
|
|
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
|
|
lq.xyzw vf28, 3(vi08) | mulw.xyzw vf13, vf13, vf20
|
|
1024.0 | ftoi0.xyzw vf12, vf12 :i
|
|
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
|
|
ibne vi04, vi03, L10 | maddaw.xyzw ACC, vf26, vf11
|
|
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
|
|
ibne vi06, vi03, L27 | nop
|
|
ilw.y vi09, -6(vi01) | nop
|
|
ibne vi07, vi03, L62 | nop
|
|
nop | nop
|
|
b L72 | nop
|
|
nop | nop
|
|
L10: (L23 in og)
|
|
sq.xyzw vf13, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
|
|
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
|
|
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
|
|
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
|
|
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
|
|
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
|
|
mtir vi08, vf09.x | itof0.xyzw vf23, vf23
|
|
ilw.y vi09, -9(vi01) | maxx.xyzw vf13, vf13, vf00
|
|
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
|
|
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
|
|
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
|
|
lq.xyz vf29, 4(vi08) | add.xyzw vf15, vf15, vf19
|
|
lq.xyz vf30, 5(vi08) | mulax.xyzw ACC, vf04, vf13
|
|
ibgtz vi09, L11 | madday.xyzw ACC, vf05, vf13
|
|
lq.xyzw vf31, 6(vi08) | maddaz.xyzw ACC, vf06, vf13
|
|
nop | addx.w vf21, vf21, vf17
|
|
L11: (L24 in og)
|
|
lq.xyzw vf25, 0(vi08) | maddw.xyzw vf13, vf07, vf00
|
|
lq.xyzw vf26, 1(vi08) | mul.xyz vf08, vf08, Q
|
|
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
|
|
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf27, 2(vi08) | mul.xyzw vf13, vf13, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
|
|
ibne vi00, vi09, L12 | mulaz.xyzw ACC, vf29, vf09
|
|
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
|
|
nop | ftoi4.xyzw vf21, vf10
|
|
L12: (L25 in og)
|
|
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
|
|
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
|
|
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
|
|
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
|
|
lq.xyzw vf28, 3(vi08) | mulw.xyzw vf11, vf11, vf20
|
|
1024.0 | ftoi0.xyzw vf13, vf13 :i
|
|
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
|
|
ibne vi04, vi03, L3 | maddaw.xyzw ACC, vf26, vf12
|
|
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
|
|
ibne vi06, vi03, L16 | nop
|
|
ilw.y vi09, -6(vi01) | nop
|
|
ibne vi07, vi03, L51 | nop
|
|
nop | nop
|
|
b L77 | nop
|
|
nop | nop
|
|
|
|
L13 (L26 in og merc):
|
|
;; pipeline startup for mat 2's (assuming you have no mat1's)
|
|
ibeq vi00, vi06, L47 | nop
|
|
iadd vi02, vi02, vi12 | nop
|
|
lqi.xyzw vf08, vi01 | nop
|
|
lqi.xyzw vf24, vi02 | nop
|
|
lqi.xyzw vf11, vi01 | nop
|
|
lqi.xyzw vf14, vi01 | nop
|
|
mtir vi10, vf08.x | nop
|
|
mtir vi13, vf08.y | itof0.xyzw vf24, vf24
|
|
iaddi vi06, vi06, -0x1 | add.zw vf08, vf08, vf17
|
|
nop | add.xyzw vf11, vf11, vf18
|
|
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
iand vi13, vi13, vi05 | nop
|
|
lq.xyzw vf20, 0(vi10) | nop
|
|
lq.xyzw vf25, 0(vi13) | nop
|
|
lq.xyzw vf23, 1(vi10) | nop
|
|
lq.xyzw vf26, 1(vi13) | nop
|
|
lq.xyzw vf20, 2(vi10) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi13) | maddy.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi10) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi13) | maddy.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi10) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi13) | maddy.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi10) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi13) | maddy.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi10) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi13) | maddy.xyz vf29, vf29, vf24
|
|
mtir vi10, vf11.x | mulax.xyzw ACC, vf23, vf24
|
|
mtir vi13, vf11.y | maddy.xyz vf30, vf30, vf24
|
|
nop | mulax.xyzw ACC, vf20, vf24
|
|
nop | maddy.xyzw vf31, vf31, vf24
|
|
nop | mulaz.xyzw ACC, vf29, vf08
|
|
nop | maddaz.xyzw ACC, vf30, vf11
|
|
nop | maddz.xyz vf11, vf31, vf14
|
|
nop | nop
|
|
nop | nop
|
|
nop | mulaw.xyzw ACC, vf25, vf08
|
|
nop | nop
|
|
erleng.xyz P, vf11 | nop
|
|
nop | maddaw.xyzw ACC, vf26, vf11
|
|
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
|
|
lqi.xyzw vf09, vi01 | nop
|
|
ilwr.y vi03, vi12 | nop
|
|
ilw.z vi07, 1(vi12) | nop
|
|
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
|
|
lqi.xyzw vf15, vi01 | nop
|
|
mtir vi11, vf09.x | nop
|
|
mtir vi14, vf09.y | nop
|
|
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
|
|
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
|
|
iadd vi03, vi03, vi12 | add.xyzw vf12, vf12, vf18
|
|
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
|
|
iadd vi06, vi06, vi03 | nop
|
|
iadd vi07, vi07, vi06 | nop
|
|
iand vi14, vi14, vi05 | nop
|
|
ibne vi05, vi11, L14 | nop
|
|
iaddiu vi08, vi00, 0x23a | mul.xyz vf08, vf08, Q
|
|
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
|
|
mtir vi14, vf12.y | nop
|
|
b L15 | nop
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
|
|
L14: (L28 in og)
|
|
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
|
|
lq.xyzw vf25, 0(vi14) | nop
|
|
lq.xyzw vf23, 1(vi11) | nop
|
|
lq.xyzw vf26, 1(vi14) | add.xyzw vf08, vf08, vf22
|
|
lq.xyzw vf20, 2(vi11) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi14) | maddw.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi11) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi14) | maddw.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi11) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi14) | maddw.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi11) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi14) | maddw.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi11) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi14) | maddw.xyz vf29, vf29, vf24
|
|
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
|
|
mtir vi11, vf12.x | maddw.xyz vf30, vf30, vf24
|
|
mtir vi14, vf12.y | mulaz.xyzw ACC, vf20, vf24
|
|
iaddiu vi08, vi00, 0x18c | maddw.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L15: (L29 in og)
|
|
nop | mulaz.xyzw ACC, vf29, vf09
|
|
nop | maddaz.xyzw ACC, vf30, vf12
|
|
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
|
|
nop | nop
|
|
1024.0 | miniw.w vf08, vf08, vf03 :i
|
|
nop | mulaw.xyzw ACC, vf25, vf09
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
|
|
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
|
|
3072.0 | nop :i
|
|
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
|
|
ibeq vi06, vi03, L50 | maddaw.xyzw ACC, vf26, vf12
|
|
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
|
|
lqi.xyzw vf10, vi01 | mulax.xyzw ACC, vf01, vf11
|
|
jr vi08 | madday.xyzw ACC, vf02, vf11
|
|
nop | maddz.xyzw vf11, vf03, vf11
|
|
L16: (L30 in og)
|
|
sq.xyzw vf11, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
|
|
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
|
|
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
|
|
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
|
|
L17: (L31 in og)
|
|
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
|
|
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
|
|
mtir vi12, vf10.x | itof0.xyzw vf23, vf23
|
|
mtir vi15, vf10.y | maxx.xyzw vf11, vf11, vf00
|
|
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
|
|
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
|
|
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
|
|
iand vi12, vi12, vi05 | add.xyzw vf16, vf16, vf19
|
|
nop | mulax.xyzw ACC, vf04, vf11
|
|
ibgtz vi09, L18 | madday.xyzw ACC, vf05, vf11
|
|
iand vi15, vi15, vi05 | maddaz.xyzw ACC, vf06, vf11
|
|
nop | addx.w vf21, vf21, vf17
|
|
L18: (L32 in og)
|
|
ibne vi05, vi12, L19 | maddw.xyzw vf11, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf09, vf09, Q
|
|
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
|
|
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
|
|
b L20 | mul.xyzw vf11, vf11, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
|
|
L19: (L33 in og)
|
|
lq.xyzw vf20, 0(vi12) | mul.xyzw vf15, vf15, Q
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
lq.xyzw vf25, 0(vi15) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf23, 1(vi12) | mul.xyzw vf11, vf11, vf23
|
|
lq.xyzw vf26, 1(vi15) | add.xyzw vf09, vf09, vf22
|
|
lq.xyzw vf20, 2(vi12) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi15) | maddy.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi12) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi15) | maddy.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi12) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi15) | maddy.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi12) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi15) | maddy.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi12) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi15) | maddy.xyz vf29, vf29, vf24
|
|
mtir vi12, vf13.x | mulax.xyzw ACC, vf23, vf24
|
|
mtir vi15, vf13.y | maddy.xyz vf30, vf30, vf24
|
|
b L35 | mulax.xyzw ACC, vf20, vf24
|
|
lqi.xyzw vf23, vi03 | maddy.xyzw vf31, vf31, vf24
|
|
L20: (L34 in og)
|
|
ibgez vi09, L21 | mulaz.xyzw ACC, vf29, vf10
|
|
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
|
|
nop | ftoi4.xyzw vf21, vf08
|
|
L21: (L35 in og)
|
|
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
|
|
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
|
|
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
|
|
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf12, vf12, vf20
|
|
1024.0 | ftoi0.xyzw vf11, vf11 :i
|
|
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
|
|
ibne vi06, vi03, L22 | maddaw.xyzw ACC, vf26, vf13
|
|
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
|
|
ibne vi07, vi03, L57 | nop
|
|
nop | nop
|
|
b L67 | nop
|
|
nop | nop
|
|
L22: (L36 in og)
|
|
sq.xyzw vf12, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
|
|
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
|
|
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
|
|
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
|
|
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
|
|
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
|
|
mtir vi10, vf08.x | itof0.xyzw vf23, vf23
|
|
mtir vi13, vf08.y | maxx.xyzw vf12, vf12, vf00
|
|
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
|
|
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
|
|
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
|
|
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
|
|
nop | mulax.xyzw ACC, vf04, vf12
|
|
ibgtz vi09, L23 | madday.xyzw ACC, vf05, vf12
|
|
iand vi13, vi13, vi05 | maddaz.xyzw ACC, vf06, vf12
|
|
nop | addx.w vf21, vf21, vf17
|
|
L23: (L37 in og)
|
|
ibne vi05, vi10, L24 | maddw.xyzw vf12, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf10, vf10, Q
|
|
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
|
|
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
|
|
b L25 | mul.xyzw vf12, vf12, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
|
|
L24: (L38 in og)
|
|
lq.xyzw vf20, 0(vi10) | mul.xyzw vf16, vf16, Q
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
lq.xyzw vf25, 0(vi13) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf23, 1(vi10) | mul.xyzw vf12, vf12, vf23
|
|
lq.xyzw vf26, 1(vi13) | add.xyzw vf10, vf10, vf22
|
|
lq.xyzw vf20, 2(vi10) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi13) | maddy.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi10) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi13) | maddy.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi10) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi13) | maddy.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi10) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi13) | maddy.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi10) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi13) | maddy.xyz vf29, vf29, vf24
|
|
mtir vi10, vf11.x | mulax.xyzw ACC, vf23, vf24
|
|
mtir vi13, vf11.y | maddy.xyz vf30, vf30, vf24
|
|
b L40 | mulax.xyzw ACC, vf20, vf24
|
|
lqi.xyzw vf23, vi03 | maddy.xyzw vf31, vf31, vf24
|
|
L25: (L39 in og)
|
|
ibgez vi09, L26 | mulaz.xyzw ACC, vf29, vf08
|
|
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
|
|
nop | ftoi4.xyzw vf21, vf09
|
|
L26: (L40 in og)
|
|
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
|
|
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
|
|
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
|
|
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf13, vf13, vf20
|
|
1024.0 | ftoi0.xyzw vf12, vf12 :i
|
|
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
|
|
ibne vi06, vi03, L27 | maddaw.xyzw ACC, vf26, vf11
|
|
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
|
|
ibne vi07, vi03, L62 | nop
|
|
nop | nop
|
|
b L72 | nop
|
|
nop | nop
|
|
L27: (L41 in og)
|
|
sq.xyzw vf13, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
|
|
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
|
|
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
|
|
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
|
|
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
|
|
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
|
|
mtir vi11, vf09.x | itof0.xyzw vf23, vf23
|
|
mtir vi14, vf09.y | maxx.xyzw vf13, vf13, vf00
|
|
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
|
|
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
|
|
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
|
|
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
|
|
nop | mulax.xyzw ACC, vf04, vf13
|
|
ibgtz vi09, L28 | madday.xyzw ACC, vf05, vf13
|
|
iand vi14, vi14, vi05 | maddaz.xyzw ACC, vf06, vf13
|
|
nop | addx.w vf21, vf21, vf17
|
|
L28: (L42 in og)
|
|
ibne vi05, vi11, L29 | maddw.xyzw vf13, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf08, vf08, Q
|
|
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
|
|
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
|
|
b L30 | mul.xyzw vf13, vf13, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
|
|
L29: (L43 in og)
|
|
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
lq.xyzw vf25, 0(vi14) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf23, 1(vi11) | mul.xyzw vf13, vf13, vf23
|
|
lq.xyzw vf26, 1(vi14) | add.xyzw vf08, vf08, vf22
|
|
lq.xyzw vf20, 2(vi11) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi14) | maddy.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi11) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi14) | maddy.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi11) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi14) | maddy.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi11) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi14) | maddy.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi11) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi14) | maddy.xyz vf29, vf29, vf24
|
|
mtir vi11, vf12.x | mulax.xyzw ACC, vf23, vf24
|
|
mtir vi14, vf12.y | maddy.xyz vf30, vf30, vf24
|
|
b L45 | mulax.xyzw ACC, vf20, vf24
|
|
lqi.xyzw vf23, vi03 | maddy.xyzw vf31, vf31, vf24
|
|
L30: (L44 in og)
|
|
ibgez vi09, L31 | mulaz.xyzw ACC, vf29, vf09
|
|
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
|
|
nop | ftoi4.xyzw vf21, vf10
|
|
L31: (L45 in og)
|
|
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
|
|
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
|
|
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
|
|
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
|
|
1024.0 | ftoi0.xyzw vf13, vf13 :i
|
|
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
|
|
ibne vi06, vi03, L16 | maddaw.xyzw ACC, vf26, vf12
|
|
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
|
|
ibne vi07, vi03, L51 | nop
|
|
nop | nop
|
|
b L77 | nop
|
|
nop | nop
|
|
L32: (L46 in og)
|
|
sq.xyzw vf11, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
|
|
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
|
|
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
|
|
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
|
|
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
|
|
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
|
|
mtir vi12, vf10.x | itof0.xyzw vf23, vf23
|
|
mtir vi15, vf10.y | maxx.xyzw vf11, vf11, vf00
|
|
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
|
|
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
|
|
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
|
|
iand vi12, vi12, vi05 | add.xyzw vf16, vf16, vf19
|
|
nop | mulax.xyzw ACC, vf04, vf11
|
|
ibgtz vi09, L33 | madday.xyzw ACC, vf05, vf11
|
|
iand vi15, vi15, vi05 | maddaz.xyzw ACC, vf06, vf11
|
|
nop | addx.w vf21, vf21, vf17
|
|
L33: (L47 in og)
|
|
ibne vi05, vi12, L34 | maddw.xyzw vf11, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf09, vf09, Q
|
|
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
|
|
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
|
|
b L35 | mul.xyzw vf11, vf11, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
|
|
L34: (L48 in og)
|
|
lq.xyzw vf20, 0(vi12) | mul.xyzw vf15, vf15, Q
|
|
lq.xyzw vf25, 0(vi15) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf23, 1(vi12) | mul.xyzw vf11, vf11, vf23
|
|
lq.xyzw vf26, 1(vi15) | add.xyzw vf09, vf09, vf22
|
|
lq.xyzw vf20, 2(vi12) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi15) | maddw.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi12) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi15) | maddw.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi12) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi15) | maddw.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi12) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi15) | maddw.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi12) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi15) | maddw.xyz vf29, vf29, vf24
|
|
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
|
|
mtir vi12, vf13.x | maddw.xyz vf30, vf30, vf24
|
|
mtir vi15, vf13.y | mulaz.xyzw ACC, vf20, vf24
|
|
b L20 | maddw.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L35: (L49 in og)
|
|
ibgez vi09, L36 | mulaz.xyzw ACC, vf29, vf10
|
|
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
|
|
nop | ftoi4.xyzw vf21, vf08
|
|
L36: (L50 in og)
|
|
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
|
|
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
|
|
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
|
|
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf12, vf12, vf20
|
|
1024.0 | ftoi0.xyzw vf11, vf11 :i
|
|
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
|
|
ibne vi06, vi03, L37 | maddaw.xyzw ACC, vf26, vf13
|
|
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
|
|
ibne vi07, vi03, L57 | nop
|
|
nop | nop
|
|
b L67 | nop
|
|
nop | nop
|
|
L37: (L51 in og)
|
|
sq.xyzw vf12, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
|
|
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
|
|
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
|
|
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
|
|
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
|
|
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
|
|
mtir vi10, vf08.x | itof0.xyzw vf23, vf23
|
|
mtir vi13, vf08.y | maxx.xyzw vf12, vf12, vf00
|
|
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
|
|
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
|
|
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
|
|
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
|
|
nop | mulax.xyzw ACC, vf04, vf12
|
|
ibgtz vi09, L38 | madday.xyzw ACC, vf05, vf12
|
|
iand vi13, vi13, vi05 | maddaz.xyzw ACC, vf06, vf12
|
|
nop | addx.w vf21, vf21, vf17
|
|
L38: (L52 in og)
|
|
ibne vi05, vi10, L39 | maddw.xyzw vf12, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf10, vf10, Q
|
|
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
|
|
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
|
|
b L40 | mul.xyzw vf12, vf12, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
|
|
L39: (L53 in og)
|
|
lq.xyzw vf20, 0(vi10) | mul.xyzw vf16, vf16, Q
|
|
lq.xyzw vf25, 0(vi13) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf23, 1(vi10) | mul.xyzw vf12, vf12, vf23
|
|
lq.xyzw vf26, 1(vi13) | add.xyzw vf10, vf10, vf22
|
|
lq.xyzw vf20, 2(vi10) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi13) | maddw.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi10) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi13) | maddw.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi10) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi13) | maddw.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi10) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi13) | maddw.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi10) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi13) | maddw.xyz vf29, vf29, vf24
|
|
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
|
|
mtir vi10, vf11.x | maddw.xyz vf30, vf30, vf24
|
|
mtir vi13, vf11.y | mulaz.xyzw ACC, vf20, vf24
|
|
b L25 | maddw.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L40: (L54 in og)
|
|
ibgez vi09, L41 | mulaz.xyzw ACC, vf29, vf08
|
|
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
|
|
nop | ftoi4.xyzw vf21, vf09
|
|
L41: (L55 in og)
|
|
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
|
|
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
|
|
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
|
|
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf13, vf13, vf20
|
|
1024.0 | ftoi0.xyzw vf12, vf12 :i
|
|
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
|
|
ibne vi06, vi03, L42 | maddaw.xyzw ACC, vf26, vf11
|
|
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
|
|
ibne vi07, vi03, L62 | nop
|
|
nop | nop
|
|
b L72 | nop
|
|
nop | nop
|
|
L42: (L56 in og)
|
|
sq.xyzw vf13, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
|
|
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
|
|
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
|
|
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
|
|
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
|
|
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
|
|
mtir vi11, vf09.x | itof0.xyzw vf23, vf23
|
|
mtir vi14, vf09.y | maxx.xyzw vf13, vf13, vf00
|
|
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
|
|
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
|
|
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
|
|
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
|
|
nop | mulax.xyzw ACC, vf04, vf13
|
|
ibgtz vi09, L43 | madday.xyzw ACC, vf05, vf13
|
|
iand vi14, vi14, vi05 | maddaz.xyzw ACC, vf06, vf13
|
|
nop | addx.w vf21, vf21, vf17
|
|
L43: (L57 in og)
|
|
ibne vi05, vi11, L44 | maddw.xyzw vf13, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf08, vf08, Q
|
|
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
|
|
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
|
|
b L45 | mul.xyzw vf13, vf13, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
|
|
L44: (L58 in og)
|
|
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
|
|
lq.xyzw vf25, 0(vi14) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf23, 1(vi11) | mul.xyzw vf13, vf13, vf23
|
|
lq.xyzw vf26, 1(vi14) | add.xyzw vf08, vf08, vf22
|
|
lq.xyzw vf20, 2(vi11) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf27, 2(vi14) | maddw.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 3(vi11) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf28, 3(vi14) | maddw.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf20, 4(vi11) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi14) | maddw.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf23, 5(vi11) | mulaz.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi14) | maddw.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf20, 6(vi11) | mulaz.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf31, 6(vi14) | maddw.xyz vf29, vf29, vf24
|
|
lqi.xyzw vf23, vi02 | mulaz.xyzw ACC, vf23, vf24
|
|
mtir vi11, vf12.x | maddw.xyz vf30, vf30, vf24
|
|
mtir vi14, vf12.y | mulaz.xyzw ACC, vf20, vf24
|
|
b L30 | maddw.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L45: (L59 in og)
|
|
ibgez vi09, L46 | mulaz.xyzw ACC, vf29, vf09
|
|
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
|
|
nop | ftoi4.xyzw vf21, vf10
|
|
L46: (L60 in og)
|
|
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
|
|
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
|
|
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
|
|
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
|
|
1024.0 | ftoi0.xyzw vf13, vf13 :i
|
|
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
|
|
ibne vi06, vi03, L32 | maddaw.xyzw ACC, vf26, vf12
|
|
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
|
|
ibne vi07, vi03, L57 | nop
|
|
nop | nop
|
|
b L77 | nop
|
|
nop | nop
|
|
|
|
;; mat 3
|
|
L47:
|
|
lqi.xyzw vf08, vi01 | nop
|
|
lqi.xyzw vf24, vi02 | nop
|
|
lqi.xyzw vf11, vi01 | nop
|
|
lqi.xyzw vf14, vi01 | nop
|
|
mtir vi10, vf08.x | nop
|
|
mtir vi13, vf08.y | itof0.xyzw vf24, vf24
|
|
nop | add.zw vf08, vf08, vf17
|
|
nop | add.xyzw vf11, vf11, vf18
|
|
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
|
|
ilw.w vi08, -1(vi02) | mulw.xyzw vf24, vf24, vf29
|
|
iand vi13, vi13, vi05 | nop
|
|
lq.xyzw vf20, 0(vi10) | nop
|
|
lq.xyzw vf31, 0(vi13) | nop
|
|
lq.xyzw vf25, 0(vi08) | nop
|
|
lq.xyzw vf23, 1(vi10) | nop
|
|
lq.xyzw vf20, 1(vi13) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 2(vi10) | maddz.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 2(vi13) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 3(vi10) | maddz.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf31, 3(vi13) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf23, 4(vi10) | maddz.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf20, 4(vi13) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 5(vi10) | maddz.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf23, 5(vi13) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 6(vi10) | maddz.xyz vf29, vf29, vf24
|
|
lq.xyzw vf22, 6(vi13) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
|
|
mtir vi10, vf11.x | mulax.xyzw ACC, vf20, vf24
|
|
mtir vi13, vf11.y | madday.xyzw ACC, vf22, vf24
|
|
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
|
|
nop | itof0.xyzw vf24, vf23
|
|
nop | mulaz.xyzw ACC, vf29, vf08
|
|
nop | maddaz.xyzw ACC, vf30, vf11
|
|
nop | maddz.xyz vf11, vf31, vf14
|
|
nop | nop
|
|
nop | nop
|
|
nop | mulaw.xyzw ACC, vf25, vf08
|
|
nop | nop
|
|
erleng.xyz P, vf11 | nop
|
|
nop | maddaw.xyzw ACC, vf26, vf11
|
|
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
|
|
lqi.xyzw vf09, vi01 | nop
|
|
ilwr.y vi03, vi12 | nop
|
|
ilw.z vi07, 1(vi12) | nop
|
|
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
|
|
lqi.xyzw vf15, vi01 | nop
|
|
mtir vi11, vf09.x | nop
|
|
mtir vi14, vf09.y | nop
|
|
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
|
|
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
|
|
iadd vi03, vi03, vi12 | add.xyzw vf12, vf12, vf18
|
|
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
|
|
ilw.w vi08, -1(vi02) | nop
|
|
iadd vi07, vi07, vi03 | nop
|
|
iand vi14, vi14, vi05 | nop
|
|
ibne vi05, vi11, L48 | nop
|
|
iaddi vi07, vi07, -0x1 | mul.xyz vf08, vf08, Q
|
|
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
|
|
mtir vi14, vf12.y | nop
|
|
b L49 | nop
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
|
|
L48:
|
|
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
lq.xyzw vf31, 0(vi14) | nop
|
|
lq.xyzw vf25, 0(vi08) | nop
|
|
lq.xyzw vf23, 1(vi11) | add.xyzw vf08, vf08, vf22
|
|
lq.xyzw vf20, 1(vi14) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 2(vi11) | maddz.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 2(vi14) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 3(vi11) | maddz.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf31, 3(vi14) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf23, 4(vi11) | maddz.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf20, 4(vi14) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 5(vi11) | maddz.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf23, 5(vi14) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 6(vi11) | maddz.xyz vf29, vf29, vf24
|
|
lq.xyzw vf22, 6(vi14) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
|
|
mtir vi11, vf12.x | mulax.xyzw ACC, vf20, vf24
|
|
mtir vi14, vf12.y | madday.xyzw ACC, vf22, vf24
|
|
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L49:
|
|
nop | mulaz.xyzw ACC, vf29, vf09
|
|
nop | maddaz.xyzw ACC, vf30, vf12
|
|
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
|
|
nop | nop
|
|
1024.0 | miniw.w vf08, vf08, vf03 :i
|
|
nop | mulaw.xyzw ACC, vf25, vf09
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
|
|
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
|
|
3072.0 | nop :i
|
|
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
|
|
nop | maddaw.xyzw ACC, vf26, vf12
|
|
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
|
|
L50:
|
|
lqi.xyzw vf10, vi01 | mulax.xyzw ACC, vf01, vf11
|
|
b L52 | madday.xyzw ACC, vf02, vf11
|
|
nop | maddz.xyzw vf11, vf03, vf11
|
|
L51:
|
|
sq.xyzw vf11, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
|
|
lqi.xyzw vf10, vi01 | minii.xy vf08, vf08, I
|
|
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
|
|
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
|
|
L52:
|
|
lqi.xyzw vf13, vi01 | add.xyzw vf09, vf09, vf28
|
|
lqi.xyzw vf16, vi01 | maxw.w vf08, vf08, vf02
|
|
mtir vi12, vf10.x | itof0.xyzw vf23, vf23
|
|
mtir vi15, vf10.y | maxx.xyzw vf11, vf11, vf00
|
|
sq.xyzw vf09, -4(vi01) | miniw.w vf09, vf09, vf01
|
|
div Q, vf01.w, vf09.w | add.zw vf10, vf10, vf17
|
|
move.xyzw vf21, vf08 | add.xyzw vf13, vf13, vf18
|
|
iand vi12, vi12, vi05 | add.xyzw vf16, vf16, vf19
|
|
ilw.w vi08, -1(vi02) | mulax.xyzw ACC, vf04, vf11
|
|
ibgtz vi09, L53 | madday.xyzw ACC, vf05, vf11
|
|
iand vi15, vi15, vi05 | maddaz.xyzw ACC, vf06, vf11
|
|
nop | addx.w vf21, vf21, vf17
|
|
L53:
|
|
ibne vi05, vi12, L54 | maddw.xyzw vf11, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf09, vf09, Q
|
|
mtir vi12, vf13.x | mul.xyzw vf15, vf15, Q
|
|
mtir vi15, vf13.y | ftoi4.xyzw vf21, vf21
|
|
b L55 | mul.xyzw vf11, vf11, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
|
|
L54:
|
|
lq.xyzw vf20, 0(vi12) | mul.xyzw vf15, vf15, Q
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
lq.xyzw vf31, 0(vi15) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf25, 0(vi08) | mul.xyzw vf11, vf11, vf23
|
|
lq.xyzw vf23, 1(vi12) | add.xyzw vf09, vf09, vf22
|
|
lq.xyzw vf20, 1(vi15) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 2(vi12) | maddz.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 2(vi15) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 3(vi12) | maddz.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf31, 3(vi15) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf23, 4(vi12) | maddz.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf20, 4(vi15) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 5(vi12) | maddz.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf23, 5(vi15) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 6(vi12) | maddz.xyz vf29, vf29, vf24
|
|
lq.xyzw vf22, 6(vi15) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
|
|
mtir vi12, vf13.x | mulax.xyzw ACC, vf20, vf24
|
|
mtir vi15, vf13.y | madday.xyzw ACC, vf22, vf24
|
|
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L55: (L70 in og)
|
|
ibgez vi09, L56 | mulaz.xyzw ACC, vf29, vf10
|
|
sq.xyzw vf21, 2(vi10) | maddaz.xyzw ACC, vf30, vf13
|
|
nop | ftoi4.xyzw vf21, vf08
|
|
L56:
|
|
mfp.w vf20, P | maddz.xyz vf13, vf31, vf16
|
|
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
|
|
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
|
|
sq.xyzw vf21, 2(vi13) | mulaw.xyzw ACC, vf25, vf10
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf12, vf12, vf20
|
|
1024.0 | ftoi0.xyzw vf11, vf11 :i
|
|
erleng.xyz P, vf13 | maxi.xy vf09, vf09, I
|
|
ibeq vi07, vi03, L67 | maddaw.xyzw ACC, vf26, vf13
|
|
mr32.z vf16, vf00 | maddw.xyzw vf10, vf27, vf16
|
|
L57:
|
|
sq.xyzw vf12, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
|
|
lqi.xyzw vf08, vi01 | minii.xy vf09, vf09, I
|
|
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
|
|
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
|
|
lqi.xyzw vf11, vi01 | add.xyzw vf10, vf10, vf28
|
|
lqi.xyzw vf14, vi01 | maxw.w vf09, vf09, vf02
|
|
mtir vi10, vf08.x | itof0.xyzw vf23, vf23
|
|
mtir vi13, vf08.y | maxx.xyzw vf12, vf12, vf00
|
|
sq.xyzw vf10, -4(vi01) | miniw.w vf10, vf10, vf01
|
|
div Q, vf01.w, vf10.w | add.zw vf08, vf08, vf17
|
|
move.xyzw vf21, vf09 | add.xyzw vf11, vf11, vf18
|
|
iand vi10, vi10, vi05 | add.xyzw vf14, vf14, vf19
|
|
ilw.w vi08, -1(vi02) | mulax.xyzw ACC, vf04, vf12
|
|
ibgtz vi09, L58 | madday.xyzw ACC, vf05, vf12
|
|
iand vi13, vi13, vi05 | maddaz.xyzw ACC, vf06, vf12
|
|
nop | addx.w vf21, vf21, vf17
|
|
L58:
|
|
ibne vi05, vi10, L59 | maddw.xyzw vf12, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf10, vf10, Q
|
|
mtir vi10, vf11.x | mul.xyzw vf16, vf16, Q
|
|
mtir vi13, vf11.y | ftoi4.xyzw vf21, vf21
|
|
b L60 | mul.xyzw vf12, vf12, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
|
|
L59:
|
|
lq.xyzw vf20, 0(vi10) | mul.xyzw vf16, vf16, Q
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
lq.xyzw vf31, 0(vi13) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf25, 0(vi08) | mul.xyzw vf12, vf12, vf23
|
|
lq.xyzw vf23, 1(vi10) | add.xyzw vf10, vf10, vf22
|
|
lq.xyzw vf20, 1(vi13) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 2(vi10) | maddz.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 2(vi13) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 3(vi10) | maddz.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf31, 3(vi13) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf23, 4(vi10) | maddz.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf20, 4(vi13) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 5(vi10) | maddz.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf23, 5(vi13) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 6(vi10) | maddz.xyz vf29, vf29, vf24
|
|
lq.xyzw vf22, 6(vi13) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
|
|
mtir vi10, vf11.x | mulax.xyzw ACC, vf20, vf24
|
|
mtir vi13, vf11.y | madday.xyzw ACC, vf22, vf24
|
|
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L60:
|
|
ibgez vi09, L61 | mulaz.xyzw ACC, vf29, vf08
|
|
sq.xyzw vf21, 2(vi11) | maddaz.xyzw ACC, vf30, vf11
|
|
nop | ftoi4.xyzw vf21, vf09
|
|
L61:
|
|
mfp.w vf20, P | maddz.xyz vf11, vf31, vf14
|
|
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
|
|
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
|
|
sq.xyzw vf21, 2(vi14) | mulaw.xyzw ACC, vf25, vf08
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf13, vf13, vf20
|
|
1024.0 | ftoi0.xyzw vf12, vf12 :i
|
|
erleng.xyz P, vf11 | maxi.xy vf10, vf10, I
|
|
ibeq vi07, vi03, L72 | maddaw.xyzw ACC, vf26, vf11
|
|
mr32.z vf14, vf00 | maddw.xyzw vf08, vf27, vf14
|
|
L62:
|
|
sq.xyzw vf13, -1(vi03) | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
|
|
lqi.xyzw vf09, vi01 | minii.xy vf10, vf10, I
|
|
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
|
|
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
|
|
lqi.xyzw vf12, vi01 | add.xyzw vf08, vf08, vf28
|
|
lqi.xyzw vf15, vi01 | maxw.w vf10, vf10, vf02
|
|
mtir vi11, vf09.x | itof0.xyzw vf23, vf23
|
|
mtir vi14, vf09.y | maxx.xyzw vf13, vf13, vf00
|
|
sq.xyzw vf08, -4(vi01) | miniw.w vf08, vf08, vf01
|
|
div Q, vf01.w, vf08.w | add.zw vf09, vf09, vf17
|
|
move.xyzw vf21, vf10 | add.xyzw vf12, vf12, vf18
|
|
iand vi11, vi11, vi05 | add.xyzw vf15, vf15, vf19
|
|
ilw.w vi08, -1(vi02) | mulax.xyzw ACC, vf04, vf13
|
|
ibgtz vi09, L63 | madday.xyzw ACC, vf05, vf13
|
|
iand vi14, vi14, vi05 | maddaz.xyzw ACC, vf06, vf13
|
|
nop | addx.w vf21, vf21, vf17
|
|
L63:
|
|
ibne vi05, vi11, L64 | maddw.xyzw vf13, vf07, vf00
|
|
ilw.x vi09, -9(vi01) | mul.xyz vf08, vf08, Q
|
|
mtir vi11, vf12.x | mul.xyzw vf14, vf14, Q
|
|
mtir vi14, vf12.y | ftoi4.xyzw vf21, vf21
|
|
b L65 | mul.xyzw vf13, vf13, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
|
|
L64:
|
|
lq.xyzw vf20, 0(vi11) | mul.xyzw vf14, vf14, Q
|
|
nop | mulw.xyzw vf24, vf24, vf29
|
|
lq.xyzw vf31, 0(vi14) | ftoi4.xyzw vf21, vf21
|
|
lq.xyzw vf25, 0(vi08) | mul.xyzw vf13, vf13, vf23
|
|
lq.xyzw vf23, 1(vi11) | add.xyzw vf08, vf08, vf22
|
|
lq.xyzw vf20, 1(vi14) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf26, 1(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 2(vi11) | maddz.xyzw vf25, vf25, vf24
|
|
lq.xyzw vf23, 2(vi14) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf27, 2(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 3(vi11) | maddz.xyzw vf26, vf26, vf24
|
|
lq.xyzw vf31, 3(vi14) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf28, 3(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lq.xyzw vf23, 4(vi11) | maddz.xyzw vf27, vf27, vf24
|
|
lq.xyzw vf20, 4(vi14) | mulax.xyzw ACC, vf20, vf24
|
|
lq.xyz vf29, 4(vi08) | madday.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 5(vi11) | maddz.xyzw vf28, vf28, vf24
|
|
lq.xyzw vf23, 5(vi14) | mulax.xyzw ACC, vf23, vf24
|
|
lq.xyz vf30, 5(vi08) | madday.xyzw ACC, vf20, vf24
|
|
lq.xyzw vf20, 6(vi11) | maddz.xyz vf29, vf29, vf24
|
|
lq.xyzw vf22, 6(vi14) | mulax.xyzw ACC, vf31, vf24
|
|
lq.xyzw vf31, 6(vi08) | madday.xyzw ACC, vf23, vf24
|
|
lqi.xyzw vf23, vi02 | maddz.xyz vf30, vf30, vf24
|
|
mtir vi11, vf12.x | mulax.xyzw ACC, vf20, vf24
|
|
mtir vi14, vf12.y | madday.xyzw ACC, vf22, vf24
|
|
lq.xyzw vf22, 2(vi00) | maddz.xyzw vf31, vf31, vf24
|
|
lqi.xyzw vf23, vi03 | itof0.xyzw vf24, vf23
|
|
L65:
|
|
ibgez vi09, L66 | mulaz.xyzw ACC, vf29, vf09
|
|
sq.xyzw vf21, 2(vi12) | maddaz.xyzw ACC, vf30, vf12
|
|
nop | ftoi4.xyzw vf21, vf10
|
|
L66: (L80 in og)
|
|
mfp.w vf20, P | maddz.xyz vf12, vf31, vf15
|
|
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
|
|
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
|
|
sq.xyzw vf21, 2(vi15) | mulaw.xyzw ACC, vf25, vf09
|
|
ilw.y vi09, -6(vi01) | mulw.xyzw vf11, vf11, vf20
|
|
1024.0 | ftoi0.xyzw vf13, vf13 :i
|
|
erleng.xyz P, vf12 | maxi.xy vf08, vf08, I
|
|
ibne vi07, vi03, L51 | maddaw.xyzw ACC, vf26, vf12
|
|
mr32.z vf15, vf00 | maddw.xyzw vf09, vf27, vf15
|
|
b L77 | nop
|
|
nop | nop
|
|
|
|
;;;;;;;;;;; OG merc has a bunch of merc prime alternate paths here.
|
|
|
|
;;;; next we have 3x pipeline exits.
|
|
;;
|
|
|
|
L67:
|
|
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
|
|
sq.xyzw vf12, -1(vi03) | minii.xy vf09, vf09, I
|
|
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
|
|
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
|
|
iaddiu vi05, vi00, 0x173 | add.xyzw vf10, vf10, vf28
|
|
lq.xyzw vf26, 1(vi00) | maxw.w vf09, vf09, vf02
|
|
iaddi vi08, vi00, 0x1 | itof0.xyzw vf23, vf23
|
|
isw.x vi08, -2(vi05) | maxx.xyzw vf12, vf12, vf00
|
|
sq.xyzw vf10, -1(vi01) | miniw.w vf10, vf10, vf01
|
|
div Q, vf01.w, vf10.w | nop
|
|
move.xyzw vf21, vf09 | nop
|
|
iaddiu vi08, vi00, 0x42 | nop
|
|
isw.z vi08, -1(vi05) | mulax.xyzw ACC, vf04, vf12
|
|
ibgtz vi09, L68 | madday.xyzw ACC, vf05, vf12
|
|
isw.x vi00, -1(vi05) | maddaz.xyzw ACC, vf06, vf12
|
|
nop | addx.w vf21, vf21, vf17
|
|
L68:
|
|
sq.yzw vf26, -2(vi05) | maddw.xyzw vf12, vf07, vf00
|
|
ilw.x vi09, -6(vi01) | mul.xyz vf10, vf10, Q
|
|
iaddiu vi08, vi00, 0x171 | mul.xyzw vf16, vf16, Q
|
|
nop | ftoi4.xyzw vf21, vf21
|
|
nop | mul.xyzw vf12, vf12, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf10, vf10, vf22
|
|
ibgez vi09, L69 | nop
|
|
sq.xyzw vf21, 2(vi11) | nop
|
|
nop | ftoi4.xyzw vf21, vf09
|
|
L69:
|
|
mfp.w vf20, P | nop
|
|
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
|
|
sq.xyzw vf15, 0(vi14) | miniw.w vf10, vf10, vf03
|
|
sq.xyzw vf21, 2(vi14) | nop
|
|
ilw.y vi09, -3(vi01) | mulw.xyzw vf13, vf13, vf20
|
|
1024.0 | ftoi0.xyzw vf12, vf12 :i
|
|
nop | maxi.xy vf10, vf10, I
|
|
nop | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
|
|
sq.xyzw vf13, -1(vi03) | minii.xy vf10, vf10, I
|
|
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
|
|
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
|
|
nop | nop
|
|
nop | maxw.w vf10, vf10, vf02
|
|
nop | itof0.xyzw vf23, vf23
|
|
nop | maxx.xyzw vf13, vf13, vf00
|
|
nop | nop
|
|
move.xyzw vf21, vf10 | nop
|
|
nop | nop
|
|
nop | mulax.xyzw ACC, vf04, vf13
|
|
ibgtz vi09, L70 | madday.xyzw ACC, vf05, vf13
|
|
nop | maddaz.xyzw ACC, vf06, vf13
|
|
nop | addx.w vf21, vf21, vf17
|
|
L70:
|
|
nop | maddw.xyzw vf13, vf07, vf00
|
|
ilw.x vi09, -3(vi01) | nop
|
|
xtop vi05 | nop
|
|
iaddiu vi05, vi05, 0x8c | ftoi4.xyzw vf21, vf21
|
|
ilwr.z vi01, vi05 | mul.xyzw vf13, vf13, vf23
|
|
ilwr.y vi03, vi05 | nop
|
|
ibgez vi09, L71 | nop
|
|
sq.xyzw vf21, 2(vi12) | nop
|
|
nop | ftoi4.xyzw vf21, vf10
|
|
L71:
|
|
nop | nop
|
|
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
|
|
sq.xyzw vf16, 0(vi15) | nop
|
|
sq.xyzw vf21, 2(vi15) | nop
|
|
nop | nop
|
|
nop | ftoi0.xyzw vf13, vf13
|
|
lq.xyzw vf23, 124(vi00) | nop
|
|
iadd vi01, vi01, vi05 | nop
|
|
iadd vi03, vi03, vi05 | nop
|
|
sq.xyzw vf13, 1(vi12) | nop
|
|
b L82 | nop
|
|
sq.xyzw vf13, 1(vi15) |
|
|
|
|
|
|
L72:
|
|
3072.0 | mulax.xyzw ACC, vf01, vf13 :i
|
|
sq.xyzw vf13, -1(vi03) | minii.xy vf10, vf10, I
|
|
sq.xyzw vf12, 1(vi11) | madday.xyzw ACC, vf02, vf13
|
|
sq.xyzw vf12, 1(vi14) | maddz.xyzw vf13, vf03, vf13
|
|
iaddiu vi05, vi00, 0x173 | add.xyzw vf08, vf08, vf28
|
|
lq.xyzw vf26, 1(vi00) | maxw.w vf10, vf10, vf02
|
|
iaddi vi08, vi00, 0x1 | itof0.xyzw vf23, vf23
|
|
isw.x vi08, -2(vi05) | maxx.xyzw vf13, vf13, vf00
|
|
sq.xyzw vf08, -1(vi01) | miniw.w vf08, vf08, vf01
|
|
div Q, vf01.w, vf08.w | nop
|
|
move.xyzw vf21, vf10 | nop
|
|
iaddiu vi08, vi00, 0x42 | nop
|
|
isw.z vi08, -1(vi05) | mulax.xyzw ACC, vf04, vf13
|
|
ibgtz vi09, L73 | madday.xyzw ACC, vf05, vf13
|
|
isw.x vi00, -1(vi05) | maddaz.xyzw ACC, vf06, vf13
|
|
nop | addx.w vf21, vf21, vf17
|
|
L73:
|
|
sq.yzw vf26, -2(vi05) | maddw.xyzw vf13, vf07, vf00
|
|
ilw.x vi09, -6(vi01) | mul.xyz vf08, vf08, Q
|
|
iaddiu vi08, vi00, 0x171 | mul.xyzw vf14, vf14, Q
|
|
nop | ftoi4.xyzw vf21, vf21
|
|
nop | mul.xyzw vf13, vf13, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf08, vf08, vf22
|
|
ibgez vi09, L74 | nop
|
|
sq.xyzw vf21, 2(vi12) | nop
|
|
nop | ftoi4.xyzw vf21, vf10
|
|
L74:
|
|
mfp.w vf20, P | nop
|
|
sq.xyzw vf16, 0(vi12) | miniy.xyzw vf13, vf13, vf17
|
|
sq.xyzw vf16, 0(vi15) | miniw.w vf08, vf08, vf03
|
|
sq.xyzw vf21, 2(vi15) | nop
|
|
ilw.y vi09, -3(vi01) | mulw.xyzw vf11, vf11, vf20
|
|
1024.0 | ftoi0.xyzw vf13, vf13 :i
|
|
nop | maxi.xy vf08, vf08, I
|
|
nop | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
|
|
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
|
|
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
|
|
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
|
|
nop | nop
|
|
nop | maxw.w vf08, vf08, vf02
|
|
nop | itof0.xyzw vf23, vf23
|
|
nop | maxx.xyzw vf11, vf11, vf00
|
|
nop | nop
|
|
move.xyzw vf21, vf08 | nop
|
|
nop | nop
|
|
nop | mulax.xyzw ACC, vf04, vf11
|
|
ibgtz vi09, L75 | madday.xyzw ACC, vf05, vf11
|
|
nop | maddaz.xyzw ACC, vf06, vf11
|
|
nop | addx.w vf21, vf21, vf17
|
|
L75:
|
|
nop | maddw.xyzw vf11, vf07, vf00
|
|
ilw.x vi09, -3(vi01) | nop
|
|
xtop vi05 | nop
|
|
iaddiu vi05, vi05, 0x8c | ftoi4.xyzw vf21, vf21
|
|
ilwr.z vi01, vi05 | mul.xyzw vf11, vf11, vf23
|
|
ilwr.y vi03, vi05 | nop
|
|
ibgez vi09, L76 | nop
|
|
sq.xyzw vf21, 2(vi10) | nop
|
|
nop | ftoi4.xyzw vf21, vf08
|
|
L76:
|
|
nop | nop
|
|
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
|
|
sq.xyzw vf14, 0(vi13) | nop
|
|
sq.xyzw vf21, 2(vi13) | nop
|
|
nop | nop
|
|
nop | ftoi0.xyzw vf11, vf11
|
|
lq.xyzw vf23, 124(vi00) | nop
|
|
iadd vi01, vi01, vi05 | nop
|
|
iadd vi03, vi03, vi05 | nop
|
|
sq.xyzw vf11, 1(vi10) | nop
|
|
b L82 | nop
|
|
sq.xyzw vf11, 1(vi13) | nop
|
|
|
|
|
|
|
|
L77:
|
|
3072.0 | mulax.xyzw ACC, vf01, vf11 :i
|
|
sq.xyzw vf11, -1(vi03) | minii.xy vf08, vf08, I
|
|
sq.xyzw vf13, 1(vi12) | madday.xyzw ACC, vf02, vf11
|
|
sq.xyzw vf13, 1(vi15) | maddz.xyzw vf11, vf03, vf11
|
|
iaddiu vi05, vi00, 0x173 | add.xyzw vf09, vf09, vf28
|
|
lq.xyzw vf26, 1(vi00) | maxw.w vf08, vf08, vf02
|
|
iaddi vi08, vi00, 0x1 | itof0.xyzw vf23, vf23
|
|
isw.x vi08, -2(vi05) | maxx.xyzw vf11, vf11, vf00
|
|
sq.xyzw vf09, -1(vi01) | miniw.w vf09, vf09, vf01
|
|
div Q, vf01.w, vf09.w | nop
|
|
move.xyzw vf21, vf08 | nop
|
|
iaddiu vi08, vi00, 0x42 | nop
|
|
isw.z vi08, -1(vi05) | mulax.xyzw ACC, vf04, vf11
|
|
ibgtz vi09, L78 | madday.xyzw ACC, vf05, vf11
|
|
isw.x vi00, -1(vi05) | maddaz.xyzw ACC, vf06, vf11
|
|
nop | addx.w vf21, vf21, vf17
|
|
L78:
|
|
sq.yzw vf26, -2(vi05) | maddw.xyzw vf11, vf07, vf00
|
|
ilw.x vi09, -6(vi01) | mul.xyz vf09, vf09, Q
|
|
iaddiu vi08, vi00, 0x171 | mul.xyzw vf15, vf15, Q ;; vi08 = 0x171: output location (fixed?)
|
|
nop | ftoi4.xyzw vf21, vf21
|
|
nop | mul.xyzw vf11, vf11, vf23
|
|
lqi.xyzw vf23, vi03 | add.xyzw vf09, vf09, vf22
|
|
ibgez vi09, L79 | nop
|
|
sq.xyzw vf21, 2(vi10) | nop
|
|
nop | ftoi4.xyzw vf21, vf08
|
|
L79:
|
|
mfp.w vf20, P | nop
|
|
sq.xyzw vf14, 0(vi10) | miniy.xyzw vf11, vf11, vf17
|
|
sq.xyzw vf14, 0(vi13) | miniw.w vf09, vf09, vf03
|
|
sq.xyzw vf21, 2(vi13) | nop
|
|
ilw.y vi09, -3(vi01) | mulw.xyzw vf12, vf12, vf20
|
|
1024.0 | ftoi0.xyzw vf11, vf11 :i
|
|
nop | maxi.xy vf09, vf09, I
|
|
nop | nop
|
|
3072.0 | mulax.xyzw ACC, vf01, vf12 :i
|
|
sq.xyzw vf12, -1(vi03) | minii.xy vf09, vf09, I
|
|
sq.xyzw vf11, 1(vi10) | madday.xyzw ACC, vf02, vf12
|
|
sq.xyzw vf11, 1(vi13) | maddz.xyzw vf12, vf03, vf12
|
|
nop | nop
|
|
nop | maxw.w vf09, vf09, vf02
|
|
nop | itof0.xyzw vf23, vf23
|
|
nop | maxx.xyzw vf12, vf12, vf00
|
|
nop | nop
|
|
move.xyzw vf21, vf09 | nop
|
|
nop | nop
|
|
nop | mulax.xyzw ACC, vf04, vf12
|
|
ibgtz vi09, L80 | madday.xyzw ACC, vf05, vf12
|
|
nop | maddaz.xyzw ACC, vf06, vf12
|
|
nop | addx.w vf21, vf21, vf17
|
|
L80:
|
|
nop | maddw.xyzw vf12, vf07, vf00
|
|
ilw.x vi09, -3(vi01) | nop
|
|
xtop vi05 | nop
|
|
iaddiu vi05, vi05, 0x8c | ftoi4.xyzw vf21, vf21 ;; vi05 = byte-header
|
|
ilwr.z vi01, vi05 | mul.xyzw vf12, vf12, vf23 ;; vi01 = lump
|
|
ilwr.y vi03, vi05 | nop ;; vi03 = rgba
|
|
ibgez vi09, L81 | nop
|
|
sq.xyzw vf21, 2(vi11) | nop
|
|
nop | ftoi4.xyzw vf21, vf09
|
|
L81:
|
|
nop | nop
|
|
sq.xyzw vf15, 0(vi11) | miniy.xyzw vf12, vf12, vf17
|
|
sq.xyzw vf15, 0(vi14) | nop
|
|
sq.xyzw vf21, 2(vi14) | nop
|
|
nop | nop
|
|
nop | ftoi0.xyzw vf12, vf12
|
|
lq.xyzw vf23, 124(vi00) | nop ;; unperspect
|
|
iadd vi01, vi01, vi05 | nop ;; lump
|
|
iadd vi03, vi03, vi05 | nop ;; rgba
|
|
sq.xyzw vf12, 1(vi11) | nop
|
|
sq.xyzw vf12, 1(vi14) | nop
|
|
|
|
;; COMMON finish part
|
|
|
|
L82:
|
|
xgkick vi08 | nop ;; normal draw?
|
|
|
|
;; pipeline startup for envmap math
|
|
lq.xyzw vf08, 2(vi01) | nop ;; vf08 = transformed vert
|
|
lqi.xyzw vf10, vi03 | nop ;; vf10 = transformed normal
|
|
ilw.x vi04, 1(vi05) | nop ;; vi04 = mat1-cnt
|
|
ilw.y vi06, 1(vi05) | nop ;; vi06 = mat2-cnt
|
|
ilw.z vi07, 1(vi05) | mul.xyzw vf09, vf08, vf23 ;; vi07 = mat3-cnt, unperspect the vert
|
|
iadd vi04, vi04, vi06 | subw.z vf10, vf10, vf00 ;; vi04 = mat1-cnt + mat2-cnt, refl1
|
|
iaddi vi01, vi01, 0x3 | nop ;; step lump
|
|
iadd vi04, vi04, vi07 | nop ;; vi04 = mat1 + mat2 + mat3 counts
|
|
iadd vi02, vi03, vi04 | addw.z vf09, vf00, vf09 ;; vi02 = end rgba, vert1
|
|
iaddi vi02, vi02, 0x2 | nop ;; end rgba more
|
|
lq.xyzw vf14, 118(vi00) | maxw.xyzw vf21, vf00, vf00 ;; vf14 = rgba-fade, vf21 = [1, 1, 1, 1]
|
|
lq.xyzw vf26, 371(vi00) | nop ;; vf26 = the giftag
|
|
nop | mul.xyz vf15, vf09, vf10 ;; multiply
|
|
lq.xyzw vf27, 119(vi00) | nop ;; vf27 = e-adgif0
|
|
nop | nop
|
|
lq.xyzw vf28, 120(vi00) | nop ;; vf28 = e-adgif1
|
|
nop | adday.xyzw vf15, vf15
|
|
lq.xyzw vf31, 121(vi00) | maddz.x vf15, vf21, vf15 ;; vf31 = e-adgif2
|
|
nop | nop
|
|
sq.xyzw vf26, 813(vi00) | nop ;; store giftag
|
|
lq.xyzw vf08, 2(vi01) | nop ;; pipe
|
|
lqi.xyzw vf11, vi03 | nop ;; pipe
|
|
div Q, vf15.x, vf10.z | nop ;; div
|
|
sq.xyzw vf27, 814(vi00) | mulaw.xyzw ACC, vf09, vf00 ;; store e-ad0, mul
|
|
nop | mul.xyzw vf09, vf08, vf23 ;; pipe
|
|
sq.xyzw vf28, 815(vi00) | subw.z vf11, vf11, vf00 ;; store e-ad1, pipe
|
|
iaddi vi01, vi01, 0x3 | nop ;; pipe
|
|
sq.xyzw vf31, 816(vi00) | nop ;; store e-ad2
|
|
nop | addw.z vf09, vf00, vf09 ;; pipe
|
|
lq.xyzw vf26, 0(vi00) | madd.xyzw vf10, vf10, Q ;; vf26 = tristrip giftag, madd
|
|
nop | nop
|
|
lq.xyzw vf27, 122(vi00) | nop ;; vf27 = e-ad3
|
|
nop | mul.xyz vf15, vf09, vf11 ;; pipe
|
|
eleng.xyz P, vf10 | nop ;; len
|
|
lq.xyzw vf28, 123(vi00) | nop ;; vf28 = e-ad4
|
|
nop | nop
|
|
lq.xyzw vf31, 377(vi00) | adday.xyzw vf15, vf15 ;; vf31 = old tristrip???
|
|
nop | maddz.x vf15, vf21, vf15 ;; pipe
|
|
mr32.xyzw vf26, vf26 | nop ;; rotate tristrip template
|
|
nop | nop
|
|
lq.xyzw vf08, 2(vi01) | nop ;; pipe
|
|
lqi.xyzw vf12, vi03 | nop ;; pipe
|
|
div Q, vf15.x, vf11.z | nop ;; pipe
|
|
mr32.xyzw vf26, vf26 | mulaw.xyzw ACC, vf09, vf00 ;; rotate | pipe
|
|
sq.xyzw vf27, 817(vi00) | mul.xyzw vf09, vf08, vf23 ;; store adgif3 | pipe
|
|
lq.xyzw vf25, -5(vi01) | subw.z vf12, vf12, vf00 ;; vf25 = lump[1] | pipe
|
|
iaddi vi01, vi01, 0x3 | nop ;; pipe
|
|
sq.xyzw vf28, 818(vi00) | nop ;; e-ad4 store
|
|
nop | addw.z vf09, vf00, vf09 ;; pipe
|
|
sq.xyzw vf31, 819(vi00) | madd.xyzw vf11, vf11, Q ;; tristrip store | pipe
|
|
nop | nop
|
|
mfp.w vf10, P | nop
|
|
sq.y vf26, 819(vi00) | mul.xyz vf15, vf09, vf12 ;; set abe | pipe
|
|
eleng.xyz P, vf11 | nop
|
|
nop | nop
|
|
div Q, vf23.z, vf10.w | nop ;; NOT PIPE (!)
|
|
nop | adday.xyzw vf15, vf15 ;; pipe
|
|
nop | maddz.x vf15, vf21, vf15 ;; pipe
|
|
nop | nop
|
|
nop | add.xyzw vf25, vf25, vf18 ;; lump dest stuff
|
|
L83:
|
|
lq.xyzw vf08, 2(vi01) | nop ;; pipe
|
|
lqi.xyzw vf13, vi03 | addaz.xyzw vf00, vf23
|
|
div Q, vf15.x, vf12.z | madd.xyzw vf10, vf10, Q
|
|
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
|
|
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
|
|
lq.xyzw vf25, -5(vi01) | subw.z vf13, vf13, vf00
|
|
;;
|
|
iaddi vi01, vi01, 0x3 | nop
|
|
lq.xyzw vf24, 0(vi10) | nop
|
|
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
|
|
lq.xyzw vf20, 2(vi13) | madd.xyzw vf12, vf12, Q
|
|
sq.xyzw vf14, 443(vi10) | nop
|
|
mfp.w vf11, P | nop
|
|
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf13
|
|
eleng.xyz P, vf12 | mulz.xy vf24, vf10, vf24
|
|
sq.xyzw vf16, 444(vi10) | nop
|
|
div Q, vf23.z, vf11.w | nop
|
|
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
|
|
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
|
|
ibeq vi02, vi03, L84 | nop
|
|
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
|
|
lq.xyzw vf08, 2(vi01) | nop
|
|
lqi.xyzw vf10, vi03 | addaz.xyzw vf00, vf23
|
|
div Q, vf15.x, vf13.z | madd.xyzw vf11, vf11, Q
|
|
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
|
|
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
|
|
lq.xyzw vf25, -5(vi01) | subw.z vf10, vf10, vf00
|
|
iaddi vi01, vi01, 0x3 | nop
|
|
lq.xyzw vf24, 0(vi10) | nop
|
|
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
|
|
lq.xyzw vf20, 2(vi13) | madd.xyzw vf13, vf13, Q
|
|
sq.xyzw vf14, 443(vi10) | nop
|
|
mfp.w vf12, P | nop
|
|
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf10
|
|
eleng.xyz P, vf13 | mulz.xy vf24, vf11, vf24
|
|
sq.xyzw vf16, 444(vi10) | nop
|
|
div Q, vf23.z, vf12.w | nop
|
|
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
|
|
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
|
|
ibeq vi02, vi03, L84 | nop
|
|
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
|
|
lq.xyzw vf08, 2(vi01) | nop
|
|
lqi.xyzw vf11, vi03 | addaz.xyzw vf00, vf23
|
|
div Q, vf15.x, vf10.z | madd.xyzw vf12, vf12, Q
|
|
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
|
|
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
|
|
lq.xyzw vf25, -5(vi01) | subw.z vf11, vf11, vf00
|
|
iaddi vi01, vi01, 0x3 | nop
|
|
lq.xyzw vf24, 0(vi10) | nop
|
|
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
|
|
lq.xyzw vf20, 2(vi13) | madd.xyzw vf10, vf10, Q
|
|
sq.xyzw vf14, 443(vi10) | nop
|
|
mfp.w vf13, P | nop
|
|
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf11
|
|
eleng.xyz P, vf10 | mulz.xy vf24, vf12, vf24
|
|
sq.xyzw vf16, 444(vi10) | nop
|
|
div Q, vf23.z, vf13.w | nop
|
|
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
|
|
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
|
|
ibeq vi02, vi03, L84 | nop
|
|
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
|
|
lq.xyzw vf08, 2(vi01) | nop
|
|
lqi.xyzw vf12, vi03 | addaz.xyzw vf00, vf23
|
|
div Q, vf15.x, vf11.z | madd.xyzw vf13, vf13, Q
|
|
mtir vi10, vf25.x | mulaw.xyzw ACC, vf09, vf00
|
|
mtir vi13, vf25.y | mul.xyzw vf09, vf08, vf23
|
|
lq.xyzw vf25, -5(vi01) | subw.z vf12, vf12, vf00
|
|
iaddi vi01, vi01, 0x3 | nop
|
|
lq.xyzw vf24, 0(vi10) | nop
|
|
lq.xyzw vf16, 2(vi10) | addw.z vf09, vf00, vf09
|
|
lq.xyzw vf20, 2(vi13) | madd.xyzw vf11, vf11, Q
|
|
sq.xyzw vf14, 443(vi10) | nop
|
|
mfp.w vf10, P | nop
|
|
sq.xyzw vf14, 443(vi13) | mul.xyz vf15, vf09, vf12
|
|
eleng.xyz P, vf11 | mulz.xy vf24, vf13, vf24
|
|
sq.xyzw vf16, 444(vi10) | nop
|
|
div Q, vf23.z, vf10.w | nop
|
|
sq.xyzw vf20, 444(vi13) | adday.xyzw vf15, vf15
|
|
sq.xyzw vf24, 442(vi10) | maddz.x vf15, vf21, vf15
|
|
ibne vi02, vi03, L83 | nop
|
|
sq.xyzw vf24, 442(vi13) | add.xyzw vf25, vf25, vf18
|
|
L84:
|
|
iaddiu vi08, vi00, 0x32d | nop
|
|
xgkick vi08 | nop
|
|
nop | nop :e
|
|
nop | nop
|
|
|
|
```
|